UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Perception of lexical tones by homeland and heritage speakers of Cantonese Lam, Wai Man 2018

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2019_february_lam_waiman.pdf [ 8.47MB ]
Metadata
JSON: 24-1.0373607.json
JSON-LD: 24-1.0373607-ld.json
RDF/XML (Pretty): 24-1.0373607-rdf.xml
RDF/JSON: 24-1.0373607-rdf.json
Turtle: 24-1.0373607-turtle.txt
N-Triples: 24-1.0373607-rdf-ntriples.txt
Original Record: 24-1.0373607-source.json
Full Text
24-1.0373607-fulltext.txt
Citation
24-1.0373607.ris

Full Text

Perception of lexical tones by homeland and heritagespeakers of CantonesebyWai Man LamB. A., The Chinese University of Hong Kong, 2004M. Phil., The Chinese University of Hong Kong, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Linguistics)The University of British Columbia(Vancouver)November 2018c©Wai Man Lam, 2018The following individuals certify that they have read, and recommendto the Faculty of Graduate and Postdoctoral Studies for acceptance, thedissertation entitled:Perception of lexical tones by homeland and heritage speakers of Cantonesesubmitted by Wai Man Lam in partial fulfillment of the requirements for thedegree of Doctor of Philosophy in Linguistics.Examining Committee:Kathleen Currie Hall, Department of LinguisticsCo-supervisorDouglas Pulleyblank, Department of LinguisticsCo-supervisorMolly Babel, Department of LinguisticsSupervisory Committee MemberValter Ciocca, School of Audiology and Speech SciencesUniversity ExaminerMa´rton So´skuthy, Department of LinguisticsUniversity ExamineriiAbstractThis dissertation compares the lexical tone perception abilities of twopopulations with different bilingual configurations: Cantonese-dominantadults who grew up in Hong Kong (referred to as homeland speakers), andEnglish-dominant adults who grew up in a Cantonese-speaking household inCanada (heritage speakers). From infancy both were exposed to Cantoneseas a first language in terms of chronological order; however, after theonset of schooling, each became dominant in the majority language oftheir respective society. Given this background, this study investigateswhether heritage speakers’ perception of lexical tones of a non-dominantfirst language (Cantonese) exhibits cross-language effects from a dominantsecond language (English) that does not have a contrastive dimension oftone.A series of perception experiments was conducted using the wordidentification paradigm. Eight types of audio stimuli were presentedto homeland and heritage speakers (N=34 per group), each of whichrepresented a specific configuration of four variables: whether the acousticsignal contained segmental and tonal information, whether the target wordwas isolated or embedded in a carrier sentence with semantic context, andwhether the meaning of the target word was congruous with the carriersentence. In each trial, participants saw pictures of the target word andminimally contrastive tonal competitors, and were instructed to choose thepicture that represented what they heard.Major findings of this study were: (1) among the eight stimulus types,the accuracy gap between the two groups was the biggest when theiiistimuli were low-pass-filtered monosyllables with no segmental informationor semantic context, which suggests that homeland speakers have asignificantly greater ability to identify tonally contrastive words by solelyrelying on tonal information. (2) Both groups showed confusion ofoverlapping subsets of tone pairs, but heritage speakers had a higher errorpercentage, which indicates a quantitative but not qualitative differencebetween the two groups. (3) When the target word was semanticallyincongruous with the carrier sentence, homeland speakers outperformedheritage speakers by attending to acoustic information, while heritagespeakers relied on semantic information relatively more often. Inother words, the two groups used different listening strategies in toneidentification.ivLay SummaryThe purpose of this research is to investigate effects of post-childhoodlinguistic experience on bilingual speakers’ ability to perceive speech soundsof a first language that uses pitch variation to distinguish word meaning.Sixty-eight young adults raised by Cantonese-speaking parents performed alistening task. Cantonese words and sentences were played and participantswere instructed to select a picture that represented what they heard. Halfof the subjects grew up in Hong Kong, and the other half grew up inCanada and generally felt more comfortable with English. Although bothgroups were exposed to Cantonese from birth, the Hong Kong group wasbetter at paying attention to the pitch of a word, while the Canadian grouprelied on the overall meaning of a sentence relatively more often. Theseresults inform us how multilingual competence works in the human mind,especially in a world where migration is common.vPrefaceThis dissertation is an original intellectual product of the author, Wai ManLam. The name Zoe Wai-Man Lam is also used in other published works bythe author.All projects and associated methods were approved by the BehaviouralResearch Ethics Board of the University of British Columbia [certificate#H16-00297]. All data collection took place in the Speech in ContextLaboratory at the University of British Columbia, Vancouver.A preliminary version of Chapter 1, Chapter 4, and Chapter 5 waspresented at the 29th North American Conference on Chinese Linguisticsheld at Rutgers University, New Jersey, in June 2017.viTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiLay Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvThe LSHK Cantonese Romanization Scheme . . . . . . . . . . . . . xixList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . xxGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxiiiDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xxv1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Cantonese in Canada . . . . . . . . . . . . . . . . . . . . . . . 11.2 Heritage speakers of Cantonese . . . . . . . . . . . . . . . . . 21.3 Goals of this dissertation . . . . . . . . . . . . . . . . . . . . . 61.4 The Cantonese language . . . . . . . . . . . . . . . . . . . . . 9vii1.4.1 Origin and spread . . . . . . . . . . . . . . . . . . . . 91.4.2 Is Cantonese a language? . . . . . . . . . . . . . . . . 91.4.3 Segmental inventories . . . . . . . . . . . . . . . . . . 131.4.4 Syllable structure . . . . . . . . . . . . . . . . . . . . . 151.4.5 Tones . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.4.6 Romanization . . . . . . . . . . . . . . . . . . . . . . . 201.4.7 Writing system . . . . . . . . . . . . . . . . . . . . . . 211.4.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 231.5 The structure of this dissertation . . . . . . . . . . . . . . . . 232 Who Are Heritage Speakers? . . . . . . . . . . . . . . . . . . . . 252.1 Defining key terms . . . . . . . . . . . . . . . . . . . . . . . . 252.1.1 Heritage languages . . . . . . . . . . . . . . . . . . . . 262.1.2 Bilingualism . . . . . . . . . . . . . . . . . . . . . . . . 282.1.3 Language dominance . . . . . . . . . . . . . . . . . . . 302.1.4 Heritage and homeland speakers . . . . . . . . . . . . 322.2 Configurations on the bilingual continuum . . . . . . . . . . . 342.2.1 Configuration A: Monolinguals . . . . . . . . . . . . . 352.2.2 Configurations B to D: L1-dominant bilinguals . . . . . 362.2.3 Configuration E: Perfectly balanced bilinguals . . . . . 402.2.4 Configurations F to H: L2-dominant bilinguals . . . . . 402.2.5 Configuration I: Replacive bilinguals . . . . . . . . . . 472.2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 493 What Is Tonal Perception? . . . . . . . . . . . . . . . . . . . . . 513.1 The acoustic and perceptual aspects of lexical tone . . . . . . 513.2 Perception of Cantonese tones by Cantonese-learning infantsand children . . . . . . . . . . . . . . . . . . . . . . . . . . . . 533.3 Perception of Cantonese tones by adult homeland speakers . . 563.3.1 Acoustic and perceptual correlates of tone identity . . 573.3.2 Tone mergers . . . . . . . . . . . . . . . . . . . . . . . 593.4 Perception of Cantonese tones by non-Cantonese speakers . . 613.5 Tone and heritage speakers of Cantonese . . . . . . . . . . . . 63viii3.6 Hypotheses to be tested . . . . . . . . . . . . . . . . . . . . . 664 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 684.1 An overview of the experimental design . . . . . . . . . . . . 684.1.1 The word identification paradigm . . . . . . . . . . . . 694.1.2 Variables being controlled . . . . . . . . . . . . . . . . 714.1.3 Stimulus types . . . . . . . . . . . . . . . . . . . . . . 714.1.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . 774.2 Pilot Study 1: Familiarity with target words . . . . . . . . . . 794.2.1 Background and purpose . . . . . . . . . . . . . . . . 794.2.2 Procedures . . . . . . . . . . . . . . . . . . . . . . . . 864.2.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . 874.2.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 884.3 Pilot Study 2: Semantic congruity of sentences . . . . . . . . 914.3.1 Background and purpose . . . . . . . . . . . . . . . . 914.3.2 Procedures . . . . . . . . . . . . . . . . . . . . . . . . 924.3.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . 934.3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 944.4 Main study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 954.4.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . 954.4.2 Procedures . . . . . . . . . . . . . . . . . . . . . . . . 1024.4.3 Participants . . . . . . . . . . . . . . . . . . . . . . . . 1145 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295.1 Overview: Hypothesis testing with generalized logistic mixedmodels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1305.2 Response to Research Question 1: Accuracy . . . . . . . . . . 1385.2.1 With vs. without context . . . . . . . . . . . . . . . . . 1425.2.2 With vs. without congruity . . . . . . . . . . . . . . . 1475.2.3 With vs. without segmental information . . . . . . . . 1515.2.4 With vs. without tonal information . . . . . . . . . . . 1555.2.5 Interim summary . . . . . . . . . . . . . . . . . . . . . 1595.3 Response to Research Question 2: Confusion patterns . . . . . 161ix5.3.1 How to read a confusion matrix . . . . . . . . . . . . . 1615.3.2 The Mantel test for comparing global similarity ofmatrices . . . . . . . . . . . . . . . . . . . . . . . . . . 1645.3.3 Comparison of confusion patterns . . . . . . . . . . . . 1755.3.4 Interim summary . . . . . . . . . . . . . . . . . . . . . 1875.4 Response to Research Question 3: Use of acoustic andsemantic cues . . . . . . . . . . . . . . . . . . . . . . . . . . . 1906 Discussion and conclusion . . . . . . . . . . . . . . . . . . . . . 1956.1 Summary of research findings . . . . . . . . . . . . . . . . . . 1956.2 Discussion and implications . . . . . . . . . . . . . . . . . . . 1986.2.1 Sound change trends in heritage Cantonese . . . . . . 1986.2.2 Tonal perception and heritage bilingualism . . . . . . 2006.2.3 Language pedagogy for heritage learners of Cantonese 2056.3 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209Appendix A Materials Used in the Experiment . . . . . . . . . . . 236A.1 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237A.2 Sentences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239A.3 Pictures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261A.4 Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270A.4.1 Written Cantonese . . . . . . . . . . . . . . . . . . . . 270A.4.2 Romanization . . . . . . . . . . . . . . . . . . . . . . . 271A.4.3 English translation . . . . . . . . . . . . . . . . . . . . 273A.5 Story used for the story listening task . . . . . . . . . . . . . . 274A.5.1 Written Cantonese . . . . . . . . . . . . . . . . . . . . 274A.5.2 Romanization . . . . . . . . . . . . . . . . . . . . . . . 274A.5.3 English Translation . . . . . . . . . . . . . . . . . . . . 275Appendix B Language background questionnaire . . . . . . . . . 276xList of TablesTable 1.1 Top five non-official languages spoken at home in Canadain the 2016 Census (Statistics Canada, 2017c) . . . . . . . 3Table 1.2 Top five non-official languages spoken at home in MetroVancouver in the 2016 Census (Statistics Canada, 2017b) . 3Table 1.3 Top 10 countries of birth of recent immigrants, 1981–2006(Statistics Canada, 2009) . . . . . . . . . . . . . . . . . . . 3Table 1.4 The phonemic consonant inventory of Cantonese . . . . . . 14Table 1.5 Examples of Cantonese syllables . . . . . . . . . . . . . . . 15Table 1.6 The phonemic tone inventory of Cantonese; tone numeralsare based on Bauer & Benedict (1997) . . . . . . . . . . . 16Table 1.7 Allotones in Cantonese; tone numerals are based on Bauer& Benedict (1997) . . . . . . . . . . . . . . . . . . . . . . . 19Table 1.8 Derived high rising tone in attenuative reduplication . . . . 19Table 1.9 The LSHK Cantonese Romanization Scheme (Jyutping) . . 20Table 3.1 The phonemic tone inventory of Cantonese . . . . . . . . . 53Table 4.1 A summary of stimulus types and procedures of the mainstudy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70Table 4.2 Summary of stimulus types, arranged by the anticipatedaccuracy gap between homeland and heritage speakers(from smallest to largest) . . . . . . . . . . . . . . . . . . . 78Table 4.3 Minimal sextuplets used in previous studies . . . . . . . . . 81Table 4.4 Tonal quadruplets used in the current study . . . . . . . . . 83xiTable 4.5 Comparison of word frequency per one million words andfamiliarity ratings of three English words (Gernsbacher,1984) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84Table 4.6 Word frequency of target words out of a total of 180,000word tokens in the Hong Kong Cantonese Corpus (Luke &Wong, 2015) . . . . . . . . . . . . . . . . . . . . . . . . . . 85Table 4.7 Examples of carrier phrases and (in)congruous target words 92Table 4.8 An example of how tone-button correspondence wascounterbalanced for a tone set . . . . . . . . . . . . . . . . 102Table 4.9 Examples of practice trials on Day 1 . . . . . . . . . . . . . 107Table 4.10 Examples of practice trials on Day 2 . . . . . . . . . . . . . 111Table 4.11 A sample of the third experimental block representing thetone set [2 3 4 6] . . . . . . . . . . . . . . . . . . . . . . . 113Table 4.12 Calculation of Subject #345’s language dominance score . 117Table 4.13 Age of included participants (in years) . . . . . . . . . . . 121Table 4.14 t-test comparison of homeland and heritage speakers’ self-rated language proficiency on a scale of 0–6: 0=“not wellat all” and 6=“very well” . . . . . . . . . . . . . . . . . . . 124Table 5.1 Fixed and random effects of three generalized logisticmixed models predicting accuracy . . . . . . . . . . . . . . 131Table 5.2 Summary of fixed effects of Model I, a generalizedlogistic mixed model that included the interaction ofis there tone and population, predicting accuracy . . . 134Table 5.3 Summary of fixed effects of Model II, a generalizedlogistic mixed model that included the interaction ofis there segment and population, predicting accuracy . 135Table 5.4 Summary of fixed effects of Model III, a generalizedlogistic mixed model that included the interaction ofcontext and population, predicting accuracy . . . . . . . 136Table 5.5 Recap of stimulus types and predicted results . . . . . . . . 139Table 5.6 Accuracy rates arranged by effect size in form of Cohen’sd (smallest to largest) . . . . . . . . . . . . . . . . . . . . . 140xiiTable 5.7 Interpretation of Cohen’s d (Cohen, 1988; Sawilowsky,2009) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141Table 5.8 Stimulus types with all acoustic information . . . . . . . . 144Table 5.9 Stimulus types with no tone . . . . . . . . . . . . . . . . . 145Table 5.10 Stimulus types with no segments . . . . . . . . . . . . . . . 147Table 5.11 Stimulus types with context and all acoustic information . 148Table 5.12 Stimulus types with context and tonal information only . . 150Table 5.13 Stimulus types with no context . . . . . . . . . . . . . . . . 152Table 5.14 Stimulus types with context and congruity . . . . . . . . . 153Table 5.15 Stimulus types with context but no congruity . . . . . . . . 155Table 5.16 Stimulus types with segmental information but no context 157Table 5.17 Stimulus types with segments, context, and congruity . . . 158Table 5.18 Summary of stimulus types and variables, arranged byeffect size in form of Cohen’s d (smallest to largest) . . . . 160Table 5.19 Interpretation of the Mantel r statistic (Mantel, 1967) . . . 167Table 5.20 Procedures to implement the Mantel test, adapted fromTang (2015) . . . . . . . . . . . . . . . . . . . . . . . . . . 168Table 5.21 Toy Matrix D at Step 1 (raw counts) . . . . . . . . . . . . . 168Table 5.22 Toy Matrix D after Step 2 (smoothing) . . . . . . . . . . . . 169Table 5.23 Toy Matrix D after Step 3 (proportion) . . . . . . . . . . . 171Table 5.24 Toy Matrix D after Step 4 (similarity) . . . . . . . . . . . . 172Table 5.25 Toy Matrix D after Step 5 (distance) . . . . . . . . . . . . . 174Table 5.26 Summary of Mantel test results comparing globalsimilarity of homeland and heritage speakers’ confusionmatrices; rows were arranged by r values (largest tosmallest) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177Table 5.27 Counts and percentages of different cues used byhomeland and heritage speakers for Type 5B (normal,incongruous sentences) and Type 6B stimuli (the last wordof the incongruous sentence has no segments) . . . . . . . 192xiiiTable 5.28 Counts and percentages of two types of incorrectresponses for Type 5B (normal, incongruous sentences)and Type 6B stimuli (the last word of the incongruoussentence has no segments) . . . . . . . . . . . . . . . . . . 194Table A.1 Words used in the main study . . . . . . . . . . . . . . . . 237Table A.2 Tonal quadruplets used in the current study (identical toTable 4.4) . . . . . . . . . . . . . . . . . . . . . . . . . . . 238xivList of FiguresFigure 1.1 Language development of homeland and heritagespeakers (adapted from Montrul, 2012) . . . . . . . . . . 7Figure 1.2 Geographical distribution of Yue subgroups based onWurm et al. (1987) . . . . . . . . . . . . . . . . . . . . . . 11Figure 1.3 The phonemic vowel inventory of Cantonese . . . . . . . 15Figure 1.4 Pitch contours of the six phonemic tones produced by afemale homeland speaker who participated in the currentstudy. The x-axis represents 100 equally spaced steps inthe vocalic portion of a syllable. Average f0 values wereextracted from 162 word tokens. Shaded areas aroundeach contour indicate values within a confidence intervalof 95%. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17Figure 2.1 The bilingual continuum (adapted from Valde´s, 2001, p.41) 35Figure 4.1 A summary of dependent and independent variables ofthis study . . . . . . . . . . . . . . . . . . . . . . . . . . . 72Figure 4.2 A screenshot of Pilot Study 1 . . . . . . . . . . . . . . . . 87Figure 4.3 Distribution of homeland and heritage speakers’familiarity ratings; 1=“not familiar at all”, 4=“veryfamiliar” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89Figure 4.4 Comparison of homeland and heritage speakers’ ratingsfor individual words; 1=“not familiar at all”, 4=“veryfamiliar” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90xvFigure 4.5 A screenshot of Pilot Study 2 . . . . . . . . . . . . . . . . 93Figure 4.6 Results of Pilot Study 2 . . . . . . . . . . . . . . . . . . . 94Figure 4.7 Spectrogram of the syllable fu2 (high rising tone,unmanipulated) . . . . . . . . . . . . . . . . . . . . . . . 99Figure 4.8 Spectrogram of the syllable fu2 (high rising tone, low-pass filter applied) . . . . . . . . . . . . . . . . . . . . . . 99Figure 4.9 Spectrogram of the syllable fu2 (high rising tone, pitchbeing reset at 200 Hz) . . . . . . . . . . . . . . . . . . . . 99Figure 4.10 Spectrogram of the syllable fu3 (mid level tone,unmanipulated) . . . . . . . . . . . . . . . . . . . . . . . 99Figure 4.11 A sample picture set: fu1 “exhale”, fu3 “pants”, fu4 “helpby holding another person’s arm”, and fu5 “woman” . . . 100Figure 4.12 Picture shown during the story listening task . . . . . . . 104Figure 4.13 An example of the picture learning task . . . . . . . . . . 105Figure 4.14 Pictures used in practice trials: zoeng1 “piece (of paper)”,zoeng2 “prize”, zoeng3 “sauce”, and zoeng6 “elephant” . . 106Figure 4.15 Procedures to screen and categorize participants . . . . . 116Figure 4.16 Language dominance scores of the two populations . . . . 122Figure 4.17 Self-rated language proficiency of homeland and heritagespeakers on a scale of 0–6; 6=“very well” and 0=“notwell at all” . . . . . . . . . . . . . . . . . . . . . . . . . . 125Figure 5.1 A sample boxplot . . . . . . . . . . . . . . . . . . . . . . . 142Figure 5.2 Comparison of stimulus types with all acoustic information144Figure 5.3 Comparison of stimulus types with no tone . . . . . . . . 145Figure 5.4 Comparison of stimulus types with no segments . . . . . . 147Figure 5.5 Comparison of stimulus types with context and allacoustic information . . . . . . . . . . . . . . . . . . . . . 148Figure 5.6 Comparison of stimulus types with no segments . . . . . . 150Figure 5.7 Comparison of stimulus types with no context . . . . . . . 152Figure 5.8 Comparison of stimulus types with context and congruity 153Figure 5.9 Comparison of stimulus types with context but no congruity155xviFigure 5.10 Comparison of stimulus types with segmental informationbut no context . . . . . . . . . . . . . . . . . . . . . . . . 157Figure 5.11 Comparison of stimulus types with segments, context,and congruity . . . . . . . . . . . . . . . . . . . . . . . . . 158Figure 5.12 How to interpret a confusion matrix . . . . . . . . . . . . 162Figure 5.13 Confusion matrices showing perfect accuracy andaccuracy at chance respectively . . . . . . . . . . . . . . . 163Figure 5.14 Confusion matrices showing two possibilities of T2-T5merger . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163Figure 5.15 Toy matrices demonstrating a strong correlation . . . . . . 166Figure 5.16 Toy matrices demonstrating a modest correlation . . . . . 166Figure 5.17 Confusion patterns of homeland and heritage speakersfor Type 6B stimuli . . . . . . . . . . . . . . . . . . . . . . 178Figure 5.18 Confusion patterns of homeland and heritage speakersfor Type 5B stimuli . . . . . . . . . . . . . . . . . . . . . . 178Figure 5.19 Confusion patterns of homeland and heritage speakersfor Type 3 stimuli . . . . . . . . . . . . . . . . . . . . . . . 180Figure 5.20 Confusion patterns of homeland and heritage speakersfor Type 1 stimuli . . . . . . . . . . . . . . . . . . . . . . . 181Figure 5.21 Confusion patterns of homeland and heritage speakersfor Type 2 stimuli . . . . . . . . . . . . . . . . . . . . . . . 183Figure 5.22 Confusion patterns of homeland and heritage speakersfor Type 5A stimuli . . . . . . . . . . . . . . . . . . . . . . 184Figure 5.23 Confusion patterns of homeland and heritage speakersfor Type 6A stimuli . . . . . . . . . . . . . . . . . . . . . . 185Figure 5.24 Confusion patterns of homeland and heritage speakersfor Type 4 stimuli . . . . . . . . . . . . . . . . . . . . . . . 186Figure 5.25 Confusion patterns of Subject #320 for Type 4 stimuli . . 187Figure 5.26 Comparison of cues used by homeland and heritagespeakers for Type 5B and Type 6B stimuli . . . . . . . . . 192Figure A.1 fan1 “share” . . . . . . . . . . . . . . . . . . . . . . . . . . 261Figure A.2 fan2 “powder” . . . . . . . . . . . . . . . . . . . . . . . . 261xviiFigure A.3 fan3 “sleep” . . . . . . . . . . . . . . . . . . . . . . . . . . 261Figure A.4 fan4 “tomb” . . . . . . . . . . . . . . . . . . . . . . . . . . 261Figure A.5 fan6 “portion” . . . . . . . . . . . . . . . . . . . . . . . . . 262Figure A.6 fu1 “exhale” . . . . . . . . . . . . . . . . . . . . . . . . . . 262Figure A.7 fu2 “tiger” . . . . . . . . . . . . . . . . . . . . . . . . . . . 262Figure A.8 fu3 “pants” . . . . . . . . . . . . . . . . . . . . . . . . . . 262Figure A.9 fu4 “help by holding another person’s arm” . . . . . . . . 263Figure A.10 fu5 “woman” . . . . . . . . . . . . . . . . . . . . . . . . . 263Figure A.11 fu6 “negative” . . . . . . . . . . . . . . . . . . . . . . . . . 263Figure A.12 ji1 “cure” . . . . . . . . . . . . . . . . . . . . . . . . . . . 263Figure A.13 ji2 “chair” . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Figure A.14 ji4 “infant/child” . . . . . . . . . . . . . . . . . . . . . . . 264Figure A.15 ji5 “ear” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Figure A.16 ji6 “two” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Figure A.17 se2 “write” . . . . . . . . . . . . . . . . . . . . . . . . . . 265Figure A.18 se3 “diarrhea” . . . . . . . . . . . . . . . . . . . . . . . . . 265Figure A.19 se4 “snake” . . . . . . . . . . . . . . . . . . . . . . . . . . 265Figure A.20 se5 “society” . . . . . . . . . . . . . . . . . . . . . . . . . . 265Figure A.21 se6 “shoot” . . . . . . . . . . . . . . . . . . . . . . . . . . 266Figure A.22 si1 “lion” . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Figure A.23 si2 “poop” . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Figure A.24 si3 “try” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266Figure A.25 si4 “key” . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267Figure A.26 si5 “market” . . . . . . . . . . . . . . . . . . . . . . . . . . 267Figure A.27 si6 “nurse” . . . . . . . . . . . . . . . . . . . . . . . . . . 267Figure A.28 zoeng1 “piece (of paper)” . . . . . . . . . . . . . . . . . . 268Figure A.29 zoeng2 “prize” . . . . . . . . . . . . . . . . . . . . . . . . 268Figure A.30 zoeng3 “sauce” . . . . . . . . . . . . . . . . . . . . . . . . 268Figure A.31 zoeng6 “elephant” . . . . . . . . . . . . . . . . . . . . . . 268Figure A.32 jan4 “human” . . . . . . . . . . . . . . . . . . . . . . . . . 269Figure A.33 taai3 joeng4 “the sun” . . . . . . . . . . . . . . . . . . . . 269Figure A.34 bak1 fung1 “the north wind” . . . . . . . . . . . . . . . . . 269xviiiThe LSHK CantoneseRomanization SchemeConsonantsIPA Jyutpingp bph pm mf ft dth tn nl lts ztsh cs sj jk gkh kN ngh hkw gwkwh kww wVowels and glidesIPA Jyutpingi: ie iy: yuu: uo uE: eO: oœ: oe8 eo5 aa: aaiw iu84 eoiu4 uiej eiOj oi5j ai5w aua:j aaia:w aauTonesIPA Jyutping55 (5)Ă£ 125 Ğ£ 233 (3) Ă£ 321 Ą£ 423 Ě£ 522 (2) Ă£ 6xixList of AbbreviationsBLP Bilingual Language Profilef0 Fundamental frequencyF2 Second formantGLMM Generalized logistic mixed modelHz HertzIPA International Phonetic AlphabetL1 First language (based on the order of acquisition)L2 Second language (based on the order of acquisition)LOC Linguistics Outside the ClassroomLSHK The Linguistic Society of Hong Kongms MillisecondsUBC The University of British ColumbiaVOT Voice onset timexxGlossarybilingual a person who uses two languages in everyday life,but does not necessarily have an equal mastery oflistening, speaking, reading, and writing skills forboth languagesBilingual LanguageProfilea questionnaire developed by Birdsong, Gertken &Amengual (2012) to elicit language backgroundinformation and assess language dominance on agradient scale using four criteria: language history,language use, language proficiency, and languageattitudesdominant language a person’s default or preferred language forspeaking and thinkingheritage language any language that has a cultural connection to anindividual’s family or community, but is not theprimary language used in government, education,and public communicationheritage speaker a bilingual who was exposed to a heritage languageearly in life, but later became dominant in themajority language of the societyxxihomeland speaker a person whose default or preferred language forspeaking and thinking is the language that s/hewas exposed to early in life; during the periodof exposure, this language not only has a culturalconnection to his/her family or community, but isalso the language of primary use in government,education, and public communicationlexical tone the use of pitch to distinguish meaning on the wordlevelmajority language the primary language used in government,education, and public communication in a societymerger a sound change whereby two or more contrastivephonological categories are replaced by a singlecategorytonal perception the process of extracting relevant auditory cuesfrom a continuous speech signal and mapping pitchattributes to discrete phonological categoriesNote: These are basic definitions. The complexities of these terms arediscussed in Chapter 2 and Chapter 3.xxiiAcknowledgmentsThis dissertation was completed thanks to the contribution of numerousmentors, colleagues, and friends. I owe my deepest gratitude to my co-supervisors Kathleen Currie Hall and Douglas Pulleyblank. I could notthank you enough for guiding me through the early stage of formulatingthe methodology of this study. Your input had made all the difference.I thoroughly enjoyed every meeting of ours, not only because of theintellectual stimulation (also known as “grilling”), but also because of yoursense of humour, which never failed to give me a warm boost of morale.Kathleen’s speedy and meticulous comments on my drafts were immenselyhelpful. The knowledge that I have gained was far beyond statisticalanalysis—in retrospect every small step paved the way to becoming abetter scientist with logical and critical thinking. Doug’s insightful feedbackreminded me to step back and look at the big picture whenever I wasstruggling to get the details right. I am extremely grateful for theencouragement that I received when I felt I was not smart enough. Fromdaily interactions with Kathleen and Doug I have learned that academiais not just about being smart—what’s more important are hard work,dedication, and humility. Thank you both so much for these life lessons.I would like to express my greatest appreciation to Molly Babel of mydissertation committee, who piqued my interest in speech perception andsociolinguistic variation. Your seminars on perceptual adaptation, attentionand salience, and heritage linguistics played a major role in inspiring me topursue this dissertation topic. Your passion and enthusiasm shone throughyour comments on my writing. Thank you for your comprehensive advicexxiiiand thoughtful suggestions.My funding sources certainly deserve acknowledgment. This studywas supported by awards and scholarships from the University of BritishColumbia and a research grant awarded to Douglas Pulleyblank by theSocial Sciences and Humanities Research Council of Canada [#435-2016-0369].Throughout my graduate career I had benefited from the wisdom ofmany professors. I am indebted to Martina Wiltschko, who taught me howto be an effective writer. I am appreciative of Hotze Rullmann’s skills ofcommunicating abstract ideas in the clearest way possible. My heartfeltthanks go to Ping Jiang, Virginia Yip, Robert Bauer, and Cathy Wong, whointroduced me to the fascinating world of linguistics and encouraged me topursue a PhD. I would like to acknowledge invaluable advice from MarjorieChan, who generously spent time to meet me during conferences. I amtremendously fortunate to have met Henry Yu, whose research on ChineseCanadian history gave me an interdisciplinary perspective on my research.I would not have been able to finish my PhD without peer support. Itwas an unforgettably humbling experience to study and work with BlakeAllen, Joash Johannes, Adriana Osa-Go´mez, Oksana Tkachman, SihweiChen, Roger Lo, Michael Fry, Andrei Anghelescu, and Natalie Weber at UBCLinguistics. I am also grateful for the moral support from Sweden Xiao,Michelle Chen, Kamila Kolpashnikova, Tomoharu Hirota, Irene Setiawan,Hyunju Kwon, Nathaniel Lim, Foong Yen Chong, Denise Chan, Siu PongCheng, and Hoi Wing Chan. Thank you all for giving me a push whenever Ineeded it.Special thanks to staff, friends, and students at UBC St John’s College,UBC Cantonese Language Program, UBC Asian Canadian and AsianMigration Studies, UBC Graduate Pathways to Success Program, as well asmembers of the Vancouver Chinatown community. You gave me momentumto make academic research accessible to the general public.[To my family] 最後我衷心感激家人無限的支持和包容,給我空間和自由追尋自己的夢想。這些年來陪伴你們的時間少了,但你們對我的關愛不僅沒有減退,反而與日俱增。你們永遠是我每天奮鬥的動力。xxivFor my parents謹以此文獻給先父及家母xxvChapter 1IntroductionMigration of linguistic communities has led to various language contactphenomena throughout history: cross-continental trade gave rise to pidginsand creoles (Holm, 1989); early maritime explorers brought loanwordsinto their language out of the need to communicate (Cutler, 2000), andcolonization and empire created varieties of English around the globe(Schneider, 2007). This dissertation is generally about immigration—therelevant keyword for the current century—and its byproduct, a populationcalled heritage speakers. The general questions asked in this study are:what happens when children of Cantonese-speaking immigrants grow upin a predominantly English-speaking society? As they enter adulthood,does English affect the way they perceive speech sounds in Cantonese? Inparticular, will their extensive exposure to a language that does not use pitchto distinguish word meaning (English) affect their perception of a languagethat uses pitch to distinguish word meaning (Cantonese)?1.1 Cantonese in CanadaA linguistically diverse country like Canada is an ideal place to studylanguage contact induced by immigration. While English and French areofficial languages, 207 other languages were reported as a mother tonguein the 2016 Census (Statistics Canada, 2017e). These 207 non-official1languages can be divided into two groups: 67 of them are Aboriginallanguages, which are traditionally spoken by the First Nations, the Me´tis,and the Inuit peoples. The other 140 non-official languages are non-Aboriginal. According to the 2016 Census, 7.3 million people, which is21.1% of the Canadian population, speak a non-official and non-Aboriginallanguage at home (Statistics Canada, 2017a).Among these 140 non-official and non-Aboriginal languages, Cantoneseis the second most-spoken home language in Canada with a population of594,705 (see Table 1.1). In the Metro Vancouver area, Cantonese is themother tongue of a population of 193,030 (see Table 1.2). Many of theseCantonese speakers came from Hong Kong in the 1980s and 1990s. AsTable 1.3 shows, Hong Kong was the top origin of immigrants in 1991 and1996 (Statistics Canada, 2009). These were the years around major politicalincidents, such as the Tiananmen Massacre in 1989, and the handover ofHong Kong’s sovereignty from the United Kingdom to the People’s Republicof China in 1997 (Li, 2005; Wong, 1999). In the period of 1980–2006,the total number of people who emigrated from Hong Kong to Canada was215,430 (Statistics Canada, 2012).1.2 Heritage speakers of CantoneseThe focus of this dissertation is individuals who grew up in Canada beingraised by Cantonese-speaking parents who had emigrated from Hong Kong.For the purpose of the current study, these individuals are hereafter referredto as heritage speakers, and Cantonese is considered their heritage language.These terms have long been used in discussions on the social politics ofmultilingualism, language education and pedagogy, especially in NorthAmerica (Cummins & Danesi, 1990; Kagan & Dillon, 2001; Peyton, Ranard& McGinnis, 2001, among others). In recent years, they have also beenused in linguistics (Benmamoun, Montrul & Polinsky, 2010, 2013b; Nagy,2015, among others). Definitions vary from source to source, but in themost general sense, heritage languages are “languages other than the defacto dominant language in a given social context” (Kelleher, 2010, p.1).2Table 1.1: Top five non-official languages spoken at home in Canada inthe 2016 Census (Statistics Canada, 2017c)Language Population1. Mandarin 641,1002. Cantonese 594,7053. Punjabi 568,3754. Spanish 553,4955. Tagalog 525,375Table 1.2: Top five non-official languages spoken at home in MetroVancouver in the 2016 Census (Statistics Canada, 2017b)Language Population1. Cantonese 193,0302. Mandarin 180,1703. Punjabi 163,4004. Tagalog 78,8305. Korean 47,920Table 1.3: Top 10 countries of birth of recent immigrants, 1981–2006(Statistics Canada, 2009)1981 1991 1996 2001 20061. U.K. Hong Kong Hong Kong P.R. China P.R. China2. Vietnam Poland P.R. China India India3. U.S.A. P.R. China India Philippines Philippines4. India India Philippines Pakistan Pakistan5. Philippines Philippines Sri Lanka Hong Kong U.S.A.6. Jamaica U.K. Poland Iran S. Korea7. Hong Kong Vietnam Taiwan Taiwan Romania8. Portugal U.S.A. Vietnam U.S.A. Iran9. Taiwan Lebanon U.S.A. S. Korea U.K.10. P.R. China Portugal U.K. Sri Lanka Colombia3For a detailed discussion on the definitions of heritage languages and theirspeakers from a linguistic perspective, see Chapter 2.The term heritage language can also be defined by what it is not. First, aheritage language is not “foreign” to its speakers due to personal or culturalconnections. Second, an ethnolinguistic community may be a “minority”according to official definitions but not in a sociological sense if its economicpower is taken into consideration. For example, Chinese is officially avisible minority according to the Government of Canada (Statistics Canada,2017d). However, in the past two decades, Chinese immigrants to Canadahave brought along wealth and labour, and so China is regarded as a “titaniceconomic power” that has impacted the economy of Metro Vancouver(Barnes & Hutton, 2016, p.11). In addition, languages of visible minoritiesmay be the most-spoken mother tongues in certain regions, and so theyare not “minority” in a numerical sense either. For example, in Richmond,British Columbia, 44.8% of residents speak a variety of Chinese1 as theirmother tongue, and 33.1% speak English as their mother tongue (Cityof Richmond, 2017, p.4). Taken together, “minority language” is not anappropriate label from both socioeconomic and numerical perspectives.Lastly, specifically for the context of Cantonese in Metro Vancouver,the term “immigrant language” could be controversial. According to theGovernment of Canada, immigrant languages in Canada are defined asthose “whose presence is initially due to immigration after English andFrench colonization” [italics by author] (Statistics Canada, 2017b, para. 2).However, languages from southern China, including varieties of Cantonese,had been brought to the Pacific Northwest initially by miners and railroadbuilders since the first half of the 19th century (Yu, 2011) before BritishColumbia joined Confederation on 20 July, 1871 (Ormsby, 1958). It isthus questionable whether Cantonese meets this definition if it had beenspoken before the region became part of Canada. Moreover, the relationbetween colonial settlers and Aboriginal peoples is a complex issue. Unlikeother parts of Canada, 95% of British Columbia is on the unceded territory1In the report by the City of Richmond, varieties of Chinese such as Cantonese andMandarin are categorized into one group, namely “Chinese”.4of the First Nations peoples (City of Vancouver, 2014), which means theland has never been surrendered or given away to the colonizers. It istherefore debatable who is an “immigrant” on Aboriginal lands. Since thisdissertation focuses on the linguistic but not the political aspect of the issue,the term heritage language is used throughout the subsequent discussion toavoid tangential controversies.Anecdotal reports point out that some heritage speakers cannot speakCantonese fluently even though their parents do. Usually they can expressthemselves better in the majority language of the society, which is Englishin the context of Metro Vancouver. Below are three Vancouver-based mediastories, in which the interviewees’ Cantonese proficiency varies from semi-fluent to mere passive knowledge:I usually won’t be shy [to speak Cantonese] because if the personI’m talking to only speaks Cantonese, I think the person wouldappreciate... even if I have an accent, or can’t fully expressmyself. But if we are talking about my younger brother... I thinkhe might not even try. (Cheong & Lee, 2015)Doris Chow understands the words but can’t speak Cantonese,the first language she learned in life, fluently. It’s a language thatis slowly disappearing from Vancouver but the city’s Chinatownis one place where Cantonese still rules... (Li, 2016, para. 1)I am Russell... I am Cantonese myself, but I can’t really speakthe language... My parents having both immigrated from HongKong... Like a lot of other Chinese Canadians, I went to Chineseschool from a young age, but since I only spoke English withmy parents, I failed to pick up the language that well. Myfamily’s loss of a language over just one generation intriguedme. (Chiong et al., 2017)The phenomena associated with heritage speakers are not onlyintriguing on a personal level to Russell from the third story above, but5also to linguists from an academic perspective. How are heritage speakersdifferent from Cantonese speakers who grew up in Hong Kong? SinceCantonese tones are notoriously difficult for second language learners(Boyle, 1997), do heritage speakers face similar difficulties despite havingearly exposure to Cantonese at a young age? These general questions willbe narrowed down in the next section.1.3 Goals of this dissertationThis dissertation investigates how Cantonese lexical tones are perceived bytwo populations of Cantonese-English bilinguals with varying configurationsof language dominance, as shown in Figure 1.1. The first population,referred to as homeland speakers, consists of Cantonese speakers who grewup in Hong Kong, where English is taught in school as a second language.They are commonly considered the prototypical native speakers. The secondpopulation, known as heritage speakers, refers to individuals who grewup in a Cantonese-speaking household in Canada, a country on the otherside of the Pacific Ocean. These speakers were exposed to Cantonese in afamily setting from early childhood. Up to that point their development ofCantonese was similar to that of homeland speakers. However, after theonset of schooling, English started to become their most frequently usedlanguage in day-to-day situations. As a result, heritage speakers could bemore comfortable with using English than Cantonese. Particularly of notefor this study is that Cantonese is a tone language, while English is not.This raises an interesting question: as Cantonese is gradually becomingthe weaker language of heritage speakers, does their ability to perceiveCantonese tonal contrasts also become weaker as they enter adulthood?Given such differences in exposure and domains of language usebetween homeland and heritage speakers, the current study aims atanswering the following questions:(1) Research questions of the current studya. Do homeland and heritage speakers behave differently in terms of6their ability to identify tonally contrastive words?b. Do homeland and heritage speakers exhibit similar confusionpatterns with respect to lexical tone perception?c. Do homeland and heritage speakers make use of the same typeof information when identifying a word from a tonally contrastiveset? In particular, are acoustic and semantic information equallyuseful?Figure 1.1: Language development of homeland and heritage speakers(adapted from Montrul, 2012)A personal anecdote that inspired me to ask these questions could be agood illustration of the linguistic phenomenon in question. As a homelandspeaker who lives in Canada as an international student, I often interactwith heritage speakers born and raised in Canada. At a dinner gatheringin Vancouver Chinatown, I was ordering food in Cantonese for a group ofheritage speakers. One of the dishes was called gon1 bin1 sei3 gwai3 dau2“stir-fried green beans”, in which sei3 gwai3 means “four seasons”, as in(2a)2. Both sei3 and gwai3 are produced with Tone 3, the mid level tone.When I said the name of this dish, one of my friends could not help laughing.2Cantonese words and sentences in this dissertation are romanized following the LSHKCantonese Romanization Scheme unless otherwise specified. For details, see the explanationin Section 1.4.6.7Seeing my puzzled face, he said in English, “Did you just say goddamnbeans? That’s so funny.” After processing his reaction for five seconds, Irealized that he probably heard something like (2b). Sei2 gwai2 “goddamn”is produced with Tone 2, the high rising tone. These two phrases involvetwo tonally contrastive minimal pairs: the first pair is sei3 “four” and sei2“die”, and the second pair is gwai3 “season” and gwai2 “ghost”.(2) a. gon1drybin1stir-frysei3 gwai3four seasondau2bean‘stir-fried green beans’b. gon1drybin1stir-frysei2 gwai2die ghostdau2bean‘stir-fried goddamn beans’It never occurred to me, someone from Hong Kong, that these twophrases could make a good pun, because the mid level tone and the highrising tone are perceptually quite distinct to me. My friend, however, musthave perceived them as similar sounds to find it funny. In other words, theremust be some differences between the perception of tonally contrastiveminimal pairs by me (a homeland speaker) and by my friend (a heritagespeaker). Using this example, I exemplify my research questions in (3):(3) An instantiation of the research questions in (1)a. Are both populations equally able to identify sei2 “die” and sei3“four”?b. Is it possible that certain minimal pairs (e.g. sei3-sei2) are moreconfusing for one population than for the other?c. Consider a semantically congruous sentence such as jat1 nin4 jau5sei3 gwai3 “there are four seasons in a year”. If both populationsare able to identify the underlined target word as sei3 “four” butnot sei2 “die”, are they using the same information to reach this8conclusion? Do they rely on the acoustic cues (i.e. what theyheard) or the semantic information (i.e. what makes sense)?1.4 The Cantonese languageThis section introduces aspects of the Cantonese language that are necessaryfor understanding this dissertation. It addresses three questions. First,where is Cantonese spoken? The second question is asked twice with adifferent emphasis: is Cantonese a language (or multiple languages)? IsCantonese a language (or dialect)? Lastly, what are the linguistic features ofthis language?1.4.1 Origin and spreadCantonese, also known as Yue (粵), belongs to the Sinitic branch of theSino-Tibetan language family. Originating from southern China, it is namedafter Canton (or Guangzhou in Mandarin), the capital city of the Guangdongprovince (Chao, 1947). According to Crissman (2012), Cantonese is spokenby 59.58 million people in the province of Guangdong, the Guangxi ZhuangAutonomous Region, the Hong Kong Special Administrative Region, andthe Macau Special Administrative Region of the People’s Republic of China.Since migrants from Guangdong dominated trans-Pacific labour migrationsfrom the 19th to early 20th century (Yu, 2011), Cantonese communities canalso be found in Malaysia, Singapore, Vietnam, Thailand, Brunei, Indonesia,the Philippines, the Netherlands, Australia, Canada and the United States(cf. Chau, 2011; Clyne & Kipp, 1997; Hashimoto, 1972; Lewis, Simons& Fennig, 2009; Tan, 2005). The total linguistic population worldwide isestimated to be 73.76 million (Lewis et al., 2009).1.4.2 Is Cantonese a language?The term Cantonese is ambiguous since it can be used in either a broador narrow sense in the literature. As an umbrella term it refers to theYue dialect group as a whole. According to Wurm, Li & Baumann (1987),9Yue can be classified into seven subgroups: Guangfu (廣府), Siyi (四邑),Gaoyang (高陽), Goulou (勾漏), Wuhua (吳化), Qinlian (欽廉) and Yongxun(邕潯). Their geographical distribution is shown on the map in Figure 1.2alongside other non-Yue Sinitic languages, namely Min, Hakka, Ping andMandarin. For detailed discussions on the phonological differences andgrammatical diversity across these subgroups, see Hashimoto (1972) andKwok, Chin & Tsou (2016) respectively.Among the seven subgroups, Siyi and Guangfu are most relevant to thediscussion of overseas Cantonese communities. Historically, according to Yu(2011), 80% of the Chinese migrants to Canada between 1910 and 1923were from the counties of Taishan (台山), Kaiping (開平), Xinhui (新會),and Enping (恩平), all of which fall into Region 2 in Figure 1.2, whereSiyi (四邑, literally “four counties”) Cantonese is spoken. Significant SiyiCantonese communities can also be found in continental United States dueto a similar migration history (Chao, 1947). However, in Honolulu, Hawaii,most Chinese migrants were from Zhongshan (中山) (Yu, 2011), which fallsinto the area of the Guangfu subgroup. In the 1980–1990s, migrants fromHong Kong to Canada (the focus of this dissertation), the United States andAustralia mostly spoke Guangfu Cantonese, which is marked as Region 1 inFigure 1.2. Therefore in sociohistorical discussions of overseas Cantonesecommunities, the general term “Cantonese” does not always refer to thesame language variety.Linguistically, the Siyi and Guangfu varieties are not entirely mutuallyintelligible, although they share cognates, syntactic and phonologicalfeatures. According to a listening comprehension test by Szeto (2000),spoken Taishanese (which belongs to the Siyi subgroup) is 31.3% intelligibleto speakers of Hong Kong Cantonese (which belongs to the Guangfusubgroup).10Figure 1.2: Geographical distribution of Yue subgroups based on Wurm et al. (1987)Image from Iacoponi (2012), used under Creative Commons Attribution 3.0 Unported License11When used in its narrow sense, Cantonese refers to the specific varietyspoken in Guangzhou, Hong Kong and Macau3 within the Guangfusubgroup. It is considered a “genuine regional standard” (Ramsey, 1987,p.99) that enjoys a more prestigious social status compared with othervarieties of Yue. It is not only the lingua franca for doing business inSouthern China (Bauer & Benedict, 1997), but also the language usedin television programs produced by Television Broadcasts Limited, themajor exporter of Cantonese popular culture from Hong Kong to diasporacommunities all over the world (To & Lau, 1995). This variety is thus calledStandard Cantonese in many linguistic studies (Fung, 2000; Lee, 1993;Yu, 2007, among others), even though there has never been a languagestandardization agency for Cantonese, nor is it the official language of anyindependent nation.Although Guangzhou is the origin of Standard Cantonese, its status asthe centre of Cantonese language and culture has shifted away to Hong Kongsince the 1950s (Bauer, 2016). The first reason is the adoption of Mandarin,officially known as Putonghua (普通話, literally “common speech”), as theonly national language of the People’s Republic of China. Second, theinflux of non-Cantonese-speaking workers from other provinces in Chinaweakened the dominance of Cantonese in Guangzhou. Before 1997 HongKong was a British colony and so Cantonese was able to maintain its statusas the city’s de facto official language, even though it has never been ade jure one (Bauer & Benedict, 1997). Considering the pervasive use ofCantonese in various domains (e.g., government, education, mass media,cultural industry), Bolton (2011) even calls Hong Kong the “Cantonese-speaking capital of the world” (p.9).Spoken Cantonese, in both its broad and narrow sense, is mutuallyunintelligible to Mandarin (Bauer, 2016; Bauer & Benedict, 1997; Cheng3The varieties spoken in these three regions are very similar. However, due to its colonialhistory, Hong Kong Cantonese is characterized by as many as 700 loanwords from English(Wong, Bauer & Lam, 2009), which played a role in expanding the language’s syllabary(Bauer, 1985), setting Hong Kong Cantonese apart from Guangzhou or Macau Cantonese.For a detailed explanation of the phonetic and phonological differences between thesevarieties, see Gui (2005).12& Tang, 2016a; Matthews & Yip, 2013; Olson, 1998). Both Cantoneseand Mandarin belong to the Sinitic branch of the Sino-Tibetan family, butthey have different sound systems, vocabulary, and morphosyntax (Cheng& Tang, 2016a; Tang & Cheng, 2014). Despite the linguistic differences,Cantonese is often considered a Chinese dialect by the general public(including the speakers themselves) due to similarities in the orthography,as well as sociocultural and political reasons. For a detailed discussion onthe language versus dialect debate, see Cheng & Tang (2014). Some authorspoint out that the English term dialect is not an accurate translation forthe Chinese term fangyan (方言, literally “regional speech”), and thus haveproposed other terms such as regionalect (DeFrancis, 1986) and topolect(Mair, 1991). For the purpose of the current linguistic study, Cantoneseis referred to as a language throughout this dissertation.Since this study focuses on Cantonese speakers from Hong Kong andand heritage speakers whose parents are from Hong Kong, the relevantlanguage variety is Hong Kong Cantonese, hereafter simply referred to asCantonese unless otherwise specified. The following subsections will providean introduction to the phonology of this language variety.1.4.3 Segmental inventoriesSince tone is the focus of this study, the segmental phonology of Cantonesewill be introduced only briefly. There are 19 phonemic consonants inCantonese (cf. Bauer & Benedict, 1997; Cheng & Tang, 2016a; Zee, 1991),as listed in Table 1.44. For stops and affricates, aspirated and unaspiratedconsonants with the same place of articulation are contrastive. In otherwords, /p/ and /ph/ are two different phonemes, and so are /ts/ and /tsh/.All 19 consonants can occur syllable-initially, but only /p, t, k, m, n, N/can be syllable-final. When occupying the syllable-final position, /p, t,k/ are realized as unreleased [p^, t^, k^] or as a glottal stop [P]. Anothercharacteristic of Cantonese is syllabic nasals. Only /m/ and /N/ can standalone as syllabic consonants [m"] and [N"]. Lastly, semi-vowels /j,w/ can be4On the consonant chart of Zee (1991), /l/ is described as denti-alveolar, while /ts, tsh/are between alveolar and postalveolar. These are collapsed into “alveolar” in Table 1.4.13offglides. When /j/ is an offglide that follows a rounded non-front nuclearvowel /8/ or /u/, it is phonetically realized as a labial-palatal approximant[4].Intertalker variation has been observed for the phonetic realization ofthese phonemes. First, the sibilants /s, ts, tsh/ have free variants [S, tS,tSh] when followed by a rounded vowel. Some speakers pronounce /kw/as [k], /kwh/ as [kh], syllable-initial /n/ as [l], syllable-initial /N/ as [P],and syllable-final /N/ as [n] (Bauer & Benedict, 1997; Cheung, 2007; Zee,1999). Such variation is documented by linguists as ongoing mergers (Law,Fung & Bauer, 2001), but they are often stigmatized as “lazy” or “sloppy”pronunciation in the speaker community (Bauer, 2016).Table 1.4: The phonemic consonant inventory of CantoneseManner ofarticulationPlace of articulationLabial Alveolar Palatal Velar Labial-velar glottalStop- Unaspirated p t k kw- Aspirated ph th kh kwhNasal m n NFricative f s hAffricate- Unaspirated ts- Aspirated tshApproximant l j wThe classification of phonemic and allophonic vowels is not asstraightforward as the consonants, but most works agree that there are 11vowel phonemes (Bauer & Benedict, 1997; Cheng & Tang, 2016a), whichare shown in Figure 1.35. Although vowel length is not contrastive, /i:, y:,u:, E:, O:, œ:, a:/ are conventionally marked with an optional long diacritic5Some works transcribe the vowels differently. For example, Hashimoto (1972) uses /ø/instead of /8/. Zee (1991) uses /I/ instead of /e/, and /U/ instead of /o/. Barrie (2003)uses /2/ instead of /5/.14to reflect their phonetic property (Zee, 1991). For a detailed discussion ondifferent ways to analyze the vowel system, see Barrie (2003) and Bauer &Benedict (1997).Figure 1.3: The phonemic vowel inventory of Cantonese1.4.4 Syllable structureA canonical Cantonese syllable is maximally CV(C), and syllabic nasalsN can also be found. Consonant clusters are rare and they only occurin ideophones, onomatopoeia, and loanwords (Bauer, 1985). Glidingvowels [iw, 84, u4, ej, Oj, 5j, 5w, a:j, a:w] are described as diphthongsin language textbooks (Matthews & Yip, 2013); however, since [j, w] arenever followed by a consonant, these glides can be phonologically analyzedas coda consonants (Bauer & Benedict, 1997). Examples of each syllabletype are listed in Table 1.5.Table 1.5: Examples of Cantonese syllablesSyllable type Example GlossCV open syllable [sa:55] “sand”CVC[j,w] closed syllable with an offglide [sa:j55] “waste”CVC[m,n,N] closed syllable with a nasal coda [sa:m55] “three”CVC[p,t,k] closed syllable with an obstruent coda [sa:t3] “kill”N syllable with a nasal nucleus [m"21] “not”15The duration of syllables with an obstruent coda is found to besignificantly shorter than those with a sonorant coda (Kao, 1971). Thisinteracts with the tone system: tones of syllables ending with [p, t, k]have “checked” or “entering” tones, as in [sa:t3] in Table 1.5. This willbe elaborated in the next subsection.1.4.5 Tones6Cantonese is a tone language, which means that changing the pitch of aword also changes the meaning of the word (Yip, 2002). For example, asshown in Table 1.6, when the syllable [sE:] is produced with a high levelpitch, it means “some”, but when it is produced with a low level pitch, itmeans “shoot”. The way it works is analogous to changing vowels in English:when the low vowel [æ] in “bad” [bæd] is changed to the low-mid vowel[E], as in “bed” [bEd], the meaning of the word changes completely from“the opposite of good” to “an object where we sleep on”. In other words,just like consonants and vowels, tone is contrastive.Table 1.6: The phonemic tone inventory of Cantonese; tone numeralsare based on Bauer & Benedict (1997)Tone Description Tone numerals Example Gloss1 high level 55 [sE:55] “some”2 high rising 25 [sE:25] “write”3 mid level 33 [sE:33] “diarrhea”4 low falling 21 [sE:21] “snake”5 low rising 23 [sE:23] “society”6 low level 22 [sE:22] “shoot”There are six phonemic lexical tones in Cantonese (Bauer & Benedict,1997), which are presented in Table 1.6. Each of their pitch contours isshown in Figure 1.4. The first tone (T1) is the high level7 tone which marks6This subsection is a basic introduction of Cantonese tones. A detailed literature reviewon Cantonese tonal perception will be presented in Chapter 3.7Historically, T1 is a high falling tone [51] (Chao, 1947). Some Guangzhou speakersand older Hong Kong speakers still produce the high falling tone. Since the current studyfocuses on college-age individuals, the high falling tone is treated as a free variant of T1.16Figure 1.4: Pitch contours of the six phonemic tones produced by afemale homeland speaker who participated in the current study.The x-axis represents 100 equally spaced steps in the vocalicportion of a syllable. Average f0 values were extracted from 162word tokens. Shaded areas around each contour indicate valueswithin a confidence interval of 95%.the upper boundary of the tonal space. According to the tone numeralsystem originally developed by (Chao, 1947), T1 is represented as [55],meaning that it starts with a high pitch and also ends with a high pitch. InFigure 1.4, the ending pitch is slightly lower than the starting pitch arguablydue to declination (Li, Lee & Qian, 2002; Wong, 2006), whereby a dropin subglottal pressure over the course of an utterance leads to a drop infundamental frequency (Ladd, 1984; Lieberman, 1966). However, since theeffect is phonetic, the ending pitch is still represented as [5].The second tone (T2) is the high rising tone. Although Chao (1947) andearlier works represent it as [35], Bauer & Benedict (1997) provide phoneticevidence to show that the starting point of the high rising tone is as low asthe starting point of the low rising tone [23]. They suggest that this contourtone may have undergone a sound change from 1947 to the 21st century.17This contour tone ends with a pitch that is as high as the ending pitch of T1.Therefore, the tone numerals for the high rising tone are [25].The third tone (T3) is the mid level tone. Its starting pitch is higherthan that of the low falling tone [21], low rising tone [23], and the lowlevel tone [22], and so the tone numerals for T3 are [33]. Similar to T1,T3’s ending pitch is slightly lower than its starting pitch due to phoneticdeclination effects. Its ending pitch is still represented as [3].The fourth tone (T4) is the low falling tone, which marks the lowerboundary of the tonal space. Its starting pitch is similar to that of the lowrising [23] tone and low level tone [22]. Therefore, the tone numeralsfor T4 are [21]. Previous studies found that T4 is sometimes produced withcreaky voice (Yu & Lam, 2014). They also point out that adding creaky voiceto T4 enhances its accuracy rate in a tone identification task. In particular,it helps to distinguish T4 from T6, the low level tone.The fifth tone (T5) is the low rising tone. It is represented as [13] insome works (e.g. Cheung, 2007), but is more commonly represented as[23] (e.g. Bauer & Benedict, 1997). As shown in Figure 1.4, the first 40% ofits pitch contour is the same as T2 high rising. Both T2 and T5 have a risingcontour, and they only differ by the magnitude of pitch change towards theend of the tone. T5 has a lower ending pitch than that of T2, and so its tonenumerals are [23]. Ciocca & Lui (2003) found that the distinction betweenT2 and T5 is the hardest for children acquiring Cantonese. By age 10, mostchildren are able to achieve an accuracy rate of 90% for most tone pairs,but for T2-T5, the accuracy rate is just 70%. It is also well documentedthat even adult native speakers are merging the two rising tones (Bauer,Cheung & Cheung, 2003; Fung & Wong, 2011; Mok, Zuo & Wong, 2013),which suggests an ongoing sound change in its initial stage. For a detaileddiscussion, see Section 3.3.2.The sixth tone (T6) is the low level tone. Its starting pitch is a bit higherbut still similar to that of T4 and T5, so the tone numerals for T6 are [22].Similar to the other level tones, declination effects can be seen towards theend of the tone.Apart from the six phonemic tones, Cantonese has three allotones, also18known as checked or entering tones (Chao, 1947). They are T7 high checked[5], T8 mid checked [3], and T9 low checked [2]. They only occur insyllables with an obstruent coda [p, t, k]. Since they have a shorter duration(Zee, 1991), they are only represented by one-digit tone numerals, as inTable 1.7. These checked tones are allophonic to their respective level tone:the high checked tone (T7) is an allotone of the high level tone (T1), midchecked (T8) is an allotone of mid level (T3), and lastly, low checked (T9)is an allotone of low level (T6). These three allotones are not the focus ofthe present study, and none of the target words used in the experiment hadchecked tones. For a detailed discussion of the history and phonology ofchecked tones, see Bauer & Benedict (1997).Table 1.7: Allotones in Cantonese; tone numerals are based on Bauer& Benedict (1997)Tone Description Tone numeral Example Gloss7 high checked 5 [sek5] “colour”8 mid checked 3 [sEk3] “kiss”9 low checked 2 [sEk2] “stone”In general, there is no tone sandhi in Cantonese, except that certainmorphological processes may change a word’s tone into the high risingtone. An example of such processes is attenuative reduplication, whichis illustrated in Table 1.8. None of the sentences used as stimuli inthe current study involved these morphological processes. For details ofmorphologically induced tone change, see Bauer & Benedict (1997) and Yu(2007).Table 1.8: Derived high rising tone in attenuative reduplicationRegular adjective Gloss Reduplicated form Gloss[kw5j33] “costly” [kw5j33 kw5j25 tej25] “a little bit costly”[ha:m21] “salty” [ha:m21 ha:m25 tej25] “a little bit salty”[lok22] “green” [lok22 lok25 tej25] “a little bit green”191.4.6 RomanizationAlthough numerous ways to romanize Cantonese have been proposed bymissionaries and authors of language textbooks, the Jyutping (粵拼) systemdeveloped by the Linguistic Society of Hong Kong, officially known as theLSHK Cantonese Romanization Scheme (Tang et al., 2002), is the de factostandard in academia (Cheng & Tang, 2016b). For this reason Cantonesewords and sentences of the current study are romanized following theJyutping convention. Table 1.9 is a list of Jyutping symbols and theircorresponding International Phonetic Alphabet (IPA) symbols.Table 1.9: The LSHK Cantonese Romanization Scheme (Jyutping)ConsonantsIPA Jyutpingp bph pm mf ft dth tn nl lts ztsh cs sj jk gkh kN ngh hkw gwkwh kww wVowels and glidesIPA Jyutpingi: ie iy: yuu: uo uE: eO: oœ: oe8 eo5 aa: aaiw iu84 eoiu4 uiej eiOj oi5j ai5w aua:j aaia:w aauTonesIPA Jyutping55 (5)Ă£ 125 Ğ£ 233 (3) Ă£ 321 Ą£ 423 Ě£ 522 (2) Ă£ 6Readers who are familiar with IPA but unfamiliar with Jyutping may payattention to the following points. First, unaspirated stops and affricates [p,t, k, kw, ts] are romanized as b, d, g, gw, z. Note that there are no voiced20obstruents in Cantonese and the romanization does not reflect the actualvoicing properties of these sounds. Second, syllable-initial [j] is also j inthe romanization, but [j] as an offglide is romanized as i. Similarly, [w] isromanized as u when it is an offglide. Third, [5] and [a:] are represented bya and aa respectively in Jyutping. Fourth, [i:] and [e] are both romanizedas i, and [u:] and [o] are both romanized as u. This is because the twovowels in each pair are in complementary distribution: [e] and [o] onlyoccur before velars, and [i:] and [u:] occur elsewhere. For a detailedanalysis of these alternations, see Bauer & Benedict (1997). Since [e, o]are accounted for, the Jyutping symbol e and o actually represent [E:] and[O:] respectively. Last but not least, tones are represented by numbers. Theromanization scheme does not distinguish checked tones from regular tones,since they can be predicted by the segmental environment. Therefore both55 (high level) and 5 (high checked) are represented as 1.1.4.7 Writing systemAlthough the current study focuses on the spoken language, two concernsregarding the writing system need to be explained as they will be relevantin Chapter 4 with regard to research methodology.The dominant writing script in Hong Kong is Traditional Chinese8.The non-alphabetic orthography is notoriously complex, as characters arelearned as logograms (Baron & Strawson, 1976). In particular, 90% ofChinese characters are ideophonetic compounds, while 10% of them arepictographs or ideographs (Zhu, 1987). According to a corpus study ofChinese textbooks in Hong Kong, a primary school student will have beenintroduced to 2,570 to 3,844 characters by the end of the sixth school year(Chung & Leung, 2008). It is important to note that heritage speakers ofCantonese (or any variety of Chinese) do not acquire reading and writingskills by being exposed solely to the spoken language at home. Xiao (2006)compares the performance of non-heritage and heritage students (who8In recent years, Simplified Chinese has become more visible due to the influx oftourists from mainland China (Choi, Liu, Pang & Chow, 2008). Readers may also note thatGuangzhou Cantonese speakers write in Simplified Chinese.21were exposed to some variety of Chinese at home) in an intensive Chinesecourse in a New England university, and found that the heritage studentsdid significantly better in speaking and listening than reading and writing.This point is crucial to some of the methodological decisions made for thepresent study, which are elaborated in Section 4.2.2 and Section 4.4.1.2. Inparticular, since not all heritage speakers have reading proficiency, picturesinstead of Chinese characters were used.The issue of writing is complicated by the fact that there is a distinctionbetween two kinds of “written Cantonese”. The first is simply writingCantonese the way it is spoken, which often contains lexical items thatare unfamiliar to Mandarin speakers. An example is provided in (4),which shows how “to be available” is expressed in spoken Cantonese.This way to write is common in informal contexts, such as texting withfriends and family, or discussion on social media and internet forums.Snow (2004) comments that written Cantonese is somewhat stigmatizeddue to its connection with lower-class life. The second kind of “writtenCantonese” is the way that educated people typically write, and refers towritten Standard Chinese, which can then be read aloud with Cantonesepronunciation. In formal contexts such as official notices and academicwriting, Standard Chinese grammar must be followed, and lexical items ofStandard Chinese must be used, as in (5). When the text needs to be readaloud (e.g. in a Chinese class of a primary school that adopts Cantoneseas the language of instruction), the characters would be pronounced withCantonese pronunciation. For comparison, the Mandarin pronunciation ofthe same characters are shown in (6). The distinction between writtenCantonese and Standard Chinese will be important in Section 4.2 regardingwhich words should be used for an experiment involving heritage speakers.For example, even though (5) can be considered a “Cantonese” word informal texts, a heritage speaker may have never heard of it in the familysetting, unless s/he had been to a Chinese school for formal languageinstruction. To ensure that heritage speakers actually know the words beingused in the experiment, words from Standard Chinese like (5) were avoided.22(4) Written Cantonese being read in Cantonese: 得閒dak1obtainhaan4free-time‘to be available’(5) Written Standard Chinese being read in Cantonese: 有空jau5havehung1empty‘to be available’(6) Written Standard Chinese being read in Mandarin: 有空yoˇuhaveko¯ngempty‘to be available’1.4.8 SummaryTo sum up, varieties of Cantonese are spoken in different overseascommunities in the world. In the current study, the baseline language forcomparing homeland and heritage speakers is Hong Kong Cantonese. Bothpopulations were exposed to this variety in their family. In this languagethere are six phonemic lexical tones and three allophonic checked tones.The Chinese writing system is complex and literacy usually requires formaleducation, which means heritage speakers are less likely to be fully literatein Chinese.1.5 The structure of this dissertationThis introductory chapter is followed by two literature review chapters.Chapter 2 discusses the definition of heritage speakers and their role inthe bilingualism literature. Chapter 3 is a review of previous studies on23Cantonese tonal perception, providing the research context of the currentstudy. Chapter 4 describes the experimental paradigm, materials used, andparticipants of the study. Chapter 5 explains the statistical tests being usedfor data analysis and presents results of the experiment. Lastly, Chapter 6discusses implications of major findings and concludes the dissertation.24Chapter 2Who Are Heritage Speakers?This chapter is a literature review focusing on heritage speakers (vis-a`-vishomeland speakers). Previous studies have approached this topic fromdifferent perspectives, such as language policy, applied linguistics, andtheoretical linguistics. Consequently, the definition of key terms variesamong authors. Section 2.1 compares their motivations and defines thesekey terms in the context of this dissertation. Section 2.2 outlines what isknown on the linguistic behaviour of heritage speakers, and provides theresearch background that leads to the current study on the perception oflexical tones. Tone-related studies will be reviewed in Chapter 3.2.1 Defining key termsThe heritage speaker population is known for its heterogeneity and variance,which makes defining it a highly challenging and difficult task (Benmamounet al., 2013a; Montrul, 2013; Polinsky & Kagan, 2007; Valde´s, 2001; VanDeusen-Scholl, 2003; Wiley, 2001; Zyzik, 2016). For the purpose ofjumpstarting the discussion, I propose the following working definition:(7) Definition of a heritage speaker (Version 1)A heritage speaker is a bilingual who was exposed to a heritagelanguage early in life, but later became dominant in the majoritylanguage of the society.25The working definition in (7) contains keywords that can be interpretedin various ways, such as “heritage language”, “bilingual”, and “dominant”.To ensure that readers of different backgrounds can come to the sameunderstanding of (7), each of the subsections that follow will focus on onekeyword, discuss possible ways to interpret it, and explain why a particularinterpretation is adopted in the context of this dissertation.2.1.1 Heritage languagesThe term heritage language is conventionally used in the literature oflanguage policy and education, often in countries with a history ofcolonization and immigration, such as Canada (Cummins, 1992, 2005;Cummins & Danesi, 1990; Duff, 2008; Duff & Li, 2009), the United States(Fishman, 2014; Garc´ıa, 2005; Hornberger & Wang, 2008; Peyton et al.,2001), and Australia (Brinton, Kagan & Bauckus, 2008; Elder, 2005, 2009;Hornberger, 2005). Definitions given by governmental institutions areusually specific to the sociopolitical context of the relevant country. Thissubsection introduces the definition of heritage languages according to thefederal and provincial governments of Canada—the country of residenceof heritage Cantonese speakers under investigation in the present study.The discussion is then followed by a comparison with the sociolinguisticdefinition.When defined sociopolitically, heritage languages in Canada refer tolanguages other than the two official languages, namely English and French.According to Cummins (2005), the term heritage languages emerged in1977, when the Ontario Heritage Language Program was introduced inschools to offer two and a half hours of heritage language instruction perweek if 25 students expressed interest in their family language, such asItalian, Portuguese, and Ukrainian (Ontario Ministry of Education, 1991). In1991, the Senate and House of Commons of Canada enacted the CanadianHeritage Languages Institute Act, which defined a heritage language as“a language, other than one of the official languages of Canada, thatcontributes to the linguistic heritage of Canada” (Government of Canada,261991). A more recent definition by the Ministry of Education in Manitobastates that heritage languages are “all languages other than English, French,or Aboriginal, taught in the public school system during the regular schoolday, either as a regular subject, or as a language of instruction, or asa language of instruction in an enhanced heritage language program”(Government of Manitoba, 2018).The sociolinguistic definition of heritage languages differs from thoseprovided by governmental institutions, in that it is not country-specificand can be applied to any given social context to focus on the de facto(non-)dominance of languages in a community. In the literature of appliedlinguistics, a heritage language can refer to any language that has a culturalconnection to an individual’s family or community, but is not the primarylanguage used in government, education, and public communication(Fishman, 2001; Kelleher, 2010). As such, this definition is useful foridentifying heritage languages in countries without an official language,such as the United States (Peyton et al., 2001). Since English is the de factodominant language in the United States, heritage languages are languagesother than English (Fishman, 2001). Besides, the sociolinguistic definitionis useful for describing the language ecology of countries where the officiallanguage is not the de facto primary language of the society. For example,Wales has two official languages, namely Welsh and English. Accordingto the 2011 census, only 11% (310,600) of respondents above the age ofthree reported that they could speak Welsh fluently (Welsh Government andWelsh Language Commissioner, 2015). Although Welsh has official statusin Wales, its de facto status fits the sociolinguistic definition of a heritagelanguage, as English is the de facto majority language of Wales. This specialcase shows that the sociopolitical and sociolinguistic ways of classificationdo not always yield the same result.In the context of this dissertation, both the sociopolitical andsociolinguistic definitions can accurately describe the status of Cantonesein Canada. Since the general goal of the current study is to contribute tothe field of heritage linguistics, the sociolinguistic definition is more useful,as it is meaningful in non-Canadian contexts as well. It allows comparison27between heritage Cantonese speakers in Canada and other heritage speakercommunities in any part of the world with a similar language ecology.Adopting the sociolinguistic definition, (8) is a revised version of (7), wherenew content is underlined:(8) Definition of a heritage speaker (Version 2)A heritage speaker is a bilingual; early in life s/he was exposedto a language that has a cultural connection to his/her family orcommunity, but later became dominant in a different language that isof primary use in government, education, and public communication.2.1.2 BilingualismIn its most general sense, a bilingual is a person who uses two languagesin everyday life (Grosjean, 1982). In the bilingualism literature, there isdiverse opinion on the specific meaning of “use”. The first kind of debatepertains to the level of language proficiency. If a person is very fluent in onelanguage but only semi-fluent in the other, is s/he bilingual? The secondtype of debate has to do with the degree of importance of four languageskills, namely speaking, listening, reading, and writing. If a person hasall four language skills for one language but is illiterate in the other, iss/he bilingual? The rest of this subsection is going to elaborate on thenarrow and broad definitions of bilingualism, each of which would leadto a different response to these debates. The subsection will be concludedwith an explanation of why the broad definition is adopted for the currentstudy.When defined narrowly, bilingualism is an equally excellent masteryof two languages. This view is commonly found in discussions of secondlanguage acquisition. Bloomfield (1933) states that bilingualism is anoutcome of “perfect foreign-language learning” (pp.55–56), when a foreign-language learner has reached a level of proficiency that is indistinguishablefrom native speakers, and at the same time has maintained his/her nativelanguage. A bilingual is therefore an individual who has “native-like control28of two languages” (pp.55–56). In language testing, the term bilingualis sometimes used as a descriptor of the top level of proficiency. Forexample, in the scale of the United States Foreign Service Institute, LevelFive (the highest level) is called “native or bilingual proficiency”, whichmeans the individual has “complete fluency in the language such that hisspeech on all levels is fully accepted by educated native speakers in all itsfeatures” (Fulcher, 2014, p.227). This interpretation of bilingual is closeto laypeople’s understanding of the term. LinkedIn, a business networkingwebsite with 500 million users from 200 countries (Darrow, 2017), adoptsa similar scale with “native or bilingual proficiency” as the highest level (Ali,2015). To sum up, the narrow or lay definition is largely motivated by theneed to describe an ideal outcome of language learning.For researchers who are interested in describing the linguisticcompetence of actual individuals, the idea of a perfectly balanced bilingualwho has mastered all language skills for both languages may be unrealisticor even “mythical” (Valde´s, 2001, p.40). While perfectly balanced bilingualsdo exist, they are rare because it is unlikely for an individual to havethe exact same amount of exposure to two languages, and it is alsounlikely that the two languages are spoken equally frequently in the samedomains of language use (Dornic, 1978; Grosjean, 1998; Myers-Scotton,2005). Although children of parents who each speak a different nativelanguage may be more likely to receive relatively balanced linguistic input,the longitudinal study of Yip & Matthews (2007) shows that the ratio oflinguistic input is often affected by factors other than the parents. Forexample, the child may spend more time with either paternal or maternalgrandparents and relatives, because often one side does not live in the samecountry as the child. Among the six children being studied, only one wastruly a balanced bilingual. It shows that if the narrow definition is adopted,a vast majority of actual individuals would fall outside the definition.By contrast, the broader definition views bilingualism as a continuum: abilingual is maximally fluent in two languages, or minimally competent inat least one of the four language skills (listening, speaking, reading, andwriting) for one of the two languages (Gertken, Amengual & Birdsong,292014; Macnamara, 1967; Valde´s, 2001). Between these two ends thereis a wide range of bilinguals with different abilities. In previous studiesof heritage speakers (Amengual, 2017; Casillas, 2015, among others), themajority of the subjects fall between the two ends of the continuum.Since the current study is about heritage speakers as well, it was expectedthat participants would also fall between the two ends of the continuum.Therefore, in the context of this dissertation, the broad definition of abilingual can better describe the actual language abilities of the subjects.Applying the broad definition, (9) is a revised definition of a heritagespeaker, where new content is underlined:(9) Definition of a heritage speaker (Version 3)A heritage speaker is a person who uses two languages in everydaylife, but does not necessarily have an equal mastery of listening,speaking, reading, and writing skills for both languages; early in lifes/he was exposed to a language that has a cultural connection tohis/her family or community, but later became dominant in a differentlanguage that is of primary use in government, education, and publiccommunication.2.1.3 Language dominanceDerived from the state of having two or more languages in the mind,dominance is a psychological construct with a relativistic nature (Gertkenet al., 2014; Grosjean, 1998). A dominant language is the “default languagefor speaking and thinking” (Harris et al., 2006, p. 264), and is sometimescalled the preferred language (Dodson, 1981). According to Gertken et al.(2014), dominance is a function of four components: age of acquisition,frequency of use, language proficiency, and language attitudes. Youngage of acquisition, high frequency of use, high language proficiency, and astrong cultural identification to the language are positively correlated withdominance. For a detailed explanation of how these components can beassessed, see Section 4.4.3.2 on the discussion of the Bilingual LanguageProfile (Birdsong, Gertken & Amengual, 2012).30Proficiency and dominance are distinct concepts even though they arecorrelated (Schmeißer, Hager, Gil, Jansen, Geveler, Eichler, Patuto & Mu¨ller,2016). First, dominance requires at least a bilingual context but proficiencydoes not. A monolingual person’s language proficiency can be assessed anddiscussed, but the relativistic concept of dominance would be irrelevantand inapplicable if there is only one language in question (Gertken et al.,2014). Second, dominance is affected by the psychosocial factor of languageattitudes. A bilingual person who is equally proficient in two languagescan have a stronger cultural identification to one language than the other(Marian & Kaushanskaya, 2004), which affects which language s/he wantsto use in a given situation. Third, it is possible for a bilingual’s dominantlanguage to be the less proficient one. For example, immigrants who havebeen immersed in an L2 environment for many years may become L2-dominant, even if this L2 remains the less proficient language comparedwith L1 (Harris et al., 2006). Taken together, although proficiency isa component of dominance, proficiency alone is not sufficient to predictdominance.Applying the aforementioned definition of dominance, (10) is a revisedversion of (9), where new content is underlined:(10) Definition of a heritage speaker (Version 4)A heritage speaker is a person who uses two languages in everydaylife, but does not necessarily have an equal mastery of listening,speaking, reading, and writing skills for both languages; early in lifes/he was exposed to a language that has a cultural connection tohis/her family or community, but later a different language that isof primary use in government, education, and public communicationhas become his/her default or preferred language for speaking andthinking.312.1.4 Heritage and homeland speakersThe previous subsections have discussed key terms that are crucial todefining who a heritage speaker is. The two definitions in (11) and (12)convey the same message, but the latter is explicit about the meaning of“heritage languages”, “bilingualism”, and “dominance”:(11) Definition of a prototypical heritage speaker (condensed version,same as (7))A heritage speaker is a bilingual who was exposed to a heritagelanguage early in life, but later became dominant in the majoritylanguage of the society.(12) Definition of a prototypical heritage speaker (elaborated version,same as (10))A heritage speaker is a person who uses two languages in everydaylife, but does not necessarily have an equal mastery of listening,speaking, reading, and writing skills for both languages; early in lifes/he was exposed to a language that has a cultural connection tohis/her family or community, but later a different language that isof primary use in government, education, and public communicationhas become his/her default or preferred language for speaking andthinking.It should be noted that (11) and (12) are definitions of a prototypicalheritage speaker. Since the heritage speaker population is highlyheterogeneous, there will always be heritage speakers who fall outside agiven definition. Zyzik (2016) points out that although no definition isperfect, it is useful to characterize a prototype in heritage language research,so that a population of similar attributes can be identified and studied. In theprototype model of categorization, the boundary between categories couldbe fuzzy (Rosch, 1973). The prototype is central in the category, and groupmembers away from the centre fall on a gradient scale of typicality—thefarther away it is from the centre, the less typical it is. Therefore, (11)32and (12) by no means suggest that perfectly balanced Cantonese-Englishbilinguals born and raised in Canada are not heritage speakers. Instead, theysuggest that such individuals are less typical of a heritage speaker comparedwith English-dominant ones.Now that a prototypical heritage speaker is defined, it is necessary todefine a prototypical homeland speaker, whose linguistic behaviour will becompared with that of heritage speakers in the current study:(13) Definition of a prototypical homeland speakerA homeland speaker is a person whose default or preferred languagefor speaking and thinking is the language that s/he was exposed toearly in life; during the period of exposure, this language not onlyhas a cultural connection to his/her family or community, but is alsothe language of primary use in government, education, and publiccommunication.There are several reasons to use the term homeland speakers as opposedto other choices. First, the term native speaker is not used even thoughnative speakers typically have the attributes in (13). If heritage speakersand native speakers are viewed as two distinct populations, it would suggestthat heritage speakers are not native speakers. Such a view requires aclear definition of a native speaker, which is out of the scope of thisdissertation. Due to reasons stated in Rothman & Treffers-Daller (2014),I assume that it is possible for heritage speakers to be native speakersof their heritage language. The use of the term native speaker vis-a`-visheritage speaker is therefore avoided. Second, the term non-heritage speakeris not used due to its ambiguity. L2 learners who have no familial orcultural connection to the target language can also be called non-heritagespeakers. The term homeland speakers can exclude L2 learners and avoidambiguity. Lastly, language-specific terms like Canadian Cantonese speakersand Hong Kong Cantonese speakers are not used, because they are parallelto names of dialectal varieties, such as Guangzhou Cantonese speakers orMacau Cantonese speakers. While geographical difference does play a rolein shaping language ecology, bilingual configuration is the crucial difference33between the two populations in the current study. To sum up, the termsheritage and homeland are relatively unambiguous and able to highlight thecrux of the phenomenon in question without problematic assumptions.The next section is going to locate homeland and heritage speakerson a bigger picture of the bilingual continuum, and review previousstudies on the linguistic behaviour of individuals with different bilingualconfigurations.2.2 Configurations on the bilingual continuumNow that keywords have been defined, this section summarizes what isknown about different types of speakers on the bilingual continuum, andsets the stage for the research questions of the current study.Although most bilinguals can be discretely classified as dominant in onelanguage, Grosjean (2001) points out that dominance can be gradient andnot necessarily dichotomous. If dominance is only dichotomous, it wouldnot be able to describe differences in terms of the degree of dominance. Adichotomous view of dominance would also obscure longitudinal changes ofthe two languages’ relative strengths over a bilingual individual’s lifetime.Therefore, a gradient view of dominance can better capture intra-group andintra-speaker variation.The bilingual continuum in Figure 2.1 is a visualization of differentdegrees of dominance. Nine configurations are labelled with English lettersfor easy identification in subsequent discussion. For each configuration, thenumbers “1” and “2” indicate the first language (L1) and second language(L2) respectively, based on the order of acquisition. Configurations A and Irepresent monolinguals, and so they only have “1” or “2” respectively. Thesetwo configurations mark the two ends of the continuum. For each bilingualconfiguration between A and I, the relative strength of the two languages areindicated by their font size. In Configuration E, which is the middle of thecontinuum, “1” and “2” have the same font size, which represents balancedbilingualism. L1-dominant configurations are between (but do not include)A and E, with the strength of L2 increasing from left to right. In the present34Figure 2.1: The bilingual continuum (adapted from Valde´s, 2001,p.41)study, homeland speakers of Cantonese were expected to fall within thisrange of L1-dominant configurations. Lastly, L2-dominant configurationsare between (but do not include) E and I, with the strength of L1 decreasingfrom left to right. Heritage speakers of this study were anticipated to fallwithin this range of L2-dominant configurations.Using the bilingual continuum as an anchor, the following subsectionswill discuss a range of speakers documented in the existing literature, eachof which is representative of a certain configuration on the continuum.2.2.1 Configuration A: MonolingualsConfiguration A in Figure 2.1 represents a monolingual who has notacquired or learned a second language. In linguistic inquiry, an adult withthis configuration (and with no hearing impairment or speech disorder)often represents an ideal native speaker who has achieved completeacquisition of a target language. Such speakers’ speech is a valuable sourceof data for understanding the grammar of their language. In experimentalstudies (e.g., Flege, 1987; Flege & Eefting, 1988), monolingual speakersoften form the control group, whose linguistic behaviour is compared with35that of another population, such as L2 learners of the target language ormonolingual speakers of another language.In an age of globalization, monolinguals are becoming a minority.According to Crystal (2012), two-thirds of children in the world grow upin a bilingual environment (p.17), and non-native speakers of English haveoutnumbered native English speakers (p.69). In Hong Kong, for example,the government adopts a language education policy of biliteracy (Chineseand English) and trilingualism (Cantonese, English, and Mandarin) (Bolton,2011). Therefore, in previous linguistic studies involving adult Cantonesespeakers recruited from the university community (e.g. Ma, Ciocca &Whitehill, 2011), participants were not monolinguals, as they had learnedEnglish as a second language. For the current study of adult speakers, it wasexpected that no participants would be monolingual Cantonese speakerswith Configuration A.Bilingual configurations are to the right of Configuration A on thecontinuum. As Grosjean (1989) points out, bilinguals are not simply twomonolinguals living in separate compartments within one person. There isno way for a speaker to “switch off” one language completely when usingthe other. Psycholinguistic studies show that cross-language activation inprocessing leads to cross-language competition in comprehension (Dijkstra,2005) and production (La Heij, 2005). Such cross-language effects werefound to be bidirectional: L1 may influence L2, and L2 may also influence L1(Flege, 1987). The subsequent discussion will focus on previous studies onhow L2 influences L1, since it is most relevant to hypotheses of the currentstudy.2.2.2 Configurations B to D: L1-dominant bilingualsConfigurations B, C, and D all represent L1-dominance, but they differ fromeach other by the strength of L2. In Figure 2.1, L2 in Configuration D iscomparatively stronger than in C; similarly, L2 in C is comparatively strongerthan in B, albeit being non-dominant. A psycholinguistic study that pertainsto this range of configurations is Linck, Kroll & Sunderman (2009), which36found that immersion in L2 can attenuate lexical access to a dominant L1.Their participants were native English speakers learning Spanish as L2, andwere matched for self-rated Spanish and English proficiency. They werecategorized into two different groups based on immersion experience: theclassroom group was attending an intermediate-level Spanish course in anAmerican university and had no immersion experience, while the immersiongroup had been studying in Spain for three months. Due to immersionthe latter group had a higher frequency of using Spanish (at the time thestudy was being conducted) compared with the classroom group, whichcorrelates to a greater strength of Spanish. Therefore, the classroom groupand the immersion group can be mapped onto the bilingual continuum asConfiguration B and Configuration C respectively1. In a verbal-fluency task,both groups were presented with one category name at a time (such as“animals”), and were asked to produce as many category exemplars (suchas “dog” and “cat”) as possible within 30 seconds. The task was donein both Spanish and English in two separate blocks. In their results, theimmersion group produced more Spanish exemplars than the classroomgroup, which is not surprising. A perhaps more interesting finding isthat the immersion group produced significantly fewer English exemplarsthan the classroom group, despite the fact that both groups consisted ofnative English speakers. The interaction between language and group wassignificant. The authors conclude that L2 immersion has inhibitory effectson L1 access. The significance of this study is that even immersion in L2 forjust three months can leave an impact on L1. Heritage speakers, the focusof this dissertation, have had “L2 immersion” most of their lives. This raisesthe question of whether a long period of L2 immersion will further influenceother processes such as phoneme production.Flege (1987) found that phonetic spaces of adult speakers can berestructured if they are highly experienced in their L2. Such restructuring1The classroom group and the immersion group could be anywhere betweenConfiguration A and E, as long as the classroom group is to the left of the immersiongroup. For simplicity I map them onto the continuum as Configuration B and ConfigurationC respectively.37may affect phoneme production of L1, when a phoneme is used in bothlanguages but realized with phonetic differences. For example, /t/ is used inboth English and French, but the English /t/ is produced with a longer voiceonset time (VOT) at 77 milliseconds (ms) on average, while the French /t/has a shorter VOT at 33 ms on average. Groups of L1 English speakers whostarted learning French as L2 in late adolescence or early adulthood wereasked to produce English words (e.g. two) and French words (e.g. tous),and their VOT values were compared with those of monolingual Englishand monolingual French speakers. Comparable to Configuration C, the firstgroup of subjects consisted of American university students who had studiedin France for less than a year but had already returned to the US for at leastthree months when the study was conducted. Their average VOT values forEnglish /t/ (72 ms) did not differ from that of monolingual English speakers(77 ms). Another group of subjects was comparable to Configuration D, andthey were Americans married to French spouses and had lived in Francefor an average of 11.7 years when the study was conducted. Their averageVOT values for English /t/ (49 ms) were considerably shorter than thatof English monolinguals (77 ms). In other words, their English /t/ hadbecome more French-like. To match this group of L1 English speakers whowere highly experienced in French, the author recruited native speakers ofFrench who had lived in Chicago for an average of 12.2 years at the timeof the study to do the same production task. They produced the French /t/with a significantly longer VOT (50 ms) than monolingual French speakers(33 ms). In other words, their French /t/ had become more English-like.The implication of this study is that a dominant L1 can be vulnerable toinfluences from an L2 learned after late adolescence. For the case of heritagespeakers, will a non-dominant L1 be even more vulnerable to influencesfrom a dominant L2 acquired in early childhood? Can L2 affect not onlyproduction, but also perception?In their perception study, Samuel & Larraza (2015) point out thatextensive L2 experience may entail exposure to L2-accented speech of L1,which makes speakers adapt by accepting “wrong” pronunciations of L1phonemes as allophonic variations. In Basque, the voiceless predorso-38alveolar affricate /ts«/ and the voiceless alveopalatal affricate /tS/ arecontrastive, but in Spanish only /tS/ is used. L1 Spanish speakers may find itchallenging to produce Basque words with /ts«/, and would produce a more/tS/-like consonant instead. In their experiment, highly proficient Basque-Spanish early bilinguals (comparable to Configuration D) were trained tomatch unusual objects with new Basque words that contained either /ts«/or /tS/. Since they were new words, L1 Basque speakers should havenever heard of them being pronounced with a Spanish accent. After thetraining session, L1 Basque and L1 Spanish bilinguals performed a picture-name matching task. A picture of an object was presented on the screenand at the same time a spoken word was played over the headphone.Participants had to press a button to indicate whether the picture matchedwith the word. They were told explicitly that there would be minordeviations in the pronunciation of some words, which should be considereda mismatch. Surprisingly, even L1 Basque speakers failed to reject non-words half of the time, although their accuracy was still significantlyhigher than that of L1 Spanish speakers. Another part of the experimentwas an AXB discrimination task, in which the L1 Basque participants didvery well. Their high discrimination accuracy eliminated the possibilitythat they were unable to detect the acoustic differences between the twoaffricates. The authors concluded that L1 Basque speakers’ acceptance of“mispronounced” words was not a sign of poor perception, but it was adual-mapping process, in which two phonetic variants are mapped ontoa single lexical representation. Such process can facilitate efficient lexicalaccess in an environment where L2-accented speech is often heard. Thesignificance of this study is that L2 effects on L1 perception may not alwayssuggest perceptual “impairment”; rather, it could be a sign of perceptualflexibility and adaptation as a strategy for efficient communication withother speakers.392.2.3 Configuration E: Perfectly balanced bilingualsConfiguration E represents perfectly balanced bilinguals. As mentionedpreviously, even children of parents who each speak a different nativelanguage tend to be dominant in one of the two languages (Yip & Matthews,2007). Individuals raised in bilingual societies are usually dominant in oneof the two languages as well. For example, in a study by Sebastia´n-Galle´s,Echeverr´ıa & Bosch (2005) on Catalan-Spanish simultaneous bilingualsin Barcelona, all 40 participants had one Spanish-speaking parent andone Catalan-speaking parent. Although it may seem to be the perfectenvironment for producing balanced bilinguals, the participants werefound to be either Catalan-dominant or Spanish-dominant, as they haddifferent patterns of language use at home and for socializing. Therefore,even though bilinguals can be equally proficient in both languages, theperfectly balanced bilinguals in terms of dominance are extremely rare.Amengual Watson (2013) even comments that “the perfectly balancedbilingual probably does not exist” (p.7).2.2.4 Configurations F to H: L2-dominant bilingualsHeritage speakers, the subject of investigation in this dissertation, aretypically L2-dominant, which means they fall to the right of ConfigurationE in Figure 2.1, where L2 is stronger than than L1. They are highlyheterogeneous in terms of the strength of their heritage language (L1),ranging from fluent speakers (Configuration F), to semi-fluent speakers(Configuration G), to receptive listeners who can understand but barelyspeak or cannot speak the language at all (Configuration H).This range of configurations is particularly useful for describing thechange of language use patterns across different generations of immigrants.According to Valde´s (2001), every new generation moves closer tothe right end of the bilingual continuum compared with their previousgeneration. Having migrated from their home country to a host country,the first generation is typically either monolingual or L1-dominant, hencecomparable to Configurations A to D. Their children, the second generation,40were born and raised in the host country. They tend to be dominant in themajority language of the society, but they continue to have at least someproficiency of the heritage language in order to communicate with the firstgeneration. Therefore, their bilingual configuration often falls to the rightof Configuration E. The third generation are children of heritage speakers.Since their parents are L2-dominant, at home they may communicate in themajority language of the society more often than in the heritage language.If they spend time with family members of the first generation, they maybe exposed to the heritage language more often than those who do not.Therefore, the bilingual configuration of the third generation may driftfurther to the right on the continuum.Recognizing these cross-generational differences, the Heritage LanguageDocumentation Corpus constructed by Nagy (2009) contains language datafrom all three generations mentioned above. Nagy (2015) is a study basedon this corpus, which shows that first-generation Ukrainian speakers inthe Greater Toronto area have significantly shorter VOTs (an average of26 ms) than those of the newer generations when producing Ukrainianwords with /p t k/. On the other hand, the average VOTs of the second(38 ms) and third generations (43 ms) did not differ from each othersignificantly. Such linguistic variation across generations has importantimplications for this dissertation. Although the second and third generationscan both be considered heritage speakers, the second generation receivedlanguage input mostly from homeland speakers, while the third generationreceived language input mostly from heritage speakers. To control potentialeffects due to differences in the input, only second-generation speakers wereincluded in the present study (see Section 4.4.3.2). This ensured that bothhomeland and heritage speakers were exposed to the same baseline varietyof Cantonese spoken by homeland speakers in the parental generation. Thisway any observed difference between the two groups in the experiment, ifany, would be a reflection of their different perception of lexical tones, butnot a reflection of linguistic variation in the input from parents of differentgenerations.Apart from cross-generational comparisons, previous studies on heritage41speakers have also looked into L2 effects across different aspects ofgrammar. Benmamoun et al. (2013b), Montrul (2013), and Polinsky& Kagan (2007) argue that morphology, syntax and semantics aremore vulnerable domains compared with phonetics and phonology. Formorphology, English-dominant receptive listeners of Inuttitut in Canadaaccepted ungrammatical sentences that omit case morphemes, incurringan error rate of 40% in a grammaticality judgment task (Sherkina-Lieber,Pe´rez-Leroux & Johns, 2011). Heritage speakers of classifier languagessuch as Mandarin (Ming & Tao, 2008) and Cantonese (Wei & Lee, 2001)sometimes paired a noun with the wrong classifier or omitted classifierscompletely. For syntax, adult heritage speakers of Russian in the UnitedStates found it challenging to process object relative clauses in a picture-matching task (Polinsky, 2008). For semantics, heritage Spanish speakersin the United States were found to be insensitive to nuances of thesubjunctive mood (Montrul, 2009). Phonetics and phonology, however, areconsidered the less vulnerable areas of linguistic knowledge. Researchersagree that heritage speakers have more native-like pronunciation than L2learners (Benmamoun et al., 2013b; Montrul, 2013), even though homelandspeakers might sometimes find it “off” or “funny” (Montrul, 2013, p.378).The phonetic and phonological knowledge of L2-dominant bilingualsraises interesting questions. First, the previous subsection on ConfigurationsB to D has shown that even a dominant L1 can be affected by a non-dominant L2. For L2-dominant bilinguals, to what extent are L1 phoneticsand phonology (in)vulnerable to L2 effects? Second, if homeland andheritage speakers share similar phonetic and phonological knowledge,would it suggest that early exposure guarantees maintenance of thisknowledge for the rest of one’s life? Third, if there are L2 effects on heritagespeakers’ L1 phonetics or phonology (as in those observed in L1-dominantbilinguals), are they symmetric between production and perception? Lastbut not least, what would be the mechanism of such effects, if any? Therest of this subsection will review previous studies on speech productionand perception that have shed light on these questions, with the goal ofmotivating the research questions of the current study.42Chang, Yao, Haynes & Rhodes (2011) compare the production ofAmerican English /u/ and Mandarin /u/ by three groups of English-Mandarin bilinguals residing in the United States: (1) L1-dominantMandarin speakers born and educated in mainland China or Taiwan up toat least seventh grade, who learned English as their L2; (2) L1-dominantEnglish speakers born and educated in the United States who started tolearn Mandarin as L2 after the age of 18, and (3) heritage speakers ofMandarin comparable to Configurations F or G, who grew up in the UnitedStates and spoke Mandarin in a family setting. All participants wereinstructed to read aloud English and Mandarin words that contained /u/.The second formant (F2) of their /u/’s was measured. An /u/ with a higherF2 is more English-like, and an /u/ with a lower F2 is more Mandarin-like.After obtaining the F2 values, the authors measured the acoustic distancebetween each subject’s English and Mandarin /u/’s. Two possible situationsmay lead to a small acoustic distance: a speaker produces a Mandarin-like /u/ for both English and Mandarin words, or a speaker produces anEnglish-like /u/ for both English and Mandarin words. In their results, thetwo L1-dominant groups (those from mainland China or Taiwan, and thosefrom the United States learning Mandarin in school) produced a relativelysmall acoustic distance between their English and Mandarin /u/’s, as theirF2 values were within the proximity of their respective L1 vowel space. Incontrast, the heritage group had the biggest acoustic separation for the twolanguages. Their English /u/ had a higher F2 (hence more English-like) thanL1-dominant speakers’, and their Mandarin /u/ had a lower F2 (hence moreMandarin-like) than L1-dominant speakers’. This means among the threegroups, heritage speakers’ vowels showed the closest approximation to thephonetic norms of both languages. The findings echo with those of Saadah(2011) on the production of Arabic and English vowels by heritage speakersof Arabic, whose vowel spaces have shown that they had two separate vowelcategories for English and Arabic. A possible interpretation of these resultsis that early exposure to both languages allowed heritage speakers to besensitive to the fine-grained phonetic details in the input, dissimilate cross-language vowel categories, and create more polarized phonetic spaces for43two languages. On the other hand, L1-dominant speakers who learnedthe L2 later in life may have assimilated L2 categories into existing L1categories, producing L2 vowels that were closer to L1 phonetic norms.In another production study, Antoniou, Best, Tyler & Kroos (2010, 2011)looked into the production of English and Greek consonants by heritagespeakers of Greek in Australia, and found that cross-language effects weredifferent between unilingual and code-switching contexts. In particular,L1 effects were observed in code-switching contexts, even though theparticipants were L2-dominant. All subjects were exposed to Greek frombirth, and quickly became English-dominant after the onset of schoolingbetween age three and four. According to self-reports, they kept usingGreek on a daily basis, and so were comparable to Configuration F or G.Voiceless stops /p t k/ are phonetically realized with a short-lag VOT inGreek but a long-lag VOT in English. When asked to produce English-onlyor Greek-only sentences (such as “say pa again” or “λ´ι piα α´λλo”), heritagespeakers produced consonants with VOT values that were indistinguishablefrom those of monolingual speakers of each of the two languages. However,when asked to produce the target words in carrier sentences of the otherlanguage (such as “say piα again” or “λ´ι pa α´λλo”), cross-language effectswere unidirectional, in that L1 affected the production of L2 targets, but L2did not affect L1 targets. When switching from Greek to English (“λ´ιpa α´λλo”), heritage speakers produced English targets with Greek-likeVOT values. However, when switching from English to Greek (“say piαagain”), their Greek targets had Greek-like VOT values without any signof effects from English. The significance of this study is that cross-languageinterference is not “across the board” but can vary in more complex linguisticprocesses such as code-switching, in which L1 effects can be observeddespite the subjects’ dominance in L2. Certain methodological decisionsin this dissertation were made in the light of these findings. Since code-switching is not relevant to the current study, measures were taken to makesure that participants would operate in a unilingual mode. As Chapter 4will explain, spoken Cantonese was used in all task instructions, and aCantonese story-listening task was inserted before the actual experiment.44These ensured that all participants, who were residing in Canada whenthis study was conducted, were attuned to a unilingual Cantonese listeningenvironment.Configuration H has the weakest L1 among the L2-dominantconfigurations in Figure 2.1, which is comparable to receptive bilingualswho have at least some degree of listening competence but no productionability of their L1. For this reason receptive bilinguals are usually recruitedfor perception instead of production studies (e.g., Celata & Cancila, 2010;Tees & Werker, 1984). Since their L1 is so much weaker compared withConfigurations F and G, will their ability to discriminate L1 phonemesdiffer from homeland speakers’? Or could it be that early exposure totwo languages would enable them to separate two phonological systemssuccessfully in perception, just like what happened to heritage speakers’production in Chang et al. (2011)? Two perception studies offer differentanswers to this question.Tees & Werker (1984) point out that linguistic perceptual abilities canbe maintained even after a long period of disuse if an individual had earlyexperience in hearing the relevant contrasts. In their study the criticalcontrast was the dental /t”/ versus the retroflex /ú/ in Hindi. Two groups ofstudents from a Hindi language course in a Canadian university performeda category-change discrimination task, in which they were asked to pressa button when hearing a change in a stream of sounds such as [úa úa úat”a t”a...]. The first group had no experience of Hindi prior to the courseand were typical L2 learners. The second group had early experience inHindi but had almost none or very limited ability to speak or understandit when the course started (hence comparable to Configuration H). Theyeither had lived in India in the first year or two of their lives and stoppedusing any Indian language with this place contrast in the family after movingto North America, or had a Hindi-speaking relative living with their familyin North America in the first year or two of their lives. Both groups weretested twice: the first test took place in the first or second week of the Hindilanguage course, and the second test was held one year after the coursestarted. Results of the first test showed significant difference between the45two groups. The “early experience group” had an accuracy rate of 90%,comparable to seven-month-old infants in Werker, Gilbert, Humphrey &Tees (1981), while the “no early experience group” had an accuracy rateof lower than 10%. As for the second test, the early experience group madean improvement and went over 90%, close to the native adult speakersof Hindi. The “no early experience group” also showed an increase, buttheir accuracy was still less than 20% a year after the course started. It isnoteworthy that the two groups’ average grades for the course did not differsignificantly in the end, which suggests that the “early experience group”did not have developmental privileges for other aspects of grammar suchas vocabulary or syntax. These findings were similar to those of Oh, Jun,Knightly & Au (2003) on the perception of Korean consonants by individualswho heard Korean regularly during childhood but had minimal exposure tothe language after childhood. The authors of both studies posit that earlylinguistic experience, however limited it might be, is beneficial to the long-term maintenance of perceptual ability.Contrary to the findings of Tees & Werker (1984), Celata & Cancila(2010) reported phonological-perceptual attrition in heritage speakers ofLucchese (spoken in Lucchesia, northern Tuscany, Italy). The phonemiccontrasts under investigation were singleton versus geminate consonants inLucchese, such as casa /"kasa/ “house” versus cassa /"kas:a/ “box”. Carriersentences with target words containing singleton or geminate consonantswere read aloud, and participants were asked to indicate what they believedwas uttered on a piece of paper printed with choices written in Italianorthography. Participants in the heritage group (or what the authors called“second-generation immigrants”) were born and raised in the United Stateswho reported that they could not speak the language although they couldunderstand their parents at least to some extent when they spoke Luccheseto each other. In their results, heritage speakers (or rather, listeners) hadan error rate of 46.19%, which is significantly higher than the error rate of8.75% found for Lucchese speakers residing in Italy. In the same perceptionstudy, non-words with singleton or geminate consonants (such as /asa/ and/as:a/) were manipulated to create stimuli with varying consonant lengths46on a phonetic continuum with six steps (Step 1 singleton, Step 6 geminate).Lucchese speakers residing in Italy showed categorical perception, showinga steep increase of “geminate” responses at Step 3 or Step 4. However,for the heritage group, the increase of “geminate” responses was gradualalong the continuum, showing little sign of categorical perception. Theauthors concluded that the heritage group showed strong impairmentin the perception of the consonant length feature in both experiments.According to the authors, heritage listeners’ insensitivity to the singletonand geminate contrasts was owing to their reliance on the phonologicalsystem of American English. Since consonant length is not a useful cue formaking lexical contrasts in American English, heritage listeners of Lucchesemay have adjusted their processing strategies and only attended to contraststhat are relevant to American English, the dominant language being usedextensively in daily life. In sum, early exposure to L1 does not necessarilyguarantee native-like perception for the rest of one’s life. Continuousexposure and use of L2 may result in a change of listening strategies (seealso Bruggeman, 2016; Rafat, Mohaghegh & Stevenson, 2017).2.2.5 Configuration I: Replacive bilingualsMarking the end of the bilingual continuum, Configuration I representsindividuals who have no conscious recollection of the language that theywere first exposed to in life, which is common for international adoptees.Since these individuals have neither receptive nor productive competencein their pre-adoption L1, they are functionally monolingual. For this reasonin Figure 2.1, the number “1” is absent from Configuration I and only “2” isshown. De Geer (1992) and Gauthier & Genesee (2011) call the languageacquired after adoption a “second first language”—“second” in terms ofchronological order but it is the “first” of which they have conscious memoryof exposure. Although individuals with Configuration I are functionallymonolingual, they are not as monolingual as those with Configuration A, inthe sense that they were exposed to two languages in total during their life.Therefore, Configuration I is still of interest to researchers of bilingualism.47For example, Yip (2013) uses the term “replacive bilingualism” (p.120)to refer to cases where an adopted child’s native language was replacedentirely. These cases raise an important question of whether early childhoodlanguage memory can remain accessible in adulthood after a long period ofzero exposure.Oh, Au & Jun (2010) show that international adoptees have anadvantage over novice L2 learners when (re)learning phonemes of the pre-adoption language. The critical contrast in the study was the three-waydistinction of stops in Korean: lenis /t/, tense /t*/, and aspirated /th/. Theirsubjects were two groups of young adults attending a first-semester Koreanlanguage course in an American university. The first group was English-speaking L2 learners who had no Korean exposure in childhood. The secondgroup was monolingual English-speaking individuals adopted from Koreato the United States as young children between three months to one yearold. They differed from the Hindi listeners in Tees & Werker (1984) in thatthey had no conscious recollection of the pre-adoption language. Therefore,these functionally monolingual adoptees were comparable to ConfigurationI. All participants were asked to do an ABX discrimination task, where thefirst two words (A and B) were produced by the same talker, and the lastword (X) was produced by a different talker. Participants responded bypressing a button to indicate whether X was the same word as A or B.According to their results, Korean adoptees were significantly better thannovice L2 learners at distinguishing lenis and aspirated stops but not thetense stop. The authors see the adoptees’ phoneme distinction ability as asign of retention and re-activation of long-ago childhood language memory,which was not lost completely but was only inactive. Their findings provideconverging evidence to the claim in other studies that early linguisticexposure benefits phoneme perception despite the lack of post-childhoodexposure.482.2.6 SummaryTo sum up, previous research suggests that early exposure allows but doesnot guarantee maintenance of productive or perceptual abilities for L1 acrossall contexts of language use. On the one hand, it is possible for heritagespeakers to have identical performance to L1-dominant native speakers inproduction (Antoniou et al., 2010; Saadah, 2011) and perception (Tees &Werker, 1984), even for individuals who had stopped receiving L1 inputfor an extended period of time (Oh et al., 2010, 2003). On the otherhand, heritage speakers may have different linguistic behaviour from thatof homeland speakers due to language attrition (Celata & Cancila, 2010) orhaving more polarized phonetic spaces for two phonologies (Chang et al.,2011). Lastly, cross-language effects can come from a dominant language(Celata & Cancila, 2010), a non-dominant language (Antoniou et al., 2011;Flege, 1987; Linck et al., 2009; Samuel & Larraza, 2015), or the state ofhaving two languages in the mind (Chang et al., 2011). There is not asingle generalization that can account for all aforementioned cases.How can tone, the contrastive dimension under investigation in thisdissertation, contribute to the literature of bilingualism, and in particular,the growing field of heritage linguistics? The reviewed studies so far havebeen dealing with segmental phonemes, namely consonants and vowels.Even though the L1 and L2 in each study have different consonant andvowel inventories, on a general level both language systems share thesimilarity of using segmental contrasts to encode lexical contrasts. Allobserved cross-language influences on vowels and consonants pertain tothe same contrastive dimension. For a language pair like Cantonese andEnglish, however, the two systems do not share the same dimensions forlexical contrasts. Cantonese has both segmental and tonal dimensions,while English has the segmental but not the tonal dimension. On theone hand, it can be argued that a non-existent lexical tone system ofEnglish cannot possibly contain anything to affect the tonal dimension ofCantonese in a bilingual’s mind. On the other hand, it can be argued thatboth languages have suprasegmental phonologies, as English does make49use of pitch variations for stress and intonation, which carry meaning onthe phrase or sentence level. From this perspective it may be possible forcross-language effects to happen between the suprasegmental phonologiesof two languages. Before exploring which of the two claims is empiricallysupported, the next chapter will explain the tone system of Cantonese andwhat is known about homeland speakers’ tonal perception, the basis forcomparison with heritage speakers’ perceptual abilities for lexical tones.50Chapter 3What Is Tonal Perception?The focus of this literature review chapter is tone, or more precisely,the perception of tone. Section 3.1 clarifies the relationship amongthree concepts, namely fundamental frequency, pitch, and tone. It isthen followed by a summary of previous studies on Cantonese tonalperception by three groups of individuals1: children acquiring Cantonese(Section 3.2), adult homeland speakers (Section 3.3), and non-Cantonesespeakers (Section 3.4). Results of these studies constitute the basis forcomparison with results of the current study in Chapter 5. Section 3.5discusses studies related to tone and heritage speakers. Lastly, Section 3.6summarizes the two literature review chapters, which leads to hypothesesto be tested in this study.3.1 The acoustic and perceptual aspects of lexicaltoneBefore the discussion of previous studies, the meanings of three related butnot equivalent terms need to be clarified—fundamental frequency, pitch, andtone. The first one, fundamental frequency (f0), is an acoustic term referringto the number of cycles per second in a sound wave measured in hertz (Hz)(Titze, 1994). It pertains to the physical property of the acoustic signal1All participants in the studies mentioned in this chapter had no reported hearingimpairment, speech disorder, or abnormalities in cognitive development.51itself, and is not dependent on the perspective of the receiver of the signal.For example, an f0 of 22,000 Hz is measurable, regardless of whether it isaudible to the human ear.The second term, pitch, is perceptual in nature. It is dependent on theperspective of a hearer, whose auditory system processes the acoustic signaland determines what is heard. Due to the special structure of the human’sbasilar membrane, f0 values do not translate linearly into pitch (von Be´ke´sy,1960). For example, to a human hearer, the pitch difference between 200Hz and 300 Hz is much bigger than that of 12,000 Hz and 12,100 Hz, eventhough the f0 difference in both cases is the same at 100 Hz. Therefore,pitch is only relevant when a signal is processed by a hearer.While pitch can be an attribute of speech or non-speech signals (suchas music or a fire alarm), tone is a different term that is linguistic innature. It refers to the use of pitch variations (among other things, suchas duration and phonation type) to mark lexical contrasts, morphological orsyntactic categories (Yip, 2002). This makes tone a phonological categorylike consonants and vowels. For tone languages with contrasts involvingmultiple dimensions (e.g., pitch, duration, phonation type), f0 is often themost important acoustic correlate of tone, while relative pitch height (andthe change thereof) is often the most important perceptual correlate of tone(Gandour, 1978; Yip, 2002).Adding everything up, tonal perception is the process of extractingrelevant auditory cues from a continuous speech signal and mapping pitchattributes to discrete phonological categories. A speaker of a tone languagehas the ability to locate boundaries between separate tonal categories alonga continuous dimension (Gandour, 1978; Gandour & Krishnan, 2015). Thebasic question asked in this dissertation is whether homeland and heritagespeakers of Cantonese share the same phonological knowledge of tone.There is a huge body of literature on Cantonese tonal perception, whichwill be summarized in the upcoming sections. Readers who have not readthe introduction to Cantonese tones in Section 1.4.5 are recommended to doso before diving into the rest of this chapter. The phonemic tone inventoryof Cantonese is repeated below as Table 3.1 for readers’ easy reference.52Table 3.1: The phonemic tone inventory of CantoneseTone Description Tone numerals Example Gloss1 high level 55 se1 “some”2 high rising 25 se2 “write”3 mid level 33 se3 “diarrhea”4 low falling 21 se4 “snake”5 low rising 23 se5 “society”6 low level 22 se6 “shoot”3.2 Perception of Cantonese tones byCantonese-learning infants and childrenAlthough subjects of the current study were adults, previous works onCantonese-learning infants and children serve as important references asto the kind of phonological knowledge shared by homeland and heritagespeakers before their language development went on diverging pathstowards different bilingual configurations, as in Figure 1.1 from Chapter 1.These references can be pointers2 to whether an observed differencebetween homeland and heritage speakers is due to language attrition (i.e.it was acquired before but eroded or lost later) or incomplete acquisition(i.e. it has not been acquired in the first place). If heritage speakers lack acertain type of phonological knowledge that Cantonese-learning infants orchildren possess, it may be a sign of language attrition in heritage speakers.However, if heritage speakers, infants and children all lack a certain kind ofphonological knowledge that adult homeland speakers possess, it may be asign of incomplete acquisition by heritage speakers.Converging evidence from previous research suggests that Cantonese-learning infants are capable of detecting tonal contrasts as early as fourmonths of age. Yeung, Chen & Werker (2013) is a Vancouver-based study2I call them “pointers” only as I recognize that different methodological paradigms wereadopted in these studies, and adult heritage speakers were infants at a different point intime from the infants being tested.53on the tone discrimination abilities of Cantonese- and English-learninginfants at four months and nine months of age. In particular they focusedon the Cantonese tonal contrast of T2 [25] and T3 [33]. Accordingto the parents’ reports, all Cantonese-learning infants were exposed toCantonese exclusively at least 90% of the time, and did not spend a lotof time with English speakers. In other words, these infants had thepotential to become heritage speakers of Cantonese after the onset ofschooling, since English is the majority language for education and publiccommunication in Vancouver. Results of the study show that all four-month-old infants (both Cantonese- and English-learning) were sensitive to the T2-T3 contrast. However, the performance of nine-month-old infants differed:nine-month-old Cantonese-learning infants were able to maintain their tonediscrimination ability for T2 and T3, but English-learning infants failed todo so. The implication for adult heritage speakers is that if they fail todiscriminate T2 [25] and T3 [33], it would more likely be a sign of attritionthan incomplete acquisition.Several Hong Kong-based studies confirm the general tonediscrimination ability of Cantonese-learning infants and children, thoughtheir performance for different tone pairs varied. Lei (2007) adopted theConditioned Head Turn procedure (Eilers, Wilson & Moore, 1977) toinvestigate the perception of three level tones (T1 [55], T3 [33], and T6[22]) by Cantonese-learning infants at six and eight months of age. Resultssuggest a possible relationship between acoustic distance and ease ofdiscrimination. T1 [55] and T6 [22] are most different in terms of pitchheight, and were best discriminated by the subjects. On the other hand, T3[33] and T6 [22] have more similar pitch heights, and were comparativelymore difficult for the infants.In another Hong Kong-based study on children of two to three yearsof age, Lee, Chiu & van Hasselt (2002) found that f0 onsets play a moreimportant role in children’s tonal perception than f0 offsets do. Their stimulionly included three tones with the most salient perceptual cues: T1 [55] hasthe highest pitch height among the six tones and marks the upper boundaryof the Cantonese tonal space; T2 [25] has the biggest magnitude of pitch54change; lastly, T4 [21] marks the lower boundary of the tonal space. Theexperimenter read aloud words and non-words (presented as names of dollswith different facial expressions and costumes) live, and subjects were askedto point at a picture (for words) or a doll (for non-words) between twochoices that represented a tonally contrastive minimal pair. Their overallaccuracy was at 90.6% for words and 72.7% for non-words, but theirperformance for T2-T4 was significantly worse (87% for words and 66%for non-words) than that of T1-T2 (93% for words and 77% for non-words)and T1-T4 (92% for words and 75% for non-words). The authors point outthat T2 [25] and T4 [21] have similar f0 onsets, which makes them moredifficult to tell apart than the other two pairs with more distinct f0 onsets.This study concludes that three-year-old children’s tonal perception abilitieswere only partial.As for Cantonese-learning children of four to six years of age, two HongKong-based studies confirmed improvement of tonal perception abilitiesduring this period, but their accuracy is still significantly lower than adults’.In Ciocca & Lui (2003), all contrastive pairs in the Cantonese tone inventorywere tested with children of four, six, and ten years old as well as adults.They listened to recorded stimuli and were instructed to choose one of thetwo pictures for a minimal pair on a screen. Significant improvement ofaccuracy was observed between four and six years of age, as well as betweensix and ten years of age; after ten years of age no significant improvementwas observed. Among all tone pairs, T3-T6 and T2-T5 were the last to beacquired in perception (see also Ciocca & Ip, 2008). For the level tones T3[33] and T6 [22], four-year-olds’ accuracy was at chance (50%), six-year-olds’ was close to 80%, showing significant improvement, and ten-year-olds’was near 90%, close to adults’ 95%. For the rising tones T2 [25] and T5[23], no significant improvement was observed between age four and agesix—both groups achieved an accuracy rate close to 65%. Children at ageten were able to reach 70%, the lowest within this age group. Adults’accuracy did not reach ceiling and was only at 80%. In a more recentstudy by Wong & Leung (2018) who adopted a slightly different pictureidentification task (with four pictures instead of two: one target word, one55tonally contrastive competitor, two segmentally constrastive competitors),six-year-olds had fairly high accuracy (over 80% for all tones), but in generalit was still significantly lower than adults’ (99–100%). In sum, in bothCiocca & Lui (2003) and Wong & Leung (2018), six-year-olds’ performancehad not reached adult accuracy.These findings raise interesting questions for the case of heritagespeakers: if tonal perception is still not fully acquired at six years of age,what will happen if a child starts to receive significantly more exposureto English around this time? Will s/he continue to acquire Cantonesetonal contrasts, or will the acquisition be interrupted? If s/he continuesto acquire Cantonese tonal contrasts, will it be done through the lens ofEnglish phonology?3.3 Perception of Cantonese tones by adulthomeland speakersCantonese tonal perception by adult homeland speakers has been studiedextensively from a variety of perspectives, including contextual effects (Fox& Qi, 1990; Francis, Ciocca, Wong, Leung & Chu, 2006; Gu & Lee, 2007; Li,Lee & Qian, 2002; Wong, 2007; Zhang, Peng, Wang & Wang, 2015; Zheng,Peng, Tsang & Wang, 2006), speech rate effects (Wong, 2011), auditoryattention and memory (Law et al., 2013), normalization for intra- and inter-talker variation (Chang, Yao & Huang, 2017; Wong & Diehl, 2003; Zhang,Peng & Wang, 2011), temporal resolution (Yu, 2017), tone-intonationinteraction (Ma, Ciocca & Whitehill, 2006, 2011; Vance, 1976), cross-modalperception (Burnham, Ciocca, Lauw, Lau & Stokes, 2000; Burnham, Lau,Tam & Schoknecht, 2001), and comparison with machine recognition (Lee,Lau, Wong & Ching, 2002; Peng & Wang, 2005; Yu, 2017). This section willonly review two areas that are most relevant to the present study, namely theacoustic and perceptual correlates of tone, and sound change in progress.Results for homeland speakers in the present study were expected to begenerally similar to those in prior research.563.3.1 Acoustic and perceptual correlates of tone identityIn the literature on Cantonese tonal perception, it is generally agreedthat f0 is the main acoustic correlate of tone identity (Fok-Chan, 1974;Gandour, 1981; Khouw & Ciocca, 2007; Lee et al., 2015; Vance, 1977).In particular, Tong, Lee, Lee & Burnham (2015) found that the three mostimportant acoustic cues for accurate tonal perception among adult speakersare average f0, f0 onset, and f0 major slope. As for non-f0 cues, duration andintensity are not particularly useful for tonal perception (Fok-Chan, 1974;Khouw & Ciocca, 2007; Tong et al., 2015), even though T2 [25] has thelongest duration and T1 [55] has the highest intensity among all six tones(Tong et al., 2015). Lastly, creaky phonation is useful but not required foraccurate identification of T4 [21] (Yu & Lam, 2014). Each of the exploitedperceptual cues is explained as follows.The six Cantonese tones can be perceptually divided into “level” and“contour” according to whether there is a change of relative pitch heightwithin the syllable (Fok-Chan, 1974). Within the “no change of relativepitch” group, tones can be divided into “high level” T1 [55], “mid level”T3 [33], and “low level” T6 [22] by pitch height (Gandour, 1981). Itshould be noted that these tones are perceptually level, but acousticallythey show some f0 declination towards the end of the syllable (Li et al.,2002; Wong, 2006). Results of the tone identification task in Francis,Ciocca & Ng (2003) provide evidence that the perception of level tones iscategorical. Synthesized stimuli were created with a tonal continuum fromStep 1 “low” [22] to Step 10 “high” [55]. In their results, the categoryboundary between low level and mid level was clear around Step 4, whilethe boundary between mid level and high level was also clear between Step7 and Step 8. According to a study on the relationship between linguistictones and musical tones (Yiu, 2013), the perceptual distance between T1[55] and T3 [33] is two semitones, while the distance between T3 [33]and T6 [22] is only one semitone. T1 [55] is separated from other tones inthe perceptual tonal space, and often receives the highest accuracy score inperception studies (e.g. Lee et al., 2015). On the other hand, T3 [33] and57T6 [22] are closer together in the tonal space, and so the confusion rate ofT3-T6 is often higher than that of T1-T3 (e.g. Mok & Wong, 2010).Contour tones fall into the “with change of relative pitch” group, whichcan be further divided into “rising” (T2 [25], T5 [23]) and “falling” (T4[21]) by the direction of pitch change (Fok-Chan, 1974; Gandour, 1981;Khouw & Ciocca, 2007). Francis et al. (2003) confirmed that the perceptionof T2, T4, and T5 is categorical. Synthesized stimuli with a tonal continuumfrom Step 1 “low falling” [21] to Step 10 “high rising” [25] were created. Intheir results, the category boundary between low falling and low rising wasclear, showing a crossover between Step 3 and 4. The boundary betweenlow rising and high rising was also clear, showing a crossover at Step 7.Although perceptually T4 [21] and T6 [22] belong to the “contour” and“level” group respectively, acoustically the falling slope of T4 [21] is similarto that of T6 [22] in declination (Wong, 2006). Despite their acousticsimilarity, T4 [21] often receives a high score in word-identification tasks(e.g. Khouw & Ciocca, 2007), and the tone pair T4-T6 has low confusionrates in AX discrimination tasks (e.g. Mok & Wong, 2010). A possibleexplanation is that T4 [21] is sometimes but not consistently realized withcreaky voice quality. In the perception study by Yu & Lam (2014), creakyvoice was added to some T6 [22] utterances in the stimuli. Participantsresponded with T4 significantly more often when the stimuli had creakyvoice, even though the stimuli’s f0 had T6 [22] properties. The authorsconcluded that creaky voice facilitates identification of T4 [21] in additionto its low f0.The last perceptual dimension, namely “magnitude of pitch change”, isprimarily relevant for distinguishing between T2 [25] and T5 [23] (Fok-Chan, 1974; Gandour, 1981; Khouw & Ciocca, 2007; Vance, 1977). Both ofthese tones are “contour” and “rising”, but T2 [25] has a larger magnitudeof pitch change due to its higher peak in the offset compared with T5 [23].Since both tones have similar pitch onsets, the latter half of the syllablecontains most of the cues to separate these two tones (Khouw & Ciocca,2007; Lam, Hall & Pulleyblank, 2016). T2-T5 is often the most confusablepair in perception studies (e.g. the control group in Mok & Wong, 2010),58which led researchers to believe that the two tonal categories are in theprocess of being merged. The next subsection will elaborate on this soundchange in progress.3.3.2 Tone mergersCantonese speakers’ confusion between T2 [25] and T5 [23] in productionand perception has been observed since the 2000s (Kej, Smyth, So, Lau& Capell, 2002). Over the years the phenomenon went from being called“tone production errors” (Kej et al., 2002, p.35), to “some kind of changeis going on” (Bauer, Cheung & Cheung, 2003, p.222), to “initial stages oftone merging in progress” (Mok et al., 2013, p.364). Individuals who havemerged the tones show a lot of inter- and intra-speaker variations, whichare explained as follows.The first type of such variation is how the tones are merged: T5 merginginto T2, T2 merging into T5, or having one general rising tone that isdifferent from T2 and T5. In the production study by Kej et al. (2002),six out of 15 participants were found to have difficulties with T2 and T5,but their tonal production patterns differed. Three of them produced T5with T2-like offsets, two of them produced T2 with T5-like offsets, andone of them produced only one rising tone midway between T2 and T5.Fung & Wong (2011) looked into the acoustics of this “midway” rising toneproduced by six subjects in their early 20s, and found that the onset of thisrising tone is higher than T2 and T5, while its offset is similar to T2’s3.As a result, this rising tone has a slope similar to T5’s, even though it hasa T2-like offset. Fung, Wong & Law (2011) recorded words produced bytwo age groups (average 22.3 and 53.17 years), and concluded that olderspeakers tend to merge T5 into T2, while younger speakers tend to have the“midway” rising tone. The two age groups demonstrated merger by transferand merger by approximation respectively.Apart from inter-speaker variation, intra-speaker variation was alsoobserved. Bauer et al. (2003) investigated tone production by two speakers3The authors did not say this, but as far as I understand, this “midway” tone would be[45] in Chao’s tone numerals.59(30 and 35 years of age) in two experimental conditions. In the firstcondition, participants were instructed to read aloud Cantonese words oneby one, each of which had one of the six phonemic Cantonese tones. Inthe second condition, all words to be read aloud were T2-T5 minimal pairspresented in two columns, and so participants were more aware of whatwas being tested. The first subject produced relatively different offsets forT2 and T5 in the first condition, but produced only T5-like tones in thesecond condition. The second subject produced both T2 and T5 with T2-like offsets in both conditions. In sum, their production patterns varieddepending on the experimental condition. Seeing that merging may nothappen across all words with the same tone, Mok et al. (2013) explorethe relationship between word frequency and merger, but did not find anycorrelation between the two.Speakers who merge in production do not necessarily merge inperception. In the AX discrimination task of Mok & Wong (2010) and Moket al. (2013), participants who did not produce the T2-T5 contrast achieveda high discrimination accuracy rate of 90%, although their reaction timewas longer than those who did not merge the tones in production. Theauthors commented that it could be a sign that the merger was still inprogress, or the nature of the AX discrimination task caused participants toattend to acoustic details that they might not normally notice in naturalisticspontaneous speech.If the T2-T5 contrast is difficult for homeland speakers, will it be difficultfor heritage speakers as well? On the one hand, if heritage speakers alsofind these two tones perceptually similar, it is possible that the failure toperceive the contrast will lead to T2-T5 merger. On the other hand, in socialdialectology, the language variety spoken in a geographically separatedcommunity is relatively more resistant to language change than that ofthe mainstream linguistic population (Wolfram & Schilling-Estes, 2003). Ifthe parents of heritage speakers in the present study had migrated awayfrom Hong Kong before T2-T5 merger became a trend, they may havepassed on a merger-free variety of Cantonese to their children. It remainsa question whether the merger phenomenon is unique to the homeland60speaker population.3.4 Perception of Cantonese tones bynon-Cantonese speakersTonal perception by na¨ıve listeners with no Cantonese proficiency cantease apart which tonal contrasts are perceptually distinct on a universalbasis, and which contrasts are better perceived by Cantonese speakersonly. Previous studies have looked into the perception of Cantonese tonesby native speakers of tone and non-tone languages, including English(Francis, Ciocca, Ma & Fenn, 2008; Qin & Mok, 2011, 2013), French (Qin& Mok, 2011, 2013), Tagalog (Chung, 2009), Thai (Burnham, Lau, Tam& Schoknecht, 2001; Chung, 2009), and Mandarin (Chang et al., 2017;Francis et al., 2008; Lee et al., 1996; Qin & Mok, 2011, 2013). BecauseEnglish is the dominant language of heritage Cantonese speakers in Canada,this section will only review perception studies about English speakerswith no prior Cantonese knowledge. If a dominant L2 is able to impactL1 phonology, heritage speakers’ confusion patterns in the current studyare expected to share similarities with those of Cantonese-na¨ıve Englishspeakers.Two studies that implemented different experimental paradigms came tothe same conclusion that English speakers find it challenging to distinguishlow tones that only differ in the direction of pitch change. In Francis et al.(2008), English speakers with no prior knowledge of any tone languagewere instructed to perform a forced-choice Cantonese word-identificationtask before and after a training phase (ten hours in total over the courseof 16 to 30 days). In each trial, six tonally contrastive lexical items wereshown on the screen along with their pitch contour pictures, romanization,and English translation. Before the training phase, the participants achievedan accuracy rate of at least 80% for T1 [55], T2 [25], and T3 [33]. They hada lower accuracy for the three low tones with different contours, namely T4[21] (61%), T5 [23] (45%), and T6 [22] (18%). The subjects heard T4 [21]as T6 [22] 32% of the time, and T6 [22] was heard as T4 [21] 21% of the61time. T5 [23] was heard as T3 [33] 30% of the time, but confusion in theother direction was rare (only 5%). After the training phase, the subjects’overall accuracy increased by 15%. Their post-training accuracy rates for T4[21], T5 [23], and T6 [22] were 74%, 77%, and 51% respectively. Althoughthey showed improvement, T6 [22] was still the hardest after the trainingphase. Results of another study by Qin & Mok (2011) were in accordancewith those of Francis et al. (2008). In their AX discrimination task in whichparticipants were asked whether two sounds were the same or different,English speakers erroneously responded with “the same” 60% of the timewhen T5 [23] and T6 [22] were presented. Both Francis et al. (2008) andQin & Mok (2011) performed multidimensional scaling analyses for theirown similarity rating task. Although their interpretations of the dimensionsare different, results of both studies agree that T5 [23] and T6 [22] are veryclose in the perceptual space of English speakers.One difference between the two studies’ results, however, was theEnglish speakers’ confusion patterns of T2 [25], T3 [33], and T5 [23]. In theAX discrimination task of Qin & Mok (2011), the error rate for the pair T2-T5 was 75%—the highest among all pairs. The error rate for the pair T3-T5was only 12.5%. In other words, the low rising tone T5 [23] was perceivedas more similar to the high rising tone T2 [25] than to the mid level toneT3 [33]. In their multidimensional scaling analysis, one dimension waslabelled as “starting pitch height”. Since T2 [25] and T5 [23] have similarstarting pitch heights, they are close along this dimension. On the otherhand, T5 [23] and T3 [33] have different starting pitch heights, and so arefarther apart along this dimension. By contrast, in Francis et al. (2008), T5[23] was rarely heard as T2 [25] (5% in pre-training, 2% in post-training);instead, T5 [23] was confused with T3 [33] more often (30% in pre-training, 7% in post-training). Therefore, in this study’s multidimensionalscaling analysis, one dimension was labelled as “pitch height” in an overallsense, not referring to the starting or ending pitch in particular. Since T5[23] and T3 [33] have similar overall pitch heights, they are close alongthis dimension. T2 [25] and T5 [23] have a bigger difference of overallpitch height, and are father apart along this dimension. In sum, English62speakers’ perception of T2-T5 in Qin & Mok (2011) was more similar to thatof homeland Cantonese speakers.Authors of both studies relate their findings to suprasegmental patternsin English prosody. Qin & Mok (2011) explain that f0 is one of the acousticcorrelates of English lexical stress (Lieberman, 1960). In addition, in Englishintonation there are high or low boundary tones at the edge of phrasesor sentences (Liberman, 1975; Pierrehumbert, 1980). Therefore, Englishspeakers have sensitivity to pitch height differences, which explains whythey did well in discriminating the three level tones in Cantonese. SinceEnglish boundary tones are either high or low, Cantonese low falling, lowrising, and low level tones were perceived as the same category, namely“low”. Therefore, English speakers did less well in perceiving the differencebetween level and contour tones within the lower pitch range. Lastly,Francis et al. (2008) point out that English questions have a rising intonationcomparable to Cantonese T2 [25]. English speakers may have assimilateda Cantonese lexical category into an English intonational category, and sowere able to do well. As for T5 [23], although it is also a rising tone, itsmagnitude of pitch change may not be sufficient to be assimilated into therising category in English intonation. This may explain why T5 [23] wasrarely confused with T2 [25].In sum, both Francis et al. (2008) and Qin & Mok (2011) agree thatCantonese-na¨ıve English speakers are more sensitive to the average pitchheight than the direction or magnitude of pitch change. If heritage speakersin the present study also have difficulty discriminating T4 [21], T5 [23],and T6 [22], it may be evidence of cross-language effects from the non-lexical suprasegmental phonology of a dominant L2 to the lexical-tonalsuprasegmental phonology of a non-dominant L1.3.5 Tone and heritage speakers of CantoneseTwo studies on adult Cantonese speakers in Canada reported differentfindings from each other with respect to tonal production and perception.The earlier one by So (2000) was based in Metro Vancouver, British63Columbia, and involved three groups of participants categorized by their ageof arrival in Canada: before seven, between 10 and 15, and after 16. Thelater study by Soo & Monahan (2017) was based in Toronto, Ontario. Theyrecruited participants from Canada and Hong Kong, who were categorizedinto two groups by their dominant language. A comparison of theirmethodologies and outcomes is presented as follows.With regard to production, So (2000) found that participants whomoved to Canada before age seven (and so were most similar to heritagespeakers as defined in this dissertation in terms of linguistic background)had a significantly smaller tonal space than those who arrived after 16. Inher study, target words were elicited in both isolated and embedded forms.f0 values were first transformed into musical semitones, and were furthertransformed into Chao’s tone numerals. Participants who arrived in Canadabefore age seven produced the high level tone (T1) with a pitch that wascomparable to [44]. Since the upper boundary of the tonal space was lowerthan the usual [55], their whole tonal space was compressed. The offset ofthe high rising tone (T2, normally [25]) became lower, resulting in a risingtone like [2 3.5]. Similarly, the offset of the low rising tone (T5, normally[23]) also became lower, resulting in a rising tone like [2 2.5]. As a result,the difference of the offsets between T2 and T5 became smaller comparedwith participants who arrived after age 16, the group that was most similarto homeland speakers. The author also measured percentage change inf0 along different sections of the syllable for contour tones, and confirmedthat the two groups produced rising tones with significantly different slopes.These findings coincide with Chang & Yao (2016) on heritage speakers ofMandarin, in that heritage speakers diverged from native norms in terms oftonal production.Soo & Monahan (2017), however, reported that homeland4 and heritagespeakers produced tones with similar slopes. In their study, participantswere instructed to read aloud words in isolation. f0 values were extractedat eight equally spaced points along the duration of the syllable and4In the original study, this group was referred to as “native speakers”.64normalized. Results showed that homeland and heritage speakers producedoverlapping pitch contours for each of the six lexical tones, and no deviationof tonal production was found in the heritage group.Results of the perception portion of the two studies also differed. So(2000) adopted the word identification paradigm, where participants hearda monosyllabic word and were asked to choose one of the six pictures thatrepresented what was heard. The average accuracy of those who arrivedbefore age seven (59.38%) was significantly lower than those who arrivedafter age 16 (87.50%). The former group’s confusion patterns showed thatthe tonal contrasts of T2-T5, T3-T6, and T4-T6 were difficult to perceive. Inaddition, incorrect T1 responses were spread over to target tones T2, T3,T4, T5, and T6. The author concluded that those who arrived in Canadabefore age seven experienced confusion with all tones.Using the AX discrimination paradigm, Soo & Monahan (2017) reportedthat their two groups’ performances did not differ significantly. In each trialparticipants listened to a pair of tonally contrastive words, and were askedto choose between “same” or “different”. D-prime scores of the two groupswere compared, and there was no effect of group. The authors admitted thatthe results were somewhat surprising, and pointed out that the nature of theAX discrimination task taps into the phonetic-perceptual level of processing,but not the abstract-phonological level. While the task was able to showthat heritage speakers were aware of the phonetic differences between twostimuli, it was not able to show whether the listeners used this phoneticinformation to draw lexical contrasts. This comment was similar to thatof Mok et al. (2013), who found that homeland speakers merging tones inproduction could still do well in the AX discrimination task. To sum up, thedifferent methodologies being used, different linguistic dynamics betweenthe Metro Vancouver and Greater Toronto areas, along with a 17-year gapbetween So (2000) and Soo & Monahan (2017), may have led to differentresults and conclusions.653.6 Hypotheses to be testedBuilding on previous research discussed in Chapter 2 and Chapter 3, thisdissertation posits three hypotheses in (14) with regard to homeland andheritage speakers’ perception of Cantonese lexical tones. First, studies likeCelata & Cancila (2010) point out that heritage speakers are less sensitive toL1 sound contrasts that are not phonemic in L2, their dominant language.Since Cantonese is a tone language and English is not, I hypothesize thatheritage speakers of Cantonese, who are English-dominant, make less useof tonal information for word identification.Second, basing on the findings of So (2000) that individuals who hadmigrated from Hong Kong to Canada before seven years of age experiencedconfusion with all tones, I hypothesize that heritage speakers born andraised in Canada will have a similar pattern, in that they show moreconfusion than homeland speakers when asked to identify a word froma tonally contrastive set. Although less confusion is anticipated fromhomeland speakers, they are expected to show confusion between T2 [25]and T5 [23] due to the ongoing sound change mentioned in Section 3.3.2.I am agnostic as to whether heritage speakers follow this particular trend oftone merger.The last hypothesis stems from the first: if heritage speakers makeless use of tonal information, how can they understand a tone language?In everyday utterances, words come in sentences that provide semanticcontext, which serves as non-acoustic top-down cues for the listener. Ihypothesize that semantic information is especially useful for heritagespeakers. Even if a word is perceptually confusable with its tonallycontrastive set, semantic context can help to resolve any potential ambiguity.(14) Hypotheses of the current studya. Compared with homeland speakers, heritage speakers make lessuse of tonal information for word identification.b. Homeland and heritage speakers exhibit different confusionpatterns with respect to lexical tone perception.66c. Compared with homeland speakers, heritage speakers rely onsemantic information to a greater degree.Apart from filling an empirical gap, the current study also has theobjective of building a methodology that can better tackle the followingissues. First, how can we be sure that differences between homeland andheritage speakers, if any, are mainly due to their tonal perception abilities,but not due to lower overall Cantonese proficiency, or lower comfort levelwith performing a Cantonese task in a laboratory setting? Second, in dailylanguage use, words are embedded in sentences and are rarely utteredin isolated forms. How do heritage speakers make use of acoustic tonalinformation when non-acoustic cues are available as well? The next chapterwill explain how variable manipulation in the present study can tease toneapart from general factors, and at the same time put tone into competitionwith semantic information.67Chapter 4MethodologyThis chapter is a walkthrough of the experimental design and procedures oftwo pilot studies and the main study. Section 4.1 presents an overviewof the word identification paradigm and independent variables beingmanipulated. Section 4.2 and Section 4.3 provide a detailed explanationof how target words and carrier phrases were selected through Pilot Study1 and 2 respectively. Section 4.4 describes the materials, procedures, andparticipants of the main study.4.1 An overview of the experimental designBefore the discussion on research methodology, I recapitulate the purposeof this study. Research questions listed below are repeated from (1) inChapter 1.(1) Research questions of the current study (repeated)a. Do homeland and heritage speakers behave differently in terms oftheir ability to identify tonally contrastive words?b. Do homeland and heritage speakers exhibit similar confusionpatterns with respect to lexical tone perception?c. Do homeland and heritage speakers make use of the same typeof information when identifying a word from a tonally contrastive68set? In particular, are acoustic and semantic information equallyuseful?The following subsections will discuss how the experimental design ofthe current study addressed these research questions.4.1.1 The word identification paradigmTo investigate homeland and heritage speakers’ tonal perception on thelexical phonological level, a forced-choice word identification paradigmwas adopted. Every trial in the experiment followed the same procedures:pictures representing the target word and its tonally contrastive competitorswere presented on a computer screen. Each of these pictures correspondedto a button on a response device. At the same time an audio stimuluscontaining the target word was presented. In all cases, this stimulus endedwith the target word, but the exact nature of the stimulus varied during thecourse of the experiment, as will be described in Section 4.1.3. Participantswere asked to identify the word they heard and respond by pushing thecorresponding button on the device. Their accuracy rate—the dependentvariable of this study—was measured and compared. The use of the sameparadigm throughout the study allowed direct comparison of results.Although only one paradigm was adopted, different types of stimuli wereused to address different questions. A summary of these stimuli is given inTable 4.1. The “Type” column shows how each stimulus type is referred toin subsequent discussion. Types 1, 2, and 3 were monosyllablic words foranswering the question in (1a) : do homeland and heritage speakers behavedifferently in terms of their ability to identify tonally contrastive words?Types 4, 5A, 5B, 6A, and 6B were sentences with a semantic context. Theirpurpose was to answer (1c) : do the two groups find acoustic and semanticinformation equally useful when discriminating tonally contrastive words?All stimulus types can answer the question in (1b) : do the two groupsexhibit similar confusion patterns with respect to lexical tone perception?69Table 4.1: A summary of stimulus types and procedures of the main studyDAY 1 No. of trialsPre-tasks: Story listening 1Picture learning 31Practice trials 8Block Type Segment Tone Context Congruity Example of stimuli No. of trialsFirst 1 3 3 7 not appl. fan3 “sleep” 60First 2 3 7 7 not appl. fan “sleep” 60First 3 7 3 7 not appl. 3 “sleep” 60Second 4 3 7 3 3 sap ji dim zung hou soeng cong fan 60“At twelve (you) should go to bed and sleep”Post-task: Language background questionnaireDAY 2 No. of trialsPre-tasks: Picture learning 31Practice trials 8Block Type Segment Tone Context Congruity Example of stimuli No. of trialsThird 5A 3 3 3 3 sap6 ji6 dim2 zung1 hou2 soeng5 cong4 fan3 60“At twelve (you) should go to bed and sleep”Third 5B 3 3 3 7 sap6 ji6 dim2 zung1 hou2 soeng5 cong4 fan2 60*3“At twelve (you) should go to bed and powder”Third 6A 7 3 3 3 sap6 ji6 dim2 zung1 hou2 soeng5 cong4 3 60“At twelve (you) should go to bed and sleep”Third 6B 7 3 3 7 sap6 ji6 dim2 zung1 hou2 soeng5 cong4 2 60*3“At twelve (you) should go to bed and powder”704.1.2 Variables being controlledAlthough Types 1, 2, and 3 were all monosyllables and Types 4, 5A, 5B,6A, and 6B were all sentences, each of them represented a different wayof controlling four independent variables in Figure 4.1: the availability of(i) segmental information, (ii) tonal information, (iii) semantic context,and (iv) semantic congruity of the target word with the carrier phrase. InTable 4.1, the tick 3 and cross 7 signs in the columns “Segment”, “Tone”,“Context”, and “Congruity” indicate how each stimulus type manipulatedthese variables. Note that “congruity” differs from the other three variablesin two ways. First, its applicability depends on another variable, namely“context”. If a stimulus was a monosyllabic word, it would have no semanticcontext, and in this case “congruity” would be “not applicable”, as in Types1–3 in Table 4.1. Therefore, “congruity” is immediately under “context” inFigure 4.1, rather than immediately under “semantic information”. Second,“segmental information”, “tonal information”, and “semantic context” wereeither present or absent; for congruity, when applicable, its configurationwas either congruous or incongruous, but not present or absent.4.1.3 Stimulus typesThis subsection explains the purpose of the eight stimulus types. (Fordetails of procedures, such as the number of trials per stimulus type, seeSection 4.4.2.) The expected result will also be discussed based on thehypothesis that heritage speakers rely less on tonal information but moreon semantic information.4.1.3.1 Monosyllabic wordsThe first three types of stimuli were all monosyllabic words. (For details ofhow these words were chosen, see Section 4.2.) They were similar in termsof the lack of semantic context. Therefore, a cross sign 7 was put in the“Context” column for Types 1, 2 and 3. Since the target words were not ina carrier sentence, semantic congruity was not applicable.The difference among Types 1, 2, and 3 lay in the kind of acoustic71Figure 4.1: A summary of dependent and independent variables of thisstudyinformation that was available. Type 1 stimuli were unmanipulated,so both segmental and tonal information were available. For example,participants would hear fan3, and had to decide whether the word wasfan2 “powder”, fan3 “sleep”, or some other tonally contrastive competitorshown on the screen. Since semantic information was unavailable but allacoustic information was available, this stimulus type was expected to befairly challenging but not extremely challenging for heritage speakers. Asfor homeland speakers who were hypothesized to rely less on semanticinformation, Type 1 was not expected to be challenging at all.Type 1 alone, however, would not be adequate to answer the firstresearch question. Even if the heritage group indeed had a lower accuracythan the homeland group, it could be due to general language proficiencydifferences, such as weaker listening comprehension skills or lower comfortlevel to complete a task in their non-dominant language. In this caseheritage speakers would perform worse than homeland speakers in any kindof Cantonese listening tasks. Therefore, this stimulus type alone would not72be sufficient to tease tonal perception abilities apart from overall languageabilities.Types 2 and 3 were therefore designed to contrast the two populations’tone discrimination abilities specifically. Containing only segmental but nottonal information, Type 2 was a control task that was expected to be equallychallenging for both populations. (For details of how tonal information wasremoved, see Section 4.4.1.1.) For example, participants would hear fan ,and had to decide whether the word was fan2 “powder”, fan3 “sleep”, orsome other tonally contrastive competitor shown on the screen. Since thetarget words and their tonally contrastive competitors were segmentally thesame (all were fan), it was anticipated that participants could only make aguess what the stimulus was. The expected accuracy for both groups wasequal to chance, which would show that heritage speakers did not simplydo worse than homeland speakers in any kind of Cantonese listening task.Unlike Type 2, Type 3 only contained tonal information but notsegmental information. (For details of how segmental information wasremoved, see Section 4.4.1.1.) Pitch was the only available acousticinformation, and no consonants or vowels could be identified. Forexample, participants would hear 3 (a pitch that corresponded toT3 [33]), and had to decide whether the word was fan3 “sleep”, fan2“powder”, or some other competitor. Since the target word and its tonallycontrastive competitors were segmentally the same, segmental informationwas expected to be redundant for these stimuli. A person who is capableof using tonal information should be able to identify the target word evenwhen segmental information is unavailable. As Type 1 was unmanipulatedspeech but Type 3 sounded unnatural (comparable to hummed speech), itwas expected that both populations would show a higher accuracy for Type 1than Type 3. Given the hypothesis that tonal information was less useful forheritage speakers, Type 3 was anticipated to be particularly difficult for theheritage group, so the biggest accuracy gap between homeland and heritagespeakers was expected.Types 1, 2, and 3 were randomized in the same block during theexperiment. They were not put into separate blocks, because Type 2 (and73possibly Type 3) in its own block could be a potentially frustrating task.To motivate participants to pay attention, monosyllabic stimuli of differentlevels of difficulty were mixed in the same block.4.1.3.2 SentencesDesigned to answer the third research question regarding the use ofacoustic versus semantic information, the rest of the stimulus types (4,5A, 5B, 6A, 6B) offered semantic information in addition to segmentalor tonal information. These stimuli had target words embedded in acarrier sentence. (For details of how to decide which sentences to use, seeSection 4.3.) Since a context was provided, all of them got a 3 sign for the“Context” column in Table 4.1.Presented on Day 1 after Types 1–3, Type 4 was a control task containingsentences with no tonal information. Participants were asked to identifythe last word of the sentence, such as sap ji dim zung hou soengcong fan “At twelve (you) should go to bed and sleep”, where theunderscore indicates an absence of pitch information. The monotonousstimuli resembled alaryngeal speech produced by individuals who use anelectrolarynx to speak after surgical removal of the larynx. Law, Ma & Yiu(2009) report that although alaryngeal Cantonese speakers had difficultiesproducing varying pitches, the sentences that they produced were fairlyintelligible. In their sentence intelligibility test, Cantonese speakers of age25–33 with no speech impairment and no prior experience with alaryngealspeech were asked to transcribe sentences uttered by electrolarynx users.According to their results, electrolaryngeal speech received an averageintelligibility score of 77.3%. For this reason, in the current study,monotonous Type 4 stimuli were expected to be reasonably comprehensibleas well, and semantic information was considered available in these stimuli.Therefore in Table 4.1, Type 4 got a 3 sign for the “Context” column. Sincethe (non-)use of tonal information was hypothesized to be the key differencebetween homeland and heritage speakers, it was anticipated that removingsuch information would render the two groups equal. The predicted result74for this task was that heritage and homeland speakers would achieve asimilar level of accuracy.The rest of the stimulus types were presented on Day 2 of theexperiment. Types 5A and 5B contained the same acoustic information(hence both were called Type 5), and they only differed by congruity:5A was congruous but 5B was incongruous. Acoustically they wereboth unmanipulated, hence keeping all segmental and tonal information.However, semantically, Types 5A and 5B were not the same. For Type 5A,the target word was congruous with its carrier phrase in terms of meaning.For example, in sap6 ji6 dim2 zung1 hou2 soeng5 cong4 fan3 “At twelve (you)should go to bed and sleep”, the target word fan3 “sleep” was relevant tothe rest of the sentence, as the function of a bed is to provide a sleepingsurface. Note that 5A was the only stimulus type that received a 3 markfor all four variables. Since it offered the most information, it should be theeasiest task for both populations.Type 5A alone, however, would not be sufficient to answer the thirdresearch question. If someone obtained a high accuracy for this stimulustype, there could be several possible explanations. First, s/he attendedto the tonal information and made a decision solely based on what wasavailable in the acoustic signal. Second, s/he did not pay attention to thetonal information at all, but simply chose a word that would make sense ina given sentence. Third, s/he made use of both the acoustic and semanticinformation in the stimuli. This would fail to answer the question of whetherheritage speakers tend to rely more on one type of information than another.The next stimulus types were designed to solve this problem.Type 5B stimuli were sentences that did not make sense, as the targetword was semantically incongruous with the carrier phrase. An examplewas sap6 ji6 dim2 zung1 hou2 soeng5 cong4 fan2 “At twelve (you) should goto bed and powder”. In terms of meaning there is no relationship betweengoing to bed at twelve and the action of powdering. Note that in thebeginning of the experiment, participants were explicitly told that somesentences might not make sense, and all they had to do was to identify thelast word that they heard. (For details of instructions given to participants,75see Section 4.4.1.4.) In other words, they were instructed to attend toacoustic information. If someone was able to pick the correct answer (fan2“powder”), it would show that s/he was able to use tonal information andactively choose not to rely on the semantic context. However, if someoneconsistently chose the one that made sense (fan3 “sleep”), it would suggestthat s/he over-relied on semantic information. Given the hypothesis thatheritage speakers rely more on semantic than acoustic information, it wasexpected that this task would be significantly more challenging for heritagethan homeland speakers.The last two stimulus types, namely 6A and 6B, contained less acousticinformation compared to 5A and 5B: all segmental information of the targetword1 was removed, but its tonal information was kept. Type 6A was thecongruous version, such as sap6 ji6 dim2 zung1 hou2 soeng5 cong4 3 “Attwelve (you) should go to bed and sleep”. Type 6B, on the other hand, wasthe incongruous version, such as sap6 ji6 dim2 zung1 hou2 soeng5 cong42 “At twelve (you) should go to bed and powder”. Lacking segmentalinformation, these two stimulus types were expected to be more challengingthan 5A and 5B for heritage speakers.Types 5A, 5B, 6A, and 6B were randomized and presented as one blockon Day 2. If they were presented in the order of 5A, 5B, 6A, and 6B asfour separate blocks, it would potentially be a frustrating experience for theparticipants, as they would have to listen to many incongruous sentences ina row (5B and 6B), and they would also have to listen to many sentencesin which the last word had no segments in a row (6A and 6B). In this case,even if Type 5A had the highest accuracy and 6B had a low accuracy asexpected, it could be merely due to participants’ loss of attention over thecourse of the experiment session. To maintain morale of the participantsand to avoid potential noise in the results, these stimulus types were mixedin the same block.1Segmental information of words in the carrier phrase was not removed (if removed, itwould be like 6 6 2 1 2 5 4 3 “At twelve (you) should go to bedand sleep”), because the carrier phrase would be incomprehensible, hence unable to createa semantic context to get the predicted effect.764.1.4 SummaryTo sum up, only one paradigm, namely the word identification paradigm,was adopted for easy comparison of results. Eight types of stimuli weredesigned to control four variables: the availability of segmental information,tonal information, semantic context, and semantic congruity. The stimulustypes in Table 4.2 are the same as those of Table 4.1, but they are arrangedby the anticipated accuracy gap between homeland and heritage speakers.The predictions were based on the hypothesis that tonal information wasleast useful while semantic context was most useful for heritage speakers.It was expected that both populations would have similar performance forstimuli with no tone (Types 2 and 4). Type 5A which contained everythingshould reflect the baseline Cantonese listening abilities of the two groups.Types 1, 5B, 6A, and 6B were predicted to be more difficult as they lackedcertain information. Lastly, Type 3 only contained tonal information, whichwas hypothesized to be least useful for heritage speakers. As a result, theiraccuracy was predicted to be far lower than that of homeland speakers.77Table 4.2: Summary of stimulus types, arranged by the anticipated accuracy gap between homeland andheritage speakers (from smallest to largest)Type Description Segment Tone Context Congruity Predicted result2 Words with no tone 3 7 7 Not appl. Homeland = Heritage4 Sentences with no tone 3 7 3 3 Homeland = Heritage5A Normal sentences(congruous)3 3 3 3 Homeland > Heritage5B Normal sentences(incongruous)3 3 3 7 Homeland >> Heritage6A The last word of thesentence has nosegments (congruous)7 3 3 3 Homeland >> Heritage6B The last word of thesentence has nosegments(incongruous)7 3 3 7 Homeland >> Heritage1 Normal words 3 3 7 Not appl. Homeland >> Heritage3 Words with nosegments7 3 7 Not appl. Homeland >>> Heritage784.2 Pilot Study 1: Familiarity with target wordsThis section explains the purpose, procedures, and results of Pilot Study 1,a word familiarity rating task that determined target words to be used forthe main study.4.2.1 Background and purposeDue to heritage speakers’ weaker Chinese literacy skills on average, choosingtarget words for a Cantonese word-identification task can be challenging. Inthe present study, the following criteria were used for choosing target words:availability of minimally contrastive sets, usage in daily life, imageability,and lastly, familiarity, which is the focus of Pilot Study 1. Each of thesecriteria is explained as follows.Since this dissertation aims at investigating homeland and heritagespeakers’ tone discrimination abilities, the target words to be used shouldbe tonally contrastive minimal sets. In other words, members of a wordset should have the same segmental phonemes, but differ in lexical toneonly. In previous works on homeland Cantonese speakers (e.g., Burnham,Ciocca, Lauw, Lau & Stokes, 2000; Ciocca & Lui, 2003; Fok-Chan, 1974),minimal sextuplets were used as target words, which are summarized inTable 4.3. Although all phonotactically well-formed syllables without anobstruent coda can be uttered with any of the six lexical tones, not allsyllable-tone matches are meaningful. Take the syllable wu as an example:wu1 “dirt”, wu2 “pot”, wu3 “nuisance”, wu4 “lake”, and wu6 “mutual” are allreal words, but wu5 does not mean anything in Cantonese. In other words,wu5 is an accidental gap. In fact, such gaps are everywhere in the lexicon.Only a limited set of syllables (fan, fu, jan, jau, ji, jyun, lau, se, seoi, si, andwai) can match with every single lexical tone and result in a meaningfulword.In addition to the limited number of full contrastive sets, a word’s usagein daily life (and the lack thereof) should be taken into consideration,particularly for a study about heritage speakers. For instance, the syllablejyun has a full contrastive set, as in Table 4.3. However, the word jyun179“Mandarin duck” is commonly used in romantic poetry, but rarely heardin daily conversations. Since heritage speakers acquire the language in afamily setting and receive education in a dominantly English environment,it is unlikely for them to be familiar with this literary term. Furthermore,as discussed in Section 1.4.7 previously, certain words are more frequentlyused in Standard Chinese, which must be learned in formal languageclasses. Examples from Table 4.3 include fan5 “diligence”, jan6 “pregnancy”,seoi5 “mental state”, jau3 “young”, se1 “some”, wai5 “great”, jau1 “rest”, andji3 “idea”, which have more colloquial counterparts in spoken Cantonese.Using these words in the study may lead to undesirable consequences:if heritage speakers demonstrate a low accuracy in a word-identificationtask, it would be difficult to conclude whether it is due to their inability todistinguish lexical tones, or their lack of familiarity with these lexical items.Meanings that are hard to express through pictures cannot be usedeither. Written Chinese characters are largely opaque to their pronunciationand must be learned through formal education. As data from the linguisticbackground questionnaire in Section 4.4.3.3 will show, most heritagespeakers in the subject pool were not literate in Chinese. To eliminate thepossibility that a low accuracy rate was due to participants’ lack of readingskills, pictures instead of written characters were presented as choices in theexperiment. Among the words in Table 4.3, jau5 “have”, lau3 “instigate”,seoi3 “tax”, wai1 “might”, and wai6 “position” have relatively abstractmeanings that are difficult to be represented by drawings. Therefore, eventhough they are used in daily contexts, their respective tonal sextuplets werenot used.So far within the limited set of syllables that can generate meaningfultonal sextuplets, fan, jan, jau, ji, jyun, lau, se, seoi, and wai have failedto meet the criteria above perfectly; only si and fu were left as usablecandidates. If all 720 trials in the current study were si and fu only, theexperiment might become a tedious task, and participants might lose theirattention quickly, affecting their accuracy. In addition, using a variety ofsyllables and a robust set of lexical items can enhance the generalizability ofresults. It was thus necessary to explore options other than tonal sextuplets.80Table 4.3: Minimal sextuplets used in previous studiesAuthor(s) Syllable T1 T2 T3 T4 T5 T6Fok-Chan (1974) fu man bitter richness help woman fatherjyun Mandarin duck gentle grumble finish soft wishYiu & Fok (1995) ji cure chair idea child ear twoBurnham et al. (2000) fu husband tiger rich hold woman fatherSo (2000) fu husband tiger pants symbol woman tofusi lion history attempt time city trained personCiocca & Lui (2003) ji clothing chair Italian child ear twoKhouw & Ciocca (2007) si poetry history try time city surnameFrancis et al. (2008) se some write spill snake society shootsi thought history try time market eventjau rest grapefruit young from have andfan separate noodle command burn diligence portionfu husband bitter rich appropriate woman negativeji cure chair meaning son ear twoKung et al. (2014) jan joy endure print human pull/draw pregnancysi teacher history try time market matterseoi bad water tax hang down mental state sleepfu husband bitter rich symbol woman negativewai might destroy comfort encircle great positionYu & Lam (2014) lau angry twist instigate stay willow leakLam et al. (2016) fu exhale bitter wealth match woman loadse some write diarrhea snake society shootsi poetry history time time market right81A solution to the aforementioned problem was to use counterbalancedtonal quadruplets instead of sextuplets, as in Table 4.4. The first columnof the table shows the list of all possible four-tone combinations. Sincethere are six lexical tones in the inventory, the number of possible four-tonecombinations is 15. All permutations of the same four tones (e.g., [1 2 3 4],[2 1 3 4], [3 1 2 4], and [4 1 2 3]) were considered the same set, namely[1 2 3 4]. (For details of how the four tones were ordered in the mainstudy, see Section 4.4.1.3.) The second column of the table shows the fivesyllables to be used: fan, fu, ji, se, and si. Each syllable was matched withthree tone sets. For example, the syllable ji was matched with [1 2 4 5], [14 5 6], and [2 4 5 6], thus avoiding the abstract word ji3 “idea”. Similarly,the syllable se was matched with [2 3 4 5], [2 3 5 6], and [3 4 5 6], henceavoiding se1 “some”, which is hard to be represented by a picture. Lastly,fan was matched with [1 2 3 6], [1 3 4 6], and [2 3 4 6], avoiding fan5“diligence”, which is more commonly used in Standard Chinese. To sum up,three syllables (ji, se, fan) occurred with only five of the six tones, while theother two syllables (fu, si) occurred with each of the six tones. The numberof unique words was therefore (3*5 + 2*6), yielding 27 .If a syllable-tone match has many homophones, the meaning with thehighest imageability (along with familiarity, which will be discussed later)would be chosen. As shown in Table 4.3, si1 has many homophones. Whenspoken, it could mean “lion”, “poetry”, “thought”, or “teacher”. Sincepictures were to be presented to elicit response, “poetry” and “thought”were less desirable candidates; instead, “lion” was chosen to be used in theexperiment. A noteworthy case is si2, which can mean “history” or “poop”unless disambiguated by writing. In all studies listed in Table 4.3, only thecharacter for “history” was used, possibly because words related to bodilyfunctions were not preferred in contexts like an academic experiment. Forthe purpose of the current study, however, si2 “poop” matched more criteriato be a usable target word. First, it can easily be represented by a picture(see Appendix A for the actual picture used in the main study). Second, itis a common word in infant-directed speech. According to Frank, Braginsky,Yurovsky & Marchman (2017) and Tardif, Fletcher, Liang & Kaciroti (2009),82Table 4.4: Tonal quadruplets used in the current studyTone set Syllable T1 T2 T3 T4 T5 T61 2 3 4 si lion poop try key1 2 3 5 fu exhale tiger pants woman1 2 3 6 fan share powder sleep portion1 2 4 5 ji cure chair child ear1 2 4 6 si lion poop key nurse/trained person1 2 5 6 si lion poop market nurse/trained person1 3 4 5 fu exhale pants help woman1 3 4 6 fan share sleep tomb portion1 3 5 6 fu exhale pants woman negative1 4 5 6 ji cure child ear two2 3 4 5 se write diarrhea snake society2 3 4 6 fan powder sleep tomb portion2 3 5 6 se write diarrhea society shoot2 4 5 6 ji chair child ear two3 4 5 6 se diarrhea snake society shootTotal number of unique words: 27most Cantonese-learning infants from Hong Kong had acquired the word si2“poop” by 30 months of age. Therefore, adult heritage speakers who grewup listening to Cantonese in a family setting were expected to be familiarwith this lexical item.Lastly, similar pictures within the same tone set were avoided. Considerthe example of fu from the first row of Table 4.3. Fu1 “man” and fu6“father” are both male human beings. It was possible to draw a babynext to the “father” to distinguish it from “man”, but still “father” logicallyentails “man”. To solve this problem, only words that do not incur semanticconfusion with another word in the tone set were chosen. For instance,in Table 4.4, fu1 “exhale”, fu2 “tiger”, fu3, “pants”, fu4 “help by holdingsomeone’s arm”, fu5 “woman”, and fu6 “negative” are all semanticallydistinct. The same principle was applied to pictures for all tone sets. Formore details about pictures, see Section 4.4.1.2 and Appendix A.The last criterion for choosing target words was familiarity, which is not83the same as word frequency. Familiarity is a subjective measure based onfamiliarity ratings to reflect a speaker’s experiential encounter with a lexicalitem (Connine, Mullennix, Shernoff & Yelen, 1990). Word frequency, on theother hand, is an objective measure based on the number of occurrences ofa word, usually in a corpus. A word can have a low frequency but receivea high familiarity rating. Using the data from The Teacher’s Word Book of30,000 Words (Thorndike & Lorge, 1963), Gernsbacher (1984) points outthat the English word ultra had a low frequency (237 occurrences per onemillion words), but it was rated as highly familiar, comparable to super, aword that was both highly familiar and highly frequent (8,031 occurrencesper one million words). Although ultra (237 occurrences) was similar totwixt (287 occurrences) in terms of word frequency, ultra received a highfamiliarity rating, while twixt had a low familiarity rating. A summary ofthe comparison is provided in Table 4.5. These examples show that there isno direct mapping between word frequency and familiarity.Table 4.5: Comparison of word frequency per one million words andfamiliarity ratings of three English words (Gernsbacher, 1984)Word Frequency Familiarity ratingsuper high (8,031) highultra low (237) hightwixt low (287) lowAs for Cantonese, word frequency data are available from the HongKong Cantonese Corpus (Luke & Wong, 2015), a database comprising180,000 word tokens and representing spontaneous speech produced by100 speakers from Hong Kong. A word search for the proposed target wordsin Table 4.4 was performed, and their frequency in the corpus is summarizedin Table 4.6. A large range of frequency can be observed: ji6 “two” was veryfrequent (199 occurrences), but ji2 “chair” was not even found in the corpus(0 occurrences). If word frequency was the sole criterion for choosing targetwords, most of the words in Table 4.6 could not be used.84Table 4.6: Word frequency of target words out of a total of 180,000word tokens in the Hong Kong Cantonese Corpus (Luke & Wong,2015)Written form Word Meaning Frequency分 fan1 share 124粉 fan2 powder 7fan3 sleep 30墳 fan4 tomb 0份 fan6 portion 12呼 fu1 exhale 8虎 fu2 tiger 1褲 fu3 pants 6扶 fu4 help by holding another person’s arm 1婦 fu5 woman 9負 fu6 negative 10醫 ji1 cure 60椅 ji2 chair 0兒 ji4 child 2耳 ji5 ear 6二 ji6 two 199寫 se2 write 68瀉 se3 diarrhea 1蛇 se4 snake 6社 se5 society 1射 se6 shoot 5獅 si1 lion 3屎 si2 poop 1試 si3 try 77匙 si4 key 3市 si5 market 1士 si6 nurse/trained person 43Since word frequency does not equal familiarity, a separate measurefor familiarity was used. To confirm that Cantonese speakers were indeedfamiliar with the words listed in Table 4.4, Pilot Study 1 was conducted inthe form of a word familiarity rating task. Although it was assumed that all85homeland speakers would be very familiar with all proposed target words,it was not the case for heritage speakers, who had varied exposure to thelanguage. It was thus necessary to make sure that heritage speakers werereasonably familiar with these words. If they did not know a particular wordat all, they would not know its tone either. In this case a low accuracy ratefor that word in the main study would not help conclude anything abouttheir tone discrimination ability. For this reason, the heritage group wasthe population of interest of this pilot study, even though data from bothhomeland and heritage speakers were collected and analyzed.4.2.2 ProceduresPilot Study 1 was conducted in the form of an online questionnairehosted on UBC FluidSurveys (FluidSurveys, 2017). Since the main targetpopulation of this pilot study was heritage speakers, the online surveywas circulated on social media platforms mainly among UBC studentcommunities and interest clubs, rather than Hong Kong-based onlinecommunities. The first page of the survey was a consent form, statingthat participation was on a voluntary basis, no personally identifiableinformation would be elicited, and all collected data would be used foracademic purpose only. Anyone who had at least one Cantonese-speakingparent was invited to participate. A high fluency in Cantonese or the abilityto read Chinese was not required. Since there were international studentsfrom Hong Kong studying at UBC, it was anticipated that the survey mightreach homeland speakers as well. To separate their data from heritagespeakers’, a language background questionnaire2 was included between theconsent form and the main part of the pilot study. It contained questions onlanguage proficiency, countries lived in, and parents’ native languages.The main part of the questionnaire was entitled “How familiar are youwith this SPOKEN word?” The word “spoken” was emphasized, so it wasclear that participants were not supposed to give ratings for a writtencharacter. A list of 27 target words (see Table 4.4) was presented in random2The language background questionnaire used in Pilot Study 1 was a condensed versionof the one used for the main study. For details of the latter, see Section 4.4.3.2.86order. For each word, its romanized form and translation were provided,while its written form in Traditional Chinese was displayed in parenthesesfor reference. Since heritage speakers were expected to have limited readingskills for Chinese, the audio file of the word was available as well, as inFigure 4.2. Participants could click and listen to the word spoken by theauthor. They were asked to rate its familiarity on a four-point scale: “veryfamiliar”, “quite familiar”, “not so familiar”, and “not familiar at all”.Figure 4.2: A screenshot of Pilot Study 14.2.3 ParticipantsA total of 648 individuals attempted the questionnaire. Responses from286 individuals were excluded from analysis, either because they did notcomplete the whole survey, or they did not have any Cantonese-speakingparents. Among the 362 individuals who answered all questions on thesurvey and reported to have Cantonese-speaking parents, 237 of them grewup in Canada and so were categorized as heritage speakers. As expected,although the online survey was mainly circulated among communities of aCanadian university, it also reached international students originally fromHong Kong, or friends and relatives of heritage speakers from Hong Kong.Among the 362 individuals whose data were analyzed, 125 of them grewup in Hong Kong, and so were categorized as homeland speakers.874.2.4 ResultsResults of the Pilot Study 1 are presented as histograms in Figure 4.3 andas boxplots in Figure 4.4. The former compares the overall distribution ofratings by the two groups, while the latter contrasts their familiarity withindividual lexical items. Each figure is discussed as follows.Figure 4.3a and Figure 4.3b show that both homeland and heritagespeakers rated most lexical items as “very familiar”. Although the numberof data points was different between homeland (125 participants * 27words = 3,429 data points) and heritage speakers (237 participants *27 words = 6,399 data points), the purpose of the histograms was tocompare the overall distribution of their ratings, rather than the numberof responses. In both figures, the x-axis represents familiarity ratings ona four-point scale. As expected, almost all lexical items were rated “4”(very familiar) by homeland speakers in Figure 4.3a. There were onlytwo responses of “1” (not familiar at all), two responses for “2” (not sofamiliar), and 15 responses for “3” (quite familiar), which can barely beseen in Figure 4.3a. As for heritage speakers in Figure 4.3b, they hadrelatively more ratings below “4” compared with homeland speakers: therewere 429 responses for “1” (not familiar at all), 440 responses for “2” (not sofamiliar), and 482 responses for “3” (quite familiar). However, there werestill 5,026 responses for “4” (very familiar), which outnumbered the otherresponses predominantly. It can be concluded that both groups shared askewed distribution towards “very familiar”, although the homeland group’sdistribution was even more skewed than that of the heritage group.The boxplots in Figure 4.4 contrast the two groups’ ratings for individualwords. In both figures, the y-axis represents ratings on the same four-pointscale. Homeland speakers in Figure 4.4a rated almost all words as “veryfamiliar” (M = 3.99, SD = 0.01). As for heritage speakers in Figure 4.4b, themean rating of all words was between “quite familiar” and “very familiar”(M = 3.58, SD = 0.32). Words that received the lowest ratings were fu6“negative” (M = 3.00, SD = 0.03), fan4 “tomb” (M = 2.94, SD = 0.03), andse3 “diarrhea” (M = 2.89, SD = 0.03). Even though their ratings were88relatively low, they were still very close to 3, which translates to “quitefamiliar”.To sum up, Pilot Study 1 confirmed that heritage speakers werereasonably familiar with the 27 proposed target words listed in Table 4.4.Therefore, they were appropriate to be used as target words in the mainstudy3.Figure 4.3: Distribution of homeland and heritage speakers’ familiarityratings; 1=“not familiar at all”, 4=“very familiar”3In the main study, a picture learning task was inserted before the experimental block asan extra measure to ensure that heritage speakers were aware of the meaning of all targetwords, especially those that received relatively low ratings in Pilot Study 1. For details of thepicture learning task, see Section 4.4.2.289Figure 4.4: Comparison of homeland and heritage speakers’ ratings for individual words; 1=“not familiar atall”, 4=“very familiar”904.3 Pilot Study 2: Semantic congruity of sentencesThis section explains how carrier phrases were chosen for the experiment,which involved Pilot Study 2, a congruity judgment task.4.3.1 Background and purposeAs mentioned in Section 4.1.3.2, Types 5 and 6 stimuli were sentences ratherthan monosyllabic words. Each of these sentences consisted of an initialintroductory carrier phrase, with the target word following as the last word.Three factors were considered in the process of choosing carrier phrases.First, the length of the sentences was controlled. If some sentenceswere shorter than others, different sentence-final pitch declination patternsmight be observed (Li et al., 2002). To avoid any potential effects of pitchdeclination on tonal perception, the number of syllables of all carrier phraseswas set to seven, and so the number of syllables of a complete sentence(carrier plus target word) was always eight. See Appendix A for the full listof 147 sentences.Second, to avoid potential confusion, none of the carrier phrasescontained any of the 27 words listed in Table 4.4. For example, a sentencelike bei2 ji1 sang1 tai2 haa5 nei5 zek3 ji5 “Let the doctor see your ear” wouldnot be usable, because the carrier phrase contained the syllable ji1 “cure”(as in ji1 sang1 “doctor, one who cures”), which might be confused with thetarget word ji5 “ear”. It was possible that some participants might choosethe picture for “cure” in this case, even though they were instructed to payattention to the last word.Last but not least, since one goal of this study is to compare howhomeland and heritage speakers make use of acoustic and semantic cues,it was important that carrier phrases created a semantic context that wasmore relevant to a particular word than its tonal competitors. Table 4.7shows four examples involving the tone set [1 2 3 4]. The carrier phrasegwo3 nin4 tong4 jan4 gaai1 jau5 mou5... “There is ... dance in Chinatownduring the Lunar New Year” contained the words “Chinatown”, “LunarNew Year”, and “dance”, which constructed a semantic context that was91likely to be associated with “lion”, since the lion dance is a traditional artperformed during the Lunar New Year. The word si1 “lion” was thereforesemantically congruous with this carrier phrase. Its tonal competitors,however, yield semantically incongruous sentences, namely “There is poopdance in Chinatown during the Lunar New Year”, “There is try dance inChinatown during the Lunar New Year”, and “There is key dance during theLunar New Year”. To confirm that other Cantonese speakers had the samecongruity judgments as the author, Pilot Study 2 was conducted.Table 4.7: Examples of carrier phrases and (in)congruous target wordsCarrier phrase Congruous Incongruousgwo3 nin4 tong4 jan4 gaai1 jau5 mou5... si1 si2 si3 si4There is ... dance in Chinatown during theLunar New Yearlion poop try keyfong3 gau2 gei3 zyu6 zap1 faan1 di1... si2 si1 si3 si4Clean up the ... after walking your dog poop lion try keygam1 ci3 m4 dak1 haa6 ci3 zoi3... si3 si1 si2 si4If you fail this time, ... again next time try lion poop keyceot1 mun4 hau2 gei3 dak1 daai3 so2... si4 si1 si2 si3Remember to take your ... when leavinghomekey lion poop try4.3.2 ProceduresSimilar to Pilot Study 1, Pilot Study 2 was conducted in the form of an onlinequestionnaire hosted on UBC FluidSurveys (FluidSurveys, 2017). Sincethis pilot study did not target at a particular population, it was circulatedon social media platforms of university student groups in UBC as well asuniversities in Hong Kong. Individuals who had Cantonese-speaking parentswere invited to complete the study. The first page of the questionnaire wasa consent form, followed by a short survey of language background.The main part of the questionnaire had a header that read in TraditionalChinese “Do you think the sentences below make sense?”, followed by a92list of sentences, as in Figure 4.5. Congruous (N = 27) and incongruous(N = 127) sentences were randomized. Two options were offered beloweach sentence, namely “it makes sense” (literally “the meaning is complete”in Cantonese) and “it does not make sense” (literally “the meaning is notthrough” in Cantonese).Unlike Pilot Study 1 which provided bilingual instructions, Pilot Study2 was in written Chinese only. The purpose was to disambiguate tonallysimilar words. If the sentences were presented in an audio format only, someparticipants might answer “it makes sense” for an intentionally incongruoussentence, simply because they thought a congruous sentence was utteredin a funny way or the speaker had an accent. Since Chinese literacy wasrequired to complete the survey, it was anticipated that most participants ofPilot Study 2 were homeland speakers.Figure 4.5: A screenshot of Pilot Study 24.3.3 ParticipantsA total of 541 individuals attempted the online survey, but only 371 of themhad Cantonese-speaking parents and completed all questions. The data ofthese 371 individuals were included in the following analysis.934.3.4 ResultsResults of Pilot Study 2 are summarized in Figure 4.6. The x-axis shows twogroups of sentences: those intended to be congruous (e.g. “There is liondance in Chinatown during the Lunar New Year”) and those intended to beincongruous (e.g. “There is key dance in Chinatown during the Lunar NewYear”). The y-axis represents mean ratings given by participants: 1 means“it makes sense”, while 0 translates to “it does not make sense”.Figure 4.6: Results of Pilot Study 2In general, participants’ judgments matched with the expected sentencecongruity. All sentences that were intended to be congruous received ratingsclose to 1 (M = 0.98, SD = 0.01). Most sentences that were intended to beincongruous received ratings close to 0 (M = 0.01, SD = 0.02). The outlierwith a mean rating of 0.27 is the incongruous sentence nei5 bong1 ngo5 sik6maai5 ngo5 go2 fan1 “You can help me eat my share (verb)”. Its congruouscounterpart has the target word fan6 “portion” instead, which reads “Youcan help me eat my portion”. A possible explanation for a relatively highrating for this incongruous sentence is the orthographic similarity of 分fan1 “share (verb)” and 份 fan6 “portion”. It was assumed that orthographicconfusion would not affect the main study, in which only audio stimuli wereto be presented, and options were to be given in the form of pictures (see94Section 4.4.1.2 for details).To conclude, Pilot Study 2 confirmed that the proposed carrier phraseswere able to create the desired semantic effect, and were thus appropriateto be used as stimuli for the main study.4.4 Main studyThis section describes the materials, procedures, and participants of themain study.4.4.1 MaterialsMaterials being discussed in this subsection are stimuli, pictures presentedas options, tones and their corresponding buttons on the response device,and instructions for participants.4.4.1.1 StimuliAll stimuli for the experimental block were recorded by a 25-year-oldfemale4 native Cantonese speaker born and raised in Hong Kong, whoseparents were also native Cantonese speakers born and raised in Hong Kong.After completing her undergraduate degree in Hong Kong, she moved toVancouver when she was 22 years old. This means that she had lived inCanada for three years when doing the recording. She spoke Cantonese ona daily basis to communicate with her family and friends. No known hearingloss or speech disorder was reported. She was not linguistically trained andwas na¨ıve about the purpose of the study. She was compensated C$10 perhour for her time spent on the task.The recording session took place in a sound-attenuated booth at theUBC Interdisciplinary Speech Research Lab. All stimuli were recorded ata sampling rate of 44,100 Hz with a USBPre 2 High-Resolution Audio4The author had recruited both male and female talkers, but it happened that all maletalkers either had extremely creaky voice especially when producing T4 [21] words, or theyseemed to have merged the two rising tones (T2 [25] and T5 [23]) in production accordingto the author’s judgment. The female talker was selected in the end because of her voicequality and her pitch contours, not because of her gender.95Interface manufactured by Sound Devices. The talker was presented with alist of 27 words and 147 sentences written in Traditional Chinese characters,and was asked to say each item three times at a natural speed. If necessary(e.g., if an item did not sound natural, if she sounded like she was laughingwhen producing the semantically incongruous sentences, or if she wanted toclear her throat), she could repeat an item for as many times as she wanted.Special attention was paid to words with T4 [21] and semanticallyincongruous sentences. As pointed out by Yu & Lam (2014), creaky voicefacilitates the identification of T4. To control the effect of phonation onaccuracy rate, the talker was instructed to produce speech sounds withnormal phonation as much as possible. Although the talker was notlinguistically trained and did not know the meaning of normal phonation,the author demonstrated the production of several words with normalphonation and with creaky voice, and asked the talker to try her best todo the former. In some of her T4 words, creaky voice could still be found,but they were not creaky for the entirety of the syllable. As for sentences(especially the semantically incongruous ones), the talker was requested tore-record them if they were produced hesitantly, or if a pause was addedbefore the last word. After listening to all recorded tokens, the authorselected the best ones to be used as stimuli based on the naturalness ofthe production, and how well a pitch contour represented the respectivelexical tone.Note that the aforementioned female talker only recorded stimuli for theexperimental blocks. All materials that were not part of the experimentalblock (task instructions, the story listening task, the picture learning task)were recorded by the author. For more details, see Section 4.4.2.After the selection of recorded tokens to be used as stimuli, the soundfiles were edited for the purpose of the experiment. As mentioned inSection 4.1, two of the variables being controlled in this study were tonaland segmental information. In Table 4.1, target words in Types 3 and 6had their segmental information filtered to test how well heritage speakerscould identify a tone by solely relying on tonal information. In Types 2 and4, tonal information was removed while segmental information was kept.96These stimuli were created by using functions in Praat (Boersma, 2002),which will be explained as follows.For stimuli of Type 3 (monosyllabic words with no segments) and Types6A and 6B (sentences in which the last word had no segments), a low-pass filter was applied to remove segmental information through the Praatfunction “Filter (pass Hann band)”. The cutoff frequency was 350 Hz, whilethe smoothing frequency was 20 Hz. This configuration was used for severalreasons. First, the talker’s T1 [55] was approximately 250 Hz, so in orderto retain all tonal information, the cutoff point must be higher than 250Hz. Second, a cut-off of 350 Hz successfully filtered away the relevantacoustic signal for identifying vowels and consonants for this speaker. Forexample, compare an unmanipulated fu2 “tiger” in Figure 4.7 with its low-pass-filtered version in Figure 4.8. In Figure 4.8, all energy above 350Hz was removed, and so there was no clear F1 and F2, which means thevowel would be unidentifiable. The high frequency noise for identifyingfricatives was also removed. Lastly, although other cut-off frequencies weretested, the output files sounded irritating to the human ear. The chosenconfiguration created stimuli that best resembles muffled speech, whichis common in phone conversations when reception is poor. This helpedminimize any listening discomfort that might affect participants’ accuracyrate in the experiment.As for Type 2 (monosyllabic words with no tone) and Type 4 (sentenceswith no tone), tonal information was “removed” in a different sense. Sinceall vocalic elements must have f0 values, f0 cannot be removed withoutaffecting segmental information. Rather, the f0 of these stimuli was resetto a uniform frequency at 200 Hz by the Praat functions “remove pitchpoint” and “add pitch point”. For instance, compare the unmanipulated fu2“tiger” in Figure 4.7 with its “no tone” version in Figure 4.9. The rising pitchcontour in Figure 4.7 was completely flattened in Figure 4.9. As a result,certain perceptual correlates of tone, namely the direction and magnitudeof pitch change, were made unavailable. The frequency at 200 Hz waschosen, because the talker’s T3 [33] was produced at a similar frequency,as in fu3 “pants” in Figure 4.10. Although a naturally produced T3 [33]97showed f0 declination towards the end of the syllable, its average pitch wasstill very similar to that of a Type 2 stimulus after pitch reset, as in Figure 4.9.This provided the ground for making predictions: when the only availableperceptual cue was pitch height, participants who did pay attention to thiscue should always respond that they heard T3 [33] when presented withType 2 stimuli.To ensure that no stimulus was louder than others (which might startleparticipants), the amplitude of all eight types of stimuli was normalizedby the command Scale... 0.99996948 in a Praat script adapted fromCrosswhite (2009), which served to scale each file in a given directory inamplitude, so that all output files had the same peak amplitude. Syllableduration, however, was not normalized. Although duration is found to be animportant perceptual correlate of Mandarin tones (Blicher, Diehl & Cohen,1990), it is not the case for Cantonese. There has been no evidence thatduration affects the perception of Cantonese tones (Fok-Chan, 1974; Tong,McBride & Burnham, 2014; Vance, 1976). To keep the stimuli as natural aspossible, no action was taken to control syllable duration.98Figure 4.7: Spectrogram of the syllable fu2 (high risingtone, unmanipulated)Figure 4.8: Spectrogram of the syllable fu2 (high risingtone, low-pass filter applied)Figure 4.9: Spectrogram of the syllable fu2 (high risingtone, pitch being reset at 200 Hz)Figure 4.10: Spectrogram of the syllable fu3 (mid leveltone, unmanipulated)994.4.1.2 PicturesIn their summary of results from the National Heritage Language Surveyconducted in the United States, Carreira & Kagan (2011) report that morethan 80% of the respondents who were heritage speakers of Cantoneseor Mandarin rated their Chinese reading and writing skills “low” to“intermediate”. It was anticipated that heritage Cantonese speakers fromCanada in the current study would have weak Chinese literacy skills.To ensure that all participants—especially those who had not receivedformal education in Chinese—could understand the options given in theforced-choice task, pictures instead of Chinese characters were presentedas choices, as in Figure 4.11. All pictures were hand-drawn by the authoronly, so that no particular picture was visually more salient than anotherdue to different artistic styles. Hand-drawn pictures were scanned as blackand white bitmap files on a computer, so that no picture was more eye-catching than another in terms of colour. For the full collection of pictures,see Appendix A .Note: Numbers refer to buttons on the button box (not tones).Figure 4.11: A sample picture set: fu1 “exhale”, fu3 “pants”, fu4 “helpby holding another person’s arm”, and fu5 “woman”Special attention was paid to potentially confusing pictures. Forexample, fu1 “exhale”, fu4 “help by holding another person’s arm” , and fu5“woman” were in the same tone set [1 3 4 5]. In Figure 4.11, the exhalingperson for the fu1 picture and the helper/helpee in the fu4 picture didnot have stereotypical female features, so that they would not be confused100with fu5 “woman”. Another pair of examples was fan1 “share” and fan6“portion”, which could be visually ambiguous, as the action of sharingwould produce multiple portions of the object being shared. Earlier versionsof these pictures were presented to a Cantonese instructor at UBC whodid not participate in the experiment. Based on the instructor’s feedback,amendments were made to the pictures to make sure that the intendedmeaning would be conveyed.In addition to the measures above, all pictures and their intendedmeaning were presented in a picture learning task that preceded theexperimental block, so that participants were guided to the intendedinterpretation of a picture for the purpose of the experiment. For detailsof the picture learning task, see Section 4.4.2.2.4.4.1.3 Tones and buttonsMeasures were taken to counterbalance the correspondence between tonesand response buttons on the response device for the experiment. Considerthe numerically ordered tone sets [1 2 3 4], [1 3 4 5], and [1 4 5 6]. T4corresponds to the fourth button in [1 2 3 4], the third in [1 3 4 5], andthe second in [1 4 5 6]. However, T1 always corresponds to the first buttonacross these three tone sets. If a participant responded by pressing Button 1most of the time, it could be due to two possible reasons: s/he either had aT1 bias, or s/he simply preferred pushing the first button regardless of whatwas heard. To avoid this situation, the position of a tone within a set mustvary.Table 4.8 illustrates the solution to this potential problem through theexample of [1 4 5 6]. The second column shows that tones within this setwere ordered in four different ways: [1 4 5 6], [4 5 6 1], [5 6 1 4], and[6 1 4 5]. Tone-button correspondence was completely different in everyrow, so that every tone could correspond to any button. Although this wasnot an exhaustive list of all possible orders (e.g. [6 5 4 1] was a possibleorder but it was not represented in Table 4.8), it was sufficient to make eachtone occur with each button exactly once. This way a participant who had101a bias towards a particular tone could be distinguished from a participantwho preferred a particular button regardless of what was heard: the formerwould press the four buttons more or less equally frequently, while the latterwould press only Button 1 most of the time.Table 4.8: An example of how tone-button correspondence wascounterbalanced for a tone setTone set Order Button 1 Button 2 Button 3 Button 4[1 4 5 6][1 4 5 6] T1 T4 T5 T6[4 5 6 1] T4 T5 T6 T1[5 6 1 4] T5 T6 T1 T4[6 1 4 5] T6 T1 T4 T54.4.1.4 InstructionsAll instructions were recorded by the author in spoken Cantonese. Thismedium had an advantage over written Cantonese: it ensured that allparticipants, regardless of their Chinese reading proficiency, would be ableto follow. Although written English could be understood by both groupsas well, research has shown that silent reading activates the phonologyof the language being written (McCutchen & Perfetti, 1982; Newman &Connolly, 2004; Perfetti, Bell & Delaney, 1988). This might lead toeffects of interlanguage interference in online processing, which is outof the scope of this study. To avoid the activation of English phonologyduring the experiment, English—whether spoken or written—was avoided.The content of task instructions will be explained in the next subsection.For a full transcript of instructions in written Chinese and Cantoneseromanization, see Appendix A.4.4.2 ProceduresEach participant was invited to attend two experiment sessions separatedby at least 24 hours, both of which took place at the UBC Speechin Context Lab. All tasks were programmed on E-Prime Version 2.0102(Schneider, Eschman & Zuccolotto, 2002). A step-by-step walkthroughof the procedures in Table 4.1 is provided as follows. Readers arerecommended to bookmark Table 4.1 for easy reference.On Day 1, all participants were presented with a printed copy of aconsent form stating the purpose, risks and benefits of the experiment, aswell as contact information of the experimenter. They were asked to signon the form if they agreed to participate. However, in all collected data,participants were only identified by a subject number to ensure anonymity.After signing the consent form, participants were brought to a sound-attenuated computer booth equipped with a response button box and anAKG K240 Studio headphone. After they were seated at the booth, theywere instructed to put on the headphone and press any button on the boxto start.4.4.2.1 Story listening taskThe experiment session began with a story listening task. Its purpose was totune participants into a unilingual Cantonese-speaking environment. Sinceall participants were recruited from the university student community inVancouver, both homeland and heritage speakers were attending classesconducted in English except for foreign language courses. By listening toa Cantonese story before the experimental block, both groups had a chanceto warm up and get ready to process Cantonese speech. This task alsohelped minimize potential effects of code-switching on bilingual languageprocessing (Antoniou et al., 2010, 2011).A 40-second story called The Sun and the Wind was played to theparticipants at the beginning of Day 1. The story did not contain anywords that were used as target words in the experimental block. (For afull script of the story, see Appendix A.) The story was recorded by theauthor, a different voice from that of the experimental block. To make surethat participants would listen carefully, they were told that they would haveto answer a question after listening to the story. A picture of a headphonewas shown in the centre of the screen while the story was being played, as103in Figure 4.12. Participants had a chance to let the experimenter know ifthe headphones were not working, or if the volume was not comfortable.When the story finished, a multiple-choice question was posed, asking “whowon in the end?” in spoken Cantonese. Three pictures (Sun, Wind, Man)were displayed on the screen, each corresponding to a button on the buttonbox. Participants were instructed to respond by pushing a button on thedevice. Their response to this question, however, was not used to assess theircomprehension of spoken Cantonese. Since this was the very first questionrequiring the use of the button box, participants were still getting used tothe sensitivity of the buttons. Therefore, responses to this question playedno role in determining the inclusion or exclusion of a participant’s data.Figure 4.12: Picture shown during the story listening task4.4.2.2 Picture learning taskPerformed after the story listening task and before the practice block, thepicture learning task served to clarify the intended meaning of the pictures,each of which could be compatible with multiple words. For example,when participants saw the picture of se6 “shoot”, it was possible for themto interpret it as coeng1 “gun”. Another concern was that some wordswere more commonly used in their disyllabic form in spoken Cantonese.Generally, the word fu2 in isolation can mean either “bitter” or “tiger”.In spoken contexts, “bitter” is always monosyllabic, but “tiger” is morecommonly referred to as lou5 fu5 (in which lou5 literally means “old”, but104lou5 fu2 is a set phrase which makes no reference to a tiger’s age). Thepicture learning procedure helped participants associate the monosyllabicword fu2 with “tiger” instead of its homophone “bitter”.The procedures of the picture learning tasks were as follows. A picturewas shown in the centre of the screen, as in Figure 4.13. At the sametime, participants would hear the sentence “X, this picture refers to X, as inXY”, where X was the target word, and Y was another word that collocateswith X often in Cantonese. Y could precede or follow X, depending on theword. Participants were only allowed to listen to each sentence once. Theycould press any button on the box to continue to the next picture. The totalnumber of pictures presented was 31, since there were 27 target words forthe main task (as in Table 4.4) and 4 for the practice block (see the nextsubsection).Audio: fu2, li1 fuk1 tou4 hai6 lou5 fu2 go3 fu2“Tigermonosyllabic, this picture is tigermonosyllabic, as in tigerdisyllabic”Figure 4.13: An example of the picture learning task4.4.2.3 Practice trials for monosyllabic wordsThe purpose of the practice block was to familiarize participants with theformat of the task and the use of the button box. In this block, all words andpictures were different from those of the experimental block. A word setwith the syllable zoeng was used: zoeng1 “piece (of paper)”, zoeng2 “prize”,zoeng3 “sauce”, and zoeng6 “elephant”.105English translation of the spoken instructions is provided in (15). Whenthe spoken instructions were being played, the same headphone picture inFigure 4.12 was shown on the screen. After listening to the instructions, theparticipant could press any button on the box to start.The following procedures applied to not only practice trials, but alsotrials in all experimental blocks. In each trial, participants would hearone stimulus on the headphone, and at the same time a picture set likeFigure 4.14 was shown on the screen. They were asked to press a buttonon the box that corresponded to the picture that represented the wordbeing heard. No feedback was provided, which means participants werenot told whether their answer was correct. After the participant respondedby pressing a button, the screen would be blank for 500 ms. After thatanother stimulus would be played and at the same time another picture setwould be shown.Note: Numbers refer to buttons on the button box (not tones).Figure 4.14: Pictures used in practice trials: zoeng1 “piece (of paper)”,zoeng2 “prize”, zoeng3 “sauce”, and zoeng6 “elephant”(15) English translation of instructions for Day 1 practice trials, originallyin spoken Cantonese:You are going to listen to some words. Which picture represents theword that you heard? You can only choose one picture and respondby using the button box. Note that sometimes the word may beunclear, and it is intentional. You just need to try your best to answer.When you are ready, press any button on the box to start.106There were eight trials in the practice block, which formed arepresentative sample of the upcoming experimental block with normalwords, words with no tone, and words with no segment, as in Table 4.9.The sample of different types of stimuli helped participants understand thatsome stimuli were intentionally unclear or unnatural, so that they wouldnot be surprised when they heard these stimulus types in the experimentalblock. Note that participants were not told that these eight trials were fortraining purpose, and so they were expected to respond as they normallywould in the experimental block. Since the practice trials only served tofamiliarize participants with the task, their performance in this block wasnot analyzed.Table 4.9: Examples of practice trials on Day 1Type Target CompetitorsNormal zoeng2 zoeng1 zoeng3 zoeng6prize piece sauce elephantNo tone zoeng zoeng zoeng zoengpiece prize sauce elephantNo segment 3 1 2 6sauce piece prize elephant4.4.2.4 The first experimental block: Types 1–3 randomizedThe first experimental block followed the practice block immediately. Itconsisted of randomized stimuli of Type 1 (normal words), Type 2 (wordswith no tone), and Type 3 (words with no segments). For each stimulustype, there were 60 trials (15 tone sets * 4 words per set). Therefore,there were a total of 180 trials in the first experimental block (60 trials *3 stimulus types).1074.4.2.5 The second experimental block: Type 4Instructions for the second experimental block in (16) were playedimmediately after the first experimental block. In this block, there were onlystimuli of Type 4 (sentences with no tone). Sentences representing differenttone sets were randomized. The total number of trials for this block was 60(15 tone sets * 4 words per set).(16) English translation of instructions for the second experimental block,originally in spoken Cantonese:You are going to listen to some sentences. What is the last word ofthe sentence? Note that the sentences may sound unnatural, whichis intentional. Just try your best to answer. If you have questions,please let the experimenter know. When you are ready, press anybutton on the box to start.Although Type 4 stimuli were sentences, they were not presented onDay 2 with other stimulus types that were sentences (Types 5A, 5B, 6A, and6B). The first experimental block on Day 1 (Types 1–3) contained 180 trials,but on Day 2, the third block (Types 5A, 5B, 6A, and 6B) contained 480trials. If Type 4 was presented on Day 2, the experiment session might takemore than 60 minutes. To avoid listening fatigue on Day 2, which mightpotentially add noise to the results, Type 4 was presented on Day 1 instead.4.4.2.6 Language background questionnaireAfter the completion of the second experimental block, participants wereinstructed to complete a language background questionnaire hosted onUBC FluidSurveys (FluidSurveys, 2017). It contained questions about theirage, education level, language history, language use, language proficiency,language attitudes, and the native languages of their parents. For adetailed explanation of this questionnaire and a full list of questions, seeSection 4.4.3.2 and Appendix B respectively.Participants were asked to complete the questionnaire on Day 1 insteadof Day 2 for two reasons. First, since participation was voluntary, some108participants might not return to complete the session on Day 2. Inthis case, their language background information would still be availableif they completed the questionnaire on Day 1. Second, Day 1 taskstook approximately 40 minutes to complete, while Day 2 tasks tookapproximately 60 minutes to complete. To avoid making the session toolong on Day 2, the questionnaire was arranged to be completed on Day 1.After filling out the questionnaire, participants could sign up for a timeslot for the second experiment session. Day 1 and Day 2 were separated bya minimum of 24 hours. Due to scheduling factors (e.g. within a schoolterm students always had free time on the same day of the week), mostparticipants attended the second experiment session one week after theirfirst.4.4.2.7 Picture learning task (repeated)The session on Day 2 began with the same picture learning task describedin Section 4.4.2.2. It served as a reminder of the intended meaning of thepictures.4.4.2.8 Practice trials for sentencesLike Day 1, a practice block with eight trials preceded the experimentalblock on Day 2. The target words were also zoeng1 “piece (of paper)”,zoeng2 “prize”, zoeng3 “sauce”, or zoeng6 “elephant”, and the samepicture set in Figure 4.14 was used. However, unlike Day 1 practicetrials which were monosyllabic words, Day 2 practice trials were normalsentences, congruous and incongruous sentences, as well as sentences witha segmentless target word at the end, as in Table 4.10. They formed arepresentative sample of the upcoming experimental block. This helpedparticipants understand that some stimuli were intended to be semanticallyanomalous, and some words were intentionally unclear.English translation of the spoken instructions is provided in (17). It isimportant to note that participants were told explicitly that some sentencesmight not make sense, and their task was to identify the last word that they109heard. If they were unsure about the instructions, they had the opportunityto clarify with the experimenter during the practice trials, so that theirperformance in the experimental block would not be affected. However,they were not told that these eight trials were for training purpose, and theywere expected to respond as they normally would.(17) English translation of instructions for Day 2 practice trials, originallyin spoken Cantonese:You are going to listen to some sentences. What is the last wordof the sentence? Some words may be unclear, and you just needto try your best. Note that some sentences may not make sense,which is intentional. All you need to do is identify the last word thatyou heard. For example, if you hear “there are too many studentnames to remember”, then you should respond with “remember”(gei3). However, if you hear “there are too many student namesto airplane”, then you should respond with “airplane” (gei1) but not“remember” (gei3). If you have questions, please let the experimenterknow. When you are ready, press any button on the box to start.4.4.2.9 The third experimental block: Types 5A–6B randomizedImmediately following the practice block, the third experimental blockconsisted of randomized stimuli of Types 5A (normal, congruous sentences),5B (normal, incongruous sentences), 6A (congruous sentences in which thelast word had no segments), and 6B (incongruous sentences in which thelast word had no segments). Since this was a long block with a total of 480trials, there was a break every 120 trials. Participants could press any buttonon the box to resume whenever they were ready to continue.The number of trials per stimulus type in this block is explained asfollows through examples from the tone set [2 3 4 6]. All stimuli inTable 4.11a, Table 4.11b, Table 4.11c, and Table 4.11d shared the samecarrier phrase sap6 ji6 dim2 zung1 hou2 soeng5 cong4... “at twelve (you)should go to bed and...” and all of their targets and competitors represented110Table 4.10: Examples of practice trials on Day 2Type Carrier phrase Target CompetitorsNormal(congruous)dung6 mat6 jyun4 zeoi3daai6 ge3 hai6...zoeng6 zoeng1 zoeng2 zoeng3The biggest (animal) inthe zoo is the ...elephant piece prize sauceNormal(incongruous)dung6 mat6 jyun4 zeoi3daai6 ge3 hai6...zoeng3 zoeng1 zoeng2 zoeng6The biggest (animal) inthe zoo is the ...sauce piece prize elephantNo segment(congruous)dung6 mat6 jyun4 zeoi3daai6 ge3 hai6...6 1 2 3The biggest (animal) inthe zoo is the ...elephant piece prize sauceNo segment(incongruous)dung6 mat6 jyun4 zeoi3daai6 ge3 hai6...3 1 2 6The biggest (animal) inthe zoo is the ...sauce piece prize elephantthe tone set [2 3 4 6]. However, they differed by whether the target wordwas semantically congruous with the carrier phrase, and whether the lastword contained segmental information.The total number of trials for Type 5A was 60. In Table 4.11a, the wordfan3 “sleep” was semantically congruous with this particular carrier phrase.However, the word fan2 was congruous with another carrier phrase, namelybong1 bi4 bi1 caa4 di1 song2 san1... “put some baby ... on the baby”.In other words, for each tone set, each member of the quadruplet wassemantically congruous with one unique carrier phrase. Since there were15 tone sets in total (see Table 4.4), the total number of trials for Type 5Awas 60 (1 member of a quadruplet * 4 unique carrier phrases * 15 tonesets).The same applied to Type 6A stimuli, which were also congruoussentences and therefore had a total of 60 trials. However, for Type 6A,the last word of the sentence had no segments, as in Table 4.11c.The total number of trials for Type 5B was 180. In Table 4.11b, there111were three rows, unlike Table 4.11a which had one only. This was becauseonly one member in a tonally contrastive quadruplet was meant to besemantically congruous with a given carrier phrase, while the other threemembers were not. Since Type 5B was incongruous sentences, its totalnumber of trials was 180 (3 members of a quadruplet * 4 unique carrierphrases * 15 tone sets).The same applied to Type 6B stimuli, which were also incongruoussentences and therefore had a total of 180 trials. However, for Type 6B,the last word of the sentence had no segments, as in Table 4.11d.After the last trial of the third experimental block, the English phrase“The end” was shown on the screen. Participants were given either cashor course credit for their participation (see Section 4.4.3.1 on subjectrecruitment). This was the end of the second (also the last) experimentsession.112Table 4.11: A sample of the third experimental block representing thetone set [2 3 4 6]a. Type 5A: Normal (congruous)Carrier phrase Target Competitorssap6 ji6 dim2 zung1 hou2 soeng5 cong4... fan3 fan2 fan4 fan6At twelve (you) should go to bed and ... sleep powder tomb portionb. Type 5B: Normal (incongruous)Carrier phrase Target Competitorssap6 ji6 dim2 zung1 hou2 soeng5 cong4... fan2 fan3 fan4 fan6At twelve (you) should go to bed and ... powder sleep tomb portionsap6 ji6 dim2 zung1 hou2 soeng5 cong4... fan4 fan2 fan3 fan6At twelve (you) should go to bed and ... tomb powder sleep portionsap6 ji6 dim2 zung1 hou2 soeng5 cong4... fan6 fan2 fan3 fan4At twelve (you) should go to bed and ... portion powder sleep tombc. Type 6A: The last word has no segments (congruous)Carrier phrase Target Competitorssap6 ji6 dim2 zung1 hou2 soeng5 cong4... 3 2 4 6At twelve (you) should go to bed and ... sleep powder tomb portiond. Type 6B: The last word has no segments (incongruous)Carrier phrase Target Competitorssap6 ji6 dim2 zung1 hou2 soeng5 cong4... 2 3 4 6At twelve (you) should go to bed and ... powder sleep tomb portionsap6 ji6 dim2 zung1 hou2 soeng5 cong4... 4 2 3 6At twelve (you) should go to bed and ... tomb powder sleep portionsap6 ji6 dim2 zung1 hou2 soeng5 cong4... 6 2 3 4At twelve (you) should go to bed and ... portion powder sleep tomb1134.4.3 ParticipantsThis subsection describes how participants for the main study were recruitedand screened, and reports on the demographic information of includedparticipants.4.4.3.1 RecruitmentAll participants were recruited from the university community in Vancouver.Electronic recruitment flyers were circulated on social media like Facebook.They were also sent via email to three student clubs at UBC, namely theUBC Mahjong Club, the UBC Hong Kong Student Association, and HongKong at Heart. The flyers were written in both Chinese and English, so thatboth homeland and heritage speakers could read them. The flyer stated thatall individuals who had at least one Cantonese-speaking parent from HongKong were welcome to sign up; a high fluency in Cantonese or the abilityto read Chinese was not required. Interested individuals could contact theauthor via email to sign up for a time slot. Participants recruited throughsocial media and student clubs were compensated C$10 for Day 1 and C$15for Day 2. A higher payment for Day 2 was an incentive for participants toreturn and complete the second part of the experiment.Another way to recruit participants was Linguistics Outside the Classroom(LOC), an initiative to encourage undergraduate students to get involvedin research conducted by members of the Department of Linguistics (UBCDepartment of Linguistics, 2014). Course instructors had the optionto include LOC credits in their course syllabi. Students could satisfyLOC requirements either by writing a summary of a research seminar orcolloquium that they attended, or by participating in experiments. Studentscould view available time slots and sign up on the UBC Linguistics Sign-up System (Sona Systems Ltd., 2017). Each user was identified by a uniqueparticipant code instead of his or her name. As a general rule of LOC, no oneshould be refused to participate in any experiment. As a result, Cantonesespeakers from China, Macau, Malaysia, and even non-Cantonese speakerswere allowed to sign up, even though their data were not usable. All114participants recruited through LOC were awarded course credit, regardlessof the usability of their data.4.4.3.2 ScreeningA total of 100 individuals participated in the study. However, only 68participants’ data were included for data analysis in Chapter 5. The dataof 32 participants were excluded for various reasons. Decisions of inclusionwere made based on responses to a language background questionnaire,which was also used to categorize participants into homeland and heritagespeakers. The screening process is summarized in Figure 4.15, and thepurpose of specific questions on the questionnaire is explained as follows.As shown in Figure 4.15, the first screening criterion was whether aparticipant had completed all tasks in the experiment. Three participantsonly attended the experiment session on Day 1 but did not return on Day2. To balance the number of responses across all types of stimuli, their datawere excluded.The rest of the screening criteria in Figure 4.15 were based onparticipants’ responses on a language background questionnaire. At the endof the experiment session on Day 1, all participants were asked to completean online questionnaire. It comprised 30 questions about demographicinformation and language background, which took 10 to 15 minutes tocomplete. All responses were anonymous and no personally identifiableinformation was elicited. Out of the 30 questions, 19 were from theBilingual Language Profile (Birdsong et al., 2012), and 11 of them weredesigned specifically for the current study. Both types of questions will beexplained below. For the complete list of questions, see Appendix B.115Figure 4.15: Procedures to screen and categorize participants116The Bilingual Language Profile (BLP) developed by Birdsong et al.(2012) was used to elicit language background information. It started withgeneral demographic information, such as age and education level. Themain part of the questionnaire was designed to assess language dominanceon a gradient scale using four criteria: language history, language use,language proficiency, and language attitudes. Participants were asked thesame set of questions for each of their two languages. For example, ifthere is question like “How well do you speak English?”, then there isalso a question like “How well do you speak Cantonese?” Responses to thequestions add up to a global language score for each language. Subtractingthe global language score for Cantonese from that of English yields thelanguage dominance score, which ranges from –218 (extremely Cantonese-dominant) to 218 (extremely English-dominant)5. A dominance score atzero indicates balanced bilingualism. Table 4.12 illustrates the calculationusing the data of Subject #345 as an example. Her global language scorefor Cantonese was 113.78 out of 218, while her global language score forEnglish was 192.96 out of 218. Thus, her language dominance score was192.96 minus 113.78, which is 79.18. Since 79.18 is above zero, Subject#345 was English-dominant.Table 4.12: Calculation of Subject #345’s language dominance scoreModule score (out of 54.5) Cantonese EnglishI. Language history 30.42 45.85II. Language use 15.26 38.15III. Language proficiency 38.59 54.48IV. Language attitudes 29.51 54.48Global language score (out of 218) 113.78 192.96Language dominance score = 192.96 –113.78 = 79.185These positive and negative numbers on the scale are arbitrary and by no means implyvalue judgment of whether it is better or worse to be dominant in a particular language.117The BLP was supplemented with an additional set of questions for thepurpose of this study. First, “What is your dominant language?” was addedto understand participants’ own judgment of their language dominance.All except two participants reported a dominant language that matchedtheir BLP language dominance score. The two special cases were Subjects#331 and #339, who had dominance scores of 26.52 and 7.00 respectively.Although their scores were above zero, which suggests dominance inEnglish, their self-reported dominant language was Cantonese. According tothe discussion on measures of language dominance by Gollan, Weissberger,Runnqvist, Montoya & Cera (2012), if there is discrepancy between theresult of a measurement method and self-assessment by an individual, s/hemay not be “wrong”; rather, it indicates that s/he may have focused onfactors not included in the measurement method. Since every individualtakes different factors into account when assessing their own languagedominance, no single measure can be a perfectly “complete” assessment.Due to this discrepancy, the dominant language of Subjects #331 and#339 was considered undetermined, and so their data were excluded. Inaddition, they were the only subjects who reported that they attendedan international school when they were living in Hong Kong. Accordingto Lai, Li & Gong (2016), international schools in Hong Kong adopt thecurricula of the United States, Australia, or Canada. In the majority ofthese schools, Chinese language classes are offered as a subject, but theyare taught in Mandarin rather than Cantonese. As for the demographicsof students, international schools mainly cater for children of expatriates,and on average 76% of students are non-local. In other words, English isnot only the main language of instruction, but also the language for socialinteraction at these schools. This distinguished Subjects #331 and #339from individuals who went to a local school in Hong Kong, where Cantoneseis the majority language in various domains of language use.Another question added for the purpose of this study was “Indicatethe cities and countries that you have lived in along with how old youwere when you lived there”. Although the BLP already has the question“How many years have you spent in a country/region where Cantonese is118spoken?”, it does not specify a particular Cantonese-speaking region. Anadditional open-ended question was therefore added to separate Hong KongCantonese speakers from speakers of other varieties of Cantonese. Nineparticipants reported that they grew up in the Guangdong province of Chinaor Malaysia. Their data were excluded from analysis, because tonal featuresof these varieties may affect their perception of Hong Kong Cantonese.For example, Guangzhou Cantonese has undergone a merger of the non-high level tones (T3 [33] and T6 [22]) (Ou, 2012), which differs from themerger of rising tones (T2 [25] and T5 [23]) in Hong Kong Cantonese. Asfor Malaysian Cantonese, there are only five phonemic tones in the system(Hsiar, 2007), while Hong Kong Cantonese has six.A few additional questions were about the background of participants’parents. “What is the native language of your father/mother?” servedto exclude individuals who did not have at least one Cantonese-speakingparent. Twelve participants were excluded since neither of their parentsspoke Cantonese; in fact, these 12 participants did not speak or barely spokeCantonese according to their self-ratings for Cantonese proficiency. Out of66 individuals whose data were included for analysis, 65 of them had twoCantonese-speaking parents. The participant who had only one Cantonese-speaking parent was Subject #314, who has a Cantonese-speaking motherand a Spanish-speaking father from Peru. Since she met the criterion ofhaving at least one Cantonese-speaking parent, her data were included foranalysis. No unusual patterns were observed in her experiment responses.Two more questions were added to exclude heritage speakers whoseparents were also heritage speakers. As mentioned in Chapter 2, second-and third-generation heritage speakers of Ukrainian in Toronto producedUkrainian obstruent consonants with different VOTs. In other words, thelinguistic input that the third generation received may be different fromthe input that the second generation had received from the homelandgeneration. For the purpose of the current study, it is important that bothhomeland and heritage speakers were exposed to the same baseline varietyof the language for fair comparison. This way any observed differencebetween the two groups in the experiment, if any, would be a reflection119of their different perception of lexical tones, but not a reflection of linguisticvariation in the input from parents of different generations. For this reason,the following questions were included in the survey: “Is your father/motheran immigrant to Canada? Where is s/he originally from?” Two participantsreported that their parents were born and raised in Canada, and so theywere excluded from data analysis. Lastly, four Canadian-born participantsreported that their parents were originally from Guangzhou. For the reasonstated previously regarding tonal features of regional varieties of Cantonese,the data of these four participants were excluded.Participants who had not been excluded due to aforementioned reasonswould be categorized as either homeland or heritage speakers based ontheir response to the question “Indicate the cities and countries that youhave lived in along with how old you were when you lived there”. Thosewho had lived in Hong Kong for at least 10 years between birth and age15 were categorized as homeland speakers, while those who had lived inCanada for at least ten years between birth and age 15 were categorized asheritage speakers.To conclude, only participants who met all the following criteria wereincluded: (i) they completed all required tasks; (ii) their self-reporteddominant language matched their language dominance score; (iii) theyhad at least one parent from Hong Kong who spoke Cantonese as a nativelanguage, and (iv) they had lived in Hong Kong or Canada for at least 10years between birth to age 15. Based on these criteria, 32 participantswere excluded. Among the 68 participants who were included, 34 of themhad spent at least ten years in Hong Kong from birth to age 15, and weretherefore considered homeland speakers; 34 of them had spent at leastten years in Canada from birth to age 15, and were therefore consideredheritage speakers6. The demographic information of these 68 individualswill be presented in the next subsection.6The balanced numbers were a result of continuous data collection until there were 34participants in each group.1204.4.3.3 Demographics of included participantsThe age, language dominance scores, and language proficiency of includedparticipants are reported as follows.As mentioned in Chapter 3, Cantonese tone merger in Hong Kong ismore common among speakers in their 20s than those in their 50s (Funget al., 2011), so it is important that the two groups in the current studyfell within the same age range. Since the 68 included participants wererecruited from the university student community, they were all young adults(M = 20.78 years of age, SD = 3.10). As summarized in Table 4.13, theaverage age of the homeland speaker group was 20.85 years (SD = 2.97),and the average age of the heritage speaker group was 20.71 years(SD = 3.27).Table 4.13: Age of included participants (in years)Minimum Maximum M SDhomeland 18 33 20.85 2.97heritage 18 31 20.71 3.27So far no known studies have suggested any relationship betweengender and Cantonese tonal perception, and so no attempt was made tobalance the number of male and female participants. In the current studythere were more female than male participants, and this was true for bothpopulations. In the homeland group there were nine male and 25 femaleparticipants; as for the heritage group, there were 12 male and 22 femaleparticipants.As mentioned Section 4.4.3.2, the language background questionnairenot only elicited demographic details like age and gender, but also collectedthe required information to generate language dominance scores, followingthe method introduced by Gertken et al. (2014). In Figure 4.16, the twoends of the x-axis are marked by the minimal and maximal endpoints ofthe BPL scale, which are –218 (extremely Cantonese-dominant) and 218(extremely English-dominant) respectively. The dotted line in the middlemarks zero, which represents balanced bilingualism.121In the current study, all homeland speakers had negative scores (henceCantonese-dominant), and all heritage speakers had positive scores (henceEnglish-dominant). Indicated by an asterisk on the boxplot, the mean scoreof the homeland group was –63.66 (SD = 30.09), while the mean scoreof the heritage group was 89.80 (SD = 36.12). The homeland group wascloser to zero, which can be attributed to the fact that all homeland speakerswere residing in Canada when they participated in the study. They hada relatively high level of English proficiency in order to study in a post-secondary institution. English was also used for day-to-day interactions onand off campus.Figure 4.16: Language dominance scores of the two populationsRecall that Chapter 2 has discussed nine configurations on the bilingualcontinuum. Although the nine configurations were introduced for thepurpose of a conceptual discussion and are by no means translatableto dominance scores linearly, the two end points and the mid point ofFigure 2.1 and Figure 4.16 are comparable to each other. To obtain adominance score of –218, the global language score for L1 must be 218(the highest possible score) and the global language score for L2 must be0 (the lowest possible score). The reverse is true for a dominance score of122218: the global language score for L1 must be 0 (the lowest possible score)and the global language score for L2 must be 218 (the highest possiblescore). Therefore, Configuration A and a dominance score of –218 can eachindicate L1 monolingualism. Configuration I and a dominance score of 218can each indicate L2 monolingualism. Configuration E and a dominancescore of 0 can each indicate balanced bilingualism. By comparing thebilingual continuum in Figure 2.1 and the language dominance scale inFigure 4.16, it can be concluded that both groups’ dominance scores landedon their expected range respectively. In Section 2.2 it was mentionedthat homeland and heritage speakers were expected to have L1-dominantconfigurations (between A and E) and L2-dominant configurations (betweenE and I) respectively. Results in Figure 4.16 match with these expectations:homeland speakers’ scores were between –218 and 0, and heritage speakers’scores were between 0 and 218.As discussed in Section 2.1.3, dominance and proficiency are relatedbut different concepts (Schmeißer et al., 2016), so data about participants’language proficiency are reported separately as follows. All participantswere instructed to rate their listening, speaking, reading and writing skillsof both languages on a scale of 0–6: 0 means “not well at all” and 6 means“very well”. Table 4.14a and Table 4.14b summarize their mean self-ratingsfor Cantonese and English respectively, and present results of t tests for fortwo independent samples with Bonferroni correction, so as to compare thetwo groups’ ratings for each skill and each language. Figure 4.17 visualizesthe same data in the form of boxplots, in which mean values are markedwith an asterisk, and median values are marked with centre lines in thebox. Observations from these data and their implications for the main studyare discussed as follows.123Table 4.14: t-test comparison of homeland and heritage speakers’ self-rated language proficiency on a scale of 0–6: 0=“not well at all”and 6=“very well”a. Cantonese language skillsHomeland Heritaget df pM SD M SDListening 5.65 0.64 4.47 1.13 5.26 52.36 <.001Speaking 5.79 0.48 3.53 1.50 8.38 39.63 <.001Reading 5.56 0.70 1.82 1.47 12.68 45.83 <.001Writing 5.32 0.94 1.41 1.35 13.84 59.04 <.001b. English language skillsHomeland Heritaget df pM SD M SDListening 4.79 0.73 5.94 0.24 –8.71 40.00 <.001Speaking 4.56 0.93 5.94 0.24 –8.42 37.36 <.001Reading 4.64 0.77 5.91 0.29 –8.93 41.96 <.001Writing 4.23 1.10 5.82 0.46 –7.76 44.09 <.001Note: Since there were eight pairwise comparisons, a Bonferronicorrection was made. The alpha level after correction was 0.00625.124Figure 4.17: Self-rated language proficiency of homeland and heritage speakers on a scale of 0–6; 6=“verywell” and 0=“not well at all”125As an overview, no heritage speakers rated 0 for all four language skillsfor Cantonese. In Figure 4.17a (“Cantonese listening”), the lowest ratinggiven was 1. In other words, all heritage speakers (and homeland speakerstoo) in the current study met the definition of a bilingual given in Chapter 2:they were minimally competent in at least one of the four language skills foreach of the two languages, even though they did not have an equal masteryof listening, speaking, reading, and writing skills for both languages.Several observations can be made regarding heritage speakers’ self-ratedCantonese proficiency. First, their self-rated listening ability was the highest(M = 4.47, SD = 1.13) among the four language skills. Second, theirself-rated speaking ability was generally fair (M = 3.53), but its standarddeviation was the highest (1.50) among the four skills, indicating a highdegree of variability. Such variability is also demonstrated by the heightof the red box in Figure 4.17b, which has whiskers spanning from 0 to 6.Third, the heritage group’s literacy skills were much lower compared withtheir listening and speaking skills. Their average self-ratings for reading(M = 1.82, SD = 1.47) and writing (M = 1.41, SD = 1.35) were bothbelow 2, and hence these two red boxes are pulled farther away from theblue ones that represent the homeland group. Therefore, the four red boxesin Figure 4.17 appear to be descending from left to right.These observations about heritage speakers’ Cantonese language skillsare consistent with those from previous studies. In both Canada (the currentstudy) and the United States (Carreira & Kagan, 2011), heritage Cantonesespeakers gave higher ratings for their aural skills (listening and speaking)than their literacy skills (reading and writing). Their low ratings for readingalso met earlier expectations, which led to the decision to use pictures ratherthan written Chinese characters as options in the forced-choice experiment.The high variability in terms of Cantonese proficiency (especially forlistening skills) within the heritage population suggests that a high varianceof accuracy could also be expected in results of the word-identification study.It was a possibility to have two clearly split groups within the heritagepopulation, namely highly proficient speakers with extremely high accuracy,and not so proficient speakers with extremely low accuracy. In this case, the126mean alone could lead to misleading interpretations. For this reason, othermeasures were taken to make fair generalizations about the population.To address within-group variability, generalized logistic mixed models withboth fixed and random effects were used for data analysis. For details, seeSection 5.1.As for heritage speakers’ self-rated English proficiency, it was high andclose to ceiling across all four language skills. The mean ratings for listening,speaking, reading, and writing were 5.94, 5.94, 5.91, and 5.82 respectively.The standard deviations were all below 0.50, which means there was verylittle variability. In the bottom row of Figure 4.17, the “red boxes” arerendered as lines because the fences and whiskers overlap.Homeland speakers also gave high ratings for their dominant language,which is, in this case, Cantonese. All four skills had a mean rating above 5 inTable 4.14a: 5.65 for listening (SD = 0.64), 5.79 for speaking (SD = 0.48),5.56 for reading (SD = 0.70), and 5.32 for writing (SD = 0.94).Similar to heritage speakers, homeland speakers gave lower ratingsfor their non-dominant language than their dominant language, but thedifference was less drastic than heritage speakers. Their mean ratings forEnglish language skills were all below 5 but above 4: 4.79 for listening(SD = 0.73), 4.56 for speaking (SD = 0.93), 4.64 for reading (SD = 0.77),and 4.23 for writing (SD = 1.10). In the bottom row of Figure 4.17, theasterisks inside the blue boxes are somewhat descending from left to right,but it is not as dramatic as the red boxes in the top row.On the whole, the average self-ratings of homeland and heritagespeakers for each language and each skill were significantly different, asshown in Table 4.14. Since the word-identification tasks in the mainstudy only involved listening comprehension in Cantonese, their ratings forthis skill were most noteworthy. Homeland speakers’ self-rated Cantoneselistening proficiency (M = 5.65, SD = 0.64) was significantly higher thanheritage speakers’ (M = 4.47, SD = 1.13), t(52.36) = 5.26, p < .001. Dueto this difference of listening abilities in Cantonese, an accuracy gap wouldalso be expected for results in the main study. If the accuracy gaps acrossstimulus types were static, it would suggest that the observed differences127between two groups were merely a reflection of their different Cantoneseproficiency. However, if the accuracy gap was bigger for a particularstimulus type (e.g. Type 3, words with no segments) but not others, it wouldsuggest that the observed differences were due to factors beyond languageproficiency (e.g. their tone discrimination abilities).To sum up, both homeland and heritage speakers gave higher ratingsfor their respective dominant language. However, for heritage speakers,the difference between dominant and non-dominant language was moreextreme. Ratings for their dominant language (English) were extremelyhigh and variability was extremely small, but ratings for their non-dominantlanguage (Cantonese) varied a lot from skill to skill; even for a particularskill, there was a lot of variability within the same population. In general,these ratings met earlier expectations.128Chapter 5ResultsThis chapter is divided into four sections. Section 5.1 presents an overviewof results by analyzing outputs of generalized logistic mixed models. Theother three sections discuss the three research questions in (1) respectivelyin detail: Section 5.2 responds to the first research question by discussingthe effects of variable manipulation on accuracy; Section 5.3 answers thesecond research question by reporting patterns of tonal confusion; finally,Section 5.4 addresses the third research question by comparing the twopopulations’ use of acoustic and semantic cues.(1) Research questions of the current study (repeated)a. Do homeland and heritage speakers behave differently in terms oftheir ability to identify tonally contrastive words?b. Do homeland and heritage speakers exhibit similar confusionpatterns with respect to lexical tone perception?c. Do homeland and heritage speakers make use of the same typeof information when identifying a word from a tonally contrastiveset? In particular, are acoustic and semantic information equallyuseful?1295.1 Overview: Hypothesis testing with generalizedlogistic mixed modelsThis section is an overview exploring whether the manipulated independentvariables, such as the presence or absence of segmental or tonal information,were effective predictors of the dependent variable of this study, namelyaccuracy. Results of the word-identification experiment were analyzed inR (R Core Team, 2013) using the lme4 package (Bates, Ma¨chler, Bolker &Walker, 2015). Generalized logistic mixed models (GLMMs) were fit to theexperiment data to see which combination of model parameters could bestdescribe the data. In all models being fit, accuracy was treated as a discretevariable, where 1 means correct and 0 means incorrect. Therefore, logisticregression was implemented by the glmer function with the argumentfamily="binomial".GLMMs are “mixed” because predictors are a combination of fixed andrandom effects. The fixed effects to be examined are the experimentallymanipulated independent variables, namely population and the type ofinformation available in the stimuli: is_there_segment, is_there_tone,and context. Among them, population, is_there_segment, andis_there_tone were binary variables: 0 refers to the reference category(“homeland” or “there is no X”), and 1 refers to the non-reference category(“heritage” or “there is X”). As for context, there were three levels:“yes congruous”, “yes incongruous”, and “no”. Among the three levels,“yes congruous” was chosen to be the reference category to facilitatethe comparison between no context and congruous contexts, and thecomparison between congruous and incongruous contexts.Interactions of the variables above were also fixed effects. Theseinteractions may change the predictive power of a model drastically, becausethey can show whether the effect of Predictor A is significantly different fordifferent values of Predictor B. For example, coefficients for the interactionof is_there_tone and population (expressed with an asterisk, as inis_there_tone*population) can show whether the effect of the presenceor absence of tone for homeland speakers was significantly different from130the effect of the presence or absence of tone for heritage speakers, which isexactly what was being tested in this dissertation.Four-way (is_there_segment*is_there_tone*context*population)and three-way interactions (is_there_tone*context*population) wereconsidered, but these models failed to converge. Multiple two-wayinteractions in one model were also considered, but the models failed toconverge. When only one two-way interaction was included at a time, themodels converged successfully. They were referred to as Models I, II, and III,which had is_there_tone*population, is_there_segment*population,and context*population as a fixed effect respectively, as in Table 5.1.Table 5.1: Fixed and random effects of three generalized logistic mixedmodels predicting accuracyEffectsFixed RandomModelIis_there_segment+ is_there_tone random slopes for subject and syllable+ context for is_there_segment,+ population is_there_tone, and context+ is_there_tone*populationModelIIis_there_segment+ is_there_tone random slopes for subject and syllable+ context for is_there_segment,+ population is_there_tone, and context+ is_there_segment*populationModelIII is_there_segment+ is_there_tone by-subject and by-syllable+ context random intercepts+ population+ context*populationTo improve the models’ ability to assess the effects of fixed-effectvariables, random effects were included to account for idiosyncraticvariations that were unpredictable. For example, the following situationscould have happened (though they were not actually reported): ParticipantA was startled when hearing the word si2 “poop” during the experiment and131would press the wrong button whenever this word came up. Participant Bfound the picture for se3 “diarrhea” amusing and would always be distractedby it, hence not paying attention to the stimulus. Participant C knew theword ji1 “to cure” particularly well compared with other Tone 1 wordsbecause his father was a doctor (ji1 sang1) and he grew up hearing thisword every day. To account for such within-population variability, subjectand syllable were examined as random effects in the models.As emphasized by Barr, Levy, Scheepers & Tily (2013), in order tominimize Type I error rates in confirmatory hypothesis testing, randomeffect structures should be as maximally specified as possible. To followtheir recommendation, a random effect structure with all of the followingwas considered for every model: by-subject and by-syllable randomintercepts, as well as random slopes for subject and syllable foris_there_segment, is_there_tone, and context.A model with the interaction context*population and both randomslopes and random intercepts was considered, but it failed to converge. Amodified model including random slopes but excluding random interceptswas considered, but it also failed to converge. Therefore, the random effectstructure of Model III in Table 5.1 only had random intercepts but notrandom slopes.Models with random intercepts, random slopes, and fixed effects ofis_there_tone*population or is_there_segment*population convergedsuccessfully. To test whether random intercepts improved the predictivepower of a model, pairs of models only differed by the inclusion or exclusionof random intercepts were compared through a likelihood ratio test usingthe anova() function. Results of the test show that by-subject and by-syllable random intercepts did not improve the models, χ2(1) = 0, p = 1.For this reason, in Models I and II, random intercepts were dropped butrandom slopes were kept, as shown in Table 5.1.Each of Models I, II, and III included the interaction of population witha different variable. Their outputs will be reported and analyzed in the132rest of this section. Note that logistic regression coefficients are log odds1,so they are by no means the predicted value of accuracy. However, thesecoefficients can show the general trend of whether accuracy was boosted orlowered.Coefficient estimates of fixed effects in Model I are summarized inTable 5.2. The intercept refers to reference categories, namely “homeland”,“there is no X”, or “yes congruous”. Effects of is_there_segment(β = 0.630, SE = 0.128, z = 4.925, p < .001) and is_there_tone(β = 2.962, SE = 0.314, z = 9.375, p < .001) were significant. Thismeans the presence of segments or tone helped homeland speakers to boosttheir performance significantly. As for context, “yes incongruous” had anegative coefficient, which indicates that going from “yes congruous” to“yes incongruous” led to a significant decrease of accuracy for homelandspeakers (β = –2.408, SE = 0.209, z = –11.543, p < .001). In a similarfashion, going from “yes congruous” to “no” also led to a significant decreaseof accuracy for homeland speakers (β = –2.796, SE = 0.208, z = –13.429,p < .001). These results were not surprising, as turning a congruous contextinto an incongruous one or taking away the semantic context altogether wasexpected to enhance the level of difficulty of the task in general.Since heritage speakers were of particular interest, the variablepopulation and its interaction with is_there_tone are the focus of thediscussion for Model I. In Table 5.2, the effect of population was notsignificant (β = –0.094, SE = 0.095, z = –0.997, p = .319). It shouldbe clarified that this coefficient was compared against the intercept, whichmeans when there was a congruous context but there were no segments andno tone, going from “homeland” to “heritage” would not make a significanteffect on accuracy. Although no actual stimuli had no segments and no toneat the same time, it would have been an impossible task for both homelandand heritage speakers. Therefore, it was not surprising that the effect of1Conversion from log odds to probabilities can be done using the formulap = odds/(1 + odds), where the odds are calculated by exponentiating the sums of therelevant coefficients in the model. For details, see Chapter 8 of Sweet & Grace-Martin(1999).133Table 5.2: Summary of fixed effects of Model I, a generalized logisticmixed model that included the interaction of is there tone andpopulation, predicting accuracyβ SE z p(Intercept) 1.452 0.245 5.914 < .001 ***is_there_segment(yes) 0.630 0.128 4.925 < .001 ***is_there_tone(yes) 2.962 0.314 9.375 < .001 ***context(yes_incong) –2.408 0.209 –11.543 < .001 ***context(no) –2.796 0.208 –13.429 < .001 ***population(heritage) –0.094 0.095 –0.997 .319is_there_tone(yes) –1.404 0.212 –6.623 < .001 ****population(heritage)Note: * p < .05; ** p < .01; *** p < .001Number of observations: 48960.0Akaike’s Information Criterion: 37924.1Bayesian Information Criterion: 38248.9Log-likelihood: –18925.0population was not significant.The most important observation in Table 5.2 is that effect of theinteraction is_there_tone*population was significant (β = –1.404,SE = 0.212, z = –6.623, p < .001). In other words, the presence orabsence of tonal information had significantly different effects for differentpopulations. For homeland speakers, going from “there is no tone” to “thereis tone” means an increase of log odds from 1.452 (intercept) to 4.414(1.452 + 2.962). Going from both “homeland” to “heritage” and “there is notone” to “there is tone” leads to an increase of log odds from 1.358 (1.452 –0.094) to 2.916 (1.452 – 0.094 + 2.962 – 1.404). In other words, thepresence of tonal information gave a boost for both homeland and heritagespeakers, but the boost for homeland speakers (2.962) was significantlybigger than for heritage speakers (1.558). This supports the hypothesis thatheritage speakers make less use of tonal information when identifying aword from a tonally contrastive set.Coefficient estimates of fixed effects in Model II are reported as134follows. The effects of interest in this model are is_there_segmentand its interaction with population. As summarized in Table 5.3, theeffect of is_there_segment was significant (β = 0.513, SE = 0.153,z = 3.349, p < .001). This means going from “there are no segments” to“there are segments” enhanced homeland speakers’ accuracy. However, theeffect of is_there_segment*population was not significant (β = 0.211,SE = 0.169, z = 1.251, p = .211). In other words, the presence or absenceof segmental information affected both homeland and heritage speakers,but its impact on homeland speakers was not significantly different from itsimpact on heritage speakers.Table 5.3: Summary of fixed effects of Model II, a generalized logisticmixed model that included the interaction of is there segmentand population, predicting accuracyβ SE z p(Intercept) 1.733 0.252 6.874 < .001 ***is_there_segment(yes) 0.513 0.153 3.349 < .001 ***is_there_tone(yes) 2.254 0.312 7.238 < .001 ***context(yes_incong) –2.417 0.210 –11.533 < .001 ***context(no) –2.808 0.209 –13.435 < .001 ***population(heritage) –0.620 0.180 –3.445 < .001 ***is_there_segment(yes) 0.211 0.169 1.251 .211*population(heritage)Note: * p < .05; ** p < .01; *** p < .001Number of observations: 48960.0Akaike’s Information Criterion: 37956.7Bayesian Information Criterion: 38281.6Log-likelihood: –18941.4Coefficient estimates of Model III, which included context and itsinteractions, are reported as follows. As mentioned previously, the randomeffect structure of Model III was different from that of Models I and II.Due to convergence problems, Model III did not include random slopesfor subject and syllable for is_there_segment, is_there_tone, andcontext. However, by-subject and by-syllable intercepts were included.135Fixed effects of Model III are summarized in Table 5.4. The effects ofcontext(yes_incong) (β = –1.589, SE = 0.066, z = –24.24, p < .001)and context(no) (β = –2.240, SE = 0.063, z = –35.52, p < .001)were significant. These coefficients were negative numbers, which meansgoing from “yes congruous” to “yes incongruous” lowered the accuracy ofhomeland speakers. Going from “yes congruous” to “no” also lowered theaccuracy of homeland speakers. This was not surprising, as taking awaythe semantic context or making a sentence incongruous would enhance thelevel of difficulty of the task in general.Table 5.4: Summary of fixed effects of Model III, a generalized logisticmixed model that included the interaction of context andpopulation, predicting accuracyβ SE z p(Intercept) 1.565 0.173 9.06 < .001 ***is_there_segment(yes) 0.540 0.027 19.78 < .001 ***is_there_tone(yes) 1.871 0.040 46.80 < .001 ***context(yes_incong) –1.589 0.066 –24.24 < .001 ***context(no) –2.240 0.063 –35.52 < .001 ***population(heritage) –1.069 0.183 –5.85 < .001 ***context(yes_incong) –0.446 0.075 –5.92 < .001 ****population(heritage)context(no) 0.113 0.079 1.43 .152*population(heritage)Note: * p < .05; ** p < .01; *** p < .001Number of observations: 48960.0Akaike’s Information Criterion: 42779.4Bayesian Information Criterion: 42867.2Log-likelihood: –21379.7Since heritage speakers were of particular interest, the interaction ofcontext and population is the focus of the discussion. Since context hadthree levels, there were two lines of interactions in Table 5.4. First, the effectof the interaction of context(yes_incong) with population(heritage)was significant (β = –0.446, SE = 0.075, z = –5.92, p < .001). This means136going from “yes congruous” to “yes incongruous” made a negative impacton the accuracy of both homeland and heritage speakers. However, thisimpact was bigger on heritage speakers compared with homeland speakers.Specifically, for homeland speakers, the impact was –1.589 in log odds.For heritage speakers, the impact was –2.035 (–1.589–0.446) in log odds.The fact that the heritage group’s performance suffered more supports thehypothesis that heritage speakers found it more difficult to actively ignoresemantic context and use tonal information when the target words wassemantically incongruous with the carrier phrase.However, the effect of context(no)*population(heritage) was notsignificant (β = 0.113, SE = 0.079, z = 1.43, p = 0.152). This suggeststhat going from “yes congruous” to “no” made an impact on the accuracy ofboth homeland and heritage speakers, but the impact was not significantlydifferent between the two groups.Overall, results of logistic regression with fixed and random effectsconfirmed that the presence of tonal information gave a significantlybigger boost for homeland speakers’ performance than heritage speakers’.When the target word was semantically incongruous with the carrierphrase, the performance of the heritage group suffered significantly morethan the homeland group. Since heritage speakers are known to be aheterogeneous group, the inclusion of random effects made sure that inter-subject variability was factored into the analysis, and so the scope ofinference can be extended to the entire population.Following this overview section, Section 5.2 will compare the twogroups’ average accuracy for each stimulus type in the word-identificationexperiment.1375.2 Response to Research Question 1: AccuracyThe rest of this chapter addresses the three research questions of thisdissertation in more detail. This section responds to the first researchquestion, which is re-stated below with its null and alternative hypotheses:Research Question 1:Do homeland and heritage speakers behave differently in terms oftheir ability to identify tonally contrastive words?H0: There is no difference between homeland and heritagespeakers in terms of their ability to identify tonallycontrastive words.H1: Compared with homeland speakers, heritage speakers areless able to identify tonally contrastive words.Previously in Section 3.6, it was hypothesized that heritage speakersmake less use of tonal information for word identification than homelandspeakers. In general, this hypothesis predicts that H0 will be rejectedin favour of H1. However, on the level of specific stimulus types, thepredictions for H0 and H1 may differ. Recall from Section 4.1.3 that eachof the eight stimulus types represented a specific way to manipulate fourvariables: segmental information, tonal, semantic context, and semanticcongruity. A recap of the tested configurations is provided in Table 5.5.Consider Type 2 (words with no tone) and Type 4 (sentences with no tone)as examples. Since the removal of tonal information was hypothesized torender the two groups equal, no difference between homeland and heritagespeakers was expected, hence predicting that H1 would be rejected in favourof H0. As for the rest of the stimulus types, an accuracy gap between thetwo groups was expected in favour of H1. However, the crucial observationto be made here is not only the presence or absence of an accuracy gap, butalso the size of the gap. Therefore, the subsequent discussion will addresstwo questions for each stimulus type: first, was any observed differencebetween homeland and heritage speakers statistically significant? Second,if there was a significant difference, what was its magnitude?To answer the first question regarding statistical significance, t tests for138Table 5.5: Recap of stimulus types and predicted resultsType Segment Tone Context Congruity Predicted result1 3 3 7 not appl. homeland >> heritage2 3 7 7 not appl. homeland = heritage3 7 3 7 not appl. homeland >>> heritage4 3 7 3 3 homeland = heritage5A 3 3 3 3 homeland > heritage5B 3 3 3 7 homeland >> heritage6A 7 3 3 3 homeland >> heritage6B 7 3 3 7 homeland >> heritagetwo independent samples were conducted to compare the average accuracyof homeland and heritage speakers for each stimulus type. To decidewhether t tests assuming equal or unequal variances should be used, Ftests for comparing variances of two samples were performed. Results ofthe F tests indicate that variances of homeland and heritage speakers weresignificantly different for Types 1, 3, 5A, 5B, 6A, and 6B, and so t testsassuming unequal variances were used. For Types 2 and 4, variances werenot significantly different, and so t tests assuming equal variances wereused. An overview of the results is presented in Table 5.6. Since eightpairwise t tests were performed on a single data set, a Bonferroni correctionwas made to reduce the chances of obtaining false-positive results. Thealpha level was adjusted to a more conservative value. This was doneby dividing the normal alpha level (0.05) by the number of pairwisecomparisons (in this case, 8), which yields 0.00625. This means a p valueabove 0.00625 indicates that an observed difference is statistically non-significant, while a p value below 0.00625 indicates that the difference isstatistically significant.139Table 5.6: Accuracy rates arranged by effect size in form of Cohen’s d (smallest to largest)Type Population M (in %) SD t df p Cohen’s d Predicted result2homeland 30.05 45.861.13 63.64 .26 0.27 homeland = heritageheritage 28.97 45.374homeland 89.22 31.031.71 60.14 .09 0.41 homeland = heritageheritage 82.25 38.215Ahomeland 95.83 19.993.99 57.60 <.001 0.97 homeland > heritageheritage 86.37 34.325Bhomeland 87.14 33.375.07 42.36 <.001 1.22 homeland >> heritageheritage 66.31 47.276Ahomeland 94.17 23.445.26 58.49 <.001 1.28 homeland >> heritageheritage 80.54 39.606Bhomeland 85.34 35.445.67 48.32 <.001 1.37 homeland >> heritageheritage 56.72 49.551homeland 91.37 28.087.89 42.83 <.001 1.91 homeland >> heritageheritage 69.22 46.173homeland 80.54 39.608.28 56.96 <.001 2.00 homeland >>> heritageheritage 49.41 50.01Note: The alpha level after Bonferroni correction was 0.00625.140To answer the second question regarding the magnitude of difference,effect sizes in Cohen’s d (Cohen, 1988) were measured. This was calculatedby dividing the difference between two means by the pooled standarddeviation of the data. The output value indicates the effect size, which canbe interpreted as shown in Table 5.7. The eight stimulus types in Table 5.6were arranged by effect size from the smallest to the largest. These resultswill be further discussed in the rest of this section.Table 5.7: Interpretation of Cohen’s d (Cohen, 1988; Sawilowsky,2009)Cohen’s d Effect size2.00 huge1.20 very large0.80 large0.50 medium0.20 small0.01 very smallIn subsequent discussion, data will be visualized using boxplots, as in theexample in Figure 5.1 based on real data from this study. The central lineand the asterisk in the box indicate the median and the mean respectively.The lower and upper boundaries of the box show the lower and upperquartiles respectively. Whiskers are drawn to include any data points withina certain distance of the box: the interquartile range is multiplied by 1.5,and this number is added to the upper quartile and subtracted from thelower quartile. The lower and upper whiskers show the lowest and highestdatum within this limit respectively. Points outside this limit are outliers.Each of the four subsections that follow will illustrate the effect of oneindependent variable. Subsets of stimulus types in Table 5.5 will be singledout for discussion, and effect sizes will be checked against predictions madein Chapter 4. Section 5.2.5 will provide an interim summary and revisit thefirst research question of this dissertation.141Figure 5.1: A sample boxplot5.2.1 With vs. without contextThe first effect being examined is the presence or absence of semanticcontext. Stimulus types that only differ by this variable will be discussed sideby side: normal words will be compared with normal congruous sentences,words with no tone will be compared with sentences with no tone, andwords with no segments will be compared with congruous sentences withno segments.The first pair for comparison is Type 1 (normal words) and Type5A (normal, congruous sentences), and their variable configuration issummarized in Table 5.8. This pair is presented first for discussion, becausethese two stimulus types can be a window to understanding the twogroups’ baseline listening ability in Cantonese. For Type 5A, it has a 3mark for all four variables, which means it contained all of the followinglistening cues: segmental information, tonal information, semantic context,and semantic congruity. Therefore, this stimulus type should be the leastchallenging task for both homeland and heritage speakers. Recall that inSection 4.4.3.3, homeland (M = 5.65 out of 6, SD = 0.64) and heritagespeakers (M = 4.47 out of 6, SD= 1.13) gave significantly different self-ratings for their Cantonese listening abilities. It was expected that their142results for Type 5A would reflect this baseline difference.As for Type 1, although these stimuli also contained segmental and tonalinformation, the target words were not embedded in a carrier phrase, andso listeners did not have a frame of reference of the talker’s pitch. Generallyspeaking, monosyllabic words are more difficult than sentences. Bothgroups were expected to have a lower accuracy compared with Type 5A.However, what is of particular interest here is not the difference of accuracywithin one population across stimulus types; rather, it is the comparisonof accuracy gaps between the two groups across stimulus types. If heritagespeakers indeed relied on semantic context more often than they did fortonal information in word identification, a larger accuracy gap betweenhomeland and heritage speakers would be expected for Type 1 comparedwith Type 5A. However, if homeland and heritage speakers did not differ interms of how much they relied on semantic information, then the accuracygaps for Type 1 and Type 5A were expected to be the same, which wouldjust be a reflection of their difference in Cantonese proficiency in general.Results for Type 5A in Figure 5.2b met expectations. Accuracy ofhomeland speakers was near ceiling (M = 95.83%, SD = 19.99). Fouroutliers fell below 90%, but they were all above 70%. The short whiskerssuggest very little variability among homeland speakers. Heritage speakersalso did well in this task (M = 86.37%, SD = 34.32), but their accuracywas significantly lower than that of homeland speakers, t(57.61) = 3.99,p < .001. The relatively high standard deviation and longer whiskers in theplot suggests large variability within the heritage group.The effect size for Type 5A was large (d = 0.97), which has importantimplications for other stimulus types: if heritage speakers’ lower overallCantonese proficiency was the sole reason for any difference between thetwo populations, then a similar effect size (0.97) should be expected acrossall eight stimulus types. However, if a larger effect size was observed inother stimulus types, such magnitude of difference must be due to reasonsother than heritage speakers’ lower overall Cantonese proficiency.Results for Type 1 in Figure 5.2a also met expectations. The differencebetween homeland (M = 91.37%, SD = 28.08) and heritage speakers143Table 5.8: Stimulus types with all acoustic informationType Segment Tone Context Congruity Predicted result1 3 3 7 not appl. homeland >> heritage5A 3 3 3 3 homeland > heritageFigure 5.2: Comparison of stimulus types with all acoustic information(M = 69.22%, SD = 46.17) was significant, t(42.83) = 7.89, p < .001.Whiskers for the heritage group span between 36.67% and 95.00%, whichsuggests an even higher degree of variability compared to Type 5A. Theeffect size for Type 1 stimuli was close to huge (d = 1.91).The actual results above matched the predicted results: even when allacoustic information was available, the magnitude of difference betweenthe two groups for normal words was huge. When semantic context wasavailable in Type 5A, however, the gap between the two groups becamesmaller. This implies that semantic context had helped both groups toachieve a higher accuracy, but it helped heritage speakers even moreproportionally.The second pair for comparison is Type 2 (words with no tone) andType 4 (sentences with no tone), and their variable configuration is144summarized in Table 5.9. According to the hypotheses of this dissertation(see Section 3.6), the difference between homeland and heritage speakerslay in the usefulness of tonal information in word identification. Therefore,removing tonal information from the stimuli would render the two groupsequal (or close to equal, considering their different language proficiency ingeneral). In particular, Type 2 (words with no tone) was expected to bean impossible task for both groups, as the target word and competitors hadidentical segments and only differed by tone. It was predicted that accuracyrates of both groups would be at chance (25%). As for Type 4, no significantdifference was expected, as it was hypothesized that both populations wouldbe equally able to make use of semantic information for word identification.Table 5.9: Stimulus types with no toneType Segment Tone Context Congruity Predicted result2 3 7 7 not appl. homeland = heritage4 3 7 3 3 homeland = heritageFigure 5.3: Comparison of stimulus types with no toneResults for Type 2 in Figure 5.3a confirmed that both groups’145performance was close to chance (25%). The accuracy of homeland(M = 30.05%, SD = 45.86) and heritage speakers (M = 28.97%,SD = 45.37) was very similar, t(63.64) = 1.13, p = .26, d = 0.27.Both groups had similar standard deviations, which suggests that noparticular population had more variability within the group. These met theexpectation that Type 2 was an impossible task for both groups.For Type 4 in Figure 5.3b, the difference between homeland(M = 89.22%, SD = 31.03) and heritage speakers (M = 82.25%,SD = 38.21) was also not significant, t(60.14) = 1.71, p = .09, d = 0.41.It is noteworthy that five outliers in the homeland group scored below 70%.These individuals mostly chose words with the mid level tone instead ofwords that would make sense in the given sentence. These cases willbe discussed in detail in Section 5.3. To sum up, results of Type 2 andType 4 show that when tonal information was unavailable, the two groupsperformed similarly—either they both performed poorly (as in Type 2), orthey both performed well (as in Type 4).The third pair for comparison is Type 3 (words with no segments) andType 6A (the last word of the congruous sentence has no segments), andtheir variable configuration is summarized in Table 5.10. Although Type6A stimuli were congruous sentences, they were predicted to be relativelychallenging for heritage speakers due to their lack of segmental informationplus the fact that all congruous and incongruous sentences were mixed inone experimental block. As for Type 3, its lack of semantic context waspredicted to make the task even harder for heritage speakers. Therefore,the magnitude of difference between the two groups was expected to belarger for Type 3 than Type 6A.Results for Type 3 in Figure 5.4a show that although this task wasextremely challenging for heritage speakers, their performance was abovechance (M = 49.41%, SD = 50.01). As expected, homeland speakers didsignificantly better (M = 80.54%, SD = 39.60) than heritage speakers,t(56.96) = 8.28, p < .001. The effect size of this comparison was huge(d = 2.00), the largest of all stimulus types. For Type 6A in Figure 5.4b,the difference between homeland (M = 94.17%, SD = 23.44) and heritage146speakers (M = 80.54%, SD = 39.60) was significant, t(58.49) = 5.26,p < .001. The effect size was not as huge as that of Type 3, but it was stillvery large (d = 1.28). Since Types 3 and 6A only differed by the presenceor absence of semantic context, it can be concluded that the presence ofsemantic context significantly decreased the gap between homeland andheritage speakers.Table 5.10: Stimulus types with no segmentsType Segment Tone Context Congruity Predicted result3 7 3 7 not appl. homeland >>> heritage6A 7 3 3 3 homeland >> heritageFigure 5.4: Comparison of stimulus types with no segments5.2.2 With vs. without congruityThis subsection examines the effect of congruity by comparing stimulustypes that only differed by this variable. The first pair for comparison isType 5A (normal, congruous sentences) and Type 5B (normal, incongruoussentences). Their variable configuration is summarized in Table 5.11. As147mentioned previously, the two groups’ performance in Type 5A was expectedto reflect their baseline Cantonese listening proficiency. Therefore, anaccuracy gap was expected for Type 5A stimuli. As for Type 5B, target wordswere semantically incongruous with the carrier phrase. It was hypothesizedthat heritage speakers would find it challenging to actively ignore thesemantic context, and identify a word by relying on tonal information only.Therefore, the gap between homeland and heritage speakers was predictedto be larger for Type 5B compared with 5A.Table 5.11: Stimulus types with context and all acoustic informationType Segment Tone Context Congruity Predicted result5A 3 3 3 3 homeland > heritage5B 3 3 3 7 homeland >> heritageFigure 5.5: Comparison of stimulus types with context and all acousticinformationResults presented in Figure 5.5 show that this prediction was correct.For Type 5A, the difference between homeland (M = 95.83%, SD = 19.99)and heritage speakers (M = 86.37%, SD = 34.32) was significant,148t(57.60) = 3.99, p < .001; the effect size was large (d = 0.97). For Type 5B,the difference between homeland (M = 87.14%, SD = 33.37) and heritagespeakers (M = 66.31%, SD = 47.27) was also significant, t(42.36)=5.07,p< .001; the effect size was very large (d = 1.22). Both populations showedmore intra-group variability for Type 5B compared with 5A. For Type 5B inFigure 5.5b, the most accurate listener from the homeland group achieved97.78% accuracy, and the bulk of this group achieved at least 73.47%accuracy. There was one outlier who achieved only 55.56% accuracy,suggesting that this homeland speaker also found Type 5B challenging. Asfor heritage speakers, their accuracy range was even larger. On the onehand, the most accurate listener from this group achieved 92.78% accuracy,which was comparable to the performance of homeland speakers. On theother hand, there were six outliers in the heritage group. In particular,three of the six outliers fell below chance (25%), and the lowest outlier wasat 8.89%. An accuracy below chance suggests that these errors were notmade randomly. Section 5.4 will explain that heritage speakers relied onsemantic context relatively more often than homeland speakers did, and sothey made errors due to choosing a word that was semantically congruouswith the carrier phrase. This pulled the two groups apart, and led to a largeraccuracy gap in Type 5B than in Type 5A.The next pair for comparison is Type 6A (the last word of the congruoussentence has no segments) and Type 6B (the last word of the incongruoussentence has no segments). Their variable configuration is summarized inTable 5.12. Note that 5A-5B and 6A-6B were both congruous-incongruouspairs. In general, “B” stimuli were expected to be harder for heritagespeakers than their “A” counterparts due to the difference in congruity.However, for 6A and 6B, there was an additional complication of the lack ofsegmental information, which means heritage speakers would have to relyon tonal information for both 6A and 6B. Given the hypothesis that heritagespeakers were less good at using tonal information, Types 6A and 6B wereboth predicted to be challenging for them.Results show that Types 6A and 6B had similar effect sizes. ForType 6A in Figure 5.6a, the difference between homeland (M = 94.17%,149SD = 23.44) and heritage speakers (M = 80.54%, SD = 39.60) wassignificant, t(58.49) = 5.26, p < .001. For Type 6B in Figure 5.6b,the difference of the average accuracy between homeland (M = 85.34%,SD = 35.44) and heritage speakers (M = 56.72%, SD = 49.55) was alsosignificant, t(48.32) = 5.67, p < .001. Effect sizes of Type 6A (d = 1.28)and 6B (d = 1.37) were similar and both translate to “very large”.Table 5.12: Stimulus types with context and tonal information onlyType Segment Tone Context Congruity Predicted result6A 7 3 3 3 homeland >> heritage6B 7 3 3 7 homeland >> heritageFigure 5.6: Comparison of stimulus types with no segmentsFor each population, a wider accuracy range was observed for Type6B compared with 6A. In Figure 5.6b, the most accurate listener from thehomeland group achieved 96.67% accuracy, while the lowest outlier in thesame group was at 31.67%. The heritage group had even more extremeperformance, ranging from 3.33% to 91.11%. Six heritage speakers’accuracy fell below chance (25%), which suggests that their errors were150not random. Section 5.4 will further investigate the patterns in these errorsand show that it was comparatively harder for heritage speakers to ignorethe semantic context when the target word was semantically incongruouswith its carrier phrase.At this point the boxplots of all eight stimulus types have been shown.In the two subsections that follow, different pairs from the same set ofboxplots will be juxtaposed to illustrate the effect of variables that havenot been discussed. Relevant statistics such as mean accuracy and standarddeviations will be repeated for clarity.5.2.3 With vs. without segmental informationThis subsection discusses the effect of the presence or absence of segmentalinformation by comparing stimulus types that only differed by this variable.The first pair for comparison is Type 1 (normal words) and Type 3(words with no segments). Their variable configuration is summarized inTable 5.13. Type 1 stimuli were tonally contrastive monosyllabic wordswhich did not provide any semantic context. Since semantic contextwas hypothesized to be an important cue for heritage speakers, a bigaccuracy gap between the two populations was expected for Type 1. Inaddition to the lack of context, Type 3 also lacked segmental information.Although segmental information was redundant in this task as target wordsand competitors were segmentally the same, segmental information doesfacilitate lexical activation in general. When segmental information wasalso taken away, the only listening cue left was tonal information, whichwas hypothesized to be the least helpful for the heritage group. Therefore,the gap between the two groups was expected to be even bigger for Type 3compared with Type 1.Results in Figure 5.7 generally met expectations. For Type 1, thehomeland group (M = 91.37%, SD = 28.08) did significantly better thanthe heritage group (M = 69.22%, SD = 46.17), t(42.83) = 7.89, p < .001.For Type 3, the homeland group (M = 80.54%, SD = 39.60) also didsignificantly better than the heritage group (M = 49.41%, SD = 50.01),151t(56.96) = 8.28, p < .001. Effect sizes of both stimulus types were on the“huge” end of the scale. Although the effect size of Type 1 (d = 1.91) wassmaller than that of Type 3 (d = 2.00), this difference was smaller than thedistance between two thresholds on the effect size scale.Table 5.13: Stimulus types with no contextType Segment Tone Context Congruity Predicted result1 3 3 7 not appl. homeland >> heritage3 7 3 7 not appl. homeland >>> heritageFigure 5.7: Comparison of stimulus types with no contextFor Type 3, a wide range of accuracy was observed for both populations.However, the two groups’ mean accuracy had skewed distributions inopposite directions in Figure 5.7b. For the homeland group, the lowerwhisker was longer than the upper whisker, and the median was closer tothe upper quartile than the lower quartile. This suggests that while somehomeland speakers did find this task challenging, they were the minority.The opposite can be observed in the heritage group: the upper whiskerwas longer than the lower whisker, and the median was closer to the lower152quartile than the upper quartile. This suggests that while it was possible forheritage speakers to achieve an accuracy comparable to homeland speakers’,they were the minority. The skewed distributions pulled the two groupsapart, which led to a wider gap between them in Type 3 compared withType 1.The second pair for comparison is Type 5A (normal congruoussentences) and 6A (the last word of the congruous sentence has nosegments). Their variable configuration is summarized in Table 5.14.Similar to the case of Type 1 vs. Type 3, since the only difference betweenType 5A and Type 6A was that Type 6A lacked segmental information, itwas predicted that this would enhance the level of difficulty for heritagespeakers.Table 5.14: Stimulus types with context and congruityType Segment Tone Context Congruity Predicted result5A 3 3 3 3 homeland > heritage6A 7 3 3 3 homeland >> heritageFigure 5.8: Comparison of stimulus types with context and congruity153Results in Figure 5.8 show that the prediction was correct. For Type 5A,the difference between homeland (M = 95.83%, SD = 19.99) and heritagespeakers (M = 86.37%, SD = 34.32) was significant, t(57.60) = 3.99,p < .001. For Type 6A, the difference between homeland (M = 94.17%,SD = 23.44) and heritage speakers (M = 80.54%, SD = 39.60) was alsosignificant, t(58.49) = 5.26, p < .001. The effect size of Type 5A was large(d = 0.97), while that of Type 6A was very large (d = 1.28). This differencein effect size confirmed that the lack of segmental information widened thegap between the two groups in Type 6A compared with Type 5A.The last pair for comparison in this subsection is Type 5B (normalincongruous sentences) and 6B (the last word of the incongruous sentencehas no segments). Their variable configuration is summarized in Table 5.15.Similar to 5A-6A, 5B-6B were also “with segment - no segment” pairs ofsentences. In general, the “6” stimuli without segmental information wereexpected to be more difficult than their “5” counterparts with segmentalinformation. However, Types 5B and 6B had an additional complicationof having target words that were semantically incongruous with the carrierphrase. Given the hypothesis that heritage speakers would find it hard toactively ignore the semantic context and identify the word solely based onacoustic information, all kinds of incongruous sentences would be morechallenging for heritage speakers than homeland speakers. Therefore, itwas predicted that 5B and 6B would show a similar magnitude of differencebetween the two groups.Results in Figure 5.9 are explained as follows. For Type 5B, thedifference between homeland (M = 87.14%, SD = 33.37) and heritagespeakers (M = 66.31%, SD = 47.27) was significant, t(42.36)=5.07,p < .001. For Type 6B, the difference of the average accuracy betweenhomeland (M = 85.34%, SD = 35.44) and heritage speakers (M = 57.72%,SD = 49.55) was also significant, t(48.32) = 5.67, p < .001. The effect sizesfor Type 5B and Type 6B were 1.22 and 1.37 respectively, both of whichtranslate to “very large”. Although the two effect sizes were not the same,their difference was smaller than the distance between two thresholds onthe effect size scale. This was similar to the difference of effect sizes of Type1541 (normal words) and Type 3 (words with no segments).Table 5.15: Stimulus types with context but no congruityType Segment Tone Context Congruity Predicted result5B 3 3 3 7 homeland >> heritage6B 7 3 3 7 homeland >> heritageFigure 5.9: Comparison of stimulus types with context but nocongruityTaken together, the absence of segmental information made more impacton the accuracy gap for tasks that were relatively easy (Type 5A vs. 6A). Forstimulus types that were already complicated by the absence of semanticcontext (Type 1 vs. Type 3) or incongruity (Type 5B vs. Type 6B), theabsence of segmental information made less impact on the accuracy gapbetween the homeland and heritage groups.5.2.4 With vs. without tonal informationLast but not least, this subsection discusses the impact of the presence orabsence of tonal information—the variable of most interest in this study. In155general, the lack of tonal information should make a task more difficult.Perhaps a more interesting question is how much difference the presenceof context alone would make for each group in the absence of tonalinformation.The first pair for comparison is Type 1 (normal words) and Type 2 (wordswith no tone). Their variable configuration is summarized in Table 5.16. Asmentioned previously, Type 1 stimuli were tonally contrastive monosyllabicwords. As there was no semantic context and only acoustic informationwas available, a large accuracy gap was expected between homeland andheritage speakers. For Type 2, the removal of tonal information from tonallycontrastive monosyllabic words was expected to render the two groupscloser to equal, in that both groups’ accuracy would be near chance (25%).Therefore, the accuracy gap for Type 1 was expected to be larger than thatof Type 2.Results for Type 1 in Figure 5.10a show that compared with homelandspeakers (M = 91.37%, SD = 28.08), heritage speakers (M = 69.22%,SD = 46.17) were indeed less able to identify a word from its tonallycontrastive competitors in the absence of semantic context, t(42.83) = 7.89,p < .001. However, it was not the case that heritage speakers werecompletely unable to distinguish between tonally contrastive words inisolation. If they were unable to use tonal information at all, theirperformance in Type 1 would have been closer to chance (25%). As forType 2, results in Figure 5.10b confirmed that it was an impossible task,and both groups’ performance was close to chance. The mean accuracy ofhomeland (M = 30.05%, SD = 45.86) and heritage speakers (M = 28.97%,SD = 45.37) was very similar, t(63.64) = 1.13, p = .26. As predicted, theaccuracy gaps of these two stimulus types were very different: the effect sizewas close to huge for Type 1 (d = 1.91) but only small for Type 2 (d = 0.27).This suggests that the absence of tonal information closed the accuracy gapbetween homeland and heritage speakers.156Table 5.16: Stimulus types with segmental information but no contextType Segment Tone Context Congruity Predicted result1 3 3 7 not appl. homeland >> heritage2 3 7 7 not appl. homeland = heritageFigure 5.10: Comparison of stimulus types with segmental informationbut no contextThe last pair for comparison is Type 5A (normal congruous sentences)and Type 4 (sentences with no tone). Their variable configuration issummarized in Table 5.17. As mentioned previously, Type 5A was expectedto reflect the difference of homeland and heritage speakers’ baselinelistening proficiency in Cantonese. As for Type 4 stimuli, they were“toneless” sentences in the sense that the f0 of the entire sentence (boththe carrier phrase and the target word at the end) was reset to 200 Hz,which was similar to the talker’s mid level tone. The stimuli block onlycontained congruous sentences, and so correct answers were words thatmade sense in that carrier phrase. Given the hypothesis that the differencebetween homeland and heritage speakers was their ability to make use of157tonal information, it was predicted that the removal of tonal informationwould render the two groups equal with regard to their performance inType 4.Table 5.17: Stimulus types with segments, context, and congruityType Segment Tone Context Congruity Predicted result5A 3 3 3 3 homeland > heritage4 3 7 3 3 homeland = heritageFigure 5.11: Comparison of stimulus types with segments, context,and congruityFor results of Type 5A in Figure 5.11a, the difference between homeland(M = 95.83%, SD = 19.99) and heritage speakers (M = 86.37%,SD = 34.32) was significant, t(57.60)=3.99, p < .001. However, forType 4 in Figure 5.11b, the difference between homeland (M = 89.22%,SD = 31.03) and heritage speakers (M = 82.25%, SD = 38.21) was notsignificant, t(60.14) = 1.71, p = .09. The effect size was large for Type 5A(d = 0.97) but it was small for Type 4 (d = 0.41). Although the effect sizefor Type 4 was not as small as expected, the difference between Types 5Aand 4 in terms of effect size met previous expectations.158Taken together, results in this subsection confirmed that the absenceof tonal information minimized the accuracy gap between homeland andheritage speakers. As expected, results of Type 2 and Type 4 provideevidence that homeland speakers did not simply do better than heritagespeakers across the board due to the difference of baseline proficiency. Thecrucial factor to determine the magnitude of difference between the twogroups was the configuration of variables.5.2.5 Interim summaryTo sum up, the absence of tonal information brought homeland andheritage speakers closer to each other in terms of accuracy, but theabsence of segmental information, semantic context or congruity widenedthe gap between the two populations. In general, results presented inthis section support the hypothesis that heritage speakers are less ableto distinguish tonally contrastive words compared with heritage speakers.More importantly, this difference between the two populations was not staticacross stimulus types; rather, the magnitude of difference can be predictedby the type of information available, which is summarized in Table 5.18.When tone was the only available type of information, the accuracy gapwas the largest, as in Type 3. The presence of segmental information playeda role to make the accuracy gap smaller (e.g. Type 5A vs. Type 6A), but thepresence of semantic context and congruity made an even bigger impact onminimizing the accuracy gap (Type 1 vs. Type 5A).159Table 5.18: Summary of stimulus types and variables, arranged byeffect size in form of Cohen’s d (smallest to largest)Type Description of stimulus type Seg. T. Ct. Cg. Cohen’s d2 Words with no tone 3 7 7 not appl. 0.274 Sentences with no tone 3 7 3 3 0.415A Normal sentences (congr.) 3 3 3 3 0.975B Normal sentences (incongr.) 3 3 3 7 1.226A Last word of the sentence hasno segments (congruous)7 3 3 3 1.286B Last word of the sentence hasno segments (incongruous)7 3 3 7 1.371 Normal words 3 3 7 not appl. 1.913 Words with no segments 7 3 7 not appl. 2.00Seg. = Segment, T. = Tone, Ct. = Context, Cg. = CongruityCohen’s d thresholds: 0.20=small, 0.50=medium, 0.80=large, 1.20=verylarge, 2.00=huge1605.3 Response to Research Question 2: ConfusionpatternsThis section addresses the second research question of this dissertation:do homeland and heritage speakers exhibit similar confusion patterns withrespect to lexical tone perception? The null and alternative hypotheses arelisted as follows:Research Question 2:Do homeland and heritage speakers exhibit similar confusionpatterns with respect to lexical tone perception?H0: There is no difference between homeland and heritagespeakers in terms of their confusion patterns.H1: Homeland and heritage speakers have different confusionpatterns.Previously in Section 3.6, it was hypothesized that heritage speakerswould experience more confusion than homeland speakers when asked toidentify a word from a tonally contrastive set. Therefore, it was anticipatedthat H0 would be rejected in favour of H1.In order to measure differences in terms of confusion patterns, datawill be presented as confusion matrices in this section. Section 5.3.1 is anintroduction to confusion matrices as a type of data visualization. It is thenfollowed by Section 5.3.2, which explains the Mantel test for comparingglobal similarity of matrices (Mantel, 1967). Data analysis for the currentstudy will be presented in Section 5.3.3, followed by an interim summary inSection 5.3.4.5.3.1 How to read a confusion matrixThis subsection is an introduction to confusion matrices. Readers whoare familiar with this type of data visualization may skip and proceed toSection 5.3.2 on the statistical test used for comparing confusion matrices.A confusion matrix is a two-dimensional table with two axes, as inFigure 5.12. The horizontal axis represents the target category, which inthis study means the intended tone. The vertical axis represents which161category a stimulus was perceived as, which in this case means participants’responses. Since there are six tones, the matrix has 36 (6*6) cells, eachof which has a “nickname” in the form of [x,y]. For example, [5,2] is thecell that represents “the stimulus was T5; participants answered T2”, and[6,6] means “the stimulus was T6; participants answered T6”. Numbersprinted on the cells indicate percentage of the target category instances thatwere classified with the given row label. If the percentage is high, the cellis coloured in a proportionally darker shade. Percentages of each columnalways add up to 100.Figure 5.12: How to interpret a confusion matrixFour hypothetical situations and their respective patterns are presentedas follows. First, if participants have an accuracy rate of 100%, theirconfusion pattern will look like Figure 5.13a. The cells [1,1], [2,2], [3,3],[4,4], [5,5], and [6,6] are instances when participants identify the correcttone. These cells form a diagonal from the lower left corner to the upperright corner2. Therefore, a dark diagonal is a sign of high accuracy.2In the literature, confusion matrices often have diagonals from the upper left to thelower right corner. Those in this section, however, have diagonals from the lower left to theupper right corner, so that tone numbers are arranged in a more intuitive order: numbers onthe x-axis go from left to right, while those on the y-axis go from bottom to top.162Figure 5.13: Confusion matrices showing perfect accuracy andaccuracy at chance respectivelyFigure 5.14: Confusion matrices showing two possibilities of T2-T5merger163The second hypothetical situation is at the other end of the extreme,namely random responses. Figure 5.13b has no clear diagonal at all, andall cells are equally dark. It means that each intended tone has beenperceived as every tone equally frequently. Therefore, an evenly shadedmatrix indicates random responses.The third and fourth situations pertain to the tone merger phenomenondiscussed in Section 3.3.2. Figure 5.14 shows two possible ways of mergingT2 (high rising) and T5 (low rising). The first way is merging onecategory into another existing category, which involves confusion in onlyone direction. In Figure 5.14a, the cell [5,2] has a much darker shadethan [5,5], which means T5 is perceived as T2 most of the time (83.33%).However, T2 is still consistently perceived as T2 (see [2,2]), and is neveridentified as T5 (see [2,5]). In other words, T5 is merging into T2 inperception.The second kind of merger is merging two categories into one newcategory, which involves confusion in two directions. In Figure 5.14b, thecells [2,2], [2,5], [5,2], and [5,5] have the same shade. T2 is perceived asT2 or T5 at equal chance, and similarly T5 is perceived as T2 or T5 at equalchance. In this case T2 and T5 are merging into a new category (a generalrising tone, for example) in perception. To sum up, both types of mergerreduce the number of contrastive rising tones from two to one: in the firstcase the rising tone that remains in the inventory would be the high risingtone, while in the second case it would be a general rising tone.Figure 5.14a and Figure 5.14b look more similar to each other thanFigure 5.13a and Figure 5.13b are. When more pairs with varying degreeof similarity are compared, however, it will be a question of how similaritycan be measured and ranked. The next subsection is going to explain howsimilarity can be quantified statistically.5.3.2 The Mantel test for comparing global similarity ofmatricesThis subsection explains the Mantel test (Mantel, 1967) to be usedfor assessing the strength of correlation between two distance matrices.164Readers who are familiar with this statistical method may skip and proceedto Section 5.3.3 for results.Consider a basic question: what does similarity mean for confusionmatrices? Figure 5.15 and Figure 5.16 present two pairs of toy matrices,which show the confusion patterns of six hypothetical categories α, β, γ, δ,, and ζ. Each of the two pairs can be argued as more similar than the otherdue to different reasons. On the one hand, one can argue that Toy MatricesA and C in Figure 5.16 are more similar than A and B are in Figure 5.15,because A and C have the same number of shaded non-diagonal cells (whichis two), and the percentages shown on these non-diagonal cells are the sameas well (which is 16.67). On the other hand, one can argue that Toy MatricesA and B are more similar than A and C are. Even though their number ofshaded non-diagonal cells is different (A has two but B has three), thesecells overlap in terms of distribution. The cells [β,] and [,β] are shadedin both matrices. It is noteworthy that these cells are “mirrored” versions ofeach other, a sign of mutual confusion. However in Toy Matrix C, the cells[α,ζ] and [δ,α] are shaded, while their counterparts in A are not. The setsof categories being confused do not overlap, and mutual confusion is onlyfound in Toy Matrix A but not C. In this respect A and C are quite different.The Mantel test for comparing global similarity of matrices shares thesame rationale as the latter argument above—it is crucial to recognize thatcells in a confusion matrix are not independent from each other, in that anygiven cell and its “mirrored twin” inform us about confusion of the sameset of categories. The output of the test is known as the Mantel r statistic,which indicates the strength level of correlation coefficients. A guide tointerpreting this value is provided in Table 5.19. A Mantel r statistic of 0 and± 1 indicate no correlation and perfect correlation respectively. Numbersbetween 0 and ± 1 are intermediate levels from weakly correlated to verystrongly correlated. The Mantel test was conducted for the Toy Matrices inFigure 5.15 and Figure 5.16. Results suggest that Toy Matrices A and B arevery strongly correlated (r = .92, p = .02), while A and C are only modestlycorrelated (r = .23, p = .16). Therefore, “similarity” of confusion matricesas measured by the Mantel test does not simply pertain to the total number165Figure 5.15: Toy matrices demonstrating a strong correlationFigure 5.16: Toy matrices demonstrating a modest correlation166of shaded non-diagonal cells or the total percentages of those cells; what ismore crucial is the relationship between these shaded cells and the sets ofcategories being confused.Table 5.19: Interpretation of the Mantel r statistic (Mantel, 1967)Mantel r statistic Correlation± 1 perfect±0.8 to ±0.9 very strong±0.5 to ±0.8 strong±0.3 to ±0.5 moderate±0.1 to ±0.3 modestbelow ±0.1 weak0 noneInformation presented so far should be adequate for readers to interpretresults presented in Section 5.3.3. Readers who are interested in how tocalculate the Mantel r statistic and significance level may keep reading thissubsection, which dives into details of the steps. Table 5.20 is an overviewof all procedures, which were implemented in R (R Core Team, 2013) withthe vegan package (Oksanen et al., 2018). Among the seven steps, Step 6(compute correlation coefficient) and Step 7 (calculate significance level)are directly relevant to data analysis. As Step 6 requires distance matricesas input, Steps 2–5 are intermediate steps with the purpose of transformingraw counts into the right input for data analysis; the output of each stepbecomes the input being fed to the next step.To illustrate the steps, a walk-through of the transformation of ToyMatrix D from raw counts (Table 5.21) to distance (Table 5.25) is presentedas follows. First, raw counts were obtained and presented in Table 5.21.As usual the x- and y-axes represent the intended and perceived categoriesrespectively. In this table some cells had a zero. These zeros would be apotential problem for Step 4 (convert proportion to similarity). To avoidpotential problems caused by zeros, the next step served the purpose ofreplacing zeros with a non-zero value.At Step 2, a smoothing technique (Witten & Bell, 1991) was employed167Table 5.20: Procedures to implement the Mantel test, adapted fromTang (2015)Step Description Method1 Obtain raw counts of responsesFor each intended category,count how many times it wasperceived as each category2 “Smooth” the matrix Witten & Bell (1991)3 Convert counts to proportionFor each cell, the value isdivided by the total number oftimes that intended categorywas presented4 Convert proportion to similarity Shepard (1972)5 Convert similarity to distance Shepard (1972)6 Compute correlation coefficient Kendall’s tau7 Calculate significance level 10,000 permutationsTable 5.21: Toy Matrix D at Step 1 (raw counts)ζ 0 0 0 0 0 100 0 50 0 0 33 0δ 0 0 0 100 0 0γ 0 0 100 0 0 0β 0 50 0 0 33 0α 100 0 0 0 34 0α β γ δ  ζto decrease the occurrences of non-zero counts (a process known as“discounting”), and redistribute them to cells that had zeros (a processknown as “backoff”). Compare the matrix before (Table 5.21) and aftersmoothing (Table 5.22). Cells that used to have a value of 100 are now98.522, while cells that used to have a value of 0 are now 0.328. Hereis how this was done: first, for non-zero cells, the discounting formula in(18) was applied. What this formula does is, for every non-zero cell, takethe original value and multiply it by a fraction. This fraction’s numerator168is the total number of responses in the matrix, and its denominator is thesum of the total number of responses and the number of non-zero cells ina matrix. In other words, its numerator will always be smaller than itsdenominator. As a result, the product after multiplication by this fractionwill always be smaller than the original value, and this is why the processis called “discounting”. For example, in Table 5.21, the number of non-zerocells was 9, the total number of responses was 600, and the cell [α,α] hadthe value of 100. Therefore, the new value for this cell after discountingwould be 100 * (600/(600+9)), which yields 98.522, as in Table 5.22.Table 5.22: Toy Matrix D after Step 2 (smoothing)ζ 0.328 0.328 0.328 0.328 0.328 98.522 0.328 49.261 0.328 0.328 32.512 0.328δ 0.328 0.328 0.328 98.522 0.328 0.328γ 0.328 0.328 98.522 0.328 0.328 0.328β 0.328 49.261 0.328 0.328 32.512 0.328α 98.522 0.328 0.328 0.328 33.498 0.328α β γ δ  ζ(18) Formula for discounting non-zero responses (Witten & Bell, 1991)V !zero = R× NTNT +N !zero• V!zero is the new value for a non-zero cell after smoothing• R is the original raw count in a non-zero cell before smoothing• NT is the total number of responses• N!zero is the number of non-zero cellsSecond, for zero cells, the backoff formula in (19) was applied. Whatthis formula does is, take the ratio of non-zero to zero cells in the matrix, and169then multiply this ratio by the same fraction described above (the numeratoris the total number of responses in the matrix, and the denominator isthe sum of the total number of responses and the number of non-zerocells in a matrix). For example, the total number of zero cells was 27in Table 5.21. The new value for the cell [α,β] after backoff would be(9/27) * (600/(600+9)), which yields 0.328. Therefore, cells that used tobe 0 in Table 5.21 have become 0.328 in Table 5.22. The matrix has been“smoothed” and is ready to be fed to the next step.(19) Backoff formula for zero responses (Witten & Bell, 1991)V zero =N !zeroN zero× NTNT +N !zero• Vzero is the new value for a zero cell after smoothing• N!zero is the number of non-zero cells• Nzero is the number of zero cells• NT is the total number of responsesStep 3, namely converting smoothed values to proportions, is quitestraightforward. For each cell, the smoothed value was divided by thenumber of times the respective intended category was presented (i.e. addup the values of that column). Take the cell [α,α] as an example: its valuewas 98.522 after Step 2 in Table 5.22, and the sum of the whole α columnwas 100. Therefore, the new value after Step 3 would be 98.522 divided by100, hence the result was 0.985 in Table 5.23. Similarly, the cell [α,β] usedto be 0.328; after Step 3 it has become 0.003. Now that the values havebecome proportion, they are ready to be converted to similarity.170Table 5.23: Toy Matrix D after Step 3 (proportion)ζ 0.003 0.003 0.003 0.003 0.003 0.985 0.003 0.493 0.003 0.003 0.327 0.003δ 0.003 0.003 0.003 0.985 0.003 0.003γ 0.003 0.003 0.985 0.003 0.003 0.003β 0.003 0.493 0.003 0.003 0.327 0.003α 0.985 0.003 0.003 0.003 0.337 0.003α β γ δ  ζThe purpose of Step 4 is to obtain a value to quantify the perceivedsimilarity of two categories. The metric proposed by Shepard (1972) in(20) was adopted, which gives an output value from 0 (not similar at all)to 1 (extremely similar). Conceptually the formula can be understood asa ratio of how often two categories are confused with each other, to howoften these two categories are correctly perceived. If two categories arealways perceived correctly with no confusion at all, the ratio would be 0:1,or 0/1 when represented as a fraction, which gives the output value 0. Thismeans these categories are perceived as “not similar at all”. However, iftwo categories are confused with each other half of the time, and perceivedcorrectly half of the time, the ratio would be 0.5:0.5, or 0.5/0.5, whichyields 1. This result means the two categories are perceived as “extremelysimilar”. Note that if two categories are never perceived correctly as theintended category, the denominator of the fraction in (20) will be zero,which means the result will always be “undefined”, and the rest of the stepscan never be taken. This explains why the zeros in Table 5.21 must beavoided by the smoothing technique at Step 2.To illustrate how this formula works, consider the categories β and  inTable 5.23. The numerator of the fraction would be the sum of [β,] (0.493)and [,β] (0.327), which yields 0.82. The denominator of the fractionwould be the sum of [β,β] (0.493) and [,] (0.327), which is also 0.82.The fact that the numerator and the denominator are identical suggeststhat the categories β and  were confused with each other as often as they171were perceived correctly. The output of the formula is therefore 0.82/0.82,resulting in 1, as in Table 5.24. In other words, β and  were perceived as“extremely similar”.(20) Formula for converting proportion to similarity (Shepard, 1972)Sxy =p(x,y) + p(y,x)p(x,x) + p(y,y)• Sxy is the similarity value between two categories x and y.• p(x,y) is the proportion of times that x was incorrectly perceived as y.• p(y,x) is the proportion of times that y was incorrectly perceived as x.• p(x,x) is the proportion of times that x was correctly perceived as x.• p(y,y) is the proportion of times that y was correctly perceived as y.Table 5.24: Toy Matrix D after Step 4 (similarity)ζ 0.003 0.004 0.003 0.003 0.005 1 0.259 1 0.005 0.005 1 0.005δ 0.003 0.004 0.003 1 0.005 0.003γ 0.003 0.004 1 0.003 0.005 0.003β 0.004 1 0.004 0.004 1 0.004α 1 0.004 0.003 0.003 0.259 0.003α β γ δ  ζThis formula in (20) has two important consequences. First, thesimilarity between β and  will always be the same as the similarity between and β. At Step 3 (proportion), the cell [β,] and its “mirrored twin”[,β] had different values, as in Table 5.23. However, after Step 4, thevalues of both cells have become 1, as in Table 5.24. Another consequenceis that after conversion to similarity, values in the diagonal cells (such as172[α,α] and [β,β]) will always be 1, regardless of how different they werefrom each other before conversion. This is because the numerator and thedenominator for the formula will always be identical for diagonal cells. Forexample, to calculate the similarity between β and β, the numerator willbe (p(β,β)+p(β,β)) and the denominator will also be (p(β,β)+p(β,β)); the onlypossible result is 1.The purpose of Step 5 is to convert similarity to distance. Similarityand distance are like two sides of a coin: if two categories are extremelysimilar, the perceptual distance between them must be very small or evennon-existent. Therefore, a similarity value of 1 (extremely similar) wouldbe equivalent to a perceptual distance of 0 (no perceptual distance). Toimplement the conversion, Shepard’s law (Shepard, 1972) was used, as in(21). The idea of this formula is turn the similarity value between twocategories into its negative logarithm. As a result, a similarity value of 1becomes a distance value of 0, as in Table 5.25. A relatively small similarityvalue (e.g. 0.003 in Table 5.24) becomes a relatively large distance value(e.g. 5.809 in Table 5.25). The consequences mentioned in Step 4 stillhold after Step 5: all diagonal cells will always have a distance value of 0,and any given cell and its “mirrored twin” will always have identical values.This fact about “mirrored twins” is especially important for the next step,because it reflects the assumption that cells in a confusion matrix are notindependent from each other. A distance matrix like Table 5.25 is now anappropriate input for correlation analysis via the Mantel test.(21) Formula for converting similarity to distance (Shepard, 1972)Dxy = −(logSxy)• Dxy is the distance value between two categories x and y.• Sxy is the similarity value between two categories x and y.173Table 5.25: Toy Matrix D after Step 5 (distance)ζ 5.809 5.521 5.809 5.809 5.298 0 1.351 0 5.298 5.298 0 5.298δ 5.809 5.521 5.809 0 5.298 5.809γ 5.809 5.521 0 5.809 5.298 5.809β 5.521 0 5.521 5.521 0 5.521α 0 5.521 5.809 5.809 1.351 5.809α β γ δ  ζThe last two steps are procedures of the Mantel test itself. It comparestwo distance matrices and computes their correlation coefficient (known asthe Mantel r statistic) and significance level (the p value). To compute thecorrelation coefficient (Step 6), Kendall’s tau3 was used. What the formulain (22) does is, take the difference between the number of concordant anddiscordant pairs, and divide this difference by the total number of paircombinations. This gives an output value of 0 (no correlation) to±1 (perfectcorrelation), as listed in Table 5.19.(22) Formula for computing the correlation coefficient of two distancematricesrτ =(number of concordant pairs)− (number of discordant pairs)n(n− 1)/2• rτ is the correlation coefficient (Kendall’s tau).• n is the number of observations.To obtain a significance level (Step 7), 10,000 permutations wereperformed. A permutation means shuffling the rows and columns of thematrices, and then recomputing the correlation coefficient of the shuffledmatrices. The purpose is to see if the recomputed correlation coefficient3To avoid potential errors made by using Spearman’s rho, Kendall’s tau was used insteadof Spearman’s rho. For more details, see the discussion in Tang (2015).174would be larger than the original coefficient. If this process is repeated10,000 times4, and most of the time the recomputed coefficient of shuffledmatrices has a greater value than that of the unshuffled matrices, it wouldsuggest that any observed correlation could be due to chance. In this case ahigh p value will be obtained, which means the correlation is not statisticallysignificant. However, if we can only find a few instances in which therecomputed coefficient is larger than the original one, it would suggestthat the observed correlation in unshuffled matrices is less likely to havehappened by chance. In this case a low p value will be obtained, whichmeans the the correlation is statistically significant.In the next subsection, confusion matrices of the actual experiment willbe discussed. Although distance matrices are the input for the Mantel test,they are not suitable for visualizing the directionality of confusion, becausecells that are mirrored twins will always have the same value. Therefore,in the next subsection, values in the cells will indicate the percentage ofresponses, which can better display (a)symmetrical patterns of potentialtone mergers. Although the transformation of each matrix from raw countsto distance will not be shown, readers should bear in mind that all matricesin the next subsection have gone through the seven steps in order tocompute the Mantel r statistic and significance level. Therefore matricesthat seem similar to the human eye (e.g. those in Figure 5.16) may beweakly correlated according to the Mantel test.5.3.3 Comparison of confusion patternsAlthough boxplots in Section 5.2 and confusion matrices in the currentsubsection were based on responses of the same group of participants fromthe same tasks, the focus of this subsection is different. The purposeof Section 5.2 was to compare homeland and heritage speakers’ averagepercentage of correct responses. However, this subsection is mainly about4In theory the number of permutations should be as large as possible so as to lower theuncertainty and to enhance the reliability of the result. As discussed in Tang (2015), 10,000permutations is a good number as it is larger than the minimum number of permutations(1,000) and at the same time the computation time required is reasonable.175their incorrect responses: were the errors random? Were there patternsin the errors? If there were patterns, were they different between thetwo groups? To answer these questions (which are iterations of ResearchQuestion 2), incorrect responses represented by non-diagonal cells will bediscussed in more detail.As an overview, Table 5.26 summarizes results of the Mantel test forcomparing global similarity of homeland and heritage speakers’ confusionmatrices. The eight stimulus types were arranged by the Mantel r statisticfrom the largest to the smallest. Overall, the r values were between0.30 and 0.81. In other words, all homeland-heritage pairs fell between“modestly correlated” and “strongly correlated” (see Table 5.19). Thissuggests that the two groups’ confusion matrices were quite similar ingeneral. However, their degree of similarity varied across stimulus types.When the stimuli were incongruous sentences (Types 5B and 6B), the twogroups had the most strongly correlated confusion matrices with r valuesabove 0.60. Monosyllabic words (Type 1, 2, 3) came next with r valuesaround 0.50, showing moderate to strong correlations. Lastly, stimuli thatwere congruous sentences (Types 4, 5A, 6A) had r values close to 0.30,showing modest to moderate correlations. Taken together, stimulus typesrepresenting different variable configurations fell into different ranges onthe scale of correlation strength.While the Mantel r statistic and p values were calculated based ondistance scores, the cells in all matrices presented below will showproportions in percentages. Recall from Section 5.3.2 that the conversionfrom proportions to distances erases any asymmetry in the directionof confusion. Presenting proportions instead of distances allows thecomparison of homeland and heritage speakers’ directionality of confusion,which is crucial for answering questions raised in Chapter 3 regarding trendsof tone merger.The rest of this subsection will compare confusion matrices of homelandand heritage speakers for each stimulus type. The eight homeland-heritagepairs will be discussed in the same order shown in Table 5.26. In otherwords, the most correlated pair will be discussed first, while the least176Table 5.26: Summary of Mantel test results comparing global similarityof homeland and heritage speakers’ confusion matrices; rowswere arranged by r values (largest to smallest)Type Description of stimulus type Seg. T. Ct. Cg. r p6B Last word of the sentence hasno segments (incongruous)7 3 3 7 .81 .0015B Normal sentences (incongr.) 3 3 3 7 .64 .0013 Words with no segments 7 3 7 n.a. .58 .0071 Normal words 3 3 7 n.a. .50 .0132 Words with no tone 3 7 7 n.a. .49 .0155A Normal sentences (congr.) 3 3 3 3 .33 .0786A Last word of the sentence hasno segments (congruous)7 3 3 3 .33 .0464 Sentences with no tone 3 7 3 3 .30 .057Seg. = Segment, T. = Tone, Ct. = Context, Cg. = Congruitycorrelated pair will be discussed last. For each pair of matrices, theleft one represents responses of homeland speakers, while the right onerepresents those of heritage speakers, as in Figure 5.17a and Figure 5.17brespectively. When perusing the matrices, readers may find it useful tobookmark Table 1.6 and Figure 1.4 on page 16 for a quick review ontone numbers and their respective tone numerals or contours. Followingthis subsection, an interim summary in Section 5.3.4 will discuss possibleinterpretations of the observed patterns.The two most similar pairs of matrices were Type 6B (the last word ofthe incongruous sentence has no segments) and 5B (normal, incongruoussentences), which are represented by Figure 5.17 and Figure 5.18respectively. The Mantel r statistic was 0.81 for Type 6B (p = .001) and0.64 for Type 5B (p = .001), which is a sign of strong correlation betweenthe homeland and heritage matrices. In both Type 6B (Figure 5.17) and 5B(Figure 5.18), non-diagonal cells were not evenly shaded, which suggeststhat errors were not random. For both homeland and heritage speakers, thesame set of paired cells, namely [6,3]&[3,6] and [5,2]&[2,5], stood out asrelatively darker than other non-diagonal cells in the same matrix. In other177Figure 5.17: Confusion patterns of homeland and heritage speakers forType 6B stimuliFigure 5.18: Confusion patterns of homeland and heritage speakers forType 5B stimuli178words, when the stimuli were incongruous sentences, both populationsfound T2-T5 (high rising and low rising) and T3-T6 (mid level and lowlevel) relatively more confusable than other tone pairs, possibly due to theirsimilar pitch contours. For each of the cells that stood out (e.g. [5,2] inFigure 5.18a and Figure 5.18b), the one in heritage is always darker thanits counterpart in homeland, which shows that confusion between thesetones happened more often for the heritage group. It can be concludedthat homeland and heritage speakers shared similar types of errors, but theydiffered mostly by the quantity of such errors.A difference between Type 5B and 6B was that patterns of the latterwere more symmetrical than those of the former. In Type 5B (Figure 5.18),the cell [6,3] was a lot darker than [3,6] (homeland: 24.69% vs 8.96%;heritage: 29.8% vs 10.69%), and [5,2] was much darker than [2,5](homeland: 8.23% vs 4.58%; heritage: 23.73% vs 11.37%). Theasymmetrical patterns suggest that confusion took place in one direction(e.g. T5 heard as T2) more often than the other (e.g. T2 heard as T5).However, in Type 6B (Figure 5.17), [3,6] and [6,3] had a similar shade,and this was true for both populations (homeland: 16.77% vs 11.15%;heritage: 18.82% vs 16.96%). The same can be observed for [2,5] and [5,2](homeland: 7.40% vs 7.08%; heritage 15% vs 17.94%). The symmetryindicates that confusion took place in both directions: T2 was heard asT5, and T5 was also heard as T2. A possible reason is that as segmentalinformation was taken away from Type 6B stimuli, the task became moredifficult than Type 5B, which led to more mutual confusion for the tonepairs that were already confusable even when segmental information wasavailable.A qualitative difference between homeland and heritage speakers wasthe confusion between T4 (low falling) and T6 (low level). In Figure 5.17b(heritage), the cell [6,4] was quite dark (13.04%) compared with othernon-diagonal cells, but its counterpart in Figure 5.17a (homeland) was not(3.75%). Similarly in Figure 5.18b (heritage), the cell [4,6] was shaded(11.08%), but the same cell in Figure 5.18a (homeland) was not (1.46%).This suggests that T4-T6 confusion was unique to heritage speakers.179The next most correlated pair was Type 3 (word with no segments). ItsMantel r statistic was 0.58 (p = .007), which indicates a strong correlation.On the whole, even though the diagonal in Figure 5.19b (heritage) wasvisible, its non-diagonal cells were shaded relatively evenly compared withFigure 5.19a (homeland). These features make Figure 5.19b (heritage) looklike a blend of the two sample matrices for perfect accuracy and randomanswers in Figure 5.13. This lack of obvious patterns in non-diagonal cellssuggests that when the only available cue was tonal information, heritagespeakers—who were hypothesized to be struggling with using this cue—hadlittle to rely on to make a judgment; therefore, the answer had to be pickedmore or less randomly. As for homeland speakers, they exhibited T3-T6confusion again, but this time the direction of confusion was different. InFigure 5.19a, the cell [3,6] stood out as a darker non-diagonal cell (26.76%)compared with [6,3] (6.18%). In other words, T3 was heard as T6 moreoften. This was different from Type 5B, in which the other direction ofconfusion was observed more often.Figure 5.19: Confusion patterns of homeland and heritage speakers forType 3 stimuli180Type 1 (normal words) matrices had a Mantel r statistic of 0.50(p = .013), which indicates that the correlation was between moderateand strong. Some recurring patterns from previous stimulus types can befound. The first recurring pattern was T3-T6 (mid level and low level)confusion. In Figure 5.20a (homeland), [6,3] and [3,6] stood out astwo of the darkest non-diagonal cells (11.47% vs 10.00%). However, inFigure 5.20b (heritage), [6,3] was darker than [3,6] (21.76% vs 11.18%).This suggests that confusion took place in both directions among homelandspeakers, which was similar to what happened for Type 6B but differentfrom Types 5B and 3. However, among heritage speakers, T3-T6 confusiontook place in one direction more often than the other, which was similar towhat happened in Type 5B but different from Types 6B and 3. Thus far, thedirection of T3-T6 confusion has been inconsistent.Figure 5.20: Confusion patterns of homeland and heritage speakers forType 1 stimuliThe second recurring pattern was T2-T5 (high rising and low rising)confusion. In both Figure 5.20a and Figure 5.20b, the cells [5,2] and [2,5]show that confusion took place mostly in one direction (T5 heard as T2),181and this was true for both homeland and heritage speakers. Thus far,directions of T2-T5 confusion have been more consistent across stimulustypes than those of T3-T6 confusion.In the discussion of Types 6B and 5B, it was mentioned that T4-T6 (lowfalling and low level) confusion was unique to heritage speakers. This wastrue for patterns observed in Type 1 too. The cells [6,4] and [4,6] wereshaded in the heritage matrix in Figure 5.20b (11.18% and 13.24%) butnot so much in the homeland matrix in Figure 5.20a (3.53% and 0.59%).This suggests that heritage speakers had T4-T6 confusion across a variety ofstimulus types including both words and sentences.Type 2 (words with no tone) matrices were moderately correlated(r = .49, p = .015). Recall from Section 4.4.1.1 that the f0 of Type 2 wasreset to a uniform pitch (200 Hz) close to the talker’s T3 (mid level tone).These stimuli had “no tone” in the sense that the pitch height and pitchcontour of the intended tone, which are important perceptual correlatesof Cantonese tones, were made unavailable. Since the semantic contextwas not provided, the task was predicted to be impossible for homelandand heritage speakers, and so no clear diagonals were expected. As seenin Figure 5.21, there were indeed no clear diagonals for both groups ofparticipants, which met previous expectations. Although these two matricesmay look extremely similar to the human eye, none of the shaded cells werepaired like [6,3] and [3,6] in Figure 5.17, which explains why Type 2’sMantel statistic was lower than Type 6B’s.One obvious pattern in Type 2 (Figure 5.21) was that both groups ofspeakers identified a high proportion of the stimuli as T3 (mid level), asindicated by the darker shading in the whole T3 row. This indicates ageneral T3 bias, presumably due to the stimuli’s uniform pitch at 200 Hz,which was close to the f0 of this talker’s T3. The cell [3,3] showed thehighest accuracy as the pitch of the stimuli after manipulation was similar tothat of the intended tone. However, it was not true that the participants hada T3 bias for all Type 2 stimuli. In both the homeland and heritage matrices,the top row for T6 was somewhat shaded as well, which means the stimuliwere sometimes identified as T6 (low level). This was not surprising, given182Figure 5.21: Confusion patterns of homeland and heritage speakers forType 2 stimulithat both groups exhibited T3-T6 confusion in other stimulus types as well.The last observation was that the cell [4,4] in both matrices was shaded,which suggests that in some instances both groups of participants were ableto identify T4 (low falling) despite not having access to tonal information.This was possibly due to the availability of non-tonal cues such as occasionalcreaky voice in the stimuli. Overall, confusion often went in one direction(i.e. some tone was heard as T3), and so the two groups’ patterns weresimilar in this respect.The last three stimulus types to be discussed were all congruoussentences. Type 5A (normal, congruous sentences) had a Mantel r statisticof 0.33 (p = .078), which means they were moderately correlated5. Sincethis type of stimuli contained both acoustic and semantic information, it waspredicted to be the least challenging task for both homeland and heritage5Since the focus of this section is the evaluation of the range of the strength of thecorrelation across stimulus types, rather than an evaluation of whether any given typeshows a significant degree of correlation, the interpretation of the last three stimulus typesis included despite their relatively higher p-values.183speakers. In Figure 5.22, both matrices had a clear diagonal, which indicatesa high accuracy rate. In Figure 5.22a (homeland), all percentages printedon non-diagonal cells were so small that the cells were close to white,which indicates an almost perfect accuracy. As a result, any errors madeby heritage speakers would pull the two populations apart. The mostcommon error among heritage speakers was represented by the cell [5,2]in Figure 5.22b, which stood out as the darkest non-diagonal cell, while itscounterpart in Figure 5.22a (homeland) was not as dark. This shaded shellindicates that even when both acoustic and semantic cues were available,13.82% of the time heritage speakers identified T5 (low rising) as T2 (highrising). This direction of T2-T5 confusion was consistent with patternsobserved from the stimulus types above. In other words, heritage speakersexhibited T2-T5 confusion in this specific direction across both difficult (e.g.Type 6B) and easy stimulus types (e.g. Type 5A). As for homeland speakers,although they showed T2-T5 and T3-T6 confusion in Types 6B, 5B, 3, and1, they were able to perceive the intended tonal categories almost perfectlywhen acoustic and semantic cues were both available.Figure 5.22: Confusion patterns of homeland and heritage speakers forType 5A stimuli184Similar to those for Type 5A, confusion matrices for Type 6A (thelast word of the congruous sentence has no segments) had a Mantel rstatistic of 0.33 (p = .046), which indicates moderate correlation. Thehomeland matrices in Figure 5.23a and Figure 5.22a were very similar toeach other, in that most non-diagonal cells were almost white (except [3,6]in Figure 5.23a, which read 8.12%). Since the homeland group barelymade any errors, any errors made by heritage speakers would increase thedifference between the two groups. This was indeed what happened to theheritage matrix in Figure 5.23b: many cells such as [3,5], [5,2], [6,3], [6,4],and [6,5] were shaded (albeit relatively lightly compared with Figure 5.19),but their counterparts in “homeland” were close to white. Among theseshaded cells, [5,2] (10.88%) and [3,6] (10.88%) were relatively dark.They provided additional evidence for heritage speakers’ T3-T6 and T2-T5confusions.Figure 5.23: Confusion patterns of homeland and heritage speakers forType 6A stimuliThe least correlated pair was Type 4 (sentences with no tone) inFigure 5.24 according to the Mantel test (r = .30, p = .057). Although185this was the least correlated pair among the eight stimulus types, it was stillmoderately correlated and thus quite similar to Types 5A and 6A in terms ofstrength of the correlation. Type 4 stimuli contained no tonal information,but the target word was embedded in a congruous sentence that offeredsemantic context, so it was predicted that the two populations would notdiffer significantly in terms of accuracy. In Figure 5.24 both matriceshad a dark diagonal, a sign of high accuracy rates (homeland 89.22%,heritage 82.25%). However, their error patterns were quite different: inFigure 5.24a (homeland) there was a very weak T3 bias as indicated bya lightly shaded T3 row. However, in Figure 5.24b (heritage), T1 wassometimes misidentified as other tones as indicated by a lightly shaded T1column.Figure 5.24: Confusion patterns of homeland and heritage speakers forType 4 stimuliThe weak T3 bias in the homeland group could be attributed to a fewsubjects who might have interpreted the instructions differently from themajority of participants. These subjects were also the outliers who scoredbelow 70% in Figure 5.3. Recall that Type 4 was a separate block on186its own, and the question asked of the participants was “What is the lastword of the sentence?” It was up to the participants to decide whether theyshould make use of semantic cues, or ignore them and use acoustic cuesonly. Although the majority of homeland speakers used semantic cues fortone identification, a few participants used acoustic cues more often. Forexample in Figure 5.25, Subject #320 often responded with T3 (mid level).This was why this figure looked more similar to Type 2 (Figure 5.21a) eventhough it belonged to a different type of stimuli. Interestingly (but notsurprisingly), no one from the heritage group did the same in a consistentmanner. A possible explanation is that if heritage speakers attended tosemantic cues more often, it might not occur to them that there was anotherway to interpret the instructions.Figure 5.25: Confusion patterns of Subject #320 for Type 4 stimuli5.3.4 Interim summaryIn this interim summary, the second research question of this dissertationis revisited: Do homeland and heritage speakers exhibit similar confusionpatterns with respect to lexical tone perception? Overall, results of the187Mantel test suggest that homeland and heritage speakers had fairly similarconfusion matrices. Although the strength of correlation varied dependingon the stimulus type, all r values fell within the range between “moderatelycorrelated” and “strongly correlated”.Homeland and heritage speakers showed confusion for overlappingsubsets of lexical tones. First, they both showed T2-T5 (high rising and lowrising) confusion. As discussed in Section 3.3.2, Cantonese in Hong Konghas been undergoing a T2-T5 merger in both production and perception.Results of the current study confirmed that heritage Cantonese in Canadais following a similar sound change trend in perception. In particular, T5was heard as T2 more often than T2 being heard as T5. This direction ofconfusion suggests that T5 is merging into T2.The second pattern shared by both groups was T3-T6 (mid level and lowlevel) confusion. However, unlike T2-T5, the direction of confusion for T3-T6 was inconsistent across stimulus types. For example, for both groups, T6was heard as T3 more often than the other direction for Type 5B (normal,incongruous sentences), but T3 was heard as T6 more often than the otherdirection for Type 3 (words with no segments).Although the types of confusion were similar, the frequency of confusiondiffered between the two groups. For every confused tone pair, heritagespeakers had a higher error percentage than homeland speakers, as shownin the matrices for Types 1, 5A, 5B, 6A, and 6B. This suggests that bothgroups perceived the same subset of tonal categories as similar to eachother, but these categories were perceived as even more similar for heritagespeakers.Another difference between the two groups was that T4-T6 (low fallingand low level) confusion was unique to heritage speakers. Recall fromChapter 3 that Cantonese-na¨ıve English speakers in Francis et al. (2008)also exhibited T4-T6 confusion, possibly due to influence from Englishintonational categories. Since English intonation has utterance-final “high”or “low” boundary tones (Liberman, 1975; Pierrehumbert, 1980), all lowtones in Cantonese (T4 low falling, T5 low rising, T6 low level) maybe perceptually assimilated into the “low” category. In terms of the188discrimination of T4 and T6, heritage Cantonese speakers in the currentstudy were similar to Cantonese-na¨ıve English speakers. However, they didnot share other tone discrimination patterns. In the current study, heritageCantonese speakers did not show a lot of T5-T6 confusion, which wascommon among Cantonese-na¨ıve English speakers in Francis et al. (2008)and Qin & Mok (2011).To conclude, the hypothesis that homeland and heritage speakersexhibit different confusion patterns with respect to lexical tone perceptionstands only from the quantitative perspective, but does not stand from thequalitative perspective.1895.4 Response to Research Question 3: Use ofacoustic and semantic cuesThis section addresses the third question regarding the type of informationused in word identification. The null and alternative hypotheses are listedas follows:Research Question 3:Do homeland and heritage speakers make use of the same typeof information when identifying a word from a tonally contrastiveset? In particular, are acoustic and semantic information equallyuseful?H0: There is no difference between homeland and heritagespeakers in terms of what information they use in toneidentification.H1: Homeland and heritage speakers use different informationin tone identification.Previously in Section 3.6, it was hypothesized that heritage speakerswould rely on semantic information more often than homeland speakers.Therefore, it was anticipated that H0 would be rejected in favour of H1.To test this hypothesis, the stimuli must contain both acoustic informationand semantic contexts, so that both types of cues were made available forparticipants to choose from. Therefore, monosyllabic stimuli (Types 1, 2,and 3) that did not have semantic contexts are not useful for testing thishypothesis.Congruous sentences (Types 5A and 6A) are not useful for testingthe hypothesis either, as acoustic and semantic information were not inconflict. Consider this situation: the stimulus was sap6 ji6 dim2 zung1hou2 soeng5 cong4 fan3 “at twelve o’clock you’d better go to bed and sleep”,and participants were asked whether the last word was fan1 “share”, fan2“powder”, fan3 “sleep”, or fan4 “tomb”. Participant A paid attention to thetone and picked a word with Tone 3, which is fan3 “sleep”, the correctanswer. Participant B, on the other hand, considered the semantic contentand picked a word that would make sense in this sentence, which was fan3190“sleep”, the correct answer. Even though the two participants used differentcues, they ended up with the same answer. In other words, it would not bepossible to conclude which particular cue was used, or whether both cueswere used simultaneously.Incongruous sentences (Types 5B and 6B) are most useful for testingthe hypothesis, as acoustic and semantic information were in conflictand participants would have to use only one of them. As mentioned inSection 4.4.2, participants were told that some sentences might not makesense, and all they had to do was to identify the last word that they heard.In other words, they were instructed to ignore semantic cues and focus onacoustic cues. The explicit instructions ensured that participants’ differentresponses would be a reflection of different word-identification strategies,but not a reflection of different ways to interpret the instructions. Considerthis incongruous sentence: the stimulus was sap6 ji6 dim2 zung1 hou2soeng5 cong4 fan2 “at twelve o’clock you’d better go to bed and powder”.Participant A, B, and C responded with fan2 “powder”, fan3 “sleep”, andfan4 “tomb” respectively. Only Participant A was correct. From theiranswers it can be concluded that Participant A made the decision by usingacoustic cues (especially tonal information since the choices were tonallycontrastive), Participant B relied on semantic information and picked a wordthat would make sense (despite being instructed to focus on acoustic cues),and Participant C used neither type of cue. Therefore, data from Types5B and 6B were able to tease apart how various cues were used in toneidentification. The rest of this section will focus on these two stimulus types.All responses to Type 5B (normal, incongruous sentences) and Type 6B(the last word of the incongruous sentence has no segments) stimuli weregrouped into three categories representing the kind of cues being used:“acoustic”, “semantic”, and “neither”. Counts and percentages are presentedin Table 5.27, and the same data are visualized in Figure 5.26. Pearson’schi-squared test of goodness-of-fit was performed to determine whether thedistributions of “acoustic”, “semantic”, and “neither” were similar betweenthe two populations for each of the two stimulus types separately. Resultsshow that homeland and heritage speakers had significantly different191Table 5.27: Counts and percentages of different cues used byhomeland and heritage speakers for Type 5B (normal,incongruous sentences) and Type 6B stimuli (the last word ofthe incongruous sentence has no segments)Type 5B Type 6Bhomeland heritage homeland heritageacoustic 5333 87.14% 4058 66.31% 5224 85.34% 3471 56.72%semantic 403 6.58% 1175 19.20% 402 6.57% 1596 26.08%neither 384 6.27% 887 14.49% 494 8.07% 1053 17.21%total 6120 100% 6120 100% 6120 100% 6120 100%Figure 5.26: Comparison of cues used by homeland and heritagespeakers for Type 5B and Type 6B stimuli192distributions of cues used for Type 5B, χ2(2) = 749.85, p < .001, as wellas for Type 6B, χ2(2) = 1268.94, p < .001. Two-sample binomial testsconfirmed that the two groups’ ratios for each type of cue and for eachstimulus type were significantly different: p < .001 for “acoustic”, p < .001for “semantic”, and p < .001 for “neither”.In Figure 5.26 homeland and heritage speakers seem to have similarerror patterns. In the “homeland” bars for both Type 5B and Type 6B, theratio of “semantic” (in black) to “neither” (in light grey) is around 50:50.In the “heritage” bar for Type 5B, the ratio of “semantic” to “neither” alsolooks close to 50:50. This raises the question of whether the two groups infact had similar error patterns: when they did make mistakes, 50% of thetime the error was due to the use of semantic cues, and 50% of the timethey made random errors, which means there was an equal chance for both.Would it be the case that these results were evidence for similarities ratherthan differences between the two groups?To answer this question, the two populations’ ratios of “semantic” to“neither” (i.e. black to light grey) were compared. Table 5.27 and Table 5.28were based on the same data, but the “acoustic” row was removed fromthe latter, and percentages were redistributed. For Type 5B, homelandspeakers used semantic cues in 403 out of 787 erroneous responses, whileheritage speakers used semantic cues in 1175 out of 2062 errors. A two-sample binomial test confirmed that 403/787 is significantly smaller than1175/2062, p = .001. The test was performed the other way round andconfirmed that 1175/2062 is significantly greater than 403/787, p < .001.As for Type 6B, the two populations’ ratios were significantly different aswell, p < .001 (both ways).To conclude, although heritage speakers were able to use acoustic cuesto achieve an accuracy rate above chance when incongruous sentences werepresented, they used semantic cues relatively more often than homelandspeakers did.193Table 5.28: Counts and percentages of two types of incorrect responsesfor Type 5B (normal, incongruous sentences) and Type 6B stimuli(the last word of the incongruous sentence has no segments)Type 5B Type 6Bhomeland heritage homeland heritagesemantic 403 51.21% 1175 56.98% 402 44.87% 1596 60.25%neither 384 48.79% 887 43.02% 494 55.13% 1053 39.75%total 787 100% 2062 100% 896 100 % 2649 100%194Chapter 6Discussion and conclusionThis final chapter summarizes major findings, discusses their implications,and concludes the dissertation.6.1 Summary of research findingsMajor findings of this dissertation are summarized below as answers to thethree research questions introduced in Chapter 1.Research Question 1:Do homeland and heritage speakers behave differently in terms of theirability to identify tonally contrastive words?Answer to Research Question 1:On average, homeland speakers outperformed heritage speakers in theword-identification task, but the magnitude of their difference was notstatic across stimulus types. The accuracy gap between the two populationsdepended on the variables being manipulated in the stimuli. First, whentonal information was the only type of information available (as in Type 3,monosyllabic words with no segments), the accuracy gap between the twogroups was the largest (80.54% for homeland and 49.41% for heritage;chance level was 25%). This suggests that homeland speakers had asignificantly greater ability to distinguish tonally contrastive words by solelyrelying on tonal information.195Second, when semantic context was available along with tonalinformation (as in Types 5A, 5B, 6A, and 6B, which were all sentences),the accuracy gap was smaller, although the difference between the twogroups was still significant. This suggests that semantic context was helpfulfor heritage speakers to distinguish tonally contrastive words, but thisadditional information was not enough to close the accuracy gap.Lastly, when tonal information was not available at all, the twopopulations’ accuracy rates were on par: they performed equally poorly(as in Type 2, monosyllabic words with no tone) or almost equally well(as in Type 4, sentences with no tone). This was crucial to show thatthe accuracy gap being observed in other stimulus types was not merelya reflection of heritage speakers’ lower Cantonese proficiency, but was dueto the two groups’ different abilities to make use of tonal information inword identification.Research Question 2:Do homeland and heritage speakers exhibit similar confusion patterns withrespect to lexical tone perception?Answer to Research Question 2:In general, confusion patterns of homeland and heritage speakers weresimilar. Although the levels of similarity varied across stimulus types, theyall fell within the range between “moderately correlated” and “stronglycorrelated”.Error patterns of both groups exhibited trends of tone merger for the tworising tones, namely T2 [25] (high rising) and T5 [23] (low rising). For bothhomeland and heritage speakers, T5 was mistaken for T2 more often thanT2 being mistaken as T5, which can be a sign that T5 is merging into theT2 category in perception. Although both groups exhibited confusion for T2and T5, heritage speakers had a higher error percentage for this tone pairthan homeland speakers, which suggests that heritage speakers are aheadof homeland speakers in this ongoing sound change.Apart from confusion between two rising tones, confusion between twolevel tones, namely T3 [33] (mid level) and T6 [22] (low level), was196also observed in both groups. Unlike the asymmetrical pattern of T2-T5confusion, no particular direction of T3-T6 confusion was predominantlymore frequent than the other across the board. Although both groupsshowed confusion for this tone pair, heritage speakers had a higher errorpercentage than homeland speakers.Lastly, confusion between T4 [21] (low falling) and T6 [22] (low level)was found among heritage speakers but it was very rare for homelandspeakers. This is a qualitative difference between the two groups.To sum up, there were more similarities than differences between thetwo groups’ confusion patterns. Their differences were more quantitativethan qualitative, in that both groups had confusion for overlapping subsetsof categories, but the heritage group showed a higher degree of confusioncompared with the homeland group.Research Question 3:Do homeland and heritage speakers make use of the same type ofinformation when identifying a word from a tonally contrastive set? Inparticular, are acoustic and semantic information equally useful?Answer to Research Question 3:Overall, homeland speakers were better at using acoustic informationcompared with heritage speakers, while heritage speakers had a relativelyhigher tendency to rely on semantic information compared with homelandspeakers. This was most clearly shown when the target word wassemantically incongruous with the carrier phrase (e.g. sap6 ji6 dim2 zung1hou2 soeng5 cong4 fan2 “at twelve o’clock you’d better go to bed andpowder”). Homeland speakers were significantly better at actively rejectingthe word that would make sense in that sentence (e.g. fan3 “sleep”), andselecting the actual word presented (e.g. fan2 “powder”) by attending totonal information. The fact that heritage speakers achieved accuracy rateswell above chance (66.31% for Type 5B and 56.72% for Type 6B; chancelevel was 25%) suggests that they were not completely unable to attendto acoustic information. Compared with homeland speakers, they maderelatively more errors due to relying on semantic information (e.g. chose197fan3 “sleep” as the answer). This is evidence that the two groups havedifferent listening strategies during word identification even when theywere told explicitly to focus on what they heard instead of what made sense.6.2 Discussion and implicationsThis section discusses how findings of the present study can addressunanswered questions in the existing literature identified in Chapter 2 andChapter 3.6.2.1 Sound change trends in heritage CantoneseIn Chapter 3 it was pointed out that as participants from Canada were atan average age of 20.78 years at the time of the experiment, their parentsvery likely had migrated from Hong Kong to Canada before the 2000s—thetime when T2-T5 merger started to be documented in the literature (Baueret al., 2003; Kej et al., 2002). Their migration to a geographically remotecountry across the Pacific Ocean conveniently allowed a natural experimentof sound change: will a merger-free variety of Cantonese be passed on totheir children who grew up in Canada?Results in Section 5.3 suggest that homeland and heritage speakershave similar tone merger trends in perception; moreover, heritage speakers’higher error percentage is a sign that they are ahead of homeland speakersin these trends. There are at least four possible reasons for this. First,although T2-T5 confusion started to catch the attention of researchers inthe 2000s, it is possible that the trend had actually started for a period oftime before it was documented in the literature. If the parents had mergedthe tones in production before migrating to Canada, it would follow thattheir children acquired the same variety of Cantonese with signs of soundchange.Another possible reason is that although Hong Kong and Canada aregeographically separated, socio-culturally there are strong ties betweenthe two regions. Heritage speakers can easily access Cantonese popularculture such as film and music from Hong Kong through the internet, and198they may also visit their family and relatives in Hong Kong regularly. Asa result, even if their parents do not merge the tones, they may haveacquired the sound change from other speakers residing in Hong Kong.The two reasons mentioned so far, however, are not adequate to explainall observations. If heritage speakers merely acquired the sound changefrom homeland speakers, the two groups would be expected to have noquantitative difference in accuracy for the tone pairs in question. Why didheritage speakers have a higher level of confusion?The third possible explanation is that tone merger was induced bylanguage-internal factors but not idiosyncratic factors unique to speakers inHong Kong. As such, sound change would take place regardless of externalfactors like the speakers’ geographic location. These language-internalfactors may include inherent characteristics of the tones in question, such asacoustic similarity, or information-theoretic properties of the tones, such asfunctional load. Tsui (2012) proposes that it takes both factors to motivatetone merger. According to his analysis, the pair T2-T5 has the smallestacoustic distance and carries the second lowest functional load among alltone pairs. The combination of small acoustic distance and low functionalload makes T2-T5 more susceptible to tone merger. If this is the case, twogeographically separated speaker communities could ultimately end up onthe same path towards merger, and it is possible for either community togo faster than the other. However, this still does not explain why heritagespeakers had moved further other than a result of chance.To account for their faster sound change trends, the reason must besomething that lies within the heritage speaker population, which led totheir lower sensitivity to the relevant tonal contrasts for word identification.Could this population-internal factor be the incomplete acquisition of tonalperception, or language attrition resulting from cross-language effects froma dominant language, namely English? This question will be addressedin the next subsection through a comparison among heritage speakers ofthe current study, Cantonese-learning children discussed in Section 3.2, andCantonese-na¨ıve English speakers discussed in Section 3.4.1996.2.2 Tonal perception and heritage bilingualismSeveral important questions raised in the literature review chapters shouldbe revisited. Previous works such as Benmamoun et al. (2013b), Montrul(2013), and Polinsky & Kagan (2007) comment that phonetics andphonology are the most stable domains in heritage speakers’ grammarcompared with morphology, syntax, and semantics. While this claim issupported by studies like Tees & Werker (1984), there are also cases likeCelata & Cancila (2010) where heritage speakers have lost sensitivity tocertain sound contrasts of their non-dominant L1 that are not phonemic intheir dominant L2. What is common among these studies is that the soundcontrasts in question are mostly consonants or vowels. What happens ifone’s non-dominant L1 has the contrastive dimension of lexical tone, butthe dominant L2 lacks such a dimension altogether? Can an intonationalsuprasegmental phonology (as in English) impact a tonal suprasegmentalphonology (as in Cantonese)?The statement that heritage speakers have lost sensitivity to certainsound contrasts presupposes that they once possessed such sensitivity. Whatif these contrasts were not acquired in the first place, and so they cannotreally be lost? In Ciocca & Lui (2003) and Wong & Leung (2018), thesix-year-olds residing in Hong Kong had not reached adult accuracy withrespect to tone identification. For children residing in Canada, if the onsetof schooling is around five years of age, the switch of dominant languagemay happen before they fully acquire all tonal contrasts. Extensive L2input and reduced L1 exposure together is a common cause of incompleteacquisition of L1 in immigrant communities (Levine, 2015; Montrul, 2008).If the tonal grammar of adult heritage speakers is a fossilized form of whatthey had acquired so far in early childhood, tone pairs with the lowestaccuracy for heritage speakers in the current study should match those inprevious studies on young children. As discussed in Section 5.3, heritagespeakers showed most confusion for T2-T5 and T3-T6. In Ciocca & Lui(2003) these were the last two tonal contrasts acquired in perception byCantonese-learning children in Hong Kong. From the matching results it can200be concluded that incomplete acquisition is a possible reason for heritagespeakers’ lower sensitivity to T2-T5 and T3-T6, which may also explainwhy they appear to have moved further than homeland speakers in soundchange trends. Further research on the longitudinal language developmentof Cantonese-learning children in Canada is necessary to provide empiricalevidence to support this hypothesis.To tap into potential cross-language effects from English, the linguisticbehaviour of heritage speakers in the present study should be comparedwith those of Cantonese-na¨ıve English speakers in Francis et al. (2008) andQin & Mok (2013). In their studies, English speakers found tones with asimilar average pitch height (T4-T5, T4-T6, T5-T6) harder to distinguishthan tones with the same direction of pitch change (T2-T5 and T3-T6). Inboth studies’ multidimensional scaling analyses, T5 [23] and T6 [22] arevery close in the perceptual space of English speakers. Authors of bothstudies discuss potential influences from English intonation, in which “high”or “low” boundary tones can be found at the edge of a phrase or sentence.T4 [21], T5 [23], and T6 [22] in Cantonese are therefore perceptuallyassimilated to the “low” category in English prosody. This is quite differentfrom error patterns in the current study. Although heritage speakers didexhibit T4-T6 confusion sometimes, they hardly made errors for T4-T5 orT5-T6. In other words, there was little evidence that heritage speakers’tonal perception was affected by intonational categories in English.Although incomplete acquisition of tonal contrasts may better accountfor heritage speakers’ specific confusion patterns, it still cannot explaintheir overall lower ability to distinguish tonally contrastive words by solelyrelying on tonal information in the acoustic signal, as shown in the bigaccuracy gap for Type 3 stimuli (monosyllabic words with no segments)compared with other stimulus types. If incomplete acquisition was thesole difference between homeland and heritage speakers, T2-T5 and T3-T6 would have been the most confused tone pairs across stimulus types. Inother words, the cells [2,5], [5,2], [3,6], and [6,3] would have been thedarkest non-diagonal cells in heritage speakers’ confusion matrices acrossthe board. However, in the actual results for Type 3, the confusion matrix201of heritage speakers had relatively evenly shaded non-diagonal cells (seeFigure 5.19), which suggests a global decline of accuracy regardless of thetone of the target word. At least two speculations can be made with regardto heritage speakers’ overall lower ability to rely on tonal information alonein the word-identification experiment.The first speculation pertains to different quantities of linguistic inputreceived by homeland and heritage speakers during their acquisition ofCantonese. In the current study, to ensure that both tested populations wereexposed to the same baseline variety of Cantonese, the screening processexcluded heritage speakers whose parents were also heritage speakers,and only included individuals who were born to parents originally fromHong Kong. This was an attempt to control the quality but not thequantity of the input. Homeland speakers grew up in an environmentwhere Cantonese is used in a variety of domains, including education andpublic communication. This environment allowed homeland speakers to beexposed to Cantonese spoken by a large number of talkers with both familiarand unfamiliar voices. However, heritage speakers grew up in an English-dominant environment, where the use of Cantonese is mostly restricted tothe family setting. The Cantonese input that they had received is mainlyfrom a limited number of family members with familiar voices. Thisdifference may affect the two groups’ abilities of perceptual normalizationfor Cantonese tones produced by an unfamiliar voice in the experiment. Ina study about talker familiarity effects on speech intelligibility, Nygaard &Pisoni (1998) found that sensitivity to talker-specific indexical informationin a voice facilitates the extraction of meaningful linguistic units from aspeech signal. Although there was only one talker in the current study andher voice was unfamiliar to all participants, homeland speakers had moreprior experience of extracting tonal categories from speech produced withunfamiliar voices. In other words, their exposure to Cantonese spoken bya large number of talkers may have allowed them to perceptually adapt totones produced with a novel voice more effectively. As a result, homelandspeakers may be less affected by talker familiarity effects. Heritage speakers,on the other hand, are used to processing Cantonese tones produced by202a fewer number of talkers. Therefore, their performance in the word-identification task may be more impacted by talker familiarity effects,especially when tone was the only available type of information in thespeech signal. Further research on heritage speakers’ perceptual flexibilityis necessary to confirm this speculation.The second speculation is cross-language effects from English by virtueof its lack of tone as a contrastive dimension. Although specific confusionpatterns of heritage speakers showed little evidence of effects from Englishintonational phonology, English as a non-tonal language may still leave aglobal impact on heritage speakers’ ability to make use of tonal informationin Cantonese word identification. After the onset of schooling, heritagespeakers had received regular and extensive English input, and Cantoneseinput was relatively reduced. To optimize the perceptual system for afrequently used non-tonal dominant language, reorganization of cognitiveresources and adjustment of listening strategies may occur. In Bruggeman(2016), Dutch emigrants in Australia ignored stress cues that are usefulfor processing Dutch but not useful for processing English. In Celata &Cancila (2010), English-dominant heritage speakers of Lucchese becameinsensitive to the singleton-geminate consonant distinction in Lucchese,because this cue is not useful for distinguishing English words. It is possiblethat heritage speakers of Cantonese in the current study have experienced asimilar switch of perception strategy from one that favours the processing ofCantonese lexical tones (as in Vancouver-based Cantonese-learning infantsin Yeung et al., 2013), to one that can improve the efficiency of processingEnglish. Further research on tonal perception by school-age Cantonese-learning children or teenagers in Canada will help to confirm whether thereis a switch of strategy and, if so, when it happens.If a re-allocation of cognitive resources in favour of a dominant L2has occurred, this will have interesting implications for the permanencevs. contingency debate in the language acquisition literature. It has beenestablished that perceptual narrowing happens early in life during a criticalperiod, when infants and children show a decline of sensitivity to non-nativephonetic contrasts (Maurer & Werker, 2014; Tees & Werker, 1984; Werker203& Hensch, 2015), and exhibit signs of a neural commitment to language-specific auditory patterns (Kuhl et al., 2006; Zhang et al., 2005). As aresult, the way individuals listen to language in general is tailored to their L1(Cutler, 2012). There is, however, not yet a consensus on the reversibility ofthis neural commitment to L1. On the one hand, the permanence hypothesisposits that resources dedicated to a specific language cannot be re-allocated,and therefore speech perception strategies established during infancy orearly childhood will be permanent throughout life (Benmamoun et al.,2013b). This hypothesis is supported by studies on phoneme discriminationby international adoptees, such as Oh et al. (2010) and Pierce, Klein,Chen, Delcenserie & Genesee (2014). On the other hand, the contingencyhypothesis posits that the persistence of linguistic knowledge depends oncontinuous input; neural commitments can be altered in favour of anotherlanguage when the exposure of L1 decreases or stops (Benmamoun et al.,2013b). This hypothesis is supported by the aforementioned studies onimmigrant communities (Bruggeman, 2016; Celata & Cancila, 2010).Empirical facts from the current study suggest that the two hypothesesare not necessarily mutually exclusive. Indeed, heritage Cantonese speakershave maintained a lexical tone system in their perception despite beingEnglish-dominant. In this regard, it may be true that the phonologicalknowledge of tone as a dimension for lexical contrast is persistent evenafter reduced exposure of Cantonese. However, this study has also shownthat heritage speakers were less good at using tonal cues compared withhomeland speakers. It could be that the process of mapping listening cuesto phonological categories in the lexical tone system was altered for efficientprocessing of a non-tonal dominant language. The change may not resultin complete abandonment of Cantonese-specific listening strategies, but itmay favour English and put Cantonese to a lower priority. Further studieson different aspects of heritage speakers’ perceptual grammar may providea more nuanced view on the permanence and contingency hypotheses.2046.2.3 Language pedagogy for heritage learners of CantoneseThis study also has implications for Cantonese language teaching, especiallyin the context of heritage language maintenance in North America, wherea growing number of universities offer Cantonese courses, such as theUniversity of British Columbia (Pai, 2016), New York University (NYUCollege of Arts and Science, 2018), the Ohio State University (OSU EastAsian Studies Center, 2014), and Stanford University (Stanford LanguageCenter, 2018). Existing works on Cantonese pedagogy mostly focus onteaching Cantonese as a foreign language (Lee, 2004; Pai, 2016). Asthe present study has shown, heritage speakers do not have the samelinguistic behaviour as homeland Cantonese speakers or Cantonese-na¨ıveEnglish speakers, and as a result they are expected to have different learningneeds in the language classroom. Below are a few take-home messages foreducators and curriculum developers.The case of heritage speakers challenges the traditional practice ofusing a monolingual native speaker norm as the standard for evaluatinglinguistic competence of any language user. In an increasingly bilingualor multilingual world, it is imperative to recognize that bilingual languageusers, as Grosjean (1989) puts it, are never two monolinguals living in oneperson. Therefore, it is unreasonable to expect or require that bilingualshave the exact same linguistic behaviour as monolingual native speakers.In Samuel & Larraza (2015), L1 Basque speakers who are fluent in Spanishfailed to reject non-words with deviated pronunciations half of the time ina picture-name matching task in Basque, even though they did very wellin an AXB discrimination task. The authors see it as perceptual adaptationto their linguistic environment, where Basque speakers often interact withpeople who speak Spanish-accented Basque. In this dissertation, heritageCantonese speakers in Canada were good at using semantic context as alistening cue but less good at using acoustic information. A comment byLynch (2003) can be a good explanation: the nature of L1 acquisition byheritage speakers is “dialogic, discursive and absolutely contextual from thebeginning” (p.11). Hence, it is no surprise that they are accustomed to205using top-down processing strategies to extract the message of a Cantoneseutterance in a discursive context. In both Samuel & Larraza (2015) andthe present study, low accuracy in one listening task does not alwaysimply overall perceptual “impairment”. Instead, bilinguals with a speciallanguage background may have different but not necessarily poorer languageprocessing strategies compared with prototypical native speakers.As for curriculum design, a heritage learner-oriented approach should bemeaning-focussed and communicative. Cummins (1979) made a distinctionbetween two pedagogical styles: cognitive/academic language proficiency,and interpersonal communication skills. The former is a more formal,grammar-based approach that emphasizes drills on conscious knowledge,such as pronunciation and rules. The latter emphasizes what learnerscan do using their language in a social setting in the real world. Inthis dissertation, heritage speakers of Cantonese definitely have toneas a contrastive dimension in their perceptual grammar, which meanstone is already part of their unconscious linguistic knowledge. Assuch, the cognitive/academic approach may not be suitable for heritagespeakers, especially when there are second language learners in the sameclassroom. As Krashen (2000) describes, non-heritage language learnersoften outperform heritage learners when linguistic competence is assessedas declarative knowledge. This could be psychologically devastating andtraumatizing for heritage speakers, who may not understand that they dopossess procedural knowledge of the language. Therefore, the interpersonalapproach that focuses on the communicative content is more suitable forheritage learners.This communicative style of pedagogy can be supplemented withlistening practice targeting at commonly confused tone pairs. Althoughheritage bilinguals should not be expected or required to behave exactlylike prototypical native speakers, it is undeniable that tonal confusion mayobstruct everyday communication. To strengthen heritage learners’ tonediscrimination ability, listening materials with multiple unfamiliar voicesmay allow learners to get used to extracting auditory cues from novelvoices and facilitate intertalker normalization in tonal processing. In206general, it is important to offer ample opportunities of re-exposure to theheritage language, which may serve as a triggering experience for linguisticknowledge that has not been accessed for an extended period of time.6.3 ConclusionsTo conclude, it is certain that heritage speakers’ perceptual grammar ofCantonese has tone as a contrastive dimension. In the word-identificationtask, heritage speakers’ accuracy was significantly above chance acrossstimulus types (except for Type 2, words with no tone, which was intendedto be an impossible task). This provides evidence that they were ableto maintain the contrastive dimension of tone after becoming English-dominant.Heritage speakers’ perceptual grammar of Cantonese is, however, notidentical to that of homeland speakers. On average, homeland speakersachieved a higher accuracy than heritage speakers, but the differencebetween the two groups varied among stimulus types, depending on whattype of information was available. Although tonal information alone was notvery useful for heritage speakers, semantic context did help to decrease theaccuracy gap. Two speculations were made with regard to heritage speakers’lower ability to use tonal cues, namely the challenge of perceiving pitchattributes from an unfamiliar voice due to limited Cantonese input, andthe change of listening strategy in favour of English, a non-tonal dominantlanguage.When it comes to specific confusion patterns, heritage speakers weremore similar to homeland speakers than to Cantonese-na¨ıve Englishspeakers. Both homeland and heritage speakers found tones with the samedirection of pitch change confusing, while Cantonese-na¨ıve English speakerstend to find tones with similar overall pitch heights confusing. Heritagespeakers’ higher level of confusion for tone pairs with the same direction ofpitch change indicates that they are ahead of homeland speakers in soundchange trends. Such difference may be owing to incomplete acquisition oftonal perception by heritage speakers, though further research is required207to confirm this.All in all, findings of the current study agree with previous research,in that heritage speakers have unique linguistic behaviour that is differentfrom homeland speakers. In the context of heritage language maintenancethrough language education, the uniqueness of this population implies thata heritage learner-specific curriculum will be more effective than a typicalcurriculum of teaching Cantonese as a foreign language.208ReferencesAli, D. (2015, January 14). Language proficiency on LinkedIn. Retrievedfrom http://www.linkedin.com/pulse/language-proficiency-linkedin-duaa-ali→ pages 29Amengual, M. (2017). Type of early bilingualism and its effect on theacoustic realization of allophonic variants: Early sequential andsimultaneous bilinguals. International Journal of Bilingualism, 1–17.doi:10.1177/1367006917741364 → pages 30Amengual Watson, M. (2013). An experimental approach to phonetictransfer in the production and perception of early Spanish-Catalanbilinguals. PhD thesis, The University of Texas at Austin. → pages 40Antoniou, M., Best, C. T., Tyler, M. D., & Kroos, C. (2010). Languagecontext elicits native-like stop voicing in early bilinguals’ productions inboth L1 and L2. Journal of Phonetics, 38(4), 640–653.doi:10.1016/j.wocn.2010.09.005 → pages 44, 49, 103Antoniou, M., Best, C. T., Tyler, M. D., & Kroos, C. (2011). Inter-languageinterference in VOT production by L2-dominant bilinguals: Asymmetriesin phonetic code-switching. Journal of Phonetics, 39(4), 558–570.doi:10.1016/j.wocn.2011.03.001 → pages 44, 49, 103Barnes, T. & Hutton, T. (2016). Dynamics of economic change in MetroVancouver: Networked economies and globalizing urban regions.Retrieved from http://www.mvprosperity.org/Documents/DynamicsofEconomicChangeinMetroVancouver.pdf → pages 4Baron, J. & Strawson, C. (1976). Use of orthographic and word-specificknowledge in reading words aloud. Journal of Experimental Psychology:Human Perception and Performance, 2(3), 386.doi:10.1037/0096-1523.2.3.386 → pages 21209Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effectsstructure for confirmatory hypothesis testing: Keep it maximal. Journalof Memory and Language, 68(3), 255–278.doi:10.1016/j.jml.2012.11.001 → pages 132Barrie, M. (2003). Contrast in Cantonese vowels. Toronto Working Papersin Linguistics, 20. → pages 14, 15Bates, D., Ma¨chler, M., Bolker, B., & Walker, S. (2015). Fitting linearmixed-effects models using lme4. Journal of Statistical Software, 67(1),1–48. doi:10.18637/jss.v067.i01 → pages 130Bauer, R. S. (1985). The expanding syllabary of Hong Kong Cantonese.Cahiers de Linguistique Asie Orientale, 14(1), 99–111.doi:10.1163/19606028 014 01-05 → pages 12, 15Bauer, R. S. (2016). The Hong Kong Cantonese language: Current featuresand future prospects. Global Chinese, 2(2), 115–161.doi:10.1515/glochi-2016-0007 → pages 12, 14Bauer, R. S. & Benedict, P. K. (1997). Modern Cantonese phonology. Berlin,Germany: Walter de Gruyter. → pages xi, 12, 13, 14, 15, 16, 17, 18, 19,21Bauer, R. S., Cheung, K.-H., & Cheung, P.-M. (2003). Variation and mergerof the rising tones in Hong Kong Cantonese. Language Variation andChange, 15(02), 211–225. doi:10.1017/S0954394503152039 → pages18, 59, 198Benmamoun, E., Montrul, S., & Polinsky, M. (2010). Prolegomena toheritage linguistics [white paper]. University of Illinois atUrbana-Champaign and Harvard University. Retrieved fromwww.nhlrc.ucla.edu/pdf/HL-whitepaper.pdf → pages 2Benmamoun, E., Montrul, S., & Polinsky, M. (2013a). Defining an “ideal”heritage speaker: Theoretical and methodological challenges reply topeer commentaries. Theoretical Linguistics, 39(3-4), 259–294.doi:10.1515/tl-2013-0018 → pages 25Benmamoun, E., Montrul, S., & Polinsky, M. (2013b). Heritage languagesand their speakers: Opportunities and challenges for linguistics.Theoretical Linguistics, 39(3-4), 129–181. doi:10.1515/tl-2013-0009 →pages 2, 42, 200, 204210Birdsong, D., Gertken, L. M., & Amengual, M. (2012). Bilingual LanguageProfile: An easy-to-use instrument to assess bilingualism. COERLL,University of Texas at Austin. → pages xxi, 30, 115, 117, 276Blicher, D. L., Diehl, R. L., & Cohen, L. B. (1990). Effects of syllableduration on the perception of the Mandarin Tone 2/Tone 3 distinction:Evidence of auditory enhancement. Journal of Phonetics. → pages 98Bloomfield, L. (1933). Language. London: George Allen & Unwin Ltd. →pages 28Boersma, P. (2002). Praat, a system for doing phonetics by computer. Glotinternational, 5(9/10), 341–345. Retrieved fromhttp://hdl.handle.net/11245/1.200596 → pages 97Bolton, K. (2011). Language policy and planning in Hong Kong: Colonialand post-colonial perspectives. Applied Linguistics Review, 2, 51–74.doi:10.1515/9783110239331.51 → pages 12, 36Boyle, J. (1997). Success and failure in learning Cantonese. LanguageLearning Journal, 16(1), 82–86. doi:10.1080/09571739785200341 →pages 6Brinton, D., Kagan, O., & Bauckus, S. (2008). Heritage language education:A new field emerging. New York, NY: Routledge. → pages 26Bruggeman, L. (2016). Nativeness, dominance, and the flexibility of listeningto spoken language. PhD thesis, Western Sydney University. → pages 47,203, 204Burnham, D., Ciocca, V., Lauw, C., Lau, S., & Stokes, S. (2000). Perceptionof visual information for Cantonese tones. Proceedings of the 8thAustralian International Conference on Speech Science and Technology,86–91. Retrived fromhttp://www.assta.org/sst/SST-00/cache/SST-00-Chapter4-p2.pdf →pages 56, 79, 81Burnham, D., Lau, S., Tam, H., & Schoknecht, C. (2001). Visualdiscrimination of Cantonese tone by tonal but non-Cantonese speakers,and by non-tonal language speakers. AVSP 2001 International Conferenceon Auditory-Visual Speech Processing, 155–160. Retrieved from https://www.isca-speech.org/archive open/archive papers/avsp01/av01 155.pdf→ pages 56, 61211Carreira, M. & Kagan, O. (2011). The results of the National HeritageLanguage Survey: Implications for teaching, curriculum design, andprofessional development. Foreign Language Annals, 44(1), 40–64.doi:10.1111/j.1944-9720.2010.01118.x → pages 100, 126Casillas, J. (2015). Production and perception of the/i/-/I/vowel contrast:The case of L2-dominant early learners of English. Phonetica, 72(2-3),182–205. doi:10.1159/000431101 → pages 30Celata, C. & Cancila, J. (2010). Phonological attrition and the perceptionof geminate consonants in the Lucchese community of San Francisco(CA). International Journal of Bilingualism, 14(2), 185–209.doi:10.1177/1367006910363058 → pages 45, 46, 49, 66, 200, 203, 204Chang, C. B. & Yao, Y. (2016). Toward an understanding of heritageprosody: Acoustic and perceptual properties of tone produced byheritage, native, and second language speakers of Mandarin. HeritageLanguage Journal, 13(2), 134–160. → pages 64Chang, C. B., Yao, Y., Haynes, E. F., & Rhodes, R. (2011). Production ofphonetic and phonological contrast by heritage speakers of Mandarin.The Journal of the Acoustical Society of America, 129(6), 3964–3980.doi:10.1121/1.3569736 → pages 42, 45, 49Chang, Y.-h. S., Yao, Y., & Huang, B. H. (2017). Effects of linguisticexperience on the perception of high-variability non-native tones. TheJournal of the Acoustical Society of America, 141(2), EL120–EL126.doi:10.1121/1.4976037 → pages 56, 61Chao, Y. R. (1947). Cantonese primer. Cambridge, MA: TheHarvard-Yenching Institute [by] Harvard University Press. → pages 9,10, 16, 17, 19Chau, W. (2011). The influence of Dutch on Cantonese in the Netherlands:Rural versus urban areas. Master’s thesis, Leiden University. → pages 9Cheng, S.-P. & Tang, S.-W. (2014). Languagehood of Cantonese: Arenewed front in an old debate. Open Journal of Modern Linguistics,4(03), 389–398. doi:10.4236/ojml.2014.43032 → pages 13Cheng, S.-P. & Tang, S.-W. (2016a). Cantonese. In S.-W. Chan (Ed.), TheRoutledge Encyclopedia of the Chinese Language (pp. 18–34). Milton Park,Abingdon, Oxon: Routledge. → pages 12, 13, 14212Cheng, S.-P. & Tang, S.-W. (2016b). Cantonese romanization. In S.-W.Chan (Ed.), The Routledge Encyclopedia of the Chinese Language (pp.35–50). Milton Park, Abingdon, Oxon: Routledge. → pages 20Cheong, E. & Lee, L. (2015). Cantonese: Passing. Short movie screened atthe Vancouver Asian Film Festival 2015. → pages 5Cheung, S. H.-n. (2007). 香港粵語語法的研究 Cantonese as spoken in HongKong (Revised Edition). The Chinese University of Hong Kong. → pages14, 18Chiong, R., Lau, W. T., Ng, W. M., Sun, Y., Wong, T. H., & Xie, H. (2017).Have you eaten yet? Investigating language and identity [online video].Retrieved August 5, 2017, fromhttps://www.youtube.com/watch?v=2C6IgDNZpDo → pages 5Choi, T.-M., Liu, S.-C., Pang, K.-M., & Chow, P.-S. (2008). Shoppingbehaviors of individual tourists from the Chinese Mainland to HongKong. Tourism Management, 29(4), 811–820.doi:10.1016/j.tourman.2007.07.009 → pages 21Chung, F. H.-K. & Leung, M.-T. (2008). Data analysis of Chinese charactersin primary school corpora of Hong Kong and mainland China:Preliminary theoretical interpretations. Clinical Linguistics & Phonetics,22(4-5), 379–389. doi:10.1080/02699200701776757 → pages 21Chung, L. M. V. (2009). Perception and production of Cantonese tones byThai and Filipino students in Hong Kong secondary school. Master’sthesis, The Chinese University of Hong Kong. → pages 61Ciocca, V. & Ip, V. W.-K. (2008). Development of tone perception and toneproduction in Cantonese-learning children aged 2 to 5 years. Proceedingsof the 9th Annual Conference of the International Speech CommunicationAssociation. Retrieved from https://www.isca-speech.org/archive/archive papers/interspeech 2008/i08 0623.pdf → pages 55Ciocca, V. & Lui, J. (2003). The development of the perception ofCantonese lexical tones. Journal of Multilingual CommunicationDisorders, 1(2), 141–147. doi:10.1080/1476967031000090971 →pages 18, 55, 56, 79, 81, 200City of Richmond. (2017). Languages hot facts. Retrieved fromhttps://www.richmond.ca/ shared/assets/Languages6251.pdf → pages 4213City of Vancouver. (2014). First peoples: A guide for newcomers. Retrievedfromhttps://vancouver.ca/files/cov/First-Peoples-A-Guide-for-Newcomers.pdf→ pages 5Clyne, M. & Kipp, S. (1997). Linguistic diversity in Australia. People andPlace, 5(3), 6. → pages 9Cohen, J. (1988). Statistical power analysis for the behavioral sciences. NewYork, NY: Lawrence Erlbaum Associates. → pages xiii, 141Connine, C. M., Mullennix, J., Shernoff, E., & Yelen, J. (1990). Wordfamiliarity and frequency in visual and auditory word recognition.Journal of Experimental Psychology: Learning, Memory, and Cognition,16(6), 1084–1096. doi:10.1037/0278-7393.16.6.1084 → pages 84Crissman, L. W. (2012). Digital language atlas of China. HarvardDataverse. doi:1902.1/18939 → pages 9Crosswhite, K. (2009). Praat script for adjusting intensity. RetrievedAugust 31, 2016, from http://phonetics.linguistics.ucla.edu/facilities/acoustic/adjust intensity whole file.txt → pages 98Crystal, D. (2012). English as a global language. Cambridge, England:Cambridge University Press. → pages 36Cummins, J. (1979). Cognitive/academic language proficiency, linguisticinterdependence, the optimum age question and some other matters.Working Papers on Bilingualism, 19, 121–129. → pages 206Cummins, J. (1992). Heritage language teaching in Canadian schools.Journal of Curriculum Studies, 24(3), 281–286.doi:10.1080/0022027920240306 → pages 26Cummins, J. (2005). A proposal for action: Strategies for recognizingheritage language competence as a learning resource within themainstream classroom. Modern Language Journal, 89(4), 585–592. →pages 26Cummins, J. & Danesi, M. (1990). Heritage languages: The development anddenial of Canada’s linguistic resources. Toronto, ON: James Lorimer &Company. → pages 2, 26214Cutler, A. (2012). Native listening: Language experience and the recognitionof spoken words. Cambridge, MA: MIT Press. → pages 204Cutler, C. L. (2000). O brave new words!: Native American loanwords incurrent English. Norman, OK: University of Oklahoma Press. → pages 1Darrow, B. (2017). LinkedIn claims half a billion users. Fortune. RetrievedApril 1, 2018, from http://fortune.com/2017/04/24/linkedin-users →pages 29De Geer, B. (1992). Internationally adopted children in communication: Adevelopmental study. Working Papers in Linguistics, 39, 1–200. Retrievedfrom http://journals.lub.lu.se/index.php/LWPL/issue/view/2526/388. →pages 47DeFrancis, J. (1986). The Chinese language: Fact and fantasy. Honolulu,HI: University of Hawai’i Press. → pages 13Dijkstra, T. (2005). Bilingual visual word recognition and lexical access. InJ. F. Kroll & A. M. B. De Groot (Eds.), Handbook of bilingualism:Psycholinguistic approaches (pp. 179–201). New York, NY: OxfordUniversity Press. → pages 36Dodson, C. J. (1981). A reappraisal of bilingual development andeducation: Some theoretical and practical considerations. In H. B.Beardsmore (Ed.), Elements of Bilingual Theory (pp. 14–27). Brussels:Vrije Universiteit Brussel. → pages 30Dornic, S. (1978). The bilingual’s performance: Language dominance,stress, and individual differences. In D. Gerver & H. Sinaiko (Eds.),Language Interpretation and Communication (pp. 259–271). New York,NY: Plenum Press. → pages 29Duff, P. (2008). Heritage language education in Canada. In D. Brinton,O. Kagan, & S. Bauckus (Eds.), Heritage language education: A new fieldemerging (pp. 70–91). New York, NY: Routledge. → pages 26Duff, P. A. & Li, D. (2009). Indigenous, minority, and heritage languageeducation in canada: Policies, contexts, and issues. Canadian ModernLanguage Review, 66(1), 1–8. doi::10.3138/cmlr.66.1.001 → pages 26Eilers, R. E., Wilson, W. R., & Moore, J. M. (1977). Developmental changesin speech discrimination in infants. Journal of Speech, Language, and215Hearing Research, 20(4), 766–780. doi:10.1044/jshr.2004.766 → pages54Elder, C. (2005). Evaluating the effectiveness of heritage languageeducation: What role for testing? International Journal of BilingualEducation and Bilingualism, 8(2-3), 196–212.doi:10.1080/13670050508668607 → pages 26Elder, C. (2009). Reconciling accountability and development needs inheritage language education: A communication challenge for theevaluation consultant. Language Teaching Research, 13(1), 15–33.doi:10.1177/1362168808095521 → pages 26Fishman, J. A. (2001). Three hundred-plus years of heritage languageeducation in the United States. In J. K. Peyton, D. A. Ranard, &S. McGinnis (Eds.), Heritage languages in America: Preserving a nationalresource (pp. 81–97). McHenry, Illinois: Center for Applied Linguisticsand Delta Systems Co. Inc. → pages 27Fishman, J. A. (2014). Three hundred-plus years of heritage languageeducation in the United States. In Handbook of heritage, community, andnative American languages in the United States (pp. 50–58). New York,NY: Routledge. → pages 26Flege, J. E. (1987). The production of “new” and “similar” phones in aforeign language: Evidence for the effect of equivalence classification.Journal of Phonetics, 15(1), 47–65. → pages 35, 36, 37, 49Flege, J. E. & Eefting, W. (1988). Imitation of a VOT continuum by nativespeakers of English and Spanish: Evidence for phonetic categoryformation. The Journal of the Acoustical Society of America, 83(2),729–740. doi:10.1121/1.396115 → pages 35FluidSurveys (2017). UBC FluidSurveys. Retrieved February 20, 2017,from http://survey.ubc.ca → pages 86, 92, 108Fok-Chan, Y.-Y. (1974). A perceptual study of tones in Cantonese. Universityof Hong Kong, Centre of Asian studies. → pages 57, 58, 79, 81, 98Fox, R. A. & Qi, Y.-Y. (1990). Context effects in the perception of lexicaltone. Journal of Chinese Linguistics, 18(2), 261–284. → pages 56216Francis, A. L., Ciocca, V., Ma, L., & Fenn, K. (2008). Perceptual learning ofCantonese lexical tones by tone and non-tone language speakers.Journal of Phonetics, 36(2), 268–294. doi:10.1016/j.wocn.2007.06.005→ pages 61, 62, 63, 81, 188, 189, 201Francis, A. L., Ciocca, V., & Ng, B. K. C. (2003). On the (non)categoricalperception of lexical tones. Perception & Psychophysics, 65(7),1029–1044. doi:10.3758/BF03194832 → pages 57, 58Francis, A. L., Ciocca, V., Wong, N. K. Y., Leung, W. H. Y., & Chu, P. C. Y.(2006). Extrinsic context affects perceptual normalization of lexicaltone. The Journal of the Acoustical Society of America, 119(3),1712–1726. doi:10.1121/1.2149768 → pages 56Frank, M. C., Braginsky, M., Yurovsky, D., & Marchman, V. A. (2017).Wordbank: An open repository for developmental vocabulary data.Journal of Child Language, 44(3), 677–694.doi:10.1017/S0305000916000209 → pages 82Fulcher, G. (2014). Testing second language speaking. Milton Park,Abingdon, Oxon: Routledge. → pages 29Fung, R. S. & Wong, C. S. (2011). Acoustic analysis of the new rising tonein Hong Kong Cantonese. Proceedings of the 17th International Congressof Phonetic Sciences, 716–718. Retrieved fromhttps://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2011/OnlineProceedings/RegularSession/Fung/Fung.pdf → pages18, 59Fung, R. S., Wong, C. S., & Law, S. (2011). The mechanism of rising tonemerger in Hong Kong Cantonese: An acoustic approach. Phonetics &Phonology In Iberia. Retrieved from http://hdl.handle.net/10722/136295→ pages 59, 121Fung, R. S.-Y. (2000). Final particles in Standard Cantonese: Semanticextension and pragmatic inference. PhD thesis, The Ohio State University.→ pages 12Gandour, J. (1981). Perceptual dimensions of tone: Evidence fromCantonese. Journal of Chinese Linguistics, 20–36. Retrieved fromhttps://www.jstor.org/stable/23753516 → pages 57, 58217Gandour, J. T. (1978). The perception of tone. In V. A. Fromkin (Ed.),Tone: A linguistic review (pp. 41–76). New York, NY: Academic Press. →pages 52Gandour, J. T. & Krishnan, A. (2015). Processing tone languages. InG. Hickok & S. Small (Eds.), Neurobiology of language (pp. 1095–1107).London, England: Academic Press. doi:10.1016/C2011-0-07351-9 →pages 52Garc´ıa, O. (2005). Positioning heritage languages in the United States. TheModern Language Journal, 89(4), 601–605. Retrieved fromhttps://www.jstor.org/stable/3588631 → pages 26Gauthier, K. & Genesee, F. (2011). Language development ininternationally adopted children: A special case of early second languagelearning. Child Development, 82(3), 887–901.doi:10.1111/j.1467-8624.2011.01578.x → pages 47Gernsbacher, M. A. (1984). Resolving 20 years of inconsistent interactionsbetween lexical familiarity and orthography, concreteness, and polysemy.Journal of Experimental Psychology: General, 113(2), 256–281.doi:10.1037/0096-3445.113.2.256 → pages xii, 84Gertken, L. M., Amengual, M., & Birdsong, D. (2014). Assessing languagedominance with the Bilingual Language Profile. In P. Leclercq,A. Edmonds, & H. Hilton (Eds.), Measuring L2 proficiency: Perspectivesfrom SLA (pp. 208–225). Multilingual Matters. → pages 29, 30, 31, 121Gollan, T. H., Weissberger, G. H., Runnqvist, E., Montoya, R. I., & Cera,C. M. (2012). Self-ratings of spoken language dominance: AMultilingual Naming Test (MINT) and preliminary norms for young andaging Spanish–English bilinguals. Bilingualism: Language and Cognition,15(3), 594–615. doi:10.1017/S1366728911000332 → pages 118Government of Canada. (1991). Canadian Heritage Languages InstituteAct. Retrieved fromhttp://laws-lois.justice.gc.ca/eng/acts/C-17.6/20050401/P1TT3xt3.html→ pages 27Government of Manitoba. (2018). Policy for heritage language instruction.Retrieved March 20, 2018, fromhttp://www.edu.gov.mb.ca/k12/docs/policy/heritage/index.html → pages27218Grosjean, F. (1982). Life with two languages: An introduction tobilingualism. Cambridge, MA: Harvard University Press. → pages 28Grosjean, F. (1989). Neurolinguists, beware! The bilingual is not twomonolinguals in one person. Brain and Language, 36(1), 3–15.doi:10.1016/0093-934X(89)90048-5 → pages 36, 205Grosjean, F. (1998). Studying bilinguals: Methodological and conceptualissues. Bilingualism: Language and Cognition, 1(2), 131–149.doi:10.1017/S136672899800025X → pages 29, 30Grosjean, F. (2001). The bilingual’s language modes. In J. L. Nicol (Ed.),One mind, two languages: Bilingual Language Processing (pp. 1–22).Oxford, England: Blackwell. → pages 34Gu, W. & Lee, T. (2007). Effects of tonal context and focus on Cantonesef0. Proceedings of the 16th International Congress of Phonetic Sciences,1033–1036. Retrieved fromhttp://www.icphs2007.de/conference/Papers/1689/1689.pdf → pages 56Gui, M. C. (2005). The phonology of Guangzhou Cantonese. Munich,Germany: Lincom Europa. → pages 12Harris, C. L., Gleason, J. B., & Aycicegi, A. (2006). When is a first languagemore emotional? Psychophysiological evidence from bilingual speakers.In A. Pavlenko (Ed.), Bilingual education and bilingualism (pp. 257–283).Multilingual Matters. → pages 30, 31Hashimoto, O.-k. Y. (1972). Studies in Yue dialects 1: Phonology ofCantonese. New York, NY: Cambridge University Press. → pages 9, 10, 14Holm, J. A. (1989). Pidgins and creoles: Volume 2, Reference survey.Cambridge, England: Cambridge University Press. → pages 1Hornberger, N. H. (2005). Heritage/community language education: USand Australian perspectives. International Journal of Bilingual Educationand Bilingualism, 8(2-3), 101–108. doi:10.1080/13670050508668599→ pages 26Hornberger, N. H. & Wang, S. C. (2008). Who are our heritage languagelearners? Identity and biliteracy in heritage language education in theUnited States. In D. Brinton, O. Kagan, & S. Bauckus (Eds.), Heritagelanguage education: A new field emerging (pp. 3–35). New York, NY:Routledge. → pages 26219Hsiar, O. Y. (2007). Phonological elision in Malaysian Cantonese casualspeech. Master’s thesis, National University of Singapore, Singapore. →pages 119Iacoponi, L. (2012). Synchronic and diachronic variation of Cantonesetone change in Optimality Theory. Master’s thesis, Universita` degli Studidi Pisa, Italy. → pages 11Kagan, O. & Dillon, K. (2001). A new perspective on teaching Russian:Focus on the heritage learner. The Slavic and East European Journal,45(3), 507–518. doi:10.2307/3086367 → pages 2Kao, D. L. (1971). Structure of the syllable in Cantonese. The Hague:Mouton & Co. → pages 16Kej, J., Smyth, V., So, L. K., Lau, C., & Capell, K. (2002). Assessing theaccuracy of production of Cantonese lexical tones: a comparisonbetween perceptual judgement and an instrumental measure. Asia PacificJournal of Speech, Language and Hearing, 7(1), 25–38.doi:10.1179/136132802805576535 → pages 59, 198Kelleher, A. (2010). What is a heritage language? Heritage Briefs, 1–3.Retrieved fromhttp://www.cal.org/heritage/pdfs/briefs/What-is-a-Heritage-Language.pdf→ pages 2, 27Khouw, E. & Ciocca, V. (2007). Perceptual correlates of Cantonese tones.Journal of Phonetics, 35(1), 104–117. doi:10.1016/j.wocn.2005.10.003→ pages 57, 58, 81Krashen, S. (2000). Bilingual education, the acquisition of English, and theretention and loss of Spanish. In A. Roca (Ed.), Research on Spanish inthe US: Linguistic Issues and Challenges (pp. 432–444). Somerville, MA:Cascadilla Press. → pages 206Kuhl, P. K., Stevens, E., Hayashi, A., Deguchi, T., Kiritani, S., & Iverson, P.(2006). Infants show a facilitation effect for native language phoneticperception between 6 and 12 months. Developmental Science, 9(2),F13–F21. doi:10.1111/j.1467-7687.2006.00468.x → pages 204Kung, C., Chwilla, D. J., & Schriefers, H. (2014). The interaction of lexicaltone, intonation and semantic context in on-line spoken wordrecognition: An ERP study on Cantonese Chinese. Neuropsychologia, 53,293–309. doi:10.1016/j.neuropsychologia.2013.11.020 → pages 81220Kwok, B.-C., Chin, A. C., & Tsou, B. K. (2016). Grammatical diversityacross the Yue dialects. Journal of Chinese Linguistics, 44(1), 109–152.doi:10.1353/jcl.2016.0002 → pages 10La Heij, W. (2005). Selection processes in monolingual and bilinguallexical access. In J. F. Kroll & A. M. B. De Groot (Eds.), Handbook ofbilingualism: Psycholinguistic approaches (pp. 289–307). New York, NY:Oxford University Press. → pages 36Ladd, D. R. (1984). Declination: A review and some hypotheses.Phonology, 1, 53–74. doi:10.1017/S0952675700000294 → pages 17Lai, C., Li, Z., & Gong, Y. (2016). Teacher agency and professional learningin cross-cultural teaching contexts: Accounts of Chinese teachers frominternational schools in Hong Kong. Teaching and Teacher Education, 54,12–21. doi:10.1016/j.tate.2015.11.007 → pages 118Lam, Z. W.-M., Hall, K. C., & Pulleyblank, D. (2016). Temporal location ofperceptual cues for Cantonese tone identification. In The 3rd Workshopon Innovations in Cantonese Linguistics (WICL-3), Columbus, OH. TheOhio State University. → pages 58, 81Law, I. K.-Y., Ma, E. P.-M., & Yiu, E. M.-L. (2009). Speech intelligibility,acceptability, and communication-related quality of life in Chinesealaryngeal speakers. Archives of Otolaryngology–Head & Neck Surgery,135(7), 704–711. doi:10.1001/archoto.2009.71 → pages 74Law, S.-P., Fung, R. S.-Y., & Bauer, R. S. (2001). Perception and productionof Cantonese consonant endings. Asia Pacific Journal of Speech, Languageand Hearing, 6(3), 179–195. doi:10.1179/136132801805576590 →pages 14Law, S.-P., Fung, R. S.-Y., & Kung, C. (2013). An ERP study of goodproduction vis-a`-vis poor perception of tones in Cantonese: Implicationsfor top-down speech processing. PLoS One, 8(1), e54396.doi:10.1371/journal.pone.0054396 → pages 56Lee, G. M. (1993). Comparative, diachronic and experimental perspectives onthe interaction between tone and vowel in Standard Cantonese. PhD thesis,The Ohio State University. → pages 12Lee, K. Y., Chan, K. T., Lam, J. H., Van Hasselt, C., & Tong, M. C. (2015).Lexical tone perception in native speakers of Cantonese. International221Journal of Speech-Language Pathology, 17(1), 53–62.doi:10.3109/17549507.2014.898096 → pages 57Lee, S. L. (2004). History and current trends of teaching Cantonese as aforeign Language: Investigating approaches to teaching and learningCantonese. EdD thesis, University of Leicester. → pages 205Lee, T., Lau, W., Wong, Y. W., & Ching, P. (2002). Using tone informationin Cantonese continuous speech recognition. ACM Transactions on AsianLanguage Information Processing, 1(1), 83–102.doi:10.1145/595576.595581 → pages 56Lee, Y.-S., Vakoch, D. A., & Wurm, L. H. (1996). Tone perception inCantonese and Mandarin: A cross-linguistic comparison. Journal ofPsycholinguistic Research, 25(5), 527–542. doi:10.1007/BF01758181 →pages 61Lee, Y. S. K., Chiu, S. N., & van Hasselt, C. A. (2002). Tone perceptionability of Cantonese-speaking children. Language and Speech, 45(4),387–406. doi:10.1177/00238309020450040401 → pages 54Lei, M. K.-Y. (2007). Discrimination of level tones in Cantonese-learninginfants. Proceedings of the 16th International Congress of PhoneticSciences, 1313–1316. Retrieved fromhttp://www.icphs2007.de/conference/Papers/1620/1620.pdf → pages 54Levine, G. S. (2015). Incomplete L1 acquisition in the immigrant situation:Yiddish in the United States. Max Niemeyer Verlag Tu¨bingen. → pages200Lewis, M. P., Simons, G. F., & Fennig, C. D. (2009). Ethnologue: Languagesof the world, volume 16. SIL international. → pages 9Li, P. S. (2005). The rise and fall of Chinese immigration to Canada:Newcomers from Hong Kong Special Administrative Region of China andMainland china, 1980–2002. International Migration, 43(3), 9–34.doi:10.1111/j.1468-2435.2005.00324.x → pages 2Li, W. (2016, September 20). New spin on Chinese school focuses onChinatown’s Cantonese conversations. Metronews Vancouver. RetrievedAugust 5, 2017, from http://www.metronews.ca/news/vancouver/2016/09/20/chinese-school-in-chinatown-focuses-on-survival-cantonese.html →pages 5222Li, Y., Lee, T., & Qian, Y. (2002). Acoustical f0 analysis of continuousCantonese speech. Proceedings of the International Symposium on ChineseSpoken Language Processing. Retrieved from https://www.isca-speech.org/archive open/archive papers/iscslp2002/clp2 072.pdf → pages 17, 56,57, 91Liberman, M. Y. (1975). The intonational system of English. PhD thesis,Massachusetts Institute of Technology. → pages 63, 188Lieberman, P. (1960). Some acoustic correlates of word stress in AmericanEnglish. The Journal of the Acoustical Society of America, 32(4), 451–454.doi:10.1121/1.1908095 → pages 63Lieberman, P. (1966). Intonation, perception, and language. PhD thesis,Massachusetts Institute of Technology. → pages 17Linck, J. A., Kroll, J. F., & Sunderman, G. (2009). Losing access to thenative language while immersed in a second language: Evidence for therole of inhibition in second-language learning. Psychological Science,20(12), 1507–1515. Retrieved fromhttps://www.jstor.org/stable/40575218 → pages 36, 49Luke, K.-K. & Wong, M. L. (2015). The Hong Kong Cantonese Corpus:Design and uses. Journal of Chinese Linguistics, 25(2015), 309–330. →pages xii, 84, 85Lynch, A. (2003). The relationship between second and heritage languageacquisition: Notes on research and theory building. Retrieved July 1,2018, from http://international.ucla.edu/institute/article/3615 → pages205Ma, J. K.-Y., Ciocca, V., & Whitehill, T. L. (2006). Effect of intonation onCantonese lexical tones. The Journal of the Acoustical Society of America,120(6), 3978–3987. doi10.1121/1.2363927 → pages 56Ma, J. K.-Y., Ciocca, V., & Whitehill, T. L. (2011). The perception ofintonation questions and statements in Cantonese. The Journal of theAcoustical Society of America, 129(2), 1012–1023.doi:10.1121/1.3531840 → pages 36, 56Macnamara, J. (1967). The bilingual’s linguistic performance—Apsychological overview. Journal of Social Issues, 23(2), 58–77.doi:10.1111/j.1540-4560.1967.tb00576.x → pages 30223Mair, V. H. (1991). What is a Chinese “dialect/topolect”?: Reflections onsome key Sino-English linguistic terms. Sino-Platonic Papers, 29, 2–30.Retrieved fromhttp://sino-platonic.org/complete/spp029 chinese dialect.pdf → pages 13Mantel, N. (1967). The detection of disease clustering and a generalizedregression approach. Cancer Research, 27(2 Part 1), 209–220. → pagesxiii, 161, 164, 167Marian, V. & Kaushanskaya, M. (2004). Self-construal and emotion inbicultural bilinguals. Journal of Memory and Language, 51(2), 190–201.doi:10.1016/j.jml.2004.04.003 → pages 31Matthews, S. & Yip, V. (2013). Cantonese: A comprehensive grammar.Milton Park, Abingdon, Oxon: Routledge. → pages 13, 15Maurer, D. & Werker, J. F. (2014). Perceptual narrowing during infancy: Acomparison of language and faces. Developmental Psychobiology, 56(2),154–178. doi:10.1002/dev.21177 → pages 203McCutchen, D. & Perfetti, C. A. (1982). The visual tongue-twister effect:Phonological activation in silent reading. Journal of Verbal Learning andVerbal Behavior, 21(6), 672–687. → pages 102Ming, T. & Tao, H. (2008). Developing a Chinese heritage language corpus:Issues and a preliminary report. In A. W. He & Y. Xiao (Eds.), Chinese asa heritage language: Fostering rooted world citizenry (pp. 167–188).Honolulu, HI: University of Hawai’i, National Foreign Language ResourceCenter Honolulu. → pages 42Mok, P. P.-K. & Wong, P. W.-Y. (2010). Perception of the merging tones inHong Kong Cantonese: Preliminary data on monosyllables. Proceedingsof the 5th International Conference on Speech Prosody. Retrieved fromhttps://www.isca-speech.org/archive/sp2010/papers/sp10 916.pdf →pages 58, 60Mok, P. P.-K., Zuo, D., & Wong, P. W.-Y. (2013). Production and perceptionof a sound change in progress: Tone merging in Hong Kong Cantonese.Language Variation and Change, 25(3), 341–370.doi:10.1017/S0954394513000161 → pages 18, 59, 60, 65Montrul, S. (2009). Knowledge of tense-aspect and mood in Spanishheritage speakers. International Journal of Bilingualism, 13(2), 239–269.doi:10.1177/1367006909339816 → pages 42224Montrul, S. (2013). Bilingualism and the heritage language speaker. InT. K. Bhatia & W. C. Ritchie (Eds.), The handbook of bilingualism (pp.168–189). Malden, MA: Blackwell. → pages 25, 42, 200Montrul, S. A. (2008). Incomplete Acquisition in Bilingualism: Re-examiningthe age factor. Amsterdam: John Benjamins. → pages 200Montrul, S. A. (2012). Is the heritage language like a second language?Eurosla Yearbook, 12(1), 1–29. doi:10.1075/eurosla.12.03mon → pagesxv, 7Myers-Scotton, C. (2005). Multiple voices: An introduction to bilingualism.Malden, MA: Blackwell. → pages 29Nagy, N. (2009). Heritage language variation and change. RetrievedAugust, 31, 2017, from http://projects.chass.utoronto.ca/ngn/HLVC →pages 41Nagy, N. (2015). A sociolinguistic view of null subjects and VOT in Torontoheritage languages. Lingua, 164, 309–327.doi:10.1016/j.lingua.2014.04.012 → pages 2, 41Newman, R. L. & Connolly, J. F. (2004). Determining the role of phonologyin silent reading using event-related brain potentials. Cognitive BrainResearch, 21(1), 94–105. doi:10.1016/j.cogbrainres.2004.05.006 →pages 102Nygaard, L. C. & Pisoni, D. B. (1998). Talker-specific learning in speechperception. Perception & Psychophysics, 60(3), 355–376.doi:10.3758/BF03206860 → pages 202NYU College of Arts and Science. (2018). Language courses. RetrievedJuly 1, 2018, fromhttps://as.nyu.edu/sca/current-students/language-courses.html → pages205Oh, J. S., Au, T. K.-f., & Jun, S.-A. (2010). Early childhood languagememory in the speech perception of international adoptees. Journal ofChild Language, 37(05), 1123–1132. doi:10.1017/S0305000909990286→ pages 48, 49, 204Oh, J. S., Jun, S.-A., Knightly, L. M., & Au, T. K.-f. (2003). Holding on tochildhood language memory. Cognition, 86(3), B53–B64.doi:10.1016/S0010-0277(02)00175-0 → pages 46, 49225Oksanen, J., Blanchet, F. G., Friendly, M., Kindt, R., Legendre, P., McGlinn,D., Minchin, P. R., O’Hara, R. B., Simpson, G. L., Solymos, P., Henry, M.,Stevens, H., Szoecs, E., & Wagner, H. (2018). vegan: Communityecology package. Retrieved fromhttps//CRAN.R-project.org/package=vegan → pages 167Olson, J. S. (1998). An ethnohistorical dictionary of China. Westport, CT:Greenwood Publishing Group. → pages 13Ontario Ministry of Education. (1991). Heritage languages: Kindergarten toGrade 8. Toronto, ON: Queen’s Printer. → pages 26Ormsby, M. A. (1958). British Columbia: A history. Toronto, ON:Macmillan. → pages 4OSU East Asian Studies Center. (2014). Focus on Cantonese: OSU’sCantonese program grows into language, area studies courses. RetrievedJuly 1, 2018, from https://cpb-us-w2.wpmucdn.com/u.osu.edu/dist/7/1615/files/2014/05/EASC-expOSUre-Sp2014-pp4-5-2hpy01c.pdf → pages205Ou, J. (2012). Tone merger in Guangzhou Cantonese. Master’s thesis, TheHong Kong Polytechnic University, Hong Kong. → pages 119Pai, R. (2016). Cantonese as a foreign language (CFL) curriculum designbased on learner needs in North America. In The 3rd Workshop onInnovations in Cantonese Linguistics (WICL-3), Columbus, OH. The OhioState University. → pages 205Peng, G. & Wang, W. S.-Y. (2005). Tone recognition of continuousCantonese speech based on support vector machines. SpeechCommunication, 45(1), 49–62. doi:10.1016/j.specom.2004.09.004 →pages 56Perfetti, C. A., Bell, L. C., & Delaney, S. M. (1988). Automatic (prelexical)phonetic activation in silent word reading: Evidence from backwardmasking. Journal of Memory and Language, 27(1), 59–70.doi:10.1016/0749-596X(88)90048-4 → pages 102Peyton, J. K., Ranard, D. A., & McGinnis, S. (2001). Charting a new course:Heritage language education in the United States. In J. K. Peyton, D. A.Ranard, & S. McGinnis (Eds.), Heritage languages in America: Preservinga national resource (pp. 3–26). McHenry, Illinois: Center for AppliedLinguistics and Delta Systems Co. Inc. → pages 2, 26, 27226Pierce, L. J., Klein, D., Chen, J.-K., Delcenserie, A., & Genesee, F. (2014).Mapping the unconscious maintenance of a lost first language.Proceedings of the National Academy of Sciences of the United States ofAmerica, 111(48), 17314–17319. doi:10.1073/pnas.1409411111 →pages 204Pierrehumbert, J. B. (1980). The phonology and phonetics of Englishintonation. PhD thesis, Massachusetts Institute of Technology. → pages63, 188Polinsky, M. (2008). Relative clauses in heritage Russian: Fossilization ordivergent grammar. Formal Approaches to Slavic Linguistics #17: The YaleMeeting 2008, 333–358. → pages 42Polinsky, M. & Kagan, O. (2007). Heritage languages: In the ‘wild’ and inthe classroom. Language and Linguistics Compass, 1(5), 368–395.doi:10.1111/j.1749-818X.2007.00022.x → pages 25, 42, 200Qin, Z. & Mok, P. P.-K. (2011). Perception of Cantonese tones by Mandarin,English and French speakers. Proceedings of the 17th InternationalCongress of Phonetic Sciences, 1654–1657. Retrieved from http://www.phonetics.ucla.edu/voiceproject/Publications/Shue-etal 2011 ICPhS.pdf →pages 61, 62, 63, 189Qin, Z. & Mok, P. P.-K. (2013). Discrimination of Cantonese tones byspeakers of tone and non-tone languages. Kansas Working Papers inLinguistics, 34. doi:10.17161/KWPL.1808.12864 → pages 61, 201R Core Team (2013). R: A Language and Environment for StatisticalComputing. Vienna, Austria: R Foundation for Statistical Computing. →pages 130, 167Rafat, Y., Mohaghegh, M., & Stevenson, R. (2017). Geminate attritionacross three generations of Farsi-English bilinguals living in Canada: Anacoustic study. Ilha do Desterro, 70(3), 151–168.doi:10.5007/2175-8026.2017v70n3p151 → pages 47Ramsey, S. R. (1987). The languages of China. Princeton, NJ: PrincetonUniversity Press. → pages 12Rosch, E. H. (1973). On the internal structure of perceptual and semanticcategories. In T. E. Moore (Ed.), Cognitive development and acquisition oflanguage (pp. 111–144). New York, NY: Academic Press.doi:10.1016/B978-0-12-505850-6.50010-4 → pages 32227Rothman, J. & Treffers-Daller, J. (2014). A prolegomenon to the constructof the native speaker: Heritage speaker bilinguals are natives too!Applied Linguistics, 35(1), 93–98. doi:10.1093/applin/amt049 → pages33Saadah, E. (2011). The production of Arabic vowels by English L2 learnersand heritage speakers of Arabic. PhD thesis, University of Illinois atUrbana-Champaign. → pages 43, 49Samuel, A. G. & Larraza, S. (2015). Does listening to non-native speechimpair speech perception? Journal of Memory and Language, 81, 51–71.doi:10.1016/j.jml.2015.01.003 → pages 38, 49, 205, 206Sawilowsky, S. S. (2009). New effect size rules of thumb. Journal ofModern Applied Statistical Methods, 8(2), 597–599. Retrieved fromhttps://digitalcommons.wayne.edu/coe tbf/4/ → pages xiii, 141Schmeißer, A., Hager, M., Gil, L. A., Jansen, V., Geveler, J., Eichler, N.,Patuto, M., & Mu¨ller, N. (2016). Related but different: The two conceptsof language dominance and language proficiency. In J. Treffers-Daller &C. Silva-Corvala´n (Eds.), Language dominance in bilinguals: Issues ofoperationalization and measurement (pp. 36–65). Cambridge, England:Cambridge University Press. → pages 31, 123Schneider, E. W. (2007). Postcolonial English: Varieties around the world.New York, NY: Cambridge University Press. → pages 1Schneider, W., Eschman, A., & Zuccolotto, A. (2002). E-Prime: User’s guide.Psychology Software Incorporated. → pages 103Sebastia´n-Galle´s, N., Echeverr´ıa, S., & Bosch, L. (2005). The influence ofinitial exposure on lexical representation: Comparing early andsimultaneous bilinguals. Journal of Memory and Language, 52(2),240–255. doi:10.1016/j.jml.2004.11.001 → pages 40Shepard, R. N. (1972). Psychological representation of speech sounds. InE. David & P. Denes (Eds.), Human Communication: A Unified View (pp.67–113). New York, NY: McGraw-Hill. → pages 168, 171, 172, 173Sherkina-Lieber, M., Pe´rez-Leroux, A. T., & Johns, A. (2011). Grammarwithout speech production: The case of Labrador Inuttitut heritagereceptive bilinguals. Bilingualism: Language and Cognition, 14(3),301–317. doi:10.1017/S1366728910000210 → pages 42228Snow, D. (2004). Cantonese as written language: The growth of a writtenChinese vernacular. Hong Kong: Hong Kong University Press. → pages 22So, K. L. C. (2000). Tonal production and perception patterns of Canadianraised Cantonese speakers. Master’s thesis, Simon Fraser University,Burnaby, Canada. → pages 63, 64, 65, 66, 81Sona Systems Ltd. (2017). UBC Linguistics Sign-up System. RetrievedFebruary 20, 2017, from https://ubclinguistics.sona-systems.com → pages114Soo, R. & Monahan, P. J. (2017). Language exposure modulates the role oftone in perception and long-term memory: Evidence from Cantonesenative and heritage speakers. Proceedings of the 43rd Annual Meeting ofthe Berkeley Linguistics Society, 2, 47–54. Retrieved fromhttp://linguistics.berkeley.edu/bls/previous proceedings/bls43 2.pdf →pages 64, 65Stanford Language Center. (2018). Cantonese language program.Retrieved July 1, 2018, from https://cantonese.stanford.edu/courses →pages 205Statistics Canada. (2009). Top 10 countries of birth of recent immigrants,1981 to 2006. Retrieved February 13, 2017, from http://www12.statcan.ca/census-recensement/2006/as-sa/97-557/table/t1-eng.cfm → pages xi,2, 3Statistics Canada. (2012). Ethnic diversity of immigration. RetrievedFebruary 13, 2017, from http://www.statcan.gc.ca/pub/11-402-x/2012000/pdf/ethnic-ethnique-eng.pdf→ pages 2Statistics Canada. (2017a). 2016 census: Immigrant languages in Canada.Retrieved August 5, 2017, fromhttp://www.statcan.gc.ca/pub/11-627-m/11-627-m2017025-eng.htm →pages 2Statistics Canada. (2017b). Census in brief: Linguistic diversity andmultilingualism in Canadian homes. Retrieved September 1, 2017, fromhttp://www12.statcan.gc.ca/census-recensement/2016/as-sa/98-200-x/2016010/98-200-x2016010-eng.cfm → pages xi, 3, 4229Statistics Canada. (2017c). An increasingly diverse linguistic profile:Corrected data from the 2016 Census. Retrieved July 1, 2018, from https://www150.statcan.gc.ca/n1/daily-quotidien/170817/dq170817a-eng.htm→ pages xi, 3Statistics Canada. (2017d). Previous standard - Visible minority. RetrievedSeptember 1, 2017, fromhttps://www.statcan.gc.ca/eng/concepts/definitions/previous/preminority→ pages 4Statistics Canada. (2017e). Proportion of mother tongue responses forvarious regions in Canada, 2016 Census. Retrieved August 5, 2018, fromhttp://www12.statcan.gc.ca/census-recensement/2016/dp-pd/dv-vd/lang/index-eng.cfm → pages 1Sweet, S. A. & Grace-Martin, K. (1999). Data analysis with SPSS. Boston,MA: Allyn & Bacon. → pages 133Szeto, C. (2000). Testing intelligibility among Sinitic dialects. Proceedingsof ALS2K, the 2000 conference of the Australian Linguistic Society.Retrieved from http://www.als.asn.au/proceedings/als2000/szeto.pdf →pages 10Tan, C.-B. (2005). Chinese in Malaysia. In M. Ember, C. R. Ember, &I. Skoggard (Eds.), Encyclopedia of Diasporas: Immigrant and RefugeeCultures Around the World (pp. 697–706). New York, NY: Springer. →pages 9Tang, K. (2015). Naturalistic speech misperception. PhD thesis, UniversityCollege London. → pages xiii, 168, 174, 175Tang, S.-W. & Cheng, S.-P. (2014). Aspects of Cantonese grammar. InC.-T. J. Huang, Y.-H. A. Li, & A. Simpson (Eds.), The Handbook of ChineseLinguistics (pp. 599–628). West Sussex, England: Blackwell.doi:10.1002/9781118584552.ch23 → pages 13Tang, S.-W., Kwok, F., Lee, T. H.-T., Lun, C., Luke, K. K., Tung, P., &Cheung, K. H. (2002). Guide to LSHK Cantonese romanization ofChinese characters. Retrieved from https://www.lshk.org/jyutping →pages 20Tardif, T., Fletcher, P., Liang, W., & Kaciroti, N. (2009). Early vocabularydevelopment in Mandarin (Putonghua) and Cantonese. Journal of Child230Language, 36(5), 1115–1144. doi:10.1017/S0305000908009185 →pages 82Tees, R. C. & Werker, J. F. (1984). Perceptual flexibility: Maintenance orrecovery of the ability to discriminate non-native speech sounds.Canadian Journal of Psychology, 38(4), 579–590. doi:10.1037/h0080868→ pages 45, 46, 48, 49, 200, 203Thorndike, E. L. & Lorge, I. (1963). The Teacher’s Word Book of 30000Words. New York, NY: Teachers College, Columbia University. → pages84Titze, I. R. (1994). Principles of Voice Production. Englewood Cliffs, NJ:Prentice Hall. → pages 51To, Y.-m. & Lau, T.-y. (1995). Global export of Hong Kong television:Television Broadcasts Limited. Asian Journal of Communication, 5(2),108–121. doi:10.1080/01292989509364726 → pages 12Tong, X., Lee, S. M. K., Lee, M. M. L., & Burnham, D. (2015). A tale of twofeatures: Perception of Cantonese lexical tone and English lexical stressin Cantonese-English bilinguals. PloS one, 10(11), e0142896.doi:10.1371/journal.pone.0142896 → pages 57Tong, X., McBride, C., & Burnham, D. (2014). Cues for lexical toneperception in children: Acoustic correlates and phonetic context effects.Journal of Speech, Language, and Hearing Research, 57(5), 1589–1605.doi:10.1044/2014 JSLHR-S-13-0145 → pages 98Tsui, T.-H. (2012). Tonal variation in Hong Kong Cantonese acousticdistance and functional load. Proceedings from the Annual Meeting of theChicago Linguistic Society, 48(1), 579–588. → pages 199UBC Department of Linguistics. (2014). Linguistics Outside the Classroom(LOC). Retrieved February 20, 2017, from http://linguistics.ubc.ca/undergrad/current-students/linguistics-outside-the-classroom-loc → pages114Valde´s, G. (2001). Heritage language students: Profiles and possibilities. InPeyton, Joy Kreeft and Ranard, Donald A. and McGinnis, Scott (Ed.),Heritage languages in America: Preserving a national resource (pp.37–80). McHenry, Illinois: Center for Applied Linguistics and DeltaSystems Co. Inc. → pages xv, 25, 29, 30, 35, 40231Van Deusen-Scholl, N. (2003). Toward a definition of heritage language:Sociopolitical and pedagogical considerations. Journal of Language,Identity, and Education, 2(3), 211–230.doi:10.1207/S15327701JLIE0203 4 → pages 25Vance, T. J. (1976). An experimental investigation of tone and intonationin Cantonese. Phonetica, 33(5), 368–392. doi:10.1159/000259793 →pages 56, 98Vance, T. J. (1977). Tonal distinctions in Cantonese. Phonetica, 34(2),93–107. doi:10.1159/000259872 → pages 57, 58von Be´ke´sy, G. (1960). Experiments in hearing. New York, NY:McGraw-Hill. → pages 52Wei, L. & Lee, S. (2001). The use of Cantonese classifiers and quantifiersby young British-born Chinese in Tyneside. International Journal ofBilingual Education and Bilingualism, 14(6), 359–382. → pages 42Welsh Government and Welsh Language Commissioner. (2015). Nationalsurvey for Wales, 2013-14: Welsh language use survey. Retrieved fromhttp://www.comisiynyddygymraeg.cymru/English/Publications%20List/20150129%20DG%20S%20Welsh%20Language%20Use%20Survey%202013-14%20-%20Main%20report.pdf → pages 27Werker, J. F., Gilbert, J. H., Humphrey, K., & Tees, R. C. (1981).Developmental aspects of cross-language speech perception. ChildDevelopment, 349–355. Retrieved fromhttps://www.jstor.org/stable/1129249 → pages 46Werker, J. F. & Hensch, T. K. (2015). Critical periods in speech perception:New directions. Annual Review of Psychology, 66, 173–196.doi:0.1146/annurev-psych-010814-015104 → pages 203Wiley, T. G. (2001). On defining heritage languages and their speakers. InPeyton, Joy Kreeft and Ranard, Donald A. and McGinnis, Scott (Ed.),Heritage languages in America: Preserving a national resource (pp.29–36). McHenry, Illinois: Center for Applied Linguistics and DeltaSystems Co. Inc. → pages 25Witten, I. H. & Bell, T. C. (1991). The zero-frequency problem: Estimatingthe probabilities of novel events in adaptive text compression. IEEETransactions on Information Theory, 37(4), 1085–1094.doi:10.1109/18.87000 → pages 167, 168, 169, 170232Wolfram, W. & Schilling-Estes, N. (2003). Language change in“conservative” dialects: The case of past tense be in Southern enclavecommunities. American Speech, 78(2), 209–228. → pages 60Wong, C. S. P., Bauer, R. S., & Lam, Z. W. M. (2009). The integration ofEnglish loanwords in Hong Kong Cantonese. Journal of the SoutheastAsian Linguistics Society, 1, 251–266. Retrieved fromhttp://hdl.handle.net/10397/5824 → pages 12Wong, P. & Leung, C. T.-T. (2018). Suprasegmental features are notacquired early: Perception and production of monosyllabic Cantoneselexical tones in 4- to 6-year-old preschool children. Journal of Speech,Language, and Hearing Research, 61(5), 1070–1085. Retrieved fromhttps://jslhr.pubs.asha.org/article.aspx?articleid=2680411 → pages 55, 56,200Wong, P. C. & Diehl, R. L. (2003). Perceptual normalization for inter-andintratalker variation in Cantonese level tones. Journal of Speech,Language, and Hearing Research, 46(2), 413–421.doi:10.1044/1092-4388(2003/034) → pages 56Wong, S.-L. (1999). Deciding to stay, deciding to move, deciding not todecide. In G. G. Hamilton (Ed.), Cosmopolitan capitalists: Hong Kong &the Chinese diaspora at the end of the 20th century (pp. 135–151). Seattle,WA: University of Washington Press. → pages 2Wong, Y. W. (2006). Contextual tonal variations and pitch targets inCantonese. Proceedings of the 3rd International Conference on SpeechProsody, 317–320. Retrieved fromhttps://www.isca-speech.org/archive/sp2006/papers/sp06 199.pdf →pages 17, 57, 58Wong, Y. W. (2007). Production and perception of tones in Cantonesecontinuous speech. Master’s thesis, The Chinese University of HongKong. → pages 56Wong, Y. W. (2011). Sound changes in Hong Kong Cantonese: Amulti-perspective study. PhD thesis, The Chinese University of HongKong. → pages 56Wurm, S. A., Li, R., & Baumann, T. (1987). Language Atlas of China.Australian Academy of the Humanities; Longman Group (Far East). →pages xv, 9, 11233Xiao, Y. (2006). Heritage learners in the Chinese language classroom:Home background. Heritage Language Journal, 4(1), 47–56. → pages 21Yeung, H. H., Chen, K. H., & Werker, J. F. (2013). When does nativelanguage input affect phonetic perception? The precocious case of lexicaltone. Journal of Memory and Language, 68(2), 123–139.doi:10.1016/j.jml.2012.09.004 → pages 53, 203Yip, M. (2002). Tone. Cambridge, England: Cambridge University Press. →pages 16, 52Yip, V. (2013). Simultaneous language acquisition. In F. Grosjean & P. Li(Eds.), The psycholinguistics of bilingualism (pp. 119–144). Oxford,England: Blackwell. → pages 48Yip, V. & Matthews, S. (2007). The bilingual child: Early development andlanguage contact. New York, NY: Cambridge University Press. → pages29, 40Yiu, E. M.-L. & Fok, A. Y.-Y. (1995). Lexical tone disruption in Cantoneseaphasic speakers. Clinical Linguistics & Phonetics, 9(1), 79–92.doi:10.3109/02699209508985326 → pages 81Yiu, S. S. (2013). Cantonese tones and musical intervals. Proceedings of theInternational Conference on Phonetics of the Languages in China,ICPLC-2013, 155–158. Retrieved fromhttps://hub.hku.hk/bitstream/10722/205625/1/Content.pdf → pages 57Yu, A. C. L. (2007). Understanding near mergers: The case ofmorphological tone in Cantonese. Phonology, 24(01), 187–214.doi:10.1017/S0952675707001157 → pages 12, 19Yu, H. (2011). The intermittent rhythms of the Cantonese Pacific. In D. R.Gabacc´ıa & D. Hoerder (Eds.), Connecting Seas and Connected OceanRims (pp. 393–414). Leiden, The Netherlands: Brill. → pages 4, 9, 10Yu, K. M. (2017). The role of time in phonetic spaces: Temporal resolutionin Cantonese tone perception. Journal of Phonetics, 65, 126–144.doi:10.1016/j.wocn.2017.06.004 → pages 56Yu, K. M. & Lam, H. W. (2014). The role of creaky voice in Cantonese tonalperception. The Journal of the Acoustical Society of America, 136(3),1320–1333. doi:10.1121/1.4887462 → pages 18, 57, 58, 81, 96234Zee, E. (1991). Chinese (Hong Kong Cantonese). Journal of theInternational Phonetic Association, 21(1), 46–48.doi:10.1017/S0025100300006058 → pages 13, 14, 15, 19Zee, E. (1999). Change and variation in the syllable-initial andsyllable-final consonants in Hong Kong Cantonese. Journal of ChineseLinguistics, 120–167. → pages 14Zhang, C., Peng, G., & Wang, W. S. (2011). Inter-talker variation as asource of confusion in Cantonese tone perception. Proceedings of the17th International Congress of the Phonetic Sciences, 2276–2279.Retrieved from https://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2011/OnlineProceedings/RegularSession/Zhang,%20Caicai/Zhang,%20Caicai.pdf→ pages 56Zhang, C., Peng, G., Wang, X., & Wang, W. S. (2015). Cumulative effects ofphonetic context on speech perception. Proceedings of the 18thInternational Congress of Phonetic Sciences. Retrieved fromhttps://www.internationalphoneticassociation.org/icphs-proceedings/ICPhS2015/Papers/ICPHS0085.pdf → pages 56Zhang, Y., Kuhl, P. K., Imada, T., Kotani, M., & Tohkura, Y. (2005). Effectsof language experience: Neural commitment to language-specificauditory patterns. NeuroImage, 26(3), 703–720.doi::10.1016/j.neuroimage.2005.02.040 → pages 204Zheng, H., Peng, G., Tsang, P. W., & Wang, W. S. (2006). Perception ofCantonese level tones influenced by context position. Proceedings of the3rd International Conference on Speech Prosody. Retrieved fromhttps://www.isca-speech.org/archive/sp2006/papers/sp06 178.pdf →pages 56Zhu, Y. (1987). Analysis of cuing functions of the phonetic in modern China.Unpublished manuscript, East China Normal University. (In Chinese). →pages 21Zyzik, E. (2016). Toward a prototype model of the heritage languagelearner: Understanding strengths and needs. In M. Fairclough & S. M.Beaudrie (Eds.), Innovative strategies for heritage language teaching: Apractical guide for the classroom (pp. 19–38). Washington, DC:Georgetown University Press. → pages 25, 32235Appendix AMaterials Used in theExperimentThis appendix contains materials used in the word-identification experimentdescribed in Chapter 4. Target words listed in Section A.1 were selectedbased on results of Pilot Study 1 described in Section 4.2. Sentenceslisted in Section A.2 were selected based on results of Pilot Study 2described in Section 4.3. Pictures presented in Section A.3 were discussedin Section 4.4.1.2. Instructions shown in Section A.4 were discussed inSection 4.4.1.4. Lastly, the story listening task in Section A.5 was describedin Section 4.4.2.1.236A.1 WordsTable A.1 is a list of 27 target words.Table A.1: Words used in the main studyWritten form Word Meaning分 fan1 share粉 fan2 powderfan3 sleep墳 fan4 tomb份 fan6 portion呼 fu1 exhale虎 fu2 tiger褲 fu3 pants扶 fu4 help by holding another person’s arm婦 fu5 woman負 fu6 negative醫 ji1 cure椅 ji2 chair兒 ji4 child耳 ji5 ear二 ji6 two寫 se2 write瀉 se3 diarrhea蛇 se4 snake社 se5 society射 se6 shoot獅 si1 lion屎 si2 poop試 si3 try匙 si4 key市 si5 market士 si6 nurse/trained person237Table A.2 summarizes tonally contrastive quadruplets used in the mainstudy, and shows why there the number of unique target words was 27.Table A.2: Tonal quadruplets used in the current study (identical toTable 4.4)Tone set Syllable T1 T2 T3 T4 T5 T61 2 3 4 si lion poop try key1 2 3 5 fu exhale tiger pants woman1 2 3 6 fan share powder sleep portion1 2 4 5 ji cure chair child ear1 2 4 6 si lion poop key nurse/trained person1 2 5 6 si lion poop market nurse/trained person1 3 4 5 fu exhale pants help woman1 3 4 6 fan share sleep tomb portion1 3 5 6 fu exhale pants woman negative1 4 5 6 ji cure child ear two2 3 4 5 se write diarrhea snake society2 3 4 6 fan powder sleep tomb portion2 3 5 6 se write diarrhea society shoot2 4 5 6 ji chair child ear two3 4 5 6 se diarrhea snake society shootTotal number of unique words: 27Note that fan5 “diligence”, ji3 “idea”, and se1 “some” were not useddeliberately. For a detailed explanation of why they were avoided, seeSection 4.2.238A.2 SentencesThe sentences below are grouped into sets by carrier phrase. In each set,all sentences share the same carrier phrase but differ in the lexical tone ofthe last word. Only one sentence in each set is semantically congruous withthe carrier phrase. Incongruous sentences are marked with a pound sign #.Note that the fu and si series have six sentences (a, b, c, d, e, f) in each set,but the fan, ji, and se series only have five sentences (a, b, c, d, e) in eachset. This is because fan5 “diligence”, ji3 “idea”, and se1 “some” were notused for reasons stated in Section 4.2. In the main study, all sentences wererandomized. For details, see Section 4.4.2.(1) a. 叫giu3order好hou2very多do1many碟dip6platesung3food大家daai6gaa1everyone分fan1share‘(Let’s) order many dishes for everyone to share.’b. # 叫giu3order好hou2very多do1many碟dip6platesung3food大家daai6gaa1everyone粉fan2powder‘(Let’s) order many dishes for everyone to powder.’c. # 叫giu3order好hou2very多do1many碟dip6platesung3food大家daai6gaa1everyonefan3sleep‘(Let’s) order many dishes for everyone to sleep.’d. # 叫giu3order好hou2very多do1many碟dip6platesung3food大家daai6gaa1everyone墳fan4tomb‘(Let’s) order many dishes for everyone to tomb.’e. # 叫giu3order好hou2very多do1many碟dip6platesung3food大家daai6gaa1everyone份fan6portion‘(Let’s) order many dishes for everyone to portion.’239(2) a. # 幫bong1help啤啤bi4 bi1baby搽caa4applydi1DET爽song2dry身san1body分fan1share‘(Please) put some baby dry body share on the baby.’b. 幫bong1help啤啤bi4 bi1baby搽caa4applydi1DET爽song2dry身san1body粉fan2powder‘(Please) put some baby powder on the baby.’c. # 幫bong1help啤啤bi4 bi1baby搽caa4applydi1DET爽song2dry身san1bodyfan3sleep‘(Please) put some dry body sleep on the baby.’d. # 幫bong1help啤啤bi4 bi1baby搽caa4applydi1DET爽song2dry身san1body墳fan4tomb‘(Please) put some dry body tomb on the baby.’e. # 幫bong1help啤啤bi4 bi1baby搽caa4applydi1DET爽song2dry身san1body份fan6portion‘(Please) put some dry body portion on the baby.’(3) a. # 十sap6ten二ji6two點dim2point鐘zung1clock好hou2better上soeng5up床cong4bed分fan1share‘At twelve o’clock (you’d) better go to bed and share.’b. # 十sap6ten二ji6two點dim2point鐘zung1clock好hou2better上soeng5up床cong4bed粉fan2powder‘At twelve o’clock (you’d) better go to bed and powder.’240c. 十sap6ten二ji6two點dim2point鐘zung1clock好hou2better上soeng5up床cong4bedfan3sleep‘At twelve o’clock (you’d) better go to bed and sleep.’d. # 十sap6ten二ji6two點dim2point鐘zung1clock好hou2better上soeng5up床cong4bed墳fan4tomb‘At twelve o’clock (you’d) better go to bed and tomb.’e. # 十sap6ten二ji6two點dim2point鐘zung1clock好hou2better上soeng5up床cong4bed份fan6portion‘At twelve o’clock (you’d) better go to bed and portion.’(4) a. # 有jau5havedi1CL賊caak6thief仔zai2DIM專zyun1specialize掘gwat6dig山saan1hill分fan1share‘There are some thieves who specialize in digging up share on thehills.’b. # 有jau5havedi1CL賊caak6thief仔zai2DIM專zyun1specialize掘gwat6dig山saan1hill粉fan2powder‘There are some thieves who specialize in digging up powder onthe hills.’c. # 有jau5havedi1CL賊caak6thief仔zai2DIM專zyun1specialize掘gwat6dig山saan1hillfan3sleep‘There are some thieves who specialize in digging up sleep on thehills.’241d. 有jau5havedi1CL賊caak6thief仔zai2DIM專zyun1specialize掘gwat6dig山saan1hill墳fan4tomb‘There are some thieves who specialize in digging up tombs on thehills.’e. # 有jau5havedi1CL賊caak6thief仔zai2DIM專zyun1specialize掘gwat6dig山saan1hill份fan6portion‘There are some thieves who specialize in digging up portion onthe hills.’(5) a. # 你nei52.SG幫bong1help我ngo51.SG食sik6eat埋maai4ASP我ngo51.SGgo2DEM分fan1share‘You help me eat my share.’b. # 你nei52.SG幫bong1help我ngo51.SG食sik6eat埋maai4ASP我ngo51.SGgo2DEM粉fan2powder‘You help me eat my powder.’c. # 你nei52.SG幫bong1help我ngo51.SG食sik6eat埋maai4ASP我ngo51.SGgo2DEMfan3sleep‘You help me eat my sleep.’d. # 你nei52.SG幫bong1help我ngo51.SG食sik6eat埋maai4ASP我ngo51.SGgo2DEM墳fan4tomb‘You help me eat my tomb.’242e. 你nei52.SG幫bong1help我ngo51.SG食sik6eat埋maai4ASP我ngo51.SGgo2DEM份fan6portion‘You help me eat my portion.’(6) a. 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow呼fu1exhale‘When (you) swim (you) should exhale slowly.’b. # 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow虎fu2tiger‘When (you) swim (you) should tiger slowly.’c. # 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow褲fu3pant‘When (you) swim (you) should pant (as in trousers) slowly.’d. # 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow扶fu4lift‘When (you) swim (you) should lift (as in helping someonebalance by holding his/her arm) slowly.’e. # 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow婦fu5woman‘When (you) swim (you) should woman slowly.’f. # 游jau4swim水seoi2water換wun6exchange氣hei3air要jiu3should慢maan6slow慢maan2slow負fu6negative‘When (you) swim (you) should negative slowly.’243(7) a. # 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old呼fu1exhale‘There are two old exhale in the zoo.’b. 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old虎fu2tiger‘There are two tigers in the zoo.’c. # 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old褲fu3pant‘There are two old pants in the zoo.’d. # 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old扶fu4lift‘There are two old lift (as in helping someone balance by holdinghis/her arm) in the zoo.’e. # 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old婦fu5woman‘There are two old women in the zoo.’f. # 動物園dung6mat6jyun4zoo有jau5have兩loeng5two隻zek3CL老lou5old負fu6negative‘There are two old negatives in the zoo.’(8) a. # 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL呼fu1exhale‘S/he does not like exhale that is too wide.’244b. # 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL虎fu2tiger‘S/he does not like tigers that are too wide.’c. 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL褲fu3pant‘S/he does not like pants that are too loose.’d. # 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL扶fu4lift‘S/he does not like lift (as in helping someone balance by holdinghis/her arm) that is too wide.’e. # 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL婦fu5woman‘S/he does not like women that are too wide.’f. # 佢keoi53.SG唔m4NEG鍾意zung1ji3like太taai3too闊fut3widege3REL負fu6negative‘S/he does not like negative that is too wide.’(9) a. # 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person呼fu1exhale‘Grandma cannot walk and needs someone to exhale.’b. # 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person虎fu2lift‘Grandma cannot walk and needs someone tiger.’245c. # 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person褲fu3pant‘Grandma cannot walk and needs someone’s pants.’d. 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person扶fu4lift‘Grandma cannot walk and needs someone’s lift (as in helpingsomeone balance by holding his/her arm).’e. # 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person婦fu5woman‘Grandma cannot walk and needs someone’s woman.’f. # 婆婆po4po2grandma行haang4walk唔m4NEG到dou2able要jiu3need人jan4person負fu6negative‘Grandma cannot walk and needs someone’s negative.’(10) a. # 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant呼fu1exhale‘(I) need to be a rich, elegant exhale.’b. # 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant虎fu2tiger‘(I) need to be a rich, elegant tiger.’c. # 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant褲fu3pant‘(I) need to be a rich, elegant pant.’246d. # 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant扶fu4lift‘(I) need to be a rich, elegant lift (as in helping someone balanceby holding his/her arm).’e. 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant婦fu5woman‘(I) need to be a rich, elegant woman.’f. # 要jiu3need做zou6be個go3CL有jau5have錢cin2moneyge3GE3貴gwai3elegant負fu6negative‘(I) need to be a rich, elegant negative.’(11) a. # 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and呼fu1exhale‘There are positive and exhale magnetic fields.’b. # 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and虎fu2tiger‘There are positive and tiger magnetic fields.’c. # 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and褲fu3pant‘There are positive and pant magnetic fields.’247d. # 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and扶fu4lift‘There are positive and lift (as in helping someone balance byholding his/her arm) magnetic fields.’e. # 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and婦fu5woman‘There are positive and woman magnetic fields.’f. 磁ci4magnetic場coeng4field都dou1also有jau5have分fan1divide正zing3positive同tung4and負fu6negative‘There are positive and negative magnetic fields.’(12) a. 呢lei1DEM種zung2type怪gwaai3strange病beng6disease係hai6be冇mou5NEG得dak1able醫ji1cure‘There is no cure for this type of strange disease.’b. # 呢lei1DEM種zung2type怪gwaai3strange病beng6disease係hai6be冇mou5NEG得dak1able椅ji2chair‘There is no chair for this type of strange disease.’c. # 呢lei1DEM種zung2type怪gwaai3strange病beng6disease係hai6be冇mou5NEG得dak1able兒ji4child‘There is no child for this type of strange disease.’d. # 呢lei1DEM種zung2type怪gwaai3strange病beng6disease係hai6be冇mou5NEG得dak1able耳ji5ear‘There is no ear for this type of strange disease.’248e. # 呢lei1DEM種zung2type怪gwaai3strange病beng6disease係hai6be冇mou5NEG得dak1able二ji6two‘There is no two for this type of strange disease.’(13) a. # 媽咪maa1mi4Mommy話waa6say要jiu3want坐co5sitgo2DEM張zoeng1CL醫ji1cure‘Mommy said she wanted to sit on that cure.’b. 媽咪maa1mi4Mommy話waa6say要jiu3want坐co5sitgo2DEM張zoeng1CL椅ji2chair‘Mommy said she wanted to sit on that chair.’c. # 媽咪maa1mi4Mommy話waa6say要jiu3want坐co5sitgo2DEM張zoeng1CL兒ji4child‘Mommy said she wanted to sit on that child.’d. # 媽咪maa1mi4Mommy話waa6say要jiu3want坐co5sitgo2DEM張zoeng1CL耳ji5ear‘Mommy said she wanted to sit on that ear.’e. # 媽咪maa1mi4Mommy話waa6say要jiu3want坐co5sitgo2DEM張zoeng1CL二ji6two‘Mommy said she wanted to sit on that two.’(14) a. # 兩loeng5two歲seoi3year以下ji5haa6below就zau6then算syun3consider幼jau3young醫ji1cure‘(Babies) below two years of age are considered young cure.’249b. # 兩loeng5two歲seoi3year以下ji5haa6below就zau6then算syun3consider幼jau3young椅ji2chair‘(Babies) below two years of age are considered young chairs.’c. 兩loeng5two歲seoi3year以下ji5haa6below就zau6then算syun3consider幼jau3young兒ji4child‘(Babies) below two years of age are considered infants.’d. # 兩loeng5two歲seoi3year以下ji5haa6below就zau6then算syun3consider幼jau3young耳ji5ear‘(Babies) below two years of age are considered young ears.’e. # 兩loeng5two歲seoi3year以下ji5haa6below就zau6then算syun3consider幼jau3young二ji6two‘(Babies) below two years of age are considered young two.’(15) a. # 小siu2small心sam1heart清潔cing1git3clean對deoi3pair眼ngaan5eye同tung4and醫ji1cure‘Clean (your) eyes and cure carefully.’b. # 小siu2small心sam1heart清潔cing1git3clean對deoi3pair眼ngaan5eye同tung4and椅ji2chair‘Clean (your) eyes and chairs carefully.’c. # 小siu2small心sam1heart清潔cing1git3clean對deoi3pair眼ngaan5eye同tung4and兒ji4child‘Clean (your) eyes and children carefully.’250d. 小siu2small心sam1heart清潔cing1git3clean對deoi3pair眼ngaan5eye同tung4and耳ji5ear‘Clean (your) eyes and ears carefully.’e. # 小siu2small心sam1heart清潔cing1git3clean對deoi3pair眼ngaan5eye同tung4and二ji6two‘Clean (your) eyes and two carefully.’(16) a. # 一jat1one加gaa1add一jat1one結果git3gwo2result等如dang2jyu4equal醫ji1cure‘One plus one equals cure.’b. # 一jat1one加gaa1add一jat1one結果git3gwo2result等如dang2jyu4equal椅ji2chair‘One plus one equals chair.’c. # 一jat1one加gaa1add一jat1one結果git3gwo2result等如dang2jyu4equal兒ji4child‘One plus one equals child.’d. # 一jat1one加gaa1add一jat1one結果git3gwo2result等如dang2jyu4equal耳ji5ear‘One plus one equals ear.’e. 一jat1one加gaa1add一jat1one結果git3gwo2result等如dang2jyu4equal二ji6two‘One plus one equals two.’251(17) a. 我ngo51.SG有jau5have好hou2very多do1many字zi6word唔m4NEG識sik1know寫se2write‘There are many words that I don’t know how to write.’b. # 我ngo51.SG有jau5have好hou2very多do1many字zi6word唔m4NEG識sik1know瀉se3diarrhea‘There are many words that I don’t know how to diarrhea.’c. # 我ngo51.SG有jau5have好hou2very多do1many字zi6word唔m4NEG識sik1know蛇se4snake‘There are many words that I don’t know how to snake.’d. # 我ngo51.SG有jau5have好hou2very多do1many字zi6word唔m4NEG識sik1know社se5society‘There are many words that I don’t know how to society.’e. # 我ngo51.SG有jau5have好hou2very多do1many字zi6word唔m4NEG識sik1know射se6shoot‘There are many words that I don’t know how to shoot.’(18) a. # 前cin4before日jat6day我ngo51.SG食sik6eat錯co3wrongje5thing肚tou5belly寫se2write‘The day before yesterday I ate something bad and got bellywrite.’b. 前cin4before日jat6day我ngo51.SG食sik6eat錯co3wrongje5thing肚tou5belly瀉se3diarrhea‘The day before yesterday I ate something bad and got diarrhea.’252c. # 前cin4before日jat6day我ngo51.SG食sik6eat錯co3wrongje5thing肚tou5belly蛇se4snake‘The day before yesterday I ate something bad and got bellysnake.’d. # 前cin4before日jat6day我ngo51.SG食sik6eat錯co3wrongje5thing肚tou5belly社se5society‘The day before yesterday I ate something bad and got bellysociety.’e. # 前cin4before日jat6day我ngo51.SG食sik6eat錯co3wrongje5thing肚tou5belly射se6shoot‘The day before yesterday I ate something bad and got bellyshoot.’(19) a. # 行haang4walk山saan1hill時si4time小siu2small心sam1heart有jau5have毒duk6poison寫se2write‘Beware of poisonous write when (you) go hiking.’b. # 行haang4walk山saan1hill時si4time小siu2small心sam1heart有jau5have毒duk6poison瀉se3diarrhea‘Beware of poisonous diarrhea when (you) go hiking.’c. 行haang4walk山saan1hill時si4time小siu2small心sam1heart有jau5have毒duk6poison蛇se4snake‘Beware of poisonous snakes when (you) go hiking.’253d. # 行haang4walk山saan1hill時si4time小siu2small心sam1heart有jau5have毒duk6poison社se5society‘Beware of poisonous society when (you) go hiking.’e. # 行haang4walk山saan1hill時si4time小siu2small心sam1heart有jau5have毒duk6poison射se6shoot‘Beware of poisonous shoot when (you) go hiking.’(20) a. # 我ngo51.SG公公gung1gung1grandpa有jau5have個go3CL粵jyut6Cantonese劇kek6opera寫se2write‘My grandpa has a Cantonese opera write.’b. # 我ngo51.SG公公gung1gung1grandpa有jau5have個go3CL粵jyut6Cantonese劇kek6opera瀉se3diarrhea‘My grandpa has a Cantonese opera diarrhea.’c. # 我ngo51.SG公公gung1gung1grandpa有jau5have個go3CL粵jyut6Cantonese劇kek6opera蛇se4snake‘My grandpa has a Cantonese opera snake.’d. 我ngo51.SG公公gung1gung1grandpa有jau5have個go3CL粵jyut6Cantonese劇kek6opera社se5society‘My grandpa has a Cantonese opera society.’e. # 我ngo51.SG公公gung1gung1grandpa有jau5have個go3CL粵jyut6Cantonese劇kek6opera射se6shoot‘My grandpa has a Cantonese opera shoot.’254(21) a. # 開hoi1open槍coeng1gun要jiu3need對deoi3aim準zeon2accurate目標muk6biu1target寫se2write‘(You) should aim at the target and write when using a gun.’b. # 開hoi1open槍coeng1gun要jiu3need對deoi3aim準zeon2accurate目標muk6biu1target瀉se3diarrhea‘(You) should aim at the target and diarrhea when using a gun.’c. # 開hoi1open槍coeng1gun要jiu3need對deoi3aim準zeon2accurate目標muk6biu1target蛇se4snake‘(You) should aim at the target and snake when using a gun.’d. # 開hoi1open槍coeng1gun要jiu3need對deoi3aim準zeon2accurate目標muk6biu1target社se5society‘(You) should aim at the target and society when using a gun.’e. 開hoi1open槍coeng1gun要jiu3need對deoi3aim準zeon2accurate目標muk6biu1target射se6shoot‘(You) should aim at the target and shoot when using a gun.’(22) a. 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance獅si1lion‘There is lion dance in Chinatown during the Lunar New Year.’b. # 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance屎si2poop‘There is poop dance in Chinatown during the Lunar New Year.’255c. # 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance試si3try‘There is try dance in Chinatown during the Lunar New Year.’d. # 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance匙si4key‘There is key dance in Chinatown during the Lunar New Year.’e. # 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance市si5market‘There is market dance in Chinatown during the Lunar New Year.’f. # 過年gwo3nin4Lunar New Year唐人街tong4jan4gaai1Chinatown有jau5have舞mou5dance士si6trained person‘There is trained person dance in Chinatown during the LunarNew Year.’(23) a. # 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET獅si1lion‘When you walk your dog, remember to pick up the lion.’b. 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET屎si2poop‘When you walk your dog, remember to pick up the poop.’c. # 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET試si3try‘When you walk your dog, remember to pick up the try.’256d. # 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET匙si4key‘When you walk your dog, remember to pick up the key.’e. # 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET市si5market‘When you walk your dog, remember to pick up the market.’f. # 放fong3release狗gau2dog記住gei3zyu6remember執zap1pick up番faan1backdi1DET士si6trained person‘When you walk your dog, remember to pick up the trainedperson.’(24) a. # 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again獅si1lion‘If you fail this time, lion again next time.’b. # 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again屎si2poop‘If you fail this time, poop again next time.’c. 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again試si3try‘If you fail this time, try again next time.’d. # 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again匙si4key‘If you fail this time, key again next time.’257e. # 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again市si5market‘If you fail this time, market again next time.’f. # 今gam1this次ci3time唔m4NEG得dak1okay下haa6next次ci3time再zoi3again士si6trained person‘If you fail this time, trained person again next time.’(25) a. # 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock獅si1lion‘Remember to bring lock lion when leaving the house.’b. # 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock屎si2poop‘Remember to bring lock poop when leaving the house.’c. # 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock試si3try‘Remember to bring lock try when leaving the house.’d. 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock匙si4key‘Remember to bring (your) key when leaving the house.’e. # 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock市si5market‘Remember to bring lock market when leaving the house.’258f. # 出ceot1exit門mun4door口hau2mouth記得gei3dak1remember帶daai3bring鎖so2lock士si6trained person‘Remember to bring lock trained person when leaving thehouse.’(26) a. #di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock獅si1lion‘The brokers are watching the stock lion closely.’b. #di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock屎si2poop‘The brokers are watching the stock poop closely.’c. #di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock試si3try‘The brokers are watching the stock try closely.’d. #di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock匙si4key‘The brokers are watching the stock key closely.’e.di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock市si5market‘The brokers are watching the stock market closely.’f. #di1DET經紀ging1gei2broker睇tai2watch實sat6closely個go3CL股gu2stock士si6trained person‘The brokers are watching the stock trained person closely.’259(27) a. # 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care獅si1lion‘There are care lions in the operating theatre.’b. # 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care屎si2poop‘There is care poop in the operating theatre.’c. # 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care試si3try‘There is care try in the operating theatre.’d. # 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care匙si4key‘There are care keys in the operating theatre.’e. # 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care市si5market‘There are care markets in the operating theatre.’f. 手術sau2seot6operation室sat1room入jap6in面min6side有jau5have護wu6care士si6trained person‘There are nurses in the operating theatre.’260A.3 PicturesThe following pictures were used in the main study.Figure A.1: fan1 “share” Figure A.2: fan2 “powder”Figure A.3: fan3 “sleep” Figure A.4: fan4 “tomb”261Figure A.5: fan6 “portion” Figure A.6: fu1 “exhale”Figure A.7: fu2 “tiger” Figure A.8: fu3 “pants”262Figure A.9: fu4 “help by holdinganother person’s arm”Figure A.10: fu5 “woman”Figure A.11: fu6 “negative” Figure A.12: ji1 “cure”263Figure A.13: ji2 “chair” Figure A.14: ji4 “infant/child”Figure A.15: ji5 “ear” Figure A.16: ji6 “two”264Figure A.17: se2 “write” Figure A.18: se3 “diarrhea”Figure A.19: se4 “snake” Figure A.20: se5 “society”265Figure A.21: se6 “shoot” Figure A.22: si1 “lion”Figure A.23: si2 “poop” Figure A.24: si3 “try”266Figure A.25: si4 “key” Figure A.26: si5 “market”Figure A.27: si6 “nurse”267The following pictures were used for practice trials.Figure A.28: zoeng1 “piece (ofpaper)”Figure A.29: zoeng2 “prize”Figure A.30: zoeng3 “sauce” Figure A.31: zoeng6 “elephant”268The following pictures were used for the story listening task.Figure A.32: jan4 “human”Figure A.33: taai3 joeng4 “thesun”Figure A.34: bak1 fung1 “thenorth wind”269A.4 InstructionsNote that all instructions were given in spoken Cantonese in the experiment.The text in written Cantonese, romanization, and English translation beloware for readers’ reference only.A.4.1 Written CantoneseInstructions for the story listening taskInstructions for the picture learning taskInstructions for the first experimental block (Types 1, 2, 3)270Instructions for the second experimental block (Type 4)Instructions for the third experimental block (Types 5A, 5B, 6A, 6B)A.4.2 RomanizationInstructions for the story listening taskJi4 gaa1 nei5 wui5 teng1 jat1 go3 gu2 zai2. Jyu4 gwo2 ji5 tung2 jau5 man6tai4, ho2 ji5 waa6 bei2 jin4 gau3 jyun4 zi1. Teng1 jyun4 gu2 zai2 zi1 hau6ngo5 wui5 man6 nei5 jat1 go3 man6 tai4. Tau4 sin1 go3 gu2 zai2 leoi5 min6bin1 go3 jeng4 zo2 le1? Cing2 gam6 button box soeng6 min6 ge3 sou3 zi6wui6 daap3.Instructions for the picture learning taskGan1 zyu6 lok6 lai4 ngo5 wui5 gaai3 siu6 zan6 gaan1 sat6 jim6 seoi1 jiu3jung6 ge3 tou4 waa2. Cing2 nei5 lau4 sam1 teng1, teng1 jyun4 zi1 hau6gam6 button box soeng6 min6 jam6 ho4 jat1 go3 zai3 gai3 zuk6 zau6 ho2 ji5laa3. Jyu4 gwo2 zeon2 bei6 hou2, cing2 nei5 ji4 gaa1 zau6 gam6 button boxsoeng6 min6 jam6 ho4 jat1 go3 zai3.271Instructions for the first experimental block (Types 1, 2, 3)Gan1 zyu6 lok6 lai4 nei5 wui5 teng1 dou2 di1 zi6. Bin1 fuk1 tou4 waa2doi6 biu2 nei5 teng1 dou2 ge3 zi6 le1? Nei5 zi2 ho2 ji5 gaan2 jat1 fuk1 tou4waa2, gam6 button box soeng6 min6 ge3 sou3 zi6 wui4 daap3. Cing2 lau4ji6, jau5 si4 nei5 ho2 nang4 wui5 teng1 m4 cing1 co2 mau5 di1 zi6, hai6dak6 dang1 ge2. Zi2 jiu3 zeon6 nei5 ge3 nang4 lik6 wui4 daap3 zau6 dak1gaa3 laa3. Zeon2 bei6 hou2 ge3 waa2 cing2 nei5 gam6 button box soeng6min6 jam6 ho4 jat1 go3 zai3 hoi1 ci2.Instructions for the second experimental block (Type 4)Gan1 zyu6 lok6 lai4 nei5 wui5 teng1 dou2 di1 geoi3 zi2. Geoi3 zi2 zeoi3 hau6go2 go3 hai6 mat1 je5 zi6 le1? Cing2 lau4 ji6, geoi6 zi2 teng1 lok6 heoi3 ho2nang4 jau5 siu2 siu2 m4 zi6 jin6, hai6 dak6 dang1 ge2. Nei5 zi2 jiu3 zeon6nei5 ge3 nang4 lik6 wui4 daap3 zau6 dak1 laa3. Jyu4 gwo2 m4 ming4 ho2ji5 man6 jin4 gau3 jyun. Ming4 baak3 ge3 waa2 ho2 ji5 gam6 buton boxsoeng6 min6 jam6 ho4 jat1 go3 zai3 hoi1 ci2.Instructions for the third experimental block (Types 5A, 5B, 6A, 6B)Gan1 zyu6 lok6 lai4 nei5 wui5 teng1 dou2 di1 geoi3 zi2. Geoi3 zi2 zeoi3 hau6go2 go3 hai6 mat1 je5 zi6 le1? Tung4 soeng6 go3 bou6 fan6 jat1 joeng6, jau5di1 zi6 m4 hai6 gam3 cing1 co2. Nei5 zeon6 lik6 wui4 daap3 zau6 dak1 gaa3laa3. Cing1 lau4 ji3, jau5 di1 geoi3 zi2 ge3 ji3 si1 hai6 gwaai3 gwaai2 dei2,hai6 dak6 dang1 ge2. Nei5 zi2 jiu3 wui4 daap6 nei5 teng1 dou2 ge3 zeoi3hau6 go2 go3 zi6 hai6 mat1 zau6 dak1 laa3. Lai6 jyu4 jyu4 gwo2 nei5 teng1dou2 “gam6 do1 hok6 saang1 ge3 meng2 hou2 naan4 gei3”, gam2 nei5 jing1goi1 gaan2 “gei3 dak1” go3 “gei3”. Daan6 hai6 jyu4 gwo2 nei5 teng1 dou2“gam6 do1 hok6 saang1 ge3 meng2 hou2 naan4 gei1”, gam2 nei5 jing1 goi1gaan2 “fei1 gei1 ” go “gei1”, ji4 m4 hai6 “gei3 dak1” go “gei3”. Jyu4 gwo2m4 ming4, ho2 ji5 man6 jin4 gau3 jyun4. Zeon2 bei6 hou2 ge3 waa2 cing2nei5 gam6 button box soeng6 min6 jam6 ho4 jat1 go3 zai3 hoi1 ci2.272A.4.3 English translationInstructions for the story listening taskNow you are going to listen to a story. If your headphone is not working,please let the experimenter know. After the story I will ask you a question.Who won in the story? Please respond by pressing a number on the buttonbox.Instructions for the picture learning taskI am going to introduce the pictures to be used in the experiment. Pleaselisten carefully, and press any button on the box to continue. When you areready, press any button on the box to start.Instructions for the first experimental block (Types 1, 2, 3)You are going to listen to some words. Which picture represents the wordthat you heard? You can only choose one picture and respond by usingthe button box. Note that sometimes the word may be unclear, and it isintentional. You just need to try your best to answer. When you are ready,press any button on the box to start.Instructions for the second experimental block (Type 4)You are going to listen to some sentences. What is the last word ofthe sentence? Note that the sentences may sound unnatural, which isintentional. Just try your best to answer. If you have questions, pleaselet the experimenter know. When you are ready, press any button on thebox to start.Instructions for the third experimental block (Types 5A, 5B, 6A, 6B)You are going to listen to some sentences. What is the last word of thesentence? Just like the previous section, some words may be unclear, andyou just need to try your best. Note that some sentences may not makesense, which is intentional. All you need to do is identify the last word that273you heard. For example, if you hear “there are too many student namesto remember”, then you should respond with “remember”. However, if youhear “there are too many student names to airplane”, then you respondwith “airplane” but not “remember”. If you have questions, please let theexperimenter know. When you are ready, press any button on the box tostart.A.5 Story used for the story listening taskNote that participants of the study only heard the audio file and were notprovided with any written texts.A.5.1 Written CantoneseA.5.2 RomanizationJau5 jat1 jat6, bak1 fung1 tung4 taai3 joeng4 hai2 dou6 cou4 gau3 ging2bin1 go3 sai1 lei di1. Ni1 go3 si4 hau6, gam3 aam1 jau5 go3 zeok3 zyu6jat1 gin6 daai6 lau1 ge3 lou6 jan4 ging1 gwo3. Jyu1 si6, keoi5 dei6 kyut3ding6 bin1 go3 nang4 gau3 ling6 dou3 go2 go3 jan4 mok1 zo2 keoi5 gin6lau1 ge3 waa2, bin1 go3 zau6 jeng4.Bak1 fung1 zeon6 lik6 gam2 ceoi1, daan6 hai6 jyut6 ceoi1 dak1 daai6 lik6,lou6 jan4 faan2 ji4 zoeng1 gin6 lau1 meng1 dak1 jyut6 gan2. Bak1 fung1wai4 jau5 fong3 hei3. Leon4 dou3 taai3 joeng4 ceot1 maa5 ge3 si4 hau6,taai3 joeng4 maang5 lik6 gam2 saai3, saai3 dou3 lou6 jan4 lau4 saai3274daai6 hon6, ji4 ce2 zik1 hak1 zoeng1 gin6 lau1 ceoi4 zo2 lok6 lai4. Zeoi3hau6, bak1 fung1 wai4 jau5 sing4 jing6 taai3 joeng4 bei2 keoi5 sai1 lei.A.5.3 English TranslationThe Wind and the Sun were disputing which was the stronger. Suddenlythey saw a traveller coming down the road. They decided that whicheverof the two can cause that traveller to take off his cloak shall be regarded asthe stronger.The Wind began to blow as hard as it could upon the traveller. But theharder he blew the more closely did the traveller wrap his cloak round him,till at last the Wind had to give up in despair. Then the Sun came out andshone in all his glory upon the traveller, who soon found it too hot to walkwith his cloak on. At last, the Wind had to admit that the Sun was thestronger.275Appendix BLanguage backgroundquestionnaireParticipants were asked to fill out this questionnaire on a computer at theend of the experiment session on Day 1.The following questions were from the Bilingual Language Profile(Birdsong et al., 2012): 1, 2, 5, 7, 9, 10, 11, 12, 24, 25, 26, 27, 28.The following questions were added for the purpose of this study: 3, 4,6, 8, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 29, 30.For a detailed discussion on this questionnaire, see Section 4.4.3.2.2761. What is your age?12. What gender do you identify yourself with?d Maled Femaled Other:3. Do you have any hearing disorder?d Yes (Please specify: )d No4. What is your dominant language?d Cantonesed Englishd Mandarind A Chinese dialect (e.g. Hokkien, Hakka); Please specify:d Other (e.g. French, Japanese); Please specify:1This is a drop-down menu with options from “17 or below”, “18”, “19” ... up to “60+”.2775. How would you rate your proficiency of the following?0 = not at all; 6 = very well0 1 2 3 4 5 6Cantonese - listening d d d d d d dCantonese - speaking d d d d d d dCantonese - reading d d d d d d dCantonese - writing d d d d d d dEnglish - listening d d d d d d dEnglish - speaking d d d d d d dEnglish - reading d d d d d d dEnglish - writing d d d d d d d6. “A heritage speaker of Cantonese is someone who in early childhood grewup with Cantonese (and possibly other languages) in the environment;however, by school age or shortly after, English was the primary languageused in day-to-day situations.”Based on this definition, would you describe yourself as a heritagespeaker of Cantonese?d Yes, I consider myself a heritage speaker of Cantonese.d No, I do not consider myself a heritage speaker of Cantonese.7. What is your current place of residence?CityState (optional)Country2788. Please indicate the cities and countries that you have lived in alongwith how old you were when you lived there. List first the placewhere you were born, and list each town/city you have lived in.Example:Toronto, Canada — birth to 3Hong Kong – 4 to 10Vancouver, Canada — 11 to 199. At what age did you2...?Start learning CantoneseStart learning English10. At what age did you3...?Start to feel comfortable using CantoneseStart to feel comfortable using English2Options on this drop-down menu are “Never”, “Since birth”, “1”, “2” ... up to “20+”.3Options on this drop-down menu are “Not yet”, “As early as I can remember”, “1”, “2”... up to “20+”.27911. How many years have you4...?Had classes (history, math, literature etc) in Cantonese (primaryschool through university, including Saturday School)Had classes (history, math, literature etc) in English (primary schoolthrough university)Spent in a country/region where Cantonese is spokenSpent in a country/region where English is spokenSpent in a family where Cantonese is spokenSpent in a family where English is spokenSpent in a work environment where Cantonese is spokenSpent in a work environment where English is spoken4Options on this drop-down menu are “0”, “1”, “2” ... up to “20+”.28012. What is your highest level of formal education?If you are a current undergraduate student, choose “Undergraduatedegree”.d Kindergartend Elementary/ primary schoold Junior high/ middle schoold High school diploma or equivalentd Undergraduate degreed Graduate degree13. What was/is the main language of instruction in your...?5KindergartenElementary/ primary schoolJunior high/ middle schoolHigh school diploma or equivalentUndergraduate degreeGraduate degree5Options on this drop-down menu are: “Cantonese”, “English”, “Mandarin”, “Other”,and “Not applicable”.28114. Have you ever attended a Chinese school in North America?d Yes6d No715. Where was the Chinese school that you attended? (City, State,Country)16. From what age to what age did you attend the Chinese school?Example: 6 to 1417. What was the language of instruction of the Chinese school that youattended?d Cantonesed Mandarin6Participants who answered “yes” would be directed to a page that contained Question15 to 17, and then continue with Question 18.7Participants who answered “no” would be directed to Question 18.28218. In an average week, what percentage of time do you use thefollowing languages with family? The percentages should add up to1008.CantoneseAnother Chinese dialect (e.g. Hoisan/Toisan/Taishan, Zhongshan,Hakka, Hokkien, Teochew)MandarinEnglishOther languages19. Is your father an immigrant to Canada?d Yes. He is originally from:d No, he was born in Canada.d No, he does not live in Canada.d Not applicable. I was adopted.20. Is your mother an immigrant to Canada?d Yes. He is originally from:d No, he was born in Canada.d No, he does not live in Canada.d Not applicable. I was adopted.8Options on this drop-down menu are “0%”, “10%”, “20%”... up to “100%”28321. What was/is the native language of your9...?FatherMotherPaternal grandfatherPaternal grandmotherMaternal grandfatherMaternal grandmother22. If you have chosen “Other” or “Another Chinese dialect” in theprevious question, please specify below which language your familymember speaks. Leave this question blank if it is not applicable toyou.Example: My father speaks Toisan. My mother speaks Tagalog.9Options on this drop-down menu are “Cantonese”, “Another Chinese dialect (e.g.Hoisan/Toisan/Taishan, Zhongshan, Hakka, Hokkien, Teochew)”, “English”, “Mandarin”,“Other”, and “I don’t know”.28423. What language(s) is/are used in the following contexts? Pleasespecify the percentage. If not applicable, leave that box blank.Example: 10% Cantonese, 90% EnglishYou speaking to your fatherYour father speaking to youYou speaking to your motherYour mother speaking to youYour parents speaking to each otherYou speaking to your sibling(s)Your sibling(s) speaking to youYou speaking to your grandparent(s)Your grandparent(s) speaking to you24. In an average week, what percentage of time do you use thefollowing languages with friends? The percentages should add up to10010.CantoneseAnother Chinese dialect (e.g. Hoisan/Toisan/Taishan, Zhongshan,Hakka, Hokkien, Teochew)MandarinEnglishOther languages10Options on the drop-down menu of Question 24-27 are “0%”, “10%”, “20%”... up to“100%”28525. In an average week, what percentage of time do you use thefollowing languages at school or work? The percentages should addup to 100.CantoneseAnother Chinese dialect (e.g. Hoisan/Toisan/Taishan, Zhongshan,Hakka, Hokkien, Teochew)MandarinEnglishOther languages28626. When you talk to yourself, how often do you talk to yourself in...?The percentages should add up to 100.CantoneseAnother Chinese dialect (e.g. Hoisan/Toisan/Taishan, Zhongshan,Hakka, Hokkien, Teochew)MandarinEnglishOther languages27. When you count, how often do you count in...? The percentagesshould add up to 100.CantoneseAnother Chinese dialect (e.g. Hoisan/Toisan/Taishan, Zhongshan,Hakka, Hokkien, Teochew)MandarinEnglishOther languages28728. To what extent do you agree with the following statements?0 = not agree at all; 6 = totally agree0 1 2 3 4 5 6I feel like myself when I speak Cantonese. d d d d d d dI feel like myself when I speak English. d d d d d d dI identify with a Cantonese-speaking culture. d d d d d d dI identify with an English-speaking culture. d d d d d d dIt is important to me to use (or eventually use)Cantonese like a native speaker.d d d d d d dIt is important to me to use (or eventually use)English like a native speaker.d d d d d d dI want others to think I am a native speaker ofCantonese.d d d d d d dI want others to think I am a native speaker ofEnglish.d d d d d d d29. What do you think the experiment is about?30. Is there anything else that you want to tell us?d Yes, I think you might want to know thatd No288

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics