Open Collections

UBC Undergraduate Research

The Benefit of One’s Own Voice in Word Recognition in Cantonese-English Bilinguals Cheung, Sarah 2021

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata

Download

Media
52966-Cheung_Sarah_LING_449_Benefit_own_voice_2021.pdf [ 1.96MB ]
Metadata
JSON: 52966-1.0397418.json
JSON-LD: 52966-1.0397418-ld.json
RDF/XML (Pretty): 52966-1.0397418-rdf.xml
RDF/JSON: 52966-1.0397418-rdf.json
Turtle: 52966-1.0397418-turtle.txt
N-Triples: 52966-1.0397418-rdf-ntriples.txt
Original Record: 52966-1.0397418-source.json
Full Text
52966-1.0397418-fulltext.txt
Citation
52966-1.0397418.ris

Full Text

Running Head: BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS    The Benefit of One’s Own Voice in Word Recognition in Cantonese-English Bilinguals    Sarah Cheung  LING 449: Honours Thesis  Supervisor: Dr. Molly Babel  University of British Columbia                     BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  2 Table of Contents 1.0 Introduction ...................................................................................................................... 4 1.1 Self-perception .............................................................................................................. 4 1.2 Self-voice Benefit .......................................................................................................... 7 1.2.1 Adverse Listening Conditions ...................................................................................... 11 1.2.2 Social Weighting .......................................................................................................... 12 1.3 Talker Identification .................................................................................................... 13 1.4 Research Question ...................................................................................................... 14 2.0 Methods ......................................................................................................................... 14 2.1 Participants ................................................................................................................. 15 2.2 Materials ..................................................................................................................... 15 2.2.1 Multilingual Language Questionnaire .......................................................................... 15 2.2.2 Production Stimuli ....................................................................................................... 16 2.2.3 Perception Stimuli ....................................................................................................... 17 2.3 Production Task .......................................................................................................... 18 2.3.1 Procedure .................................................................................................................... 18 2.3.2 Segmentation .............................................................................................................. 18 2.3.3 Grouping Voices ........................................................................................................... 20 2.4 Perception Task ........................................................................................................... 22 2.4.1 Audio Manipulation ..................................................................................................... 22 2.4.2 Procedure .................................................................................................................... 22 3.0 Results ............................................................................................................................ 23 4.0 Discussion ....................................................................................................................... 26 4.1 Self-voice Recognition ................................................................................................. 27 4.2 Theories Behind the Self-voice Benefit ........................................................................ 28 4.2.1 Familiarity Benefit ....................................................................................................... 28 4.2.2 Common Coding Theory .............................................................................................. 29 4.2.3 Prototype Theory ......................................................................................................... 31 4.3 Limitations and Future Directions ................................................................................ 32 5.0 Conclusion ...................................................................................................................... 34 Acknowledgements .............................................................................................................. 35 References ............................................................................................................................ 36 Appendix A ........................................................................................................................... 45 Appendix B ........................................................................................................................... 52  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  3 Abstract  In the domain of voice processing, a large body of evidence suggests that people process their own voices differently from others’ voices (Hughes & Harrison, 2013; Mitterer et al., 2020; Peng et al., 2019) and a recent study suggests this difference in perception may translate into an advantage in word recognition when listening to one’s own voice (Eger & Reinisch, 2019). This self-voice benefit has yet to be examined in a bilingual population, so the current study provides a first look into the perception performance of Cantonese-English bilinguals hearing their own voice compared to others’ voices. Moreover, this study aims to investigate whether this advantage exists when listeners lack the expected cues for self-identification. In this virtual experiment, female Cantonese-English bilinguals recorded themselves producing a set of minimal pairs containing difficult Cantonese contrasts. A subset of these minimal pairs were selected as stimuli for the following perception task. Speakers were grouped according to how acoustically contrastive their productions of each minimal pair were and these groupings were used to design personalized experiments for each subject, featuring their own voice and the voices of their group members. The perception task was a two-alternative forced-choice lexical identification paradigm in which participants heard isolated Cantonese words and were required to select the picture corresponding to the word they heard. The audio stimuli for this task was manipulated using the Change-Gender function in Praat (Boersma & Weenink, 2020), which was intended to disguise speaker identity. Participants’ language background information was collected through a multilingual language questionnaire. The results of this study provides support for the presence of a benefit for self-produced speech regardless of whether or not participants explicitly recognized their own voices.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  4 1.0 Introduction  Self-recognition, the ability to distinguish between the self and others, is a fundamental human capability. The idea that self-referential information is processed differently from stimuli associated with others is well-supported. This extends to the domain of voice processing, as researchers have not only observed that people process their own voices differently from others’ voices (Hughes & Harrison, 2013; Mitterer et al., 2020; Peng et al., 2019), but also that this difference in perception may translate into an advantage in recognizing words in self-produced speech (Eger & Reinisch, 2019). The perceptual benefit of hearing one’s own voice has yet to be examined in a bilingual population, so the current study provides a first investigation of the presence of the self-voice benefit in Cantonese-English bilinguals. Additionally, this study aims to investigate the listening conditions in which this advantage exists, specifically if it is present when listeners lack the expected cues for self-identification. 1.1 Self-perception People perceive stimuli pertaining to themselves differently from information related to others. For example, subjects have shown unique electrophysiological responses to the auditory or visual presentation of their own names, compared to others’ names (Liu et al., 2019; Zhao et al., 2011). Liu and colleagues (2019) showed that hearing one's own name elicits different brain responses (studied using electroencephalography) than hearing someone else's name, regardless of whether it was spoken by oneself or another talker. Furthermore, much evidence for distinct mechanisms for processing information relating to oneself comes from studies of self-face processing (Devue & Brédart, 2011; Keenan et al., 2000; Keyes et al., 2010; Platek et al., 2004, 2006; Uddin et al., 2005).  Uddin et al. (2005) used functional magnetic BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  5 resonance imaging to examine the brain regions involved in self-face processing. Participants viewed photos of their own faces morphed with a familiar other of the same gender and indicated whether the face looked like themselves or a familiar other. This neuroimaging study showed selective activation of right hemisphere structures including the inferior frontal gyrus, suggesting a unique mechanism for self-face recognition (Uddin et al., 2005). Numerous other studies have also found a right-hemisphere specialization for processing one's own face compared to the faces of familiar and unfamiliar others (Kaplan et al., 2008; Keenan et al., 2000; Platek et al., 2004), however some studies have found an advantage in the left hemisphere (Devue & Brédart, 2011). Nevertheless, these studies minimally agree that a difference exists between visual recognition of oneself and others. In the domain of voice processing, a large body of evidence suggests that people process their own voices differently from others’ voices (Candini et al., 2014; Hughes & Harrison, 2013; Hughes & Nicholson, 2010; Kaplan et al., 2008; Liu et al., 2019; Peng et al., 2019; Rosa et al., 2008). Two studies investigating the self-enhancement bias in voice perception found that participants rate their own voices as more attractive than those of other same-sex participants and tend to give themselves higher ratings than they were given by others, regardless of whether or not they recognized their own voices in the experiments (Hughes & Harrison, 2013; Peng et al., 2019). In addition, neuroimaging shows selective brain activation in response to listeners’ own voices. For example, when listening to self-produced auditory stimuli, participants showed more activity in the right inferior frontal gyrus than when listening to a friend’s voice (Kaplan et al., 2008). This brain area is also involved in the processing of one’s own face (Uddin et al., 2005) and is consistent with the other studies suggesting a right-BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  6 hemisphere advantage in self-voice processing (Hughes & Nicholson, 2010; Rosa et al., 2008).  In one such study, participants were presented with a continua of auditory morphs combining their own voices and famous voices, in addition to morphs combining famous voices with familiar voices of friends or co-workers whom participants had known for at least one year. Results showed higher sensitivity to self-voice, but not the voices of familiar others when subjects responded with their left hands (Rosa et al., 2008). Together, these studies demonstrate a difference in the perception of self-voice and other-voice.  Self-voice perception differs from listening to others’ voices because of the different mediums through which sound is physically conducted during perception. When listeners hear their own voices as they speak, sound is transmitted via both air and bone conduction (Reinfeldt et al., 2010; Shuster & Durrant, 2003). In air conduction, vibrations exit the oral cavity, travel through air and enters the ear canal, whereas in bone conduction, vibrations move through the skull bone directly to the cochlea (Stenfelt & Goode, 2005). Comparatively, when listeners hear others speak or hear their own voice in recordings, sound is conducted solely via air conduction. Despite these differences, listeners are very successful at recognizing their own productions in recordings (Xu et al., 2013). Xu and colleagues (2003) presented listeners with recordings of their own voices and the voices of other, familiar speakers in normal and difficult listening conditions. They found that even in acoustically challenging conditions, when formant information above the third formant was removed by a high-pass filter, listeners were good at identifying their own voices. Researchers explained that auditory familiarity with one's own voice and the association between auditory self-representation and motor representations may contribute to this self-recognition advantage (Xu et al., 2013).  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  7 Much research has been conducted on the differences between identifying self-voice and other-voice and several studies point to a self-advantage in recognizing one’s own voice. However, less is known about how this potential self-benefit applies to word recognition during perception. This study aims to investigate whether a self-voice advantage in word recognition exists when listeners are unable to recognize their own voices. In adverse listening conditions, monolinguals have shown no such self-benefit (Schuerman et al., 2019; Schuerman et al., 2015). In contrast, a study on second language learners in natural conditions has provided evidence of better word recognition for listener’s own voices compared to others matched in proficiency (Eger & Reinisch, 2019). We will be the first to examine the self-voice benefit in word recognition in bilinguals, who show more phonetic variability in production compared to monolinguals (Bosch & Ramon-Casas, 2011; Bosch & Ramon-Casas, 2009). Bilinguals are a population of interest because their naturally variable productions can reveal the presence or absence of a self-benefit if speakers’ familiarity with their own unique speech patterns do in fact facilitate word recognition.  1.2 Self-voice Benefit The idea that listeners may perceive their own voices better is related to the interlanguage speech intelligibility benefit, which describes a perceptual advantage experienced by a listener sharing the same native language background as the speaker (Bent & Bradlow, 2003). Native and non-native speakers of English participated in a sentence recognition task involving English sentences produced by native speakers of  Chinese, Korean and English. Researchers found a “matched interlanguage speech intelligibility benefit” when non-native listeners rated the speech of a native speaker just as intelligible as the speech of a BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  8 non-native speaker with the same native language as the listener. Bent and Bradlow (2003) attributed this benefit to the shared phonetic and phonological knowledge from the speaker and listener’s common language background. Additionally, a similar result was observed when non-native listeners heard sentences produced by other language learners with a different native language, suggesting a “mismatched interlanguage speech intelligibility benefit”. This benefit could be explained by language learners sharing similar strategies to attend to certain cues during comprehension, that differ from those used by native listeners. While other studies have found mixed evidence for the interlanguage speech intelligibility benefit (Stibbard & Lee, 2006; Xiao et al., 2019), the rationale behind it extends to other areas of investigation, namely self-voice perception. Just as non-native interlocutors sharing native language backgrounds may show better perception, speakers listening to themselves may have a self-benefit in comprehension due to their own linguistic knowledge and familiarity with the cues important to understanding their own speech. Recent research on the self-voice benefit suggests that second language (L2) learners perform better on a word recognition task when listening to their own voices over other voices (Eger & Reinisch, 2019). In this study, native German learners of English heard isolated English words containing difficult contrasts produced by themselves or other equally proficient L2 speakers, identified as using similar acoustic cues to produce contrasts. Participants demonstrated better word recognition for self-produced speech, providing support for a perceptual self-benefit. A second experiment showed that when listening to speech containing rich cues for differentiating contrasts (produced by high-proficiency speakers), low-proficiency speakers no longer show a self-benefit. Nonetheless, this proposed self-advantage in word BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  9 recognition for proficiency-matched L2 learners is motivated by the fact that speakers are highly familiar with their own voices and therefore may adapt to their own accented productions. Listeners have been shown to adapt quickly to speech patterns of non-native speakers and ease of comprehension increases with familiarity (Adank et al., 2009; Bradlow & Bent, 2008; Eger & Reinisch, 2019). For example, Bradlow & Bent (2008) studied perceptual adaptation to foreign-accented speech by exposing native American English listeners to Chinese-accented speech. Performance on a task requiring listeners to transcribe productions showed that perceptual adaptation to foreign-accented English improved with exposure to the accented talker, regardless of baseline intelligibility. While L2 learners have ample exposure to accented speech from interactions with their peers, they have even more experience with their own accented productions. The frequent auditory and articulatory feedback (Guenther, 2006) received by L2 learners when they hear themselves speak allows them to adapt to their own unique speech patterns, potentially driving the self-benefit in word recognition.  Furthermore, this is consistent with the large body of research showing that familiarity with a speaker’s voice eases perception (Newman et al., 2001; Nygaard et al., 1994; Perry et al., 2018). For instance, Nygaard & Pisoni (1998) showed that listeners who successfully learned the voices and names of speakers were better at identifying words produced by the speakers they were trained on compared to unfamiliar speakers. Evidence of this familiar-talker advantage in perception has been found for young and old listeners (Johnsrude et al., 2013; Yonan & Sommers, 2000), older listeners with hearing impairments (Souza et al., 2013), with explicit (Nygaard & Pisoni, 1998) and implicit training (Kreitewolf et al., 2017) and in listening conditions with a competing talker in the background (Holmes et al., 2018; Holmes & BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  10 Johnsrude, 2020). While indexical information in the speech signal can aid perception, the ability to explicitly recognize familiar voices may be independent from the familiar-voice benefit. Holmes et al. (2018) manipulated the fundamental frequency (f0) and formant frequencies of productions using the Change Gender function in Praat (Boersma & Weenink, 2020) to prevent listeners from successfully recognizing the familiar voices of their friends or significant others. In the presence of a competing talker, speech produced by a familiar talker was more intelligible than that of a stranger, even when listeners did not recognize the familiar voice. Researchers speculate that voice-recognition and speech perception rely on separate systems (Holmes et al., 2018), which motivates the current study.  Another study using the same vocal disguise technique found that speakers perceive their own voices differently from those of other speakers, even when they do not recognize their own voices (Mitterer et al., 2020). In this study, female German learners of English were recorded producing English sentences that contained sounds typically difficult for German learners. These recordings were subsequently manipulated to conceal speaker identity by lowering the f0 and formant frequencies to change the female voices into male voices. Although subjects were unaware they were listening to their own voices, they rated their own productions as more target-like compared to those of other L2 learners. Mitterer, Eger & Reinisch (2020) propose that the comprehension advantage for a speaker’s own voice (Eger & Reinisch, 2019) could be a potential explanation for the higher ratings for self-produced sentences. This implies that speakers may show better comprehension of their own voices even when they do not recognize the voice as being their own. According to this view, we would predict that subjects in the current study would show a self-benefit in word recognition even BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  11 when lacking cues to speaker identification. In other words, the advantage in comprehending one’s own voice may not require the listener to explicitly recognize their own voice. Alternatively, this benefit could be due a speaker’s familiarity with their own idiosyncrasies in the speech signal.  1.2.1 Adverse Listening Conditions  Several researchers have found no evidence for an advantage in recognizing words spoken by oneself in adverse listening conditions. Schuerman et al. (2015) tested word recognition with speech produced by participants themselves compared to an average speaker representative of their speech community. This average speaker was selected from their participant pool based on comparisons of average phonetic distances from each speaker to all other speakers on various linguistic variables. Their word identification task featured noise-vocoded speech from monolingual Dutch participants, which lacked spectral cues important for in talker identification. Results revealed higher accuracy for words produced by the model speaker than the listeners themselves. Similarly, Schuerman et al. (2019) found no self-benefit in word recognition when stimuli were noise-vocoded, embedded in speech shaped noise or filtered to simulate self-perception via air and bone conduction. Results showed that across all three difficult listening conditions, word recognition was better for speech produced by speakers who better approximated the statistical average (an averaged measure of phonetic distance from a single participant to all other participants) of their speech community, even when compared to self-produced speech. In the noise-vocoded speech condition, spectral cues to speaker identity were diminished so participants were unable to recognize their productions as their own. Moreover, even in the speech-in-noise and filtered speech conditions, when cues BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  12 important to talker identification were retained, participants did not show an advantage in perceiving their own voices.  1.2.2 Social Weighting A socially weighted encoding approach emphasizes the simultaneous mapping of the incoming speech signal to both linguistic and social representations rather than linguistic representations alone. During this process, acoustic cues signal social features affecting social weighting, which therefore influences the encoding of the speech signal (Sumner et al., 2014). This dual-route approach to speech perception predicts that the social perception of the speaker, cued by phonetic variation in the speech signal, can change the way a listener attends to speech. A potential self-voice benefit in word recognition could be observed if listeners perceive the ways in which their own voices manifest the relevant linguistic contrasts as more socially salient or prestigious, resulting in stronger encoding and better perception. Additionally, phonetic variation cueing unreliability can change the way the listener attends to stimuli, as social weighting introduces bias and discrimination early on in the perception process (Sumner, 2020; Sumner et al., 2014). While the current study presents listeners with stimuli lacking some of the cues used for speaker identification, individual variation and acoustic cues for contrasting sounds are retained. If speakers’ own productions involve low contrastiveness or variation cueing unreliability (Sumner et al., 2014), the social weighting approach may predict the absence of a self-benefit in word recognition, in favour of better perception for a prototypical or statistically average speaker (Schuerman et al., 2019; Schuerman et al., 2015). BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  13 1.3 Talker Identification When considering a self-benefit in perception, it is important to acknowledge that cues to talker identity influence speech perception (Schuerman et al., 2015). Listeners use acoustic cues to access knowledge about a specific speaker which constrains processing (Creel & Tumlin, 2011). For example, a listener hearing a voice with an f0 indicative of a female voice may form semantic predictions about the topics the speaker will talk about based on gender (Creel & Tumlin, 2011). In another study, listeners presented with a continuum of stimuli between two vowels identified different locations for phoneme boundaries depending on whether they were shown a male or female face (Johnson et al., 1999). This pattern of results was replicated when listeners were simply asked to imagine a male or female speaker. These studies indicate that indexical cues affect perception. Therefore, it is of interest to examine how speech perception is affected when listeners are lacking cues to talker identity. Research suggests that listeners encode acoustically-specific information about words, which can result in more efficient processing if it is similar to existing representations, even without awareness of speaker identity (Creel & Tumlin, 2011).  Studies investigating the importance of various talker identification cues reveal inconsistencies and some researchers propose that different acoustic cues are important for different speakers (Creel & Bregman, 2011; Van Lancker et al., 1985). In other words, one parameter that is critical to the identification of one voice may not be critical for another, if the voice is distinctive on another dimension (Van Lancker et al., 1985). However, f0 and formant spacing are two parameters that have been shown to differ reliably between speakers and are important to talker identity (Holmes et al., 2018). In the current study, some cues to talker BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  14 identity are eliminated by using the Change Gender function in Praat (Boersma & Weenink, 2020) to ensure listeners are not able to recognize their own voices. This methodology draws on the observation that manipulating fundamental frequency (f0) and formant frequencies greatly affects the success of self-voice recognition (Xu et al., 2013) and is based on the success of other studies disguising voices (Holmes et al., 2018; Mitterer et al., 2020). 1.4 Research Question The current study examines whether Cantonese-English bilinguals show a benefit in word recognition when perceiving their own voices compared to other voices, when lacking the expected cues to speaker identity. Based on the research conducted on monolingual participants who were unable to recognize their own voices, one hypothesis is that bilinguals in the current study will not show an advantage in recognizing words produced by themselves. Alternatively, the abundance of literature on the familiarity benefit and the recent study on L2 learners predict that subjects will perform better with self-produced speech, even without the expected cues for self-identification. Finally, the socially weighted encoding approach can motivate both possible outcomes of the current study, as the processing of the speech signal is dependent on social perception of the speaker. 2.0 Methods  This experiment consisted of three parts: a questionnaire, production task and perception task, all of which were completed remotely on participants’ own electronic devices. All participants gave informed consent and completed the production task and questionnaire. Several months later, the same participants took part in the perception task. All written and verbal instructions were presented in English. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  15 2.1 Participants To be eligible for this study, participants were required to be female, exposed to both Cantonese and English at an early age (before six) and minimally have the ability to carry out a basic conversation in Cantonese. Only female subjects were invited to participate to minimize between-speaker variation and to allow a more consistent vocal disguise technique (see description of audio manipulation for the perception task below). Thirty-six female Cantonese-English bilinguals participated in the experiment. While all participants completed the multilingual questionnaire and the production task, the recordings of three participants obtained during the production task were excluded from the perception task due to poor recording quality and interference from background noise. In addition, two participants who completed the production task and questionnaire did not participate in the perception task, resulting in a total of 31 subjects who completed all three parts of the study.  Participants were compensated C$5 for the production task, C$5 for the questionnaire and C$10 for the perception task, in the form of Amazon gift cards. Three participants were compensated with gift cards in US dollars, equivalent in value. Participants were recruited through the UBC Linguistics Sign-up System (Sona Systems Ltd., 2017), announcements in undergraduate classes, postings on social media and an ad on UBC’s Paid Participant Studies List hosted by the Psychology Graduate Student Council. 2.2 Materials 2.2.1 Multilingual Language Questionnaire Participants completed an online survey that presented questions from the Language Experience and Proficiency Questionnaire (LEAP-Q) (Marian et al., 2007) and the Bilingual BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  16 Language Profile (BLP) (Birdsong et al., 2012). Both resources were designed to gain a better understanding of language profiles of bilingual and multilingual speakers by including questions relating to individuals’ language history, usage, attitudes and self-rated proficiency. Additionally, general questions pertaining to participants’ biographical information were included in this questionnaire. This survey was administered in English. 2.2.2 Production Stimuli  Stimuli for the production task included 41 monosyllabic Cantonese words, presented in pictures accompanied by English translations. This provided a total of 22 minimal pairs targeting 7 segmental contrasts (see Appendix A.1 for the complete production word list). Of these, two were consonant contrasts: word-initial /ts/ and /tsh/ and word-initial /s/ and /tsh/ and five were vowel contrasts in word-medial or word-final position:  /ɐi/ and /ei/, /ɔː/ and /ou/, /ɐi/ and /aːi/, /ɐu/ and /aːu/, and /ɐ/ and /aː/. Target sounds were selected with differences between English and Cantonese phonology in consideration, supplemented with the linguistic experience of a heritage speaker of Cantonese. For example, three of the vowel contrasts chosen are distinguished by vowel length, a feature that is not lexically contrastive in English. The stimuli was designed to consist of all high level tone (T1) words to control for differences in tone that may cause unwanted variability in production or confusion in perception task performance. The words were chosen to be familiar to Cantonese speakers with limited vocabularies and had meanings that could be easily represented in pictures. To allow for a measure of speaker proficiency, a picture depicting a busy park scene was designed to elicit a short speech sample from each speaker. Pictures, as opposed to Chinese characters, were used both in the production and perception task to accommodate participants who have limited literacy skills. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  17 All pictures were hand-drawn by the researcher and presented in black and white so that no single picture was especially salient to subjects (see Appendix A.2 for the complete set of visual stimuli). 2.2.3 Perception Stimuli A subset of the stimuli words used in the production task were featured in the perception task. These consisted of 13 minimal pairs featuring five vowel contrasts: /ɐi/ and /ei/, /ɔː/ and /ou/, /ɐi/ and /aːi/, /ɐu/ and /aːu/, and /ɐ/ and /aː/ (see Table 1 below). The same pictures corresponding to these target words from the production experiment were used in the perception task. The manipulation of the audio stimuli for the perception experiment will be described below. Table 1  Perception Stimuli  Chinese Character English Gloss Jyutping Romanization 雞 chicken gai1 機 machine gei1 街 street gaai1 揮 to wave fai1 飛 to fly fei1 多 many do1 刀 knife dou1 歌 song go1 高 tall gou1 梳 comb so1 鬚 beard/mustache  sou1 波 ball bo1 煲 pot bou1 Chinese Character English Gloss Jyutping Romanization 踎 to squat mau1 貓 cat maau1 秋 autumn cau1 抄 to copy caau1 咯 cough kat1 咭 card kaat1 心 heart sam1 衫 shirt saam1 西 west sai1 嘥 to waste saai1 龜 turtle gwai1 乖 well-behaved gwaai1    BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  18 2.3 Production Task 2.3.1 Procedure For the production task, participants first watched a video tutorial (made by the author) on how to record themselves producing the list of target words. This video included a familiarization phase for participants to learn the intended referents of the pictorial stimuli. For each target word, participants would hear a Cantonese word and see its corresponding picture and English translation. Afterwards, participants were instructed to download Praat (Boersma & Weenink, 2020) and record themselves using the built-in microphone of their personal electronic devices at a sampling frequency of 44100Hz. Participants accessed a .pdf file containing the picture stimuli and were asked to verbally label the target words in Cantonese, given the picture and English translation as they proceeded through the randomized list at their own pace. Each picture was shown twice to elicit two productions of each word, for a total of 82 productions. Lastly, participants were asked to verbally describe a picture of a busy park scene in Cantonese, in as much detail as they wanted. Participants saved their recordings according to their anonymous participant ID number and uploaded their recordings to Dropbox.  2.3.2 Segmentation Words of the minimal pairs were segmented from recordings using Praat (Boersma & Weenink, 2020). Recordings from three participants were excluded from this process due to poor recording quality. From the productions of the remaining 33 speakers, nine speakers had at least one word excluded for a total of 15 words excluded from analyses due to incorrect labelling of the picture stimuli. The removal of one item, entailed the removal of two, as the minimal pair was removed from that individual’s set.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  19 Because stimuli words were produced in isolation, word-initial stops /b/, /d/, /g/, /k/ and /kw/ were identified as beginning with the stop burst, seen as an abrupt change in amplitude in the waveform and ending with the onset of quasi-periodic activity of the following vowel. The offset of the labialized voiceless velar stop /kw/ was identified as a change in the waveform from a simpler periodic pattern to a more complex periodic pattern of a vowel. In this set of stimuli, the only word-final stop was /t /̚. The end boundary of this unreleased stop was identified as the same point as the end of its preceding vowel. Fricatives /s/ and /f/ were identified in waveforms as aperiodic or random patterns indicating frication noise. Affricates /ts/ and /tsh/ were identified as beginning with a stop burst and ending with the offset of frication noise, signalling the end of the fricative. Aspirated alveolar affricates showed a period of high amplitude frication followed by a period of lower amplitude frication and the boundaries for aspiration were annotated using low amplitude frication as a cue. One participant produced target words intended to contain word-initial aspirated alveolar affricates with voiceless fricatives instead. For these productions, the onset and offset of the aspirated alveolar affricate /tsh/ was marked at the same points as the beginning and end of aspiration shown in the waveform. The onset of nasals /m/, /n/ and /ŋ/ were identified at the point of a most discrete change in amplitude in the waveform. The offset of the nasal consonants in word-initial position were indicated by a sudden increase in intensity at the beginning of the following vowel. Another cue used to identify this boundary was the change from a simple waveform pattern with lower frequencies, characteristic of nasal consonants, to a more complex pattern with both high and low frequencies, characteristic of vowels. Likewise, the opposite change in intensity and opposite shift in waveform patterns indicated boundary of the BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  20 word-final nasal /ŋ/. All word and sound boundaries were placed as closely as possible to zero crossings to prevent auditory distortions resulting from discontinuities at the beginnings and ends of sound intervals. Words in all 22 minimal pairs were segmented, although only the subset of words comprising 13 minimal pairs were used in the perception task. Target words were saved into their own files, while target sounds were trimmed into files with 25ms buffers at the onset and offset of sounds in preparation for acoustic analysis.  2.3.3 Grouping Voices Acoustic analyses served to group voices into five groups (Groups A, B, C, D and E) reflecting how discretely speakers produced the contrast between the two words of each minimal pair. We will refer to this measure as “contrastiveness”, as it denotes the acoustic difference between target sounds in minimal pairs, but does not necessarily imply speaker proficiency or production accuracy. Because of the considerable amount of individual variation observed between minimal pairs within vowel contrasts, group assignments were done separately for each minimal pair. First, we took the files segmented with 25ms buffers at the onset and offset of target vowels and used Fast Track (Barreda, 2021), a formant tracker plug-in via Praat (Boersma & Weenink, 2020), to estimate formant trajectories with measurements every two seconds for each vowel. The frequency range was set at 5,000 to 7,000 Hz to reflect a speaker of “medium height” (Barreda, 2021), as all participants in our study were female adults.  Secondly, we converted the frequencies of the formant trajectories from Hertz into Bark scale to better reflect auditory processing (Traunmüller, 1990). With the obtained Bark-scaled formant trajectories, we then performed a discrete cosine transform (DCT) which yielded three BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  21 primary coefficients for F1 and F2 of each formant analyzed. The three coefficients corresponded to the mean of the formant, the slope of the formant and the curvature of the slope. In addition to these six dimensions, we also measured vowel duration as a seventh dimension in which speakers could potentially show distinctiveness in production. While not all seven dimensions may be used to contrast the target vowels in our minimal pairs, we did not exclude any particular parameter to avoid making any a priori claims about the relative importance of these cues for contrastiveness for this bilingual population. We centered, scaled and calculated Euclidean distances for each talker’s minimal pair along all seven dimensions. Lastly, we organized speakers according to the contrastiveness of their productions. This was done by ranking the Euclidean distances and using the rankings to form groups assignments, in which a greater Euclidean distance indicated a more distinctive production. Within each minimal pair, we formed five groups, ranging from A (most contrastive) to E (least contrastive), consisting of five, six or seven different voices. The groups were manually adjusted to be approximately equally sized, as some talkers were missing tokens and therefore would not be presented with that particular minimal pair in their individualized perception experiment. Each subject was presented with a perception experiment featuring their own productions and the productions of other members of their contrastiveness group, for each minimal pair. Therefore, the number of unfamiliar voices heard by each participant varied according to their group memberships.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  22 2.4 Perception Task 2.4.1 Audio Manipulation For the perception experiment, recordings segmented into isolated words were altered to change female voices into male-like voices using the Change-Gender function in Praat (Boersma & Weenink, 2020). This application lowered the fundamental frequency (f0) and formant frequencies of the original productions by multiplying these dimensions by factors specific to each speaker. Modulation of these parameters have been shown to influence the accuracy of self-voice recognition (Xu et al., 2013) and previous studies have successfully disguised voices using the Change Gender function (Holmes et al., 2018; Mitterer et al., 2020). For speakers in the current study, the multiplication factors for f0 and formant frequencies ranged from 0.55 to 0.75 (mean = 0.62) and 0.79 to 0.83 (mean = 0.81) respectively. Pitch range parameters were adjusted as necessary to ensure accurate pitch tracking (see Appendix B.1). Finally, the target stimuli were amplitude normalized to 65 dB and mixed in continuous speech-shaped noise at a signal-to-noise ratio (SNR) of +5 dB to increase the difficulty of the task. 2.4.2 Procedure The same speakers who completed the production task were invited to complete the perception task several months after, which was administered online using jsPsych (de Leeuw, 2015). This perception experiment was a two-alternative forced choice lexical identification task featuring the acoustically altered recordings described above. For each trial, participants heard an isolated Cantonese word produced either by themselves or another speaker along with two pictures on the left and right sides of the screen, representing the appropriate Cantonese minimal pair. Participants were required to choose the picture corresponding to the BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  23 word they heard by pressing the keys “F” or “J” for the left and right sides of the screen, respectively. Participants had up to 5 seconds to respond to each item. Three practice trials were provided (see Appendix A.3). Audio stimuli were presented at a comfortable listening level and participants completed a headphone check prior to beginning the experiment (Woods et al., 2017). There were four repetitions of each token for a total of 560 to 688 trials for each participant’s personalized experiment (up to 26 items (e.g., 13 minimal pairs) x a range of 5 to 7 speakers in each by minimal pair group x 4 repetitions of each token). Trials were fully randomized across four blocks between which participants were offered a self-paced break. At the end of the experiment, participants were asked if they recognized their own voice throughout the experiment, to which they selected “yes” or “no” on the screen. The perception experiment was completed on participant’s own electronic devices and took approximately 35 to 40 minutes to complete.  3.0 Results  Participants’ responses on the perception task were scored as either correct or incorrect depending on whether listeners chose the picture corresponding to the intended word. We analyzed the proportion of correct responses as the dependent variable in a generalized linear mixed effects model fitted by maximum likelihood via Laplace approximation. Models were implemented in R (R Core Team, 2020). The main variable of interest was Voice Match, referring to whether listeners heard their own voice or not. There was a significant effect of Voice Match (B = -0.0357, SE = 0.0134, z = -2.657, p = 0.008), which indicates that listeners were significantly less accurate with other voices compared to their own. These results are shown by-BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  24 listener in Figure 1. This visualization indicates that despite the statistical reliability of this result, some participants were more accurate on voices other than their own.  To compare the performance of listeners in adjacent acoustic contrastiveness groups, models were fitted with Group as a fixed factor. The comparison of Group A and B showed a significant difference (B = -0.14978, SE = 0.0701, z = -2.135 , p = 0.033), along with the comparisons of Group B and C (B = 0.2046, SE = 0.0654, z = 3.130, p = 0.002), Group C and D (B = 0.2428, SE = 0.0625, z = 3.882, p = 0.0001) and Group D and E (B = 0.2562, SE = 0.05977, z = 4.286, p = 1.82e-05). These results can be seen in Figure 2 and suggest that adjacent group levels are significantly different from one another. To see whether the effect of listeners hearing their own voice was present in all five acoustic contrastiveness groups, we examined the interaction between Voice Match and Group. This interaction was not significant (B (Voice Match:Group A and B) = -0.0452, SE = 0.0301, z = -1.502, p = 0.133; B (Voice Match:Group B and C) = 0.0536, SE = 0.0286, z = 1.877, p = 0.06]; B (Voice Match:Group C and D) = -0.038, SE = 0.02767, z = -1.372, p = 0.17; B (Voice Match:Group D and E) = 0.0134, SE = 0.02559, z = 0.522, p = 0.602) indicating that the effect of Voice Match was consistent across all contrastiveness groups. A model with proportion of correct responses as dependent variable and trial index as independent variable showed a significant difference (B = 0.0873, SE = 0.0311, z = 2.804, p = 0.005), suggesting that listeners’ performance on the task improved as the task went on. This was true for listeners’ hearing all voices, not only their own, indicating that the observed self-voice advantage did not only emerge over the course of the experiment and could not simply be accounted for by listeners hearing their own voice in more trials than any single other voice. Lastly, one fourth of the participants in the perception task reported hearing their own voice throughout the BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  25 experiment. Therefore, the self-voice benefit appears to exist regardless of whether or not listeners recognized their speech as self-produced.  Figure 1. Proportion of  correct responses in the perception task for other voice and own voice shown by listener.    Figure 2. Proportion of correct responses in the perception task for five acoustic contrastiveness groups, with Group A as the most contrastive group and Group E as the least. Responses to both own voice and other voice are included.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  26 4.0 Discussion  This experiment set out to examine whether Cantonese-English bilinguals show an advantage when listening to their own voices compared to others’ voices, while lacking the expected cues to speaker identity. Our results demonstrate that regardless of whether or not listeners reported hearing their own voices, they are better at recognizing words containing difficult vowel contrasts if they had produced the words themselves as opposed to if they were produced by other acoustically similar speakers in their group. This effect of Voice Match, referring to whether listeners’ heard their own voice or not, did not differ between contrastiveness groups. In other words, speakers who produced contrasts more contrastively and heard other similarly contrastive speakers were more accurate with their own voice, just as speakers whose productions were less contrastive and who heard less contrastive stimuli perceived self-produced speech better.  In the analysis comparing the proportion of correct responses of listeners in adjacent acoustic contrastiveness groups, we found that adjacent groups are significantly different from one another. This suggests that acoustic contrastiveness may predict perception accuracy such that the more contrastive the participants’ productions were, the higher proportion of correct responses were given. We defined contrastiveness as the acoustic difference between target sounds in minimal pairs, but did not imply speaker proficiency or production accuracy with the use of this term. Therefore, the degree of distinctiveness of speakers’ productions appears to relate to speaker proficiency. Our understanding of this relationship will be further elucidated with future analysis of data from the multilingual language questionnaire, including participants’ self-reported proficiency scores.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  27 Results of the current study also confirm that this observed self-advantage was not simply due to listeners hearing their own voice more than other voices throughout the experiment. Performance on the perception task improved over the course of the experiment when participants heard both their own voice and other voices. Although subjects heard their own voice more often than any single other voice in the experiment, the proportion of correct responses increased with trial number for all voices. This was likely due to participants becoming familiarized with the picture stimuli, the paradigm itself and perceiving speech in noise as the trials went on. Altogether, these data provide statistically robust evidence for an advantage in perceiving self-produced speech regardless of whether or not participants recognized their own voice.  4.1 Self-voice Recognition Despite the success of previous studies employing the Change-Gender function in Praat (Boersma & Weenink, 2020) to disguise speaker identity (Holmes et al., 2018; Mitterer et al., 2020), one fourth of the participants in the current study reported hearing their own voice throughout the experiment (see limitations and future directions below). Modulation of the fundamental frequency (f0) and formant frequencies of the audio stimuli for the perception task intended to remove some of the expected cues to speaker identity to examine whether recognition of speech as self-produced was required for the self-benefit to be observed. Previous experiments finding no support for a self-voice advantage have featured noise-vocoded speech in which spectral cues important to talker identification are eliminated (Schuerman et al., 2019; Schuerman et al., 2015). The removal of these expected cues to speaker identity could explain the absence of a self-benefit in those studies, but researchers BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  28 posit that voice-recognition and speech perception rely on separate systems (Holmes et al., 2018). One study supporting the latter interpretation found that in the presence of a competing talker, a familiar voice was more intelligible than a stranger, even when listeners did not recognize the familiar voice (Holmes et al., 2018). Thus, the ability to explicitly recognize familiar voices or one’s own voice may be independent from its benefit in perception.   Existing literature supports the view that listeners use different acoustic cues for speaker identification (Lavner et al., 2000; Van Lancker et al., 1985). The consequences of modifying acoustic features cueing the identity of speakers differed for each speaker, in that the same modifications caused some speakers to be unidentifiable and others to be unaffected. Researchers speculate that individual speakers possess unique acoustic features critical for identification, so one dimension that may be expendable for one voice may be essential for another (Lavner et al., 2000; Van Lancker et al., 1985). In the current study, voices that were sufficiently distinctive on dimensions other than those modified by our vocal disguise technique may have been more easily recognizable, providing an explanation for why some participants reported recognizing their own voices and some did not. As the aforementioned studies have investigated the acoustic cues listeners use to identify familiar others, an examination of the dimensions important to the identification of one’s own voice would make a fascinating comparison.  4.2 Theories Behind the Self-voice Benefit  4.2.1 Familiarity Benefit Bilingual participants in the current study showed a self-perception benefit, which may be accounted for by the high level of familiarity listeners have with their own voices. Existing BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  29 literature has found strong support for the familiarity benefit, which refers to a positive relationship between familiarity with a talker’s voice and the ease of perception. Nygaard and Pisoni’s (1998) experiment demonstrated that listeners were better at identifying words spoken by a familiar talker, whom they had been trained to recognize, compared to an unfamiliar talker. Additionally, subjects who were implicitly trained on a talker showed a familiarity benefit for novel material produced by that speaker (Kreitewolf et al., 2017). In this case, subjects were simply exposed to the talker without explicitly trying to identify them. As a familiarity benefit can be observed after exposure during only four training sessions in the aforementioned study, a greater benefit should be expected for a listener’s own voice, considering the abundance of feedback a listener receives. Exposure to one’s own productions causes speakers to be highly familiar with the unique patterns, phonetic variability and acoustic cues present in their own speech, which can aid in perception. Researchers have also postulated that in adapting to one’s own speech, language learners have adapted to their own accented or non-native productions, posing a disadvantage for second language acquisition and accent reduction. In this regard, researchers assume that perception and production are linked as the listener’s ability to perceive errors and fine distinctions relates to their ability to produce those contrasts.  4.2.2 Common Coding Theory Another explanation for the observed self-voice advantage is the common coding theory of perception, which posits that there exists a shared representation for both perception and action (Prinz, 1990). According to this theory, during the perception of an action, the same mental representations activated when producing the action are accessed. Assuming a shared mental representation for perception and production, the common coding theory predicts that BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  30 listeners compare incoming speech signals to their own productions. Therefore, in perceiving one’s own voice, perception is facilitated because the auditory signal matches the listeners’ own productions to a greater degree. A study that provides evidence in support of common coding found that participants were better at lipreading silent videos of themselves compared to others (Tye-Murray et al., 2013). A subsequent study found that subjects benefitted from seeing themselves more than other participants benefitted from seeing them, even when individual differences were controlled for (Tye-Murray et al., 2015). Researchers explain that speech gestures correspond to unique motor signatures that are more easily activated when the performer and observer are the same person (Tye-Murray et al., 2013), implying that the self-produced stimuli should yield a more robust advantage. While common coding explains perceptual facilitation for self-generated actions, this view does not account for perceptual learning or social weighted encoding. Perceptual learning refers a listener’s ability to change category boundaries or phonetic cue weighting following exposure to novel input (Norris et al., 2003; Schertz & Clare, 2020). If perception and production relied on the same mental representations, the reorganization of phonetic space or changes in the weighting of acoustic cues due to perceptual learning should also be observed in that individual’s productions, but this is not well supported in existing literature (Schertz & Clare, 2020). The common coding view is also inconsistent with a socially weighted encoding approach which emphasizes the role of acoustic cues signalling social features which influences the encoding of the speech signal (Sumner, 2020; Sumner et al., 2014). Under this account, listeners process the speech signal with regards to social perception of the speaker, not according to listeners’ own mental representations for production.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  31 4.2.3 Prototype Theory  Voice prototype research can also elucidate our understanding of the benefit in self-voice perception. According to the prototype theory, each stimulus is compared to a representative or central member of its category; stimuli that better approximates the prototype will be more easily perceived as belonging to the category (Lavner et al., 2001). Under this interpretation, talker identification relies on the storage and retrieval of identities based on a set of features deviating from the prototype. As previous studies have shown, the  acoustic dimensions used to characterize different voices are often talker-specific (Lavner et al., 2000; Van Lancker et al., 1985). Voices that deviate more from the prototype are perceived as more distinct and thus, the more distant a speaker’s acoustic features are from the central model, the easier the speaker is to be identified (Latinus et al., 2013; Lavner et al., 2001). This may partially explain the variance in participants’ self-reports of hearing their own voices in the current study despite our attempt to disguise vocal identity. Those who successfully identified themselves may have had voices that deviated more from the average template and were therefore easier to recognize. This begs the question of what the prototype is derived from. Researchers have proposed that the prototype is an average, commonly encountered, yet attractive voice (Latinus et al., 2013; Lavan et al., 2019; Lavner et al., 2001). Accordingly, this voice should be representative of the listeners’ language input and environment, and people of the same linguistic community would be expected to share a similar template (Lavner et al., 2001). The implications for having a voice that approximates listeners’ prototypes with regards to a benefit in perception needs to be explored further. In Schuerman et al.’s experiments in 2015 and 2019, researchers selected a statistically average speaker among the subjects in their BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  32 studies to represent the average of the linguistic community. This average speaker had the smallest average distance from all other subjects on several acoustic variables. Under degraded listening conditions, monolingual listeners in their studies showed better perception of words produced by the selected average speaker than the listeners themselves. This implies that the benefit of a prototypical voice may extend beyond the benefit of hearing one’s own voice for word recognition.  The finding of a self-voice benefit in the current study is statistically robust, however some participants did appear to perform less accurately on their own voices. Along the lines of the prototype theory, our results may be explained by how distant a particular individual is from the prototype. For example, participants exemplifying a self-benefit may better approximate the prototype, while those performing worse with their own voices may deviate greatly from the prototype relative to other speakers in their group. While an average speaker in a population of bilinguals would be more difficult to extract, a future study examining the perception accuracy of bilinguals hearing their own voices and a prototypical voice would allow for a better comparison of the self-voice benefit in monolinguals and bilinguals. Likewise, a study on monolinguals featuring a similar paradigm as the current study and Eger and Reinisch’s (2019) experiment would reveal differences due to language background and the effect of degraded listening conditions on the presence of a self-benefit. 4.3 Limitations and Future Directions  Interpretations of the current results are limited by the experimental design in which the perception task lacked an objective measure of whether participants truly recognized their own voice in the study. Participants were only required to self-report whether they had heard BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  33 their own voice throughout the experiment, while an added task of having to choose their voice from a lineup of acoustically altered voices would provide a more justifiable report. Therefore, the current results provide evidence for a self-voice advantage in perception regardless of whether or not participants report having recognized their own voices, but generalizations cannot be made regarding the mechanisms behind this perceptual benefit and self-recognition.  Stimuli were chosen based on the differences between English and Cantonese phonology presuming that the productions of bilinguals would show English influence. There was a lack of previous research on Cantonese-English bilinguals confirming the difficulty of the chosen contrasts and whether certain contrasts were more challenging to differentiate than others. The difficulty of sound contrasts has been shown to have an effect on the self-benefit in that when recognizing words containing difficult contrasts (not present in the listeners’ first language), the self-benefit was more apparent in low-proficiency L2 learners compared to high-proficiency ones. No such difference was observed for contrasts present in the listeners’ first language (Eger & Reinisch, 2019). Furthermore, the stimuli chosen were not all equally frequent or familiar due to constraints on the number of minimal pairs featuring these contrasts and the requirement that these stimuli had to be easily depicted, high level tone words. For this same reason, the number of minimal pairs selected for each of the five vowel contrasts were limited and unequal. We observed a considerable amount of individual variation between minimal pairs within vowel contrasts during acoustic analysis. While this was unexpected, it can potentially be explained by the varying levels of word frequency for our chosen stimuli. This variation prompted our decision to form acoustic contrastiveness groups by minimal pair. Nevertheless, our data show that our groupings do reflect performance accuracy.  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  34 A question that will be answered by further analysis of data from our multilingual questionnaire is whether our acoustically motivated groupings of contrastiveness are comparable to participants’ self-ratings of accentedness. Given the research on speakers becoming familiarized with their own accented productions and listeners perceiving their own voices as more target-like or attractive than others (Hughes & Harrison, 2013; Mitterer et al., 2020; Peng et al., 2019), participants may have biased self-ratings of accentedness that do not resemble measured contrastiveness. Self-reported measures of language dominance will reveal any patterns present in the language profiles of participants who performed more poorly on their own voice than other voices. Additionally, language dominance may predict overall accuracy on the perception task and modulate the extent to which hearing one’s own voice benefits perception.   5.0 Conclusion    The current study provides a first look into the self-voice benefit in perception in a population of Cantonese-English bilinguals. Previous experiments involving various linguistic populations have found support for an advantage in perceiving one’s own voice but not in adverse listening conditions. Performance by bilinguals in the current study reveals the presence of a statistically robust benefit for self-produced speech regardless of whether or not participants explicitly recognized their own voices. The specific listening conditions required for such an advantage remain unclear and the success of the vocal disguise technique applied in the current study needs to be further investigated. Future directions include analysis of the accompanying multilingual language questionnaire data and further exploration of the acoustic BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  35 cues used to differentiate contrasts in Cantonese as well as the features necessary for speaker self-identification. Acknowledgements    This research was funded by a grant from the Natural Sciences and Engineering Research Council awarded to the PI. I am truly appreciative of all the participants who took the time to participate in this research. I am also grateful to my family and friends for their unlimited encouragement. Lastly, I want to thank the members of the Speech in Context Lab, particularly Rachel Soo and Fion Fung, and my supervisor Molly Babel for all of their support and guidance in every step of this project.      BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  36 References  Adank, P., Evans, B. G., Stuart-Smith, J., & Scott, S. K. (2009). Comprehension of Familiar and Unfamiliar Native Accents Under Adverse Listening Conditions. Journal of Experimental Psychology: Human Perception and Performance, 35(2), 520–529. https://doi.org/10.1037/a0013552 Barreda, S. (2021). Fast Track: Fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard, 7(1), 1379–1393. https://doi.org/10.1515/lingvan-2020-0051 Bent, T., & Bradlow, A. R. (2003). The interlanguage speech intelligibility benefit. The Journal of the Acoustical Society of America, 114(3), 1600–1610. https://doi.org/10.1121/1.1603234 Boersma, P., & Weenink, D. (2020). Praat: Doing phonetics by computer (6.1.21). http://www.praat.org/ Bosch, L., & Ramon-casas, M. (2011). Variability in vowel production by bilingual speakers : Can input properties hinder the early stabilization of contrastive categories ? Journal of Phonetics, 39(4), 514–526. https://doi.org/10.1016/j.wocn.2011.02.001 Bosch, L., & Ramon-Casas, M. (2009). Phonetic variability in bilinguals’ acquisition of native-vowel category contrasts. The Journal of the Acoustical Society of America, 125(4), 2770–2770. https://doi.org/10.1121/1.4784720 Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106(2), 707–729. https://doi.org/10.1016/j.cognition.2007.04.005 Candini, M., Zamagni, E., Nuzzo, A., Ruotolo, F., Iachini, T., & Frassinetti, F. (2014). Who is speaking? Implicit and explicit self and other voice recognition. Brain and Cognition, 92, 112–117. https://doi.org/10.1016/j.bandc.2014.10.001 BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  37 Creel, S. C., & Bregman, M. R. (2011). How Talker Identity Relates to Language Processing. Linguistics and Language Compass, 5(5), 190–204. https://doi.org/10.1111/j.1749-818X.2011.00276.x Creel, S. C., & Tumlin, M. A. (2011). On-line acoustic and semantic interpretation of talker information. Journal of Memory and Language, 65(3), 264–285. https://doi.org/10.1016/j.jml.2011.06.005 de Leeuw, J. R. (2015). jsPsych: A JavaScript library for creating behavioral experiments in a web browser. Behavior Research Methods, 47(1), 1–12. https://doi.org/doi:10.3758/s13428-014-0458-y Devue, C., & Brédart, S. (2011). The neural correlates of visual self-recognition. Consciousness and Cognition, 20(1), 40–51. https://doi.org/10.1016/j.concog.2010.09.007 Eger, N. A., & Reinisch, E. (2019). The impact of one’s own voice and production skills on word recognition in a second language. Journal of Experimental Psychology: Learning Memory and Cognition, 45(3), 552–571. https://doi.org/10.1037/xlm0000599 Guenther, F. H. (2006). Cortical interactions underlying the production of speech sounds. Journal of Communication Disorders, 39(5), 350–365. https://doi.org/10.1016/j.jcomdis.2006.06.013 Holmes, E., Domingo, Y., & Johnsrude, I. S. (2018). Familiar Voices Are More Intelligible, Even if They Are Not Recognized as Familiar. Psychological Science, 29(10), 1575–1583. https://doi.org/10.1177/0956797618779083 Holmes, E., & Johnsrude, I. S. (2020). Speech Spoken by Familiar People Is More Resistant to Interference by Linguistically Similar Speech. Journal of Experimental Psychology: Learning BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  38 Memory and Cognition, 46(8), 1465–1476. https://doi.org/10.1037/xlm0000823 Hughes, S. M., & Harrison, M. A. (2013). I like my voice better: Self-enhancement bias in perceptions of voice attractiveness. Perception, 42(9), 941–949. https://doi.org/10.1068/p7526 Hughes, S. M., & Nicholson, S. E. (2010). The processing of auditory and visual recognition of self-stimuli. Consciousness and Cognition, 19(4), 1124–1134. https://doi.org/10.1016/j.concog.2010.03.001 Johnson, K., Strand, E. A., & D’Imperio, M. (1999). Auditory-visual integration of talker gender in vowel perception. Journal of Phonetics, 27(4), 359–384. https://doi.org/10.1006/jpho.1999.0100 Johnsrude, I. S., Mackey, A., Hakyemez, H., Alexander, E., Trang, H. P., & Carlyon, R. P. (2013). Swinging at a Cocktail Party: Voice Familiarity Aids Speech Perception in the Presence of a Competing Voice. Psychological Science, 24(10), 1995–2004. https://doi.org/10.1177/0956797613482467 Kaplan, J. T., Aziz-Zadeh, L., Uddin, L. Q., & Iacoboni, M. (2008). The self across the senses: An fMRI study of self-face and self-voice recognition. Social Cognitive and Affective Neuroscience, 3(3), 218–223. https://doi.org/10.1093/scan/nsn014 Keenan, J. P., Ganis, G., Freund, S., & Pascual-Leone, A. (2000). Self-face identification is increased with left hand responses. Laterality: Asymmetries of Body, Brain and Cognition, 5(3), 259–268. https://doi.org/10.1080/713754382 Keyes, H., Brady, N., Reilly, R. B., & Foxe, J. J. (2010). My face or yours? Event-related potential correlates of self-face processing. Brain and Cognition, 72(2), 244–254. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  39 https://doi.org/10.1016/j.bandc.2009.09.006 Kreitewolf, J., Mathias, S. R., & von Kriegstein, K. (2017). Implicit talker training improves comprehension of auditory speech in noise. Frontiers in Psychology, 8(SEP), 1–8. https://doi.org/10.3389/fpsyg.2017.01584 Latinus, M., McAleer, P., Bestelmeyer, P. E. G., & Belin, P. (2013). Norm-based coding of voice identity in human auditory cortex. Current Biology, 23(12), 1075–1080. https://doi.org/10.1016/j.cub.2013.04.055 Lavan, N., Knight, S., & McGettigan, C. (2019). Listeners form average-based representations of individual voice identities. Nature Communications, 10(1). https://doi.org/10.1038/s41467-019-10295-w Lavner, Y., Gath, I., & Rosenhouse, J. (2000). Effects of acoustic modifications on the identification of familiar voices speaking isolated vowels. Speech Communication, 30(1), 9–26. https://doi.org/10.1016/S0167-6393(99)00028-X Lavner, Y., Rosenhouse, J., & Gath, I. (2001). The prototype model in speaker identification by human listeners. International Journal of Speech Technology, 4(1), 63–74. https://doi.org/10.1023/A:1009656816383 Liu, L., Li, W., Li, J., Lou, L., & Chen, J. (2019). Temporal features of psychological and physical self-representation: An ERP study. Frontiers in Psychology, 10(APR). https://doi.org/10.3389/fpsyg.2019.00785 Mitterer, H., Eger, N. A., & Reinisch, E. (2020). My English sounds better than yours: Second-language learners perceive their own accent as better than that of their peers. PLoS ONE, 15(2), 1–12. https://doi.org/10.1371/journal.pone.0227643 BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  40 Newman, R. S., Clouse, S. A., & Burnham, J. L. (2001). The perceptual consequences of within-talker variability in fricative production. The Journal of the Acoustical Society of America, 109(3), 1181–1196. https://doi.org/10.1121/1.1348009 Norris, D., McQueen, J. M., & Cutler, A. (2003). Perceptual learning in speech. Cognitive Psychology, 47(2), 204–238. https://doi.org/10.1016/S0010-0285(03)00006-9 Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception and Psychophysics, 60(3), 355–376. https://doi.org/10.3758/BF03206860 Nygaard, L. C., Sommers, M. S., & Pisoni, D. B. (1994). Speech perception as a talker-contingent process. Psychological Science, 5(1), 42–46. https://doi.org/10.1111/j.1467-9280.1994.tb00612.x Peng, Z., Wang, Y., Meng, L., Liu, H., & Hu, Z. (2019). One’s own and similar voices are more attractive than other voices. Australian Journal of Psychology, 71(3), 212–222. https://doi.org/10.1111/ajpy.12235 Perry, L. K., Mech, E. N., MacDonald, M. C., & Seidenberg, M. S. (2018). Influences of speech familiarity on immediate perception and final comprehension. Psychonomic Bulletin and Review, 25(1), 431–439. https://doi.org/10.3758/s13423-017-1297-5 Platek, S. M., Keenan, J. P., Gallup, G. G., & Mohamed, F. B. (2004). Where am I? The neurological correlates of self and other. Cognitive Brain Research, 19(2), 114–122. https://doi.org/10.1016/j.cogbrainres.2003.11.014 Platek, S. M., Loughead, J. W., Gur, R. C., Busch, S., Ruparel, K., Phend, N., Panyavin, I. S., & Langleben, D. D. (2006). Neural substrates for functionally discriminating self-face from personally familiar faces. Human Brain Mapping, 27(2), 91–98. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  41 https://doi.org/10.1002/hbm.20168 Prinz, W. (1990). A Common Coding Approach to Perception and Action. In O. Neumann & W. Prinz (Eds.), Relationships Between Perception and Action: Current Approaches (pp. 167–201). Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-75348-0_7 R Core Team. (2020). R: A language and environment for statistical computing. R Foundation for Statistical Computing. https://www.r-project.org/ Reinfeldt, S., Östli, P., Håkansson, B., & Stenfelt, S. (2010). Hearing one’s own voice during phoneme vocalization—Transmission by air and bone conduction. The Journal of the Acoustical Society of America, 128(2), 751–762. https://doi.org/10.1121/1.3458855 Rosa, C., Lassonde, M., Pinard, C., Keenan, J. P., & Belin, P. (2008). Investigations of hemispheric specialization of self-voice recognition. Brain and Cognition, 68(2), 204–214. https://doi.org/10.1016/j.bandc.2008.04.007 Schertz, J., & Clare, E. J. (2020). Phonetic cue weighting in perception and production. Wiley Interdisciplinary Reviews: Cognitive Science, 11(2), 1–24. https://doi.org/10.1002/wcs.1521 Schuerman, W. L., Meyer, A., & McQueen, J. M. (2015). Do we perceive others better than ourselves? A perceptual benefit for noise-vocoded speech produced by an average speaker. PLoS ONE, 10(7), 1–18. https://doi.org/10.1371/journal.pone.0129731 Schuerman, W., McQueen, J. M., & Meyer, A. (2019). Speaker Statistical Averageness Modulates Word Recognition in Adverse Listening Conditions. 19th International Congress of Phonetic Sciences (ICPhS 2019), 1203–1207. Shuster, L. I., & Durrant, J. D. (2003). Toward a better understanding of the perception of self-produced speech. Journal of Communication Disorders, 36(1), 1–11. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  42 https://doi.org/10.1016/S0021-9924(02)00132-6 Souza, P., Gehani, N., Wright, R., & McCloy, D. (2013). The advantage of knowing the talker. Journal of the American Academy of Audiology, 24(8), 689–700. https://doi.org/10.3766/jaaa.24.8.6 Stenfelt, S., & Goode, R. L. (2005). Bone-conducted sound: Physiological and clinical aspects. Otology and Neurotology, 26(6), 1245–1261. https://doi.org/10.1097/01.mao.0000187236.10842.d5 Stibbard, R. M., & Lee, J.-I. (2006). Evidence against the mismatched interlanguage speech intelligibility benefit hypothesis. The Journal of the Acoustical Society of America, 120(1), 433–442. https://doi.org/10.1121/1.2203595 Sumner, M. (2020). The social weight of spoken words. Trends in Cognitive Sciences, 19(5), 238–239. https://doi.org/10.1016/j.tics.2015.03.007 Sumner, M., Kim, S. K., King, E., & McGowan, K. B. (2014). The socially weighted encoding of spoken words: A dual-route approach to speech perception. Frontiers in Psychology, 4(JAN), 1–13. https://doi.org/10.3389/fpsyg.2013.01015 Traunmüller, H. (1990). Analytical expressions for the tonotopic sensory scale. Journal of the Acoustical Society of America, 88(1), 97–100. https://doi.org/10.1121/1.399849 Tye-Murray, N., Spehar, B. P., Myerson, J., Hale, S., & Sommers, M. S. (2013). Reading your own lips: Common-coding theory and visual speech perception. Psychonomic Bulletin and Review, 20(1), 115–119. https://doi.org/10.3758/s13423-012-0328-5 Tye-Murray, N., Spehar, B. P., Myerson, J., Hale, S., & Sommers, M. S. (2015). The self-advantage in visual speech processing enhances audiovisual speech recognition in noise. BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  43 Psychonomic Bulletin and Review, 22(4), 1048–1053. https://doi.org/10.3758/s13423-014-0774-3 Uddin, L. Q., Kaplan, J. T., Molnar-Szakacs, I., Zaidel, E., & Iacoboni, M. (2005). Self-face recognition activates a frontoparietal “mirror” network in the right hemisphere: An event-related fMRI study. NeuroImage, 25(3), 926–935. https://doi.org/10.1016/j.neuroimage.2004.12.018 Van Lancker, D., Kreiman, J., & Emmorey, K. (1985). Familiar voice recognition: patterns and parameters Part I: Recognition of backward voices. Journal of Phonetics, 13(1), 19–38. https://doi.org/10.1016/s0095-4470(19)30723-5 Woods, K. J. P., Siegel, M. H., Traer, J., & McDermott, J. H. (2017). Headphone screening to facilitate web-based auditory experiments. Attention, Perception, and Psychophysics, 79(7), 2064–2072. https://doi.org/10.3758/s13414-017-1361-2 Xiao, H., Wang, H., Jin, N., & Van De Weijer, J. (2019). The interlanguage speech intelligibility benefit disappears under noisy conditions. Proceedings of the 3rd World Conference on Smart Trends in Systems, Security and Sustainability, WorldS4 2019, 339–343. https://doi.org/10.1109/WorldS4.2019.8903953 Xu, M., Homae, F., Hashimoto, R., & Hagiwara, H. (2013). Acoustic cues for the recognition of self-voice and other-voice. Frontiers in Psychology, 4(October), 1–7. https://doi.org/10.3389/fpsyg.2013.00735 Yonan, C. A., & Sommers, M. S. (2000). The Effects of Talker Familiarity on Spoken Word Identification in Younger and Older Listeners. 15(I), 88–99. Zhao, K., Wu, Q., Zimmer, H. D., & Fu, X. (2011). Electrophysiological correlates of visually BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  44 processing subject’s own name. Neuroscience Letters, 491(2), 143–147. https://doi.org/10.1016/j.neulet.2011.01.025                    BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  45 Appendix A  Materials Table A.1 Production Word List  Chinese Character English Gloss Jyutping Romanization 雞 chicken gai1 機 machine gei1 街 street gaai1 揮 to wave fai1 飛 to fly fei1 多 many do1 刀 knife dou1 歌 song go1 高 tall gou1 梳 comb so1 鬚 beard/mustache  sou1 波 ball bo1 煲 pot bou1 踎 to squat mau1 貓 cat maau1 秋 autumn cau1 抄 to copy caau1 咯 cough kat1 咭 card kaat1 心 heart sam1 衫 shirt saam1 Chinese Character English Gloss Jyutping Romanization 西 west sai1 嘥 to waste saai1 龜 turtle gwai1 乖 well-behaved gwaai1 揸 drive zaa1 叉 fork caa1 遮 umbrella ze1 車 car ce1 鐘 clock zung1 蔥 onion cung1 追 to chase zeoi1 吹 to blow ceoi1 尖 sharp zim1 簽 to sign cim1 獅 lion si1 黐 to stick ci1 星 star sing1 蜻 dragonfly cing1 沙 sand saa1 鬆 loose sung1         BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  46 Figure A.2 Picture Stimuli  The following pictures were used in the production task. A subset of these were used in the perception task.    gai1 “chicken” gei1 “machine”   fai1 “to wave (one’s hand)” fei1 “to fly”    do1 “many” dou1 “knife”   go1 “song” gou1 “tall” BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  47   so1 “comb” sou1 “beard/mustache”     bo1 “ball” bou1 “pot”     mau1 “to squat” maau1 “cat”     cau1 “autumn” caau1 “to copy” BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  48     kat1 “to cough” kaat1 “card”     sam1 “heart” saam1 “shirt”   sai1 “west” saai1 “to waste”   gwai1 “turtle” gwaai1 “well-behaved” BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  49       zaa1 “to drive” caa1 “fork”   ze1 “umbrella” ce1 “car”     zung1 “clock” cung1 “green onion”     zeoi1 “to chase” ceoi1 “to blow”   BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  50   zim1 “sharp” cim1 “to sign”     si1 “lion” ci “to stick”     sing1 “star” cing1 “dragonfly”   saa1 “sand” sung1 “loose”  BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  51  gaai1 “street”   Figure A.3 The following pictures were used for practice trials in the perception task.      syu1 “book” gau2 “dog”     saan1 “mountain” zi2 “paper”     seoi2 “water” man1 “mosquito” BENEFIT OF OWN VOICE IN CANTONESE-ENGLISH BILINGUALS  52 Appendix B    Perception Task Audio Manipulation Table B.1  Modification parameters and pitch measurement parameters used with the Change-Gender function in Praat. Participant Pitch range factor Formant shift ratio Pitch floor Pitch ceiling 5848 0.57 0.8 150 Hz 300 Hz 11452 0.75 0.83 150 Hz 300 Hz 11458 0.65 0.79 150 Hz 300 Hz 12550 0.6 0.83 150 Hz 300 Hz 14458 0.65 0.8 150 Hz 500 Hz 14953 0.6 0.82 150 Hz 300 Hz 15544 0.65 0.8 150 Hz 500 Hz 15841 0.75 0.83 150 Hz 300 Hz 17155 0.6 0.82 150 Hz 300 Hz 19822 0.6 0.8 150 Hz 300 Hz 21094 0.55 0.8 150 Hz 300 Hz 21121 0.65 0.82 150 Hz 300 Hz 22165 0.55 0.8 150 Hz 300 Hz 24778 0.6 0.8 150 Hz 300 Hz 26236 0.6 0.83 150 Hz 500 Hz 27637 0.55 0.79 150 Hz 300 Hz 28138 0.6 0.82 150 Hz 300 Hz 28267 0.55 0.79 150 Hz 300 Hz 28408 0.6 0.83 150 Hz 300 Hz 28480 0.65 0.8 150 Hz 300 Hz 28513 0.57 0.81 150 Hz 300 Hz 28534 0.58 0.8 150 Hz 300 Hz 28567 0.7 0.83 150 Hz 300 Hz 28591 0.69 0.83 150 Hz 300 Hz 28627 0.57 0.79 150 Hz 300 Hz 28642 0.56 0.79 150 Hz 300 Hz 28675 0.58 0.79 150 Hz 300 Hz 28678 0.6 0.82 150 Hz 300 Hz 28693 0.65 0.83 150 Hz 300 Hz 28702 0.68 0.8 150 Hz 300 Hz 28729 0.6 0.8 150 Hz 300 Hz 28732 0.63 0.81 150 Hz 300 Hz 28756 0.7 0.81 150 Hz 300 Hz  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            data-media="{[{embed.selectedMedia}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
https://iiif.library.ubc.ca/presentation/dsp.52966.1-0397418/manifest

Comment

Related Items