SpiCE: Speech in Cantonese and English - UBC Library Open Collections

UBC Research Data

SpiCE: Speech in Cantonese and English Johnson, Khia A.

Description

This is the Speech in Cantonese and English (SpiCE) corpus. SpiCE is an audio corpus of conversational Cantonese-English bilingual speech recorded in Vancouver, Canada during 2018-2020. The corpus includes high-quality recordings of 34 early bilinguals in both English and Cantonese. Participants completed a sentence reading task, storyboard narration, and conversational interview in each language. These different speech tasks are available in a single audio file for each language for each talker. A Praat textgrid file accompanies each audio file. The textgrids provide hand-corrected orthographic transcription and phoneme-level forced-alignment in Cantonese and English. As an open-access language resource, SpiCE will promote bilingualism research for a typologically distinct pair of languages, of which Cantonese remains understudied despite there being millions of speakers around the world. The SpiCE corpus is especially well-suited for phonetic research on conversational speech, and enables researchers to study cross-language within-speaker phenomena for a diverse group of early Cantonese-English bilinguals. These are areas with few existing high-quality resources. Corpus documentation is available at: https://spice-corpus.readthedocs.io/.

Item Metadata

Title	SpiCE: Speech in Cantonese and English
Alternate Title	A transcribed audio corpus of conversational Cantonese-English bilingual speech
Creator	Johnson, Khia A.
Contributor	Johnson, Khia; Yiu, Nancy; Fong, Ivan; Lee, Katherine; Chan, Kristy; Oliveira Ferreira, Natália; To, Michelle; Wong, Rachel Ching Fung; Sen, Christina; Zattera, Ariana; Soo, Rachel; Babel, Molly; Johnson, Khia A.
Date Issued	2021-05-20
Description	This is the Speech in Cantonese and English (SpiCE) corpus. SpiCE is an audio corpus of conversational Cantonese-English bilingual speech recorded in Vancouver, Canada during 2018-2020. The corpus includes high-quality recordings of 34 early bilinguals in both English and Cantonese. Participants completed a sentence reading task, storyboard narration, and conversational interview in each language. These different speech tasks are available in a single audio file for each language for each talker. A Praat textgrid file accompanies each audio file. The textgrids provide hand-corrected orthographic transcription and phoneme-level forced-alignment in Cantonese and English. As an open-access language resource, SpiCE will promote bilingualism research for a typologically distinct pair of languages, of which Cantonese remains understudied despite there being millions of speakers around the world. The SpiCE corpus is especially well-suited for phonetic research on conversational speech, and enables researchers to study cross-language within-speaker phenomena for a diverse group of early Cantonese-English bilinguals. These are areas with few existing high-quality resources. Corpus documentation is available at: https://spice-corpus.readthedocs.io/.
Subject	Arts and Humanities; Computer and Information Science; Social Sciences; yue; yue; yuec1235; eng; eng; cana1268; hong1245
Type	Dataset
Language	Chinese; English
Date Available	2021-01-27
Provider	University of British Columbia Library
License	CC-BY 4.0
DOI	10.14288/1.0398086
URI	https://doi.org/10.5683/SP2/MJOXP3
Publisher DOI	https://doi.org/10.5683/SP2/MJOXP3
Grant Funding Agency	Social Sciences and Humanities Research Council
Rights URI	http://creativecommons.org/licenses/by/4.0
Aggregated Source Repository	Dataverse

Item Media

Item Citations and Data

License

CC-BY 4.0