The voice box : a fast coupled vocal fold model for articulatory speech synthesis

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

The voice box : a fast coupled vocal fold model for articulatory speech synthesis Vasudevan, Arvind

Abstract

Speech is unique to human beings as a means of communication and many efforts have been made towards understanding and characterizing speech. In particular, articulatory speech synthesis is a critical field of study as it works towards simulating the fundamental physical phenomena that underlines speech. Of the various components that constitute an articulatory speech synthesizer, vocal fold models play an important role as the source of the acoustic simulation. A balance between the simplicity and speed of lumped-element vocal fold models and the completeness and complexity of continuum-models is required to achieve time-efficient high-quality speech synthesis. In addition, most models of the vocal folds are seen in a vacuum without any coupling to the vocal tract model. This thesis aims to fill these lacunae in the field through two major contributions. We develop and implement a novel self-oscillating vocal-fold model, composed of an 1D unsteady fluid model loosely coupled with a 2D finite-element structural model. The flow model is capable of handling irregular geometries, different boundary conditions, closure of the glottis and unsteady flow states. A method for a fast decoupled solution of the flow equations that does not require the computation of the Jacobian matrix is provided. The simulation results are shown to agree with existing data in literature, and give realistic glottal pressure-velocity distributions, glottal width and glottal flow values. In addition, the model is more than order of magnitude faster than comparable 2D Navier-Stokes fluid solvers while better capturing transitional flow than simple Bernoulli-based flow models. Secondly, as an illustrative case study, we implement a complete articulatory speech synthesizer using our vocal fold model. This includes both lumped-element and continuum vocal fold models, a 2D finite-difference time-domain solver of the vocal tract, and a 1D tracheal model. A clear work flow is established to derive model components from experimental data or user-specified meshes, and run fully-coupled acoustic simulations. This leads to one of the few complete articulatory speech synthesizers in literature and a valuable tool for speech research to run time-efficient speech simulations, and thoroughly study the acoustic outcomes of model formulations.

Item Metadata

Title	The voice box : a fast coupled vocal fold model for articulatory speech synthesis
Creator	Vasudevan, Arvind
Publisher	University of British Columbia
Date Issued	2017
Description	Speech is unique to human beings as a means of communication and many efforts have been made towards understanding and characterizing speech. In particular, articulatory speech synthesis is a critical field of study as it works towards simulating the fundamental physical phenomena that underlines speech. Of the various components that constitute an articulatory speech synthesizer, vocal fold models play an important role as the source of the acoustic simulation. A balance between the simplicity and speed of lumped-element vocal fold models and the completeness and complexity of continuum-models is required to achieve time-efficient high-quality speech synthesis. In addition, most models of the vocal folds are seen in a vacuum without any coupling to the vocal tract model. This thesis aims to fill these lacunae in the field through two major contributions. We develop and implement a novel self-oscillating vocal-fold model, composed of an 1D unsteady fluid model loosely coupled with a 2D finite-element structural model. The flow model is capable of handling irregular geometries, different boundary conditions, closure of the glottis and unsteady flow states. A method for a fast decoupled solution of the flow equations that does not require the computation of the Jacobian matrix is provided. The simulation results are shown to agree with existing data in literature, and give realistic glottal pressure-velocity distributions, glottal width and glottal flow values. In addition, the model is more than order of magnitude faster than comparable 2D Navier-Stokes fluid solvers while better capturing transitional flow than simple Bernoulli-based flow models. Secondly, as an illustrative case study, we implement a complete articulatory speech synthesizer using our vocal fold model. This includes both lumped-element and continuum vocal fold models, a 2D finite-difference time-domain solver of the vocal tract, and a 1D tracheal model. A clear work flow is established to derive model components from experimental data or user-specified meshes, and run fully-coupled acoustic simulations. This leads to one of the few complete articulatory speech synthesizers in literature and a valuable tool for speech research to run time-efficient speech simulations, and thoroughly study the acoustic outcomes of model formulations.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-07-10
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0348751
URI	http://hdl.handle.net/2429/62195
Degree (Theses)	Master of Applied Science - MASc
Program (Theses)	Electrical and Computer Engineering
Affiliation	Applied Science, Faculty of; Electrical and Computer Engineering, Department of
Degree Grantor	University of British Columbia
Graduation Date	2017-09
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

The voice box : a fast coupled vocal fold model for articulatory speech synthesis Vasudevan, Arvind

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights