UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Dynamic vocal tract acoustic modeling using the immersed boundary method Wu, Rongshuai

Abstract

Synthesizing human speech using physics-based acoustic models is a complex task that involves simulating the wave propagation and the interactions between airflow and the dynamic vocal tract. The finite-difference time-domain (FDTD) models can synthesize static vowels but face significant limitations in representing the vocal tract’s complex boundaries and maintaining stability for dynamic, moving vocal tract structures. While dynamic Finite Element (FE) models can capture the movement of the vocal tract, they come with substantial computational costs due to the need for frequent remeshing of the entire domain during geometry updating. This thesis introduces a two-dimensional immersed boundary FDTD method (2D IB-FDTD) model to address these challenges. The proposed 2D IB-FDTD model leverages the FDTD numerical scheme with the immersed boundary approach to more precisely represent the vocal tract’s complex geometry and dynamic movement. Unlike traditional FDTD models that approximate boundaries using stair-stepped grid cells, the IB-FDTD model incorporates immersed boundary techniques, utilizing Lagrangian points to create a smooth, flexible boundary that adapts to the vocal tract’s shape. The interaction between the boundary and flow equations is facilitated through additional forcing terms characterized by boundary immittances. This approach not only allows for detailed vocal tract geometries but also supports boundary interpolation, enabling seamless transitions and maintaining stability when simulating continuous movements such as diphthongs. To validate the effectiveness of the IB-FDTD model, we conducted evaluations on both static vowels and dynamic diphthongs. We compared the results of the IB-FDTD model with those from existing 3D FEM and waveguide models. The comparisons demonstrate that the IB-FDTD model effectively replicates acoustic characteristics similar to those produced by more computationally intensive 3D models, with minor discrepancies. This study presents a robust tool for synthesizing human speech sounds, providing valuable applications for speech synthesis and articulatory research.

Item Media

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International