UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

3D subject-specific biomechanical modeling and simulation of the oral region and airway with application to speech production Mohaghegh Harandi, Negar


The oropharynx is involved in a number of complex neurological functions, such as chewing, swallowing, and speech. Disorders associated with these functions, if not treated properly, can dramatically reduce the quality of life for the sufferer. When tailored to individual patients, biomechanical models can augment the imaging data, to enable computer-assisted diagnosis and treatment planning. The present dissertation develops a framework for 3D, subject-specific biomechanical modeling and simulation of the oropharynx. Underlying data consists of magnetic resonance (MR) images, as well as audio signals, recorded while healthy speakers repeated specific phonetic utterances in time with a metronome. Based on this data, we perform simulations that demonstrate motor control commonalities and variations of the /s/ sound across speakers, in front and back vowel contexts. Results compare well with theories of speech motor control in predicting the primary muscles responsible for tongue protrusion/retraction, jaw advancement, and hyoid positioning, and in suggesting independent activation units along the genioglossus muscle. We augment the simulations with real-time acoustic synthesis to generate sound. Spectral analysis of resultant sounds vis-à-vis recorded audio signals reveals discrepancy in formant frequencies of the two. Experiments using 1D and 3D acoustical models demonstrate that such discrepancy arises from low resolution of MR images, generic parameter-tuning in acoustical models, and ambiguity in 1D vocal tract representation. Our models prove beneficial for vowel synthesis based on biomechanics derived from image data. Our modeling approach is designed for time-efficient creation of subject-specific models. We develop methods that streamline delineation of articulators from MR images and reduce expert interaction time significantly (≈ 5 mins per image volume for the tongue). Our approach also exploits muscular and joint information embedded in state-of-the-art generic models, while providing consistent mesh quality, and the affordances to adjust mesh resolution and muscle definitions.

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International