UBC Theses and Dissertations
Bayesian models of learning and generating inflectional morphology Allen, Blake H.
In many languages of the world, the form of individual words can undergo systematic variation in order to express concepts including tense, gender, and relative social status. Accurate models of these inflectional systems, such as verb conjugation and noun declension systems, are indispensable for purposes of both language research and language technology development. This dissertation presents a theoretical framework for understanding and predicting native speakers’ use of their languages’ inflectional systems. I propose a probabilistic interpretation of the task that speakers face when inferring unfamiliar inflected forms, and I argue in favor of a Bayesian approach to modeling this task. Specifically, I develop the theory of sublexical morphology, which augments the Bayesian approach with intuitive methods for calculating necessary probabilities. Sublexical morphology also possesses the virtue of computational implementability: this dissertation defines all data structures used in sublexical morphology, and it specifies the procedures necessary to use a model for morphological inference. I provide along with this dissertation a Python package that implements all the classes and methods necessary to perform inference with a sublexical morphology model. I also describe an implemented learning algorithm that allows induction of sublexical morphology models from labeled but unparsed training data. As empirical support for my core claims, I describe the outcomes of two behavioral experiments. Evidence from a test of Icelandic speakers’ inflection of novel words demonstrates that speakers are able to additively make use of information from multiple provided inflected forms of a word, and evidence from a similar test on Polish speakers suggests that speakers may be limited to this additive way of combining such pieces of information. In clear support of a Bayesian interpretation of morphological inference, both experiments additionally demonstrate that prior probabilities—understood as reflecting lexical frequencies of different groupings of words—play a major role in speakers’ use of their inflectional systems. This is shown to be true even when influence from prior probabilities results in speakers apparently deviating from exceptionless lexical patterns in those systems.
Item Citations and Data
Attribution 4.0 International