"Arts, Faculty of"@en . "Linguistics, Department of"@en . "DSpace"@en . "UBCV"@en . "Mackie, James Scott"@en . "2017-01-23T15:38:03Z"@en . "2017"@en . "Doctor of Philosophy - PhD"@en . "University of British Columbia"@en . "A major question in phonology concerns the role of historical changes in shaping the typology of languages. This dissertation explores the effect of sound change on consonant inventories. Historical reconstruction is mainly done by comparing cognate words across languages, making it difficult to track how inventories change specifically. Additionally, few languages have historical written records that can be directly examined. For this dissertation, the main research tool is computer simulation, using bespoke software called PyILM, which is based on the Iterated Learning Model (Kirby 2011, Smith et al. 2003). This allows for the simulation of sound change from arbitrary starting points, controlling for a multitude of variables. PyILM is an agent-based model, where a 'speaking' agent transmits a set of words to a 'listening' agent. The speaking agent is then removed, the learner becomes the speaker, and a new learner is introduced. The cycle repeats any number of times, roughly simulating the transmission of language over many generations. Sound change in a simulation is due to channel bias (Moreton 2008), the result of which is that agents occasionally misinterpret some aspect of speech, and internalize sound categories that differ from the previous generation (Ohala 1981, Blevins 2004). Three typological generalizations are examined, none of which have previously been studied from an evolutionary perspective:\r\n(1) The total number of consonants in a language. This is shown to be related to syllable structure, such that languages with simple syllables develop smaller inventories than languages with complex syllables. This mirrors a positive correlation between inventory size and syllable structure in natural languages, as reported by Maddieson (2007).\r\n(2) The correlation reported by Lindblom and Maddieson (1988) between the size of an inventory and the complexity of its segments. This effect emerges in simulations when context-free changes are introduced, since these changes produce similar outcomes in inventories of all sizes.\r\n(3) Feature economy (Clements 2003), which refers to the way that consonants within a language tend to make use of a minimal number of distinctive features. Economy emerges over time when sound changes take scope over classes of sounds, rather than targeting individual sounds."@en . "https://circle.library.ubc.ca/rest/handle/2429/60389?expand=metadata"@en . "SIMULATING THE EVOLUTION OF CONSONANT INVENTORIESbyJAMES SCOTT MACKIEB.A., University of Ottawa, 2006M.A., University of Ottawa, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Linguistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January, 2017\u00C2\u00A9James Scott Mackie 2017AbstractA major question in phonology concerns the role of historical changes in shaping the ty-pology of languages. This dissertation explores the e\u001Bect of sound change on consonantinventories.Historical reconstruction is mainly done by comparing cognate words across languages,making it di\u001Ecult to track how inventories change speci\u001Ccally. Additionally, few languageshave historical written records that can be directly examined. For this dissertation, themain research tool is computer simulation, using bespoke software called PyILM, which isbased on the Iterated Learning Model (Kirby 2011, Smith et al. 2003). This allows for thesimulation of sound change from arbitrary starting points, controlling for a multitude ofvariables.PyILM is an agent-based model, where a `speaking' agent transmits a set of words toa `listening' agent. The speaking agent is then removed, the learner becomes the speaker,and a new learner is introduced. The cycle repeats any number of times, roughly simulatingthe transmission of language over many generations.Sound change in a simulation is due to channel bias (Moreton 2008), the result ofwhich is that agents occasionally misinterpret some aspect of speech, and internalize soundcategories that di\u001Ber from the previous generation (Ohala 1981, Blevins 2004). Threetypological generalizations are examined, none of which have previously been studied froman evolutionary perspective:(1) The total number of consonants in a language. This is shown to be related tosyllable structure, such that languages with simple syllables develop smaller inventories thanlanguages with complex syllables. This mirrors a positive correlation between inventory sizeand syllable structure in natural languages, as reported by Maddieson (2007).(2) The correlation reported by Lindblom and Maddieson (1988) between the size ofan inventory and the complexity of its segments. This e\u001Bect emerges in simulations whencontext-free changes are introduced, since these changes produce similar outcomes in in-ventories of all sizes.(3) Feature economy (Clements 2003), which refers to the way that consonants within alanguage tend to make use of a minimal number of distinctive features. Economy emergesover time when sound changes take scope over classes of sounds, rather than targetingindividual sounds.iiPrefaceThis dissertation is the original work of the author. I wrote all of the computer code for Py-ILM, including the accompanying GUI, from the ground-up using Python 3.4. The FeatureEconomist algorithm used for the results in Chapter 5 was originally designed in collabo-ration with Je\u001B Mielke. It has previously been used for the research in Mackie and Mielke(2011), and it is included with the software P-base (Mielke 2008). The implementationused for this dissertation is one that I wrote myself.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Consonant inventories and sound change . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Iterated learning models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Sound change models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 An ILM for phonology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.2 One turn of a PyILM simulation . . . . . . . . . . . . . . . . . . . . . 181.4.2.1 Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.2.2 Misperception . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.2.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3 Some notes on design . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3.1 Social factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3.2 Single-agent transmission change . . . . . . . . . . . . . . . . 211.4.3.3 Discrete learning period . . . . . . . . . . . . . . . . . . . . . 221.4.3.4 No teleology . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4.3.5 Phonemes and allophones . . . . . . . . . . . . . . . . . . . . 231.4.4 Expected outcomes and inventory structure . . . . . . . . . . . . . . . 241.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25iv2 PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.1 Iterated Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.2 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.3 initial_lexicon_size . . . . . . . . . . . . . . . . . . . . . 312.2.2.4 initial_inventory . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.5 minimum_repetitions . . . . . . . . . . . . . . . . . . . . . . 322.2.2.6 min_word_length . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.7 max_word_length . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.8 phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.9 features_file . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.2.10 max_lexicon_size . . . . . . . . . . . . . . . . . . . . . . . . 342.2.2.11 invention_rate . . . . . . . . . . . . . . . . . . . . . . . . . 342.2.2.12 max_inventions . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.2.13 misperceptions . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.2.14 minimum_activation_level . . . . . . . . . . . . . . . . . . 362.2.2.15 auto_increase_lexicon_size . . . . . . . . . . . . . . . . . 372.2.2.16 initial_words . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.2.17 allow_unmarked . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.2.18 seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.2.19 seg_specific_misperceptions . . . . . . . . . . . . . . . . 382.2.3 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.3.1 string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.3.2 meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.4 Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.4.1 symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.2.4.2 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.2.4.3 envs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.4.4 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.6 FeatureSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.7 Sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.8 Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.8.1 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.2 value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.3 label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.4 env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43v2.2.9 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.9.1 lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.2 inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.3 feature_space . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.4 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.10 Misperception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.2.10.1 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.2 target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.3 feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.4 salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.5 env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.6 p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.2.10.7 How misperception happens . . . . . . . . . . . . . . . . . . . 472.2.10.8 A note on misperception de\u001Cnitions . . . . . . . . . . . . . . 482.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.3.1 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.1.1 Parsing a Word . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.1.2 Creating new segment categories . . . . . . . . . . . . . . . . 522.3.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.2.1 The lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.2.2 The inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.3 Determining phonological feature values . . . . . . . . . . . . . . . . . 532.3.4 Production algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.4.2 Step 1: Word selection . . . . . . . . . . . . . . . . . . . . . . 562.3.4.3 Step 2: Transforming Segments into Sounds . . . . . . . . . . 562.3.5 Invention algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4 Using PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.1 Obtaining PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.2 Con\u001Cguration \u001Cles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.3 Running a simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.4.4 Viewing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.5 Other notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1.1 No social contact . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1.2 No deletion or epenthesis . . . . . . . . . . . . . . . . . . . . 632.5.1.3 No morphology or syntax . . . . . . . . . . . . . . . . . . . . 642.5.1.4 No long distance changes . . . . . . . . . . . . . . . . . . . . 642.5.2 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Sample simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66vi3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.2 Simulation 1 - A single abrupt change . . . . . . . . . . . . . . . . . . . . . . 663.3 Simulation 2 - A single gradual change . . . . . . . . . . . . . . . . . . . . . 713.4 Misperceptions and phonetic similarity . . . . . . . . . . . . . . . . . . . . . 723.5 Simulation 3 - Interactions between sound changes . . . . . . . . . . . . . . . 743.6 Simulation 4 - CVC language . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.7 Simulation 5 - Invention and the spread of new segments . . . . . . . . . . . 803.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 Natural language consonant inventories . . . . . . . . . . . . . . . . . . . . 844.1 Inventory size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.1.2 Population size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.1.3 Hypothesis #1: Phonotactics and inventory size . . . . . . . . . . . . 984.2 Inventory contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.2.2 Hypothesis #2 - Common consonants . . . . . . . . . . . . . . . . . . 1074.3 Inventory organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.3.2 Feature economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.2.1 Measuring feature economy . . . . . . . . . . . . . . . . . . . 1124.3.3 Cross-linguistic tendencies . . . . . . . . . . . . . . . . . . . . . . . . . 1154.3.4 Explaining economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194.3.4.1 A computational model . . . . . . . . . . . . . . . . . . . . . 1204.3.4.2 Whistle experiments . . . . . . . . . . . . . . . . . . . . . . . 1224.3.5 Hypothesis #3 - Sound change and feature economy . . . . . . . . . . 1254.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265 Simulating inventory evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.2 Inventory size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.2.1 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.3 Common consonants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.3.1 Misperception vs. bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375.4 Feature economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.4.1 How economy can change over time . . . . . . . . . . . . . . . . . . . 1415.4.2 An illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.4.3 Segment-speci\u001Cc misperceptions vs. class-level misperceptions . . . . . 1485.4.4 Calculating feature economy . . . . . . . . . . . . . . . . . . . . . . . 1505.4.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152vii6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158viiiList of Tables3.1 Con\u001Cguration for Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2 Comparison of inventories in Simulation 1 . . . . . . . . . . . . . . . . . . . 683.3 Comparison of inventories in Simulation 2 . . . . . . . . . . . . . . . . . . . 723.4 Comparisons of several generations in Simulation 3 . . . . . . . . . . . . . . 753.5 Con\u001Cguration for Simulation 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 773.6 Comparison of several generation in Simulation 4 . . . . . . . . . . . . . . . 803.7 Con\u001Cguration for Simulation 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 813.8 Comparison of several generations in Simulation 5 . . . . . . . . . . . . . . . 824.1 Co-occurrence of V and Z in UPSID (from Clements (2003, p. 303) ) . . . . 1114.2 The inventory of West Greenlandic . . . . . . . . . . . . . . . . . . . . . . . 1174.3 Feature economy e\u001Bects in Pater and Staubs (2013) . . . . . . . . . . . . . . 1215.1 Con\u001Cguration for testing phonotactic e\u001Bects on inventory size . . . . . . . . 1295.2 Con\u001Cguration for simulations comparing simple misperceptions and biases . 1365.3 Example of individual inventories in a simulation with misperception andbias, starting from only voiceless stops . . . . . . . . . . . . . . . . . . . . . 1385.4 Misperceptions and biases for testing Hypothesis #2 . . . . . . . . . . . . . . 1395.5 Results of two-way ANOVA with inventory size and misperception type aspredictors and economy score as dependent variable. . . . . . . . . . . . . . . 154ixList of Figures1.1 Model of sound change through listener misperception, Ohala (1981, p. 182) 101.2 Emergent stops, from Ohala (1997) . . . . . . . . . . . . . . . . . . . . . . . 112.1 The objects of PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 The transmission of a phonological segment . . . . . . . . . . . . . . . . . . . 302.3 Sample feature \u001Cle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.4 Example con\u001Cguration \u001Cle . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.5 Screen shot of PyILM Visualizer . . . . . . . . . . . . . . . . . . . . . . . . . 623.1 Change in inventory size for Simulation 1 . . . . . . . . . . . . . . . . . . . . 703.2 Results for various values of minimum_activiation_level. . . . . . . . . . . 733.3 Varying misperception salience across three di\u001Berent values forminimum_activation_level. Misperception salience is shown in thelegend. Simulation (a) uses a value of 0.2, Simulation (b) uses a value of 0.5and Simulation (c) uses a value of 1.0 . . . . . . . . . . . . . . . . . . . . . . 743.4 Change in inventory size for Simulation 3 . . . . . . . . . . . . . . . . . . . . 763.5 Change in total inventory size with \u001Cve di\u001Berent random seeds . . . . . . . . 794.1 The inventories of Palauan, from Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005) . . . . . . . . . . . . 854.2 Summary of the distribution of velar stops in Palauan, with data from Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.3 Phonemic inventory of Central Rotokas, based on Firchow and Firchow (1969) 884.4 The inventory of !Xo\u00CB\u009Co\u00C2\u00B4, based on Traill (1985) . . . . . . . . . . . . . . . . . . 894.5 Correlations between speaker population size (individual languages) and in-ventory size, from Hay and Bauer (2007). . . . . . . . . . . . . . . . . . . . . 914.6 Correlations between speaker population size (language families) and inven-tory size, from Hay and Bauer (2007) . . . . . . . . . . . . . . . . . . . . . . 924.7 Relationship between population size (log scale) and inventory size for severallanguage families, from Donohue and Nichols (2011) . . . . . . . . . . . . . . 944.8 Population size and inventory size, from Wichmann, Rama and Holman (2011) 954.9 Predicted magnitude of the e\u001Bect of population size on inventory size, fromMoran et al. (2012, p. 18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96x4.10 IPA chart warped to show consonant frequency in P-base (Mielke 2008) . . . 1004.11 Consonant inventory size and number of superset inventories in P-base . . . 1024.12 Consonant inventory size in P-base and number of unique consonants . . . . 1034.13 Segment complexity plotted against inventory size for the inventories of P-base1054.14 Consonant inventories from P-base with \u0010reversed\u0011 segment complexity . . . 1064.15 Consonant inventories of Noon and Tamazight . . . . . . . . . . . . . . . . . 1084.16 Randomly generated consonant inventory . . . . . . . . . . . . . . . . . . . . 1084.17 Three sound systems di\u001Bering in symmetry and economy, from Clements(2003, p. 292) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.18 Inventories of Hawaiian, French and Nepali (from Clements (2003, p. 288)). . 1134.19 Ranges of feature economy scores in the inventories of P-base (Mielke 2008) 1164.20 Feature economy scores of natural languages and randomly generated inven-tories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.21 Example of whistle recombinations from Verhoef and de Boer (2011, p. 2) . 1235.1 Average inventory size for 50 simulations over 50 generations, across 3 di\u001Berentphonotactic conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.2 State diagram for word-\u001Cnal obstruents in a simulation with \u001Cnal devoicing . 1345.3 Change in inventory size for two simulations, one starting with voiceless stops,one with voiced stops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4 Biased and non-biased sounds in the \u001Cnal simulated inventories. . . . . . . . 1405.5 Change in economy score for a hypothetical language . . . . . . . . . . . . . 1435.6 Range of possible Simple Ratio scores . . . . . . . . . . . . . . . . . . . . . . 1455.7 Range of possible Frugality scores . . . . . . . . . . . . . . . . . . . . . . . . 1455.8 Change in feature economy for a simple simulation . . . . . . . . . . . . . . 1475.9 Change in average feature economy for simulations run with class-level changes1535.10 Change in average feature economy for simulations run with segment-speci\u001Ccchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153xiList of Algorithms1.1 Generalized Iterated Learning Model . . . . . . . . . . . . . . . . . . . . . . 51.2 Generalized Iterated Learning Model for phonology . . . . . . . . . . . . . . 182.1 Main simulation loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Misperception function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.4 Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.5 K-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.6 Distribution estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.7 Production algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.8 Invention algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57xiiAcknowledgmentsThis project has been a long time in the making, and it could not have been completedwithout the help of many people. First and foremost is my family. My partner Cameronand our daughter Autumn have shown incredible patience, love, and understanding over theyears, and I could not have \u001Cnished this without their support. My parents, Margaret andCraig, and my brothers, Simon and Alan, have been extremely supportive and encouraging,even if they don't always understand what I'm up to. I love you all.I'd also like to thank my committee, Gunnar, Molly, and Alex, who have all beenextremely helpful over the course of this project. I'm grateful for all that they have donefor me. Gunnar was my \u001Crst contact at UBC when I applied to be a PhD student, and I'mextremely happy that he was able to direct my dissertation as well.There are many other individuals who made my graduate school experience special, andwho deserve a mention too. In no particular order, I'd like to thank...Je\u001B Mielke and Ana Arregui, at the University of Ottawa, for encouraging me to pursuefurther studies in linguistics.Xiahou Dun, Kraus, a foolish pianist, Robo Kitty, Cingulate, Mortley, !amicable andothers on the SA forums who have really helped me re\u001Cne my ideas about language andlinguistics.The PCT team: Kathleen Hall, Blake Allen, Michael Fry, and Michael McAuli\u001Be, whotaught me a great deal about programming, and helped me see how much fun it is.Johnathon Jones, who has given me amazing opportunties to apply linguistics outsideof a university setting.Everyone I've ever TAed for (Susannah Kirby, Lisa Matthewson, Brian Gick, DouglasPulleyblank, Henry Davis, and especially Strang Burton). Teaching has been one of themost meaningful and exciting parts of my PhD, and I've enjoyed every class with every oneof you.xiiiChapter 1Consonant inventories and soundchange1.1 IntroductionEach human language uses only a \u001Cnite subset of all possible consonants and vowels, andthis collection of sounds is known as the inventory of the language. This dissertation is astudy of consonant inventories. I investigate three di\u001Berent aspects of consonant inventories,and how they change over time.The \u001Crst is the total size of an inventory. I propose that a main factor that in\u001Du-ences inventory size is the phonotactics (syllable structure) of a language. Languages withmore restrictive phonotactics (e.g. only CV syllables, and hence no word-internal conso-nant clusters, nor any \u001Cnal consonants) will tend to develop small inventories over time,while languages with more permissive phonotactics (e.g. maximally CCVCC syllables, andhence the possibility of word-internal consonant clusters as well as \u001Cnal consonants) willdevelop larger inventories. This is supported by a correlation reported in Maddieson (2007),which shows that syllable structure complexity is positively related to inventory size in theinventories of UPSID (Maddieson and Precoda 1989).The second aspect of inventories to be studied is the frequency of consonants acrosslanguages. Certain sounds are extremely common (such as /p/ and /m/) while other soundsare less common (such as /q'/ and /\u00C3\u0090/). This frequency distribution is related to inventorysize: small inventories tend to have only the most common sounds, while large inventorieshave all of the most common sounds, as well as rare or even unique ones (Maddieson (2011),Lindblom and Maddieson (1988), see also Section 4.2). Put another way, small inventorieslook similar to each other, and inventories diversify as they grow.Lindblom and Maddieson (1988) propose that this e\u001Bect is due to the way that inven-tories grow over time. Inventories \u001Crst saturate a small set of all possible sounds, beforeexpanding into other areas of phonetic space. A metaphorical rubber band draws inven-1tories back toward this basic set, accounting for the contents of small inventories, whilea metaphorical magnet pushes sounds apart from each other, resulting in the increasingdiversity of large inventories.This metaphor is appealing but Lindblom and Maddieson do not o\u001Ber any historicalevidence to support it, nor do they point to any speci\u001Cc types of sound changes that mightunderlie the rubber band and magnet e\u001Bect. I propose that the basis for a common setof sounds across languages is the existence of context-free sound changes, which can a\u001Bectinventories of any size. Large inventories have more unique sounds because such languagesalso have a wider variety of phonetic contexts (due to the correlation with phonotacticsdiscussed above).The third aspect of inventories is something known as \u0010feature economy\u0011 (Clements2003, 2009). This describes the tendency for inventories to be organized around the re-useof a small number of distinctive features. For example, many languages have a six-stopsystem /p, b, t, d, k, g/, where the feature [voice] is re-used for contrast at each place ofarticulation. In comparison, a six-stop system /p, b, t, t', kw, k/, one that makes use of adi\u001Berent feature for contrast at each place of articulation, is extremely rare, if not actuallyunattested. Mackie and Mielke (2011) showed that natural languages exhibit higher featureeconomy scores than randomly generated sets of segments, but did not o\u001Ber an explanationfor why.I propose that feature economy is the emergent result of sound change. This is becausethe phonetic biases underlying sound change are such that they can a\u001Bect classes of sounds,rather than individual sounds. This makes it possible for a new set of sounds to emergein an inventory, all of the members of which are minimally di\u001Berent from another set,di\u001Bering only by whatever phonetic feature was a\u001Bected by sound change. Over time, thiscreates the appearance of economy in an inventory. Randomly generated inventories areless economical than natural languages because they have never undergone sound change.These proposals about consonant inventories will be tested through computer simula-tion. In Chapter 2, I present PyILM, a Python package for running sound change simula-tions. Broadly speaking, PyILM is an implementation of a listener-based theory of soundchange. In contemporary linguistics, this approach to sound change is probably most well-known through the work of John Ohala (1981, 1983, 1991, 1997, et seq.) and more recentlyEvolutionary Phonology (Blevins 2004, 2006b, Blevins and Wedel 2009). The computa-tional framework used is the Iterated Learning Model (Brighton et al. 2005, Kirby et al.2008, Smith and Kirby 2008, Cornish 2010). This is an agent-based model where agentsare arranged in a transmission chain. The nth agent receives information from agent n\u00E2\u0088\u0092 ),formulates a hypothesis about it, and then transmits new information to agent n+ ).Sound change in a simulation is due to events known as \u0010misperceptions\u0011 in the termi-nology of PyILM. Misperceptions are de\u001Cned as probabilistic, context-sensitive rules (whichincludes context-free rules, i.e. rules sensitive to any context). When a misperception oc-curs, the phonetic value of a speech sound is altered. The shift in phonetic value createsthe potential for the agent at generation n to acquire a di\u001Berent set of sound categories2compared to the agent at generation n\u00E2\u0088\u0092 ).Simulations have a large number of parameters that can be set, which makes it possibleto study how sound systems evolve under di\u001Berent conditions. For instance, the hypothesisthat inventory size is connected to phonotactic complexity can be testing by running severalsimulations with identical starting conditions, varying only the syllable types permitted,and measuring the size of the inventories after N generations of transmission.The dissertation is organized as follows: The remainder of Chapter 1 discusses issuesof language transmission and sound change. Chapter 2 provides technical details of Py-ILM. Chapter 3 gives some toy simulations to illustrate how PyILM works. Chapter 4returns to the topic of natural languages with an overview of cross-linguistic tendencies ininventories. Chapter 5 provides the results of PyILM simulations demonstrating how thesecross-linguistic tendencies can emerge through iterated learning of sound systems1.2 Iterated learning modelsLanguages can only survive over time if they are continually re-learned at each genera-tion. This continuation of people learning from others, who in turn learned from others,is referred to as \u0010cultural transmission\u0011 (Brighton et al. 2005). A simulation of languagechange should be at least in part a simulation of cultural transmission. The actual processof cultural transmission is extremely complex, as it includes an uncountable number of in-teractions between an enormous network of people, often with intricate social relationships.Language use and acquisition is also tied to the physical environment, conversational con-text, and various other socio-linguistic factors. There may even be more than one languagebeing transmitted at a time. This makes computational modeling of cultural transmissionchallenging, and it is common to abstract away from this complexity, and focus on simplersituations.Cultural transmission is modeled in this dissertation as an Iterated Learning Model(Kirby 1998, Kirby et al. 2008, Kalish et al. 2007, Gri\u001Eths and Kalish 2007, Smith et al.2003). This is a simple model of information transmission where individuals are arranged ina chain. The nthindividual receives input from the n-1thindividual, formulates a hypothesisabout the input, then uses that hypothesis to produce output for the n+1thindividual. Interms of language change, each pair of individuals is intended to represent one generationof language transmission.In such a model, there are only ever two agents interacting at a time. There is alwaysone agent who already knows a language, referred to hereafter as the speaker, and one agentwho is learning from the speaker, referred to hereafter as the listener or the learner. Theseare relative terms. Every agent spends some time in both roles, with the exception of the\u001Crst agent, who is seeded with some kind of language to get the simulation started, andhence is never a learner. The agent at Generation 2, for example, is a learner with respectto the agent in Generation 1, but a speaker with respect to the agent in Generation 3.3The nature of iterated learning is such that the information being transmitted canchange over time. Any errors that occur in transmission can continue to get propagatedthrough the chain of agents. It is possible that the language acquired by the \u001Cnal generationis extremely di\u001Berent from what the \u001Crst generation knew. This is a desirable outcome interms of modeling natural language change, since languages (eventually) become mutuallyunintelligible with their ancestral forms.The amount of change that occurs in an iterated learning model, and how often it occurs,depends on how reliably information can be re-learned by each agent. Information that isdi\u001Ecult to reliably re-transmit (for whatever reason) will tend to change or disappear froma language. This is known as \u0010selection for learnability\u0011 (Kirby et al. 2008, Brighton et al.2005, Smith and Kirby 2008).\u0010In order for linguistic forms to persist from one generation to the next, theymust repeatedly survive the processes of expression and induction. That is,the output of one generation must be successfully learned by the next if theselinguistic forms are to survive. We say that those forms that repeatedly survivecultural transmission are adaptive in the context of cultural transmission: theywill be selected for due to the combined pressures of cultural transmission andlearning.\u0011 (Brighton et al. 2005, p. 10; emphasis added)One goal of this chapter is to apply this concept to the study of sound change. Thiswill not be di\u001Ecult, since it already has much in common with popular models of soundchange through misperception (e.g. Ohala 1981, Blevins 2004). The concept of selectionfor learnability originally grew out of research on syntax and morphology, so it is useful tostart with a brief overview of that literature, even though it takes us somewhat far a\u001Celdfrom the topic of phonological inventories. I will focus on the theoretical and computationalaspects, and pay less attention to the implications for syntax.Early work in this area was done by Simon Kirby (Kirby 1996, 1998, 2000, 2001) whohas focused on how compositional syntax can emerge in a language that is initially non-compositional, through the process of iterated learning. A compositional language in thiscase is de\u001Cned as one where the meaning of an utterance is a function of the meaning of itsparts and how they are put together. A non-compositional language, on the other hand,is de\u001Cned as one where every meaning is expressed through a holistic arbitrary pairingof meanings and sound strings. This is not a strictly binary division, and a languagecould have some meanings expressed by compositional structures while others are non-compositional. This is in fact the case with natural languages where we both observecompositional patterns (e.g. regular word order and morphological paradigms) and non-compositional forms (idioms, irregular forms).Algorithm 1.1 gives an outline of a typical iterated learning simulation from Kirby'swork. This would simulate g generations of agent interactions, and each agent learns fromd utterances.4Algorithm 1.1 Generalized Iterated Learning ModelGenerate a speaking agent with a grammarGenerate a learning agent with no grammarLoop g times:Loop d times:The speaking agent produces an utterance.The learning agent tries to parse the utterance with her grammar.If she cannot, she memorizes it as an unanalyzable whole.The speaker is removed from the simulation.The learner becomes the new speaker.A new learner is added into the simulation.Kirby argues that compositionality emerges as languages adapt to a speci\u001Cc constraintimposed by cultural transmission, namely the fact that a learner cannot hear an exampleof every sentence that she will potentially want to express as a speaker. This constraint isreferred to as the \u0010transmission bottleneck\u0011, and it is similar to the concept of the Povertyof the Stimulus in generative linguistics (Legate and Yang (2002), Berwick et al. (2011), seeZuidema (2003) for comparison of the transmission bottleneck to poverty of the stimulus).This constraint is built into simulations by ensuring that d is less than the total number ofutterances an agent could possibly produce.In Kirby's models, languages are said to \u0010survive\u0011 transmission if the learner at gener-ation n acquires a grammar such that she would produce the same utterance for the samemeaning as the speaker at generation n\u00E2\u0088\u0092 ). If the grammar changes between generation nand n+), then the language of generation n did not survive transmission. It is important totreat the word \u0010survive\u0011 as a technical term to be understood in the context of simulations.There is a single language in a simulation, and a single pair of users at a time.Non-compositional languages cannot survive when there is a bottleneck on transmission.This is because a learner-turned-speaker will, at some point, want to express a meaningshe has never heard expressed. She will not be able to guess how this meaning would beexpressed by the previous generation, due to the lack of compositionality. Instead, shewill need to invent a new way of expressing this meaning, which will serve as input tothe following generation. Because of this change, the older language does not survive theentire simulation. Changes like this are guaranteed to occur at each generation, due to theconstant bottleneck. A slightly di\u001Berent language will appear at each generation throughouta simulation.Compositional languages, on the other hand, can survive transmission even with abottleneck. A learner need not hear an example of every single sentence. As long as alearner knows the component parts, and knows rules for putting parts together, she canconstruct a novel utterance that has a high chance of being the same as what the previousgeneration would have constructed. This increases the chances of a language surviving5transmission many generations in a row.Kirby's simulated agents all have the capability of learning compositional grammars, butthe initial agent is intentionally seeded with a grammar that lacks compositionality entirely.The key step in the emergence of compositionality is the \u001Crst time an agent invents a novelutterance. The way in which the invention algorithm works is crucial. The algorithmconstructs a new utterance by looking for other meanings an agent already knows that aresimilar to the one she wants to express, selecting some random sub-string from there, andthen adding on a new random string. The result of this invention is the introduction ofevidence for compositionality into the input of the following generation. There are now twosimilar meanings with shared sub-strings that a learner can infer are connected by a rule.For example, suppose Learner 1 acquired afxaba for \u0010fox eats bird\u0011. Later, this agentbecomes Speaker 2 and invents afxagatam for \u0010fox eats mouse\u0011 by taking the substring afxafrom a known word and adding on a randomly generated string gatam to the end. Thefollowing Learner 2 hears both of these utterances, and infers that afxa means \u0010fox eats\u0011, bameans \u0010bird\u0011 and gatam means \u0010mouse\u0011. Learner 2 could then posit a rule where either baor gatam can follow afxa, as opposed to memorizing each meaning independently. Learner2 now has the \u001Crst partially compositional grammar in the simulation. (See Kirby (2000)for speci\u001Cc details of the invention and rule-induction algorithm.)Learner 2 will eventually become Speaker 3, and will make use of this rule to produceutterances. Any utterances that Speaker 3 invents containing the meaning \u0010fox eats\u0011 willcontain the string afxa, and this is information that Learner 3 can use to acquire a grammarsimilar to Speaker 3.Over the following generations, more and more compositional rules enter into the lan-guage through this cycle of invention and rule-induction. Since compositional languagescan be learned in spite of a bottleneck, transmission has a lower error rate, and eventuallythe language comes to be dominated entirely, or almost entirely, by compositionality.It is important to note that there is no single factor that can explain the emergenceof compositionality. It is the result of a combination of agent behaviour and culturaltransmission. Changing either of these changes the resulting languages. Trivially, if agentswere cognitively incapable of using compositional structure (if they could do neither ruleinduction nor compositional invention) then of course compositionality would never arise.If agents were content to learn strictly from the input, and never invent new utterances,then non-compositional languages would have a higher chance of surviving.If the transmission model did not involve iterated learning, but agents at each genera-tion received input from the same external source instead, then invention and rule-inductionwould have no long-term impact, and the language would not evolve toward composition-ality. The speci\u001Cc size of the bottleneck can change the outcome (Smith et al. 2003),potentially favouring non-compositional languages. Frequency in the input makes a di\u001Ber-ence as well, and non-compositional forms can survive if they are highly frequent (Kirby2001).It is also important to emphasize that compositionality appears entirely through non-6teleological means. While it is true that agents do introduce compositional utterances \u0010onpurpose\u0011 through the invention algorithm, this does not represent a teleological elementof the model. The reason for inventing an utterance is not so that the language can, inthe future, be compositional. Utterances are invented to solve in-the-moment needs ofcommunication, with no regard to long-term consequences. Language change itself is notdirected towards the goal of achieving compositionality, so the model is not teleological, evenif the individual agent interactions could be said to have a goal. Instead, compositionalityis achieved through selection for learnability.This is not just an e\u001Bect that occurs in computer simulations. It has also been demon-strated in laboratory experiments with human participants. Kirby et al. (2008) and Cornishet al. (2009) discuss experiments where compositional lexicons can emerge from an initiallyrandomly-generated set of words through iterated learning. The \u001Crst set of participants inan experiment were shown pictures of objects, each of which was paired with a randomlygenerated string of CV syllables. The objects were constructed out of three features: shape(square, circle, triangle), colour (red, blue, green), movement (horizontal, spiral, bouncing).Participants were not made explicitly aware of these features.Participants were instructed to learn the names, and they were then tested on theirability to recall those names. The answers they provided in the recall test were then givenas the labels for those objects to next set of participants in the experiment, and the cyclerepeated. Unlike actual cultural transmission, participants in these experiments never meteach other, and were not made aware that their answers would be given to other participants.The lexicons at the end of the experiment appeared to be organized around certain fea-tures of the objects, rather than having each word arbitrarily matched to an object. Kirbyet al. (2008) provide an example of a \u001Cnal lexicon with consistent pairing of morphemes andcolours (ne is \u0010black\u0011, la is \u0010blue\u0011, ra is \u0010red\u0011) as well as motion (ki for horizontal movementand pilu for spiral). Shape was less consistent. Blue and back triangles were encoded as keif moving horizontally, but as ki if moving in a spiral. Red horizontal triangles were calledhe and the red spiral triangles were ho.This is a result of the lexicon adapting to the learning requirements of the participants.It is di\u001Ecult to remember nine random strings of sounds, so the \u001Crst participants in theexperiment tend to have a high error rate. Even if they could not remember the name of anobject, they still had to supply a word of some kind for the recall test, so they invented anew word based on whatever other words they actually could remember. This immediatelydecreased the di\u001Eculty for the second participant, since the lexicon now contained words forsimilar objects that have substrings in common, which helps with learning. Some \u0010irregular\u0011forms still exist in the lexicon by the end, because while participants may not be able toremember nine random strings, they can remember two or three.In summary, the idea that languages adapt to how they are being transmitted, whatBrighton et al. (2005) call selection for learnability, is useful for understanding how patternscan emerge in languages over long periods of time. Changes happen as agents each maketheir own small adjustments to the language to meet their needs at a particular time. Agents7do not consider what e\u001Bects their change will have on the future state of the language.Agents are not even aware that they are changing anything. They do not know whatthe underlying forms of the previous generation looked like, so they cannot know if theyare deviating from them. Patterns tend to emerge because all agents learn under similarconditions. Changes that make it easier for one agent to learn to use the language will alsomake it easier for future agents to use the language, though no agents are aware of this.1.3 Sound change modelsHow do phonological systems adapt to transmission? The facts relevant to the transmissionof sound systems are of course very di\u001Berent from syntax or morphology. Learners-turned-speakers do not face the same problems with phonology as they might with syntax. Speakersmay have to express novel propositions or construct unique arrangements of words andphrases, but they are never in a place where they need to invent a new sound they didnot hear in their input and only rarely is there a need to construct a unique sequence ofconsonants and vowels.The notion of a transmission bottleneck may still apply to some properties of sound sys-tems, however. Stanton (2016) argues that certain patterns in stress sytems are unattestedbecause they are too di\u001Ecult to learn from input data. It is also likely that the natureof the input a\u001Bects the learnability of long-distance harmony patterns (McMullin 2016).As far as phonological inventories are concerned, however, the transmission bottleneck notan important factor, since it is highly unlikely that a sound is so rare in the input that alearner does not acquire it.In order for a phonological inventory to \u0010survive\u0011 transmission, a learner must acquirethe same set of categories as the speaker. The main obstacle to successful transmissionof an inventory is channel bias, which Moreton (2008, p. 87) describes as \u0010phonetically-systematic errors in transmission between speaker and hearer, caused largely by subtlephonetic interactions\u0011.Moreton contrasts channel bias with analytic bias, a term he uses to refer to cognitivefactors that make learning certain patterns easier or harder. Moreton argues that somepatterns are not explainable without reference to analytic bias, and a complete under-standing of phonology requires considering it along with channel bias. Although I agreein general with Moreton on this issue, for the purposes of this dissertation, I will focusonly on the e\u001Bects of channel bias. This is because analytic bias seems to be more relevantfor learning phonological patterns involving the interaction of sounds. Moreton himselfdemonstrates the need for analytic bias by discussing vowel-to-vowel height dependenciesand vowel-height-to-consonant-voicing dependencies. On the other hand, I am interestedin the transmission of individual sound categories, which are more likely to be a\u001Bected bychannel bias.There are many phonetic e\u001Bects that could be considered channel bias. Co-articulation8can change some characteristics of a sound, such as nasal consonants taking on the placeof articulation of following consonants (Kochetov and Colantoni 2011). Speakers may failto reach an articulatory target, such as when vowels become more centralized in unstressedsyllables, a phenomenon known as \u0010undershoot\u0011 (Mooshammer and Geng 2008). Some-times, two sounds might just be acoustically very similar and easily confusable, such as [f]and [T] (Jongman et al. 2000).The most important consequence of channel bias is that it introduces variability into theinput of a learner. This variability is the precursor to sound change. Due to channel bias,a particular phonological category will have multiple possible pronunciations, and some ofthese may be di\u001Berent enough from each other that a learner of a language incorrectly infersthat they are, in fact, representative of di\u001Berent categories. When the learner becomes thespeaker, these di\u001Berent categories are re-transmitted to the following generation, cementingthem into the language.One important aspect of channel bias is that it is context-sensitive. Pronunciation doesnot randomly vary from utterance to utterance. The way that a particular sound categorymanifests itself phonetically is in\u001Duenced by the kinds of sounds that occur before and afterit. For this reason, a sound change such as nMk' / _[+continuant] is not expected to occurin any language because there is no obvious relationship between the continuant nature ofthe environment in which the change occurs, and the nasal-to-ejective change that is theoutcome. It is highly unlikely that a listener would mistake [ans] for [ak's], for example.On the other hand, a change like nMm / _p is a more natural change, because there is aphonetic connection between the environment (a labial stop) and the outcome of the change(a coronal becoming a labial).A main claim of this dissertation is that inventories, and their typological characteristics,are the result of adaptation to channel bias over many generations of cultural transmission.In other words, channel bias is to phonological inventories what the bottleneck is to morpho-syntax. The sounds that appear in an inventory are those which are the most likely to besuccessfully retransmitted, given the set of environments found in the lexicon, and given anychannel bias that might apply in these environments. Since all humans have roughly thesame articulatory and perceptual systems, it is expected that unrelated languages will besubject to the same kinds of channel bias, and hence similar patterns can arise in languagesall around the world.This type of approach to phonology is sometimes referred to as a \u0010diachronic\u0011 approach,since the main locus of explanation is in the transmission of the language from generationto generation. In an overview article on diachronic explanations in phonology, Hansson(2008) describes them this way:\u0010[R]ecurrent sound patterns are the product of recurrent diachronic events(sound changes), which have their ultimate causes in the physical conditionsunder which speaker-listener interactions take place in language use and lan-guage transmission across generations. On this view, voicing is neutralized in9Figure 1.1: Model of sound change through listener misperception, Ohala (1981, p. 182)preconsonantal (as opposed to prevocalic) position not because some constraintto this e\u001Bect is part of the innate endowment of humans, nor because learnersare predisposed to posit only such constraints as are grounded in phonetics.Rather, languages will show some tendency to acquire such neutralization pat-terns for the simple reason that, in positions where distinctive voicing is hardfor listeners (including learners) to detect, listeners / learners will be liablenot to detect it, erroneously interpreting a preconsonantal voiced obstruent asbeing voiceless and encoding it as such in their mental representation of theword-form in question. If and when the pattern caused by such recurring mis-interpretations becomes entrenched, the result is a language with systematicvoicing neutralization precisely in those positions where such neutralization isphonetically motivated.\u0011 (Hansson 2008, pp.4-5)The most in\u001Duential line of work in this area comes from John Ohala (1981, 1983, 1991, 1997,et seq.). Ohala's theory of sound change is based on the idea of listener misperceptions,which occur when listeners acquire something from the speech signal other than whatthe speaker intended. Listeners at some point become speakers, and the misperceivedinformation serves as the basis for producing speech, which in turn becomes input forfuture learners. Ohala's models have much in common with models of iterated learning.This is very clear in the diagram from Ohala (1981) shown in Figure 1.1, which predatesany of the modern formal literature on iterated learning.Note that this model of change crucially relies on a third generation. The learner10Figure 1.2: Emergent stops, from Ohala (1997)who initially misinterpreted the signal has to re-transmit this misinterpretation to a newgeneration. Ohala calls the change without re-transmission a \u0010mini-sound change\u0011. This isbecause:\u0010it would so far only involve one speaker-hearer. However, if this person's speechis copied by other speakers, this mini-sound change could become a regularsound change, i.e. characteristic of a well-de\u001Cned speech community.\u0011 (Ohala1981, p. 184)One of Ohala's primary arguments is that listener misperceptions arise in the \u001Crst placebecause speech is inherently ambiguous. Figure 1.2 is a diagram from Ohala (1997) thatillustrates how stops can emerge from the ambiguity created by co-articulation. In thetransition between two consonants, total or near-total obstruction of the oral tract mayoccur. The speaker has met the condition to produce a stop at this point, even thoughthis was not the intention of the speaker and there is no underlying stop in their mentalrepresentation at this position in the word. Listeners may interpret this transient closureas belonging to a true stop, and assume that one really does exist in the word , leading toa sound change.Ohala discusses several di\u001Berent environments where stops can appear through thisprocess. I mention only two here. The \u001Crst is when a nasal is followed by a fricative. Thisis noticeable in English words such as warmth, which may be pronounced as [wOrmT] or[wOrmpT], or length which may be pronounced as [lENT] or [lENkT].The stop emerges as follows, using the word warmth as an example and focusing on thetransition from [m] to [T]. First the oral tract is closed at the lips for [m], but the velum islowered. This is the initial state in Figure 1.2, where line B represents the closed lips andline A represents the open velar port.From this initial state, the velum has to raise and the closure at the lips has to bereleased, with a new constriction formed at the teeth for [T]. In between these two stages,11there is the possibility for the velum to have raised before the labial closure is released.This is the transitional state in Figure 1.2. This creates the conditions for an oral stop, andonce the lips do open, there is a release of air into the fricative which can be mistaken fora stop burst. This is represented by the \u001Cnal state in Figure 1.2.Ohala (1997) points to a few cases of historical change that could potentially be haveresulted from listeners misperceiving the burst as being a true stop, for example, the intro-duction of /b/ between /m/ and /r/ in French, e.g. Latin /kamera/ > French /SAmbK/.A second environment where stops might emerge is between a fricative and a lateral,e.g. [l] and [s]. During the transition between manners of articulation there may be totalclosure formed by the tongue against the sides or roof of the mouth. This would producethe right conditions for a [t] to be perceived. Ohala gives an example from English, whereelse has come to be pronounced [Elts], and another example from Kwak'wala k'we\u00C3\u00ACtsoP \u0010tobe feasted\u0011 from k'we\u00C3\u00AC + soP.Velar palatalization is another sound change that can potentially be explained throughlistener misperception. This is a common change k > tS before front vowels. Guion (1997)investigated the question of why the /k/ is fronted all the way to the post-alveolar place ofarticulation (see also Chang et al. (2001) and Wilson (2006)).Guion (1997) notes that the peak spectral frequency of the burst of a velar stop is relatedto the frontness of the following vowel. Speci\u001Ccally, higher vowels result in higher burstpeaks, and the peaks are highest before /i/. Guion compared these to the peak spectralfrequencies of /tS/, which is relatively constant across di\u001Berent vowel contexts, but alsohigher than the velar peak in general. The burst of a /k/ is highest before /i/, and soit is in that environment where it is most spectrally similar to /tS/ , which is exactly theenvironment where the sound change tends to occur. Thus, this sound change could haveits origin in misperception.Guion conducted an experiment to further establish whether these sounds are indeedperceptually similar. Participants heard examples of a CV syllable consisting of one of [k,g, tS, dZ] followed by one of [i,a,u], and were given a forced-choice identi\u001Ccation task. Asexpected, [ki] was identi\u001Ced as [tSi] more often than [ka] was identi\u001Ced as [tSa].James Kirby has proposed a listener-based explanation for the recent development oftone in Phnom Penh Khmer. Kirby reports that the trill /r/ is being lost in onset clusters,and is replaced by aspiration and a change in f0 contour. Kirby argues that the previouscontrast of CV and CrV has transphonologized into a contrast based on f0 of the vowel.This is supported by perception experiments in Kirby (2014a) where listeners were able touse f0 as cue for distinguishing words that are underlyingly /CrV/ from those which are/CV/.Kirby (2014b) further demonstrates how this kind of listener-led sound change canoccur, this time with a series of computational simulations. Agents in a simulation receiveexamples of words, and their task is to assign each segment in the word to a category.Segments are classi\u001Ced based on four phonetic dimensions.In addition, there is a channel bias that alters the input to agents. The bias has two12simultaneous e\u001Bects: in a sequence CrV, it reduces the duration of /r/, and it lengthensthe onset of the vowel. This bias has a cumulative e\u001Bect, so the perceptibility of /r/ slowlydecreases over the course of a simulation, while the length of the onset increases.Early in the simulations, agents were able to distinguish /CrV/ words from /CV/ usingthe length of /r/, but as the simulation ran on this became impossible because of the bias.Instead, agents begin using information about the vowel onset because that information hasbecome more salient to them, which is similar to what Kirby (2014a) describes for Khmer.Another in\u001Duential model of diachronic phonology is the Evolutionary Phonology model(Blevins 2004, 2006a, Blevins and Wedel 2009), which builds on Ohala's work. The basicpremise behind Evolutionary Phonology is that \u0010[p]rincipled diachronic explanations forsound patterns have priority over competing synchronic explanations unless independentevidence demonstrates, beyond reasonable doubt, that a synchronic account is warranted.\u0011(Blevins 2006b, p. 23). Common sound patterns are common because they result from com-mon sound changes. Sound changes themselves are the result of articulatory and perceptualfactors hindering perfect language transmission.Much of Blevins' terminology is borrowed from biological evolution, and shares some-thing in common with the iterated learning literature, although her Evolutionary Phonologybook does not cite any of that work. For instance, she has a discussion of \u0010adaptations\u0011(Blevins 2004, p. 54), which is reminiscent of the concept of selection for learnability,namely that sounds are selected for on the basis of their ability to survive transmission ina particular context:\u0010If a contrast between two sounds is just barely perceptible in a particularphonetic environment, its chances of survival in a noisy world are slight. ... Inreconsidering the case of change where [anpa] is heard as [ampa] it makes verylittle sense to compare the sounds [n] and [m] outside the speci\u001Cc environmentin which they occur. In the same sense that the usefulness of claws and toe-pads cannot be assessed outside particular physical environments in which theyoccur, there is no sense in which /n/ is a better or more useful nasal consonantthan /m/ or vice versa. Adaptation occurs with respect to a speci\u001Cc phoneticcontext.\u0011Ohala and Blevins present slightly di\u001Berent typologies of misperception. Ohala dividesmisperceptions into two types, called \u0010hypercorrection\u0011 and \u0010hypocorrection\u0011 (e.g. Ohala(1992)). Hypocorrection occurs when a listener assumes that a phonetic e\u001Bect, such asco-articulation, is an intended part of the signal, and internalizes it as such.For example, the amount of aspiration that occurs on a stop depends on the height ofthe vowel that follows it (Hansson 2008, Ohala 1983). This is because in order for voicingto occur, the vocal folds need to vibrate, and this requires a suitable pressure di\u001Berentialbetween the oral cavity and subglottal cavity. During the closure phase of a stop, thepressure in the oral cavity builds to become equal with the subglottal pressure, and whenthe stop is released, pressure drops in the oral cavity. How fast this drop happens, and how13long it takes to achieve the right pressure di\u001Berential, depends on the height of the vowel, i.e.the size of the oral cavity through which air can escape. Higher vowels make for narroweropenings, which slows the drop in pressure, and also increases the turbulence/noisy qualityof the stop burst, which can make the stop sound more a\u001Bricate-like.If learners hypocorrect, they may infer an underlying a\u001Bricate in this position, includingpossibly as an allophonic variant of the stop. In fact, numerous languages have phonologicalprocesses converting stops to a\u001Bricates before high vowels, e.g. Japanese t \u00E2\u0086\u0092 tS / _iThe other kind of change, hypercorrection, occurs if a listener erroneously tries to \u0010fac-tor out\u0011 a part of the speech signal. The main example of hypercorrection seems to bedissimilation. For example, in Classical Greek a change has occurred such that labializedconsonants became unlabialized adjacent to rounded vowels, e.g. *lukwos < lukos `wolf'.If hypercorrection is at play, then this change was caused by listeners who assumed thelabialization of the consonants was due to the adjacent rounded vowel, and removed it,creating unlabialized consonants. Hypercorrection is expected to target phonetic charac-teristics which have a relatively long duration (e.g. palatalization, glottalization, but notcontinuancy or a\u001Brication). Unlike a hypocorrection, dissimilation and hypercorrection arenot likely to eliminate the original triggering environment for the change. This is becausethe listener has to notice this environment in the \u001Crst place in order to even make thehypercorrection.Blevins has a three-way typology of misperceptions in her model, calling them \u0010choice\u0011,\u0010chance\u0011, and \u0010change\u0011. Change occurs when the phonetic signal is misperceived by thelistener due to acoustic similarities between the actual utterance and the perceived utter-ance. For instance, a listener might misperceive a [T] as a [f]. The misperception occurs onthe surface, and there is no correction taking place on the part of the listener. In fact, thelistener's underlying form is remaining entirely faithful to the surface form but the surfaceform was not a good representation of what the speaker intended.Chance is a term for when the phonetic signal is accurately perceived by the listener butis intrinsically phonologically ambiguous. The listener associates a phonological form withthe utterance which di\u001Bers from the phonological form in the speaker's grammar. Blevins'example here is the speaker says /aP/ \u00E2\u0086\u0092 [Pa\u00CB\u009CP] and the listener hears [Pa\u00CB\u009CP] \u00E2\u0086\u0092 /Pa/.Choice describes a situation where there are multiple phonetic variants of a singlephonological form which are accurately perceived by the listener. The listener (a) ac-quires a prototype or best exemplar which di\u001Bers from that of the speaker; and/or (b)associates a phonological form with the set of variants which di\u001Bers from the phonologicalform in the speaker's grammar. In Blevins' example, the speaker has an underlying form/tuP@laN/ which is variously pronounced as [tuP@laN], [tuP@laN], or [tuPlaN], e.g. there arevarious amounts of schwa that actual appear on the surface. The listener has a choiceabout whether to include the schwa in the underlying form, or factor it out as in irrelevanttransition between the glottal stop and the lateral.Although these diachronic models tend to focus on the role of the listener, the speakeris equally important because the speaker produces the listener's input. The diachronic14model presented in Garrett and Johnson (2012), for example, is one that more explicitlyincorporates the role of the speaker. Their model di\u001Bers from Blevins and Ohala in that itattempts to categorize sound changes based on their underlying mechanisms, rather thanon their outcome. Garrett et al. focus on four speci\u001Cc factors: motor planning, speechaerodynamics, gestural mechanics, and speech perception. Two of these, motor planningand gestural mechanics, are clearly speaker-oriented.Certain kinds of sound changes are more obviously speaker-initiated than others. Oneexample is assimilatory change, which occurs when there is overlap in articulation betweentwo sounds, causing one sound to acquire the features of the other. Consider the nasalizationof vowels before nasal consonants, for instance. To produce the vowel, the oral tract needsto be open to some degree and relatively free of obstruction, and there should be no nasalair\u001Dow, i.e. the velum should be raised. The postvocalic nasal consonant has con\u001Dictingrequirements: the speaker needs to close o\u001B the oral tract at some place of articulation, andlower the velum for nasal air\u001Dow. Since the velum cannot be instantaneously displaced, andsince oral closure cannot happen immediately, there is the possibility that the speaker willspend some time with the velum lowered and the oral tract open, which e\u001Bectively resultsin a nasal vowel.Vowel nasalization tends to occur more often when the nasal consonant follows thevowel, compared to when the consonant precedes it (Chen et al. 2007). This is again dueto articulatory e\u001Bects. When the consonant follows the vowel, the potential co-articulationhappens as the speaker attempts to open the velar port and close o\u001B the oral tract. Whenthe consonant comes before the vowel, however, the potential period of co-articulationhappens as the speaker attempts to close the velar port and open the oral tract. As itturns out, the movement required for velic opening in post-vowel nasals is about 1.6 timesfaster than the movement required for oral opening in post-nasal vowels (Krakow (1994)).The faster speed of the velic movement means there is a greater probability of producing anasal vowel in a VN sequence, compared to a NV sequence, because the velar port is goingto open for the nasal before the oral tract can be closed.This co-articulatory e\u001Bect has been argued to be the source of historical changes whereoral vowels nasalize before nasal consonants, becoming full-\u001Dedged phonemes, such as oc-curred in some Romance languages (Recasens 2014). It is also common for many languagesto have allophonic nasalization of vowels before nasal consonants (Schourup 1973), and thistoo probably developed from misperceptions arising from co-articulation.However, this articulatory timing is not universal, so nasalization of vowels adjacentto nasals is not universal either. Butcher (1999) studied the articulation of speakers ofAustralian languages in the Arandic, Lake Eyre, and Yura groups. He found that thesespeakers have systematically di\u001Berent timing in the raising and lowering of their velum,compared to speakers of English. In particular, the Australian speakers showed much moresudden changes in the state of their velum, which meant that nasality hardly spread at allinto adjacent vowels. Butcher further suggests that this particularity in articulation is theorigin of pre-stopped nasals in these languages.15The aerodynamic voicing constraint (Ohala 1983) is another example of how articulationcan play a role in sound change. The constraint refers to the requirements for modal voicing:there must be air \u001Dowing through the vocal folds, which need to be tensed. This presents aproblem for voiced stops. By their nature, stop consonants cause air to accumulate in theoral cavity, and the di\u001Berence in air pressure above and below the glottis begins to equalize.At a certain point voicing becomes impossible. Voiced fricatives are also a\u001Bected by thisconstraint because frication requires the air pressure in the oral cavity to be greater thanatmospheric pressure. This creates a con\u001Dict: for voicing oral air pressure needs to be low,for frication oral air pressure needs to be high.How does the aerodynamic voicing constraint factor into an explanation of sound changethrough misperception? The argument would run as follows: the production of any voicedstop or voiced fricative inherently puts it in con\u001Dict with this constraint. To maintain voic-ing, oral air pressure needs to be reduced somehow, and Ohala discusses several ways thiscan be achieved, including expansion of the cheeks, lowering of the larynx, or venting someof the accumulated air through the nose. On some occasions, these \u0010strategies\u0011 can lead tospeakers producing speech with characteristics di\u001Berent than intended, which listeners will(wrongly) assume are intended characteristics. This makes the constraint di\u001Berent fromco-articulation, because it is not entirely context-dependent. There is an inherent di\u001Ecultyin voicing an obstruent, regardless of its position in a word.Ohala (1983) provides a list of 12 potential implications this has for sound change andinventory typology, such as the fact that voiceless stops and fricatives are more commoncross-linguistically than their voiced counterparts. In P-base (Mielke 2008), for example,97% of the languages have at least one of /p,t,c,k,q,f,s,S/ whereas only 83% of languages inthe database have a voiced version of any of those.As another example, Ohala argues that implosives developed in Sindhi from geminatevoiced stops. The length of a geminate means it is even more at risk of becoming voicelessthan a singleton voiced stops. Oral air pressure must be kept low for an even longer periodof time through some means. Ohala proposes that this was done through larynx lowering,which listeners misinterpreted as implosion.Another articulatory e\u001Bect that plays a role is gestural reduction. Lin et al. (2014)looked at the role of gestural reduction in the production of the English lateral /l/. Alveolarlaterals have two lingual constrictions, one anterior and one dorsal. The degree of anteriorconstriction in laterals varies with the phonetic context and between speakers, in some casesachieving no apical contact at all. In English, /l/ is especially likely to be reduced in thecontext of V_C, where C is non-alveolar. For instance, the /l/ in help or elk undergoesmore reduction than the /l/ in melt. This may be partly due to homorganicity, and Lin etal. \u001Cnd that the anterior constriction is less reduced when the tongue tip would be makinga contact at that place for the following sound anyway.This reduction is one reason underlying a change currently underway in some varietiesof English, where /l/ loses its anterior constriction entirely, and vocalizes to /w/ or /u/ (dueto the dorsal constriction). Lin et al. report that in dialects where this change is underway,16it is more advanced in pre-labial and pre-velar contexts, which is expected given the greaterlikelihood for gestural reduction in that environment. The loss of /l/ has already occurredbefore /k/ in some words, though it is still preserved in the orthography, e.g. walk, talk,balk, etc..1.3.1 SummaryTo summarize the general model of sound change that has just been presented: Languagesmust be successfully transmitted from speaker to learner through the medium of physicalspeech in order to survive over time. There are numerous factors involved in articulation andperception, so-called channel bias (Moreton 2008), that impede successful transmission. Inaddition, the listener cannot necessarily know the intentions of the speaker, or if the signalhas been changed in any way. This creates the possibility that learners can misperceive someaspect of the speech signal. When learners eventually become speakers, this misperceivedelement is then re-transmitted to the following generation, making it part of the language(Ohala 1981, Blevins 2004).The term \u0010misperception\u0011 in this context is intended to be neutral with respect to theactual source of the change. It could be due to perceptual or articulatory factors. Thekey point is that learners have acquired a language from the input that di\u001Bers from thelanguage that generated the input, but they do not realize they have done so.The next step of the dissertation is to describe a computer simulation based on thisframework of sound change.1.4 An ILM for phonology1.4.1 OverviewSimulating the evolution of consonant inventories requires simulating multiple, potentiallyinteracting, sound changes over multiple generations. The following is an overview of Py-ILM, the simulation software designed for this dissertation1. Algorithm 1.2 represents ggenerations of transmission, with each agent learning from d words.At the core, this is just a model of the transmission of sound strings. There are no mor-phological or phonological processes, and agents can always recover the intended meaningof a word. Full technical details, including descriptions of various algorithms, are givenin Chapter 2. In the following sections, I will focus more on the higher-level cosrnceptualdetails and theoretical assumptions that went into building PyILM.1The simulation is an Iterated Learning Model (ILM) written in the Python programming language.The name PyILM follows a convention of the Python community of prepending \u0010py\u0011 onto the names ofpackages or programs. The intended pronunciation is [pai.El.\u00C2\u0092Em]17Algorithm 1.2 Generalized Iterated Learning Model for phonologyGenerate a speaking agent with a lexiconGenerate a learning agent with no lexiconLoop g times:Loop d times:The speaking agent produces a word from the lexiconMisperception may alter some phonetic values in the wordThe learning agent assigns each sound in the word to a known phonological categoryIf no known categories match, then a new one is created.The speaker is removed from the simulation.The learner becomes the new speaker.A new learner is added into the simulation.1.4.2 One turn of a PyILM simulationTo understand how the simulation is intended to work, it is useful to give an overview of oneiteration of the simulation. This consists of the speaker producing a word, misperceptionspossibly occurring, and the listener learning something.1.4.2.1 ProductionThe turn begins with the speaking agent selecting a word to produce from the lexicon.Words in the lexicon are represented as strings of phonological categories (i.e. segments).These categories are, in turn, represented as a list of binary features of length F. Foreach category (segment) in a word, a production algorithm generates an array of length F,where the nthelement is a real number in [0,1], representing a phonetic value for the nthphonological feature. For example, assuming a very simple simulation with four features,[continuant, nasal, voice, sonorant], an instance of the category /b/ might be represented as[0.05, 0.30, 0.95, 0.04]. In an actual simulation, these numbers are determined by samplinga (truncated) Gaussian distribution for each feature. The distributions are inferred duringthe learning phase, except for the initial agent in a simulation, who is seeded with a set ofdistributions.This process of generating lists of phonetic values is done for each sound category inthe word, and then the resulting list is sent to the misperception function.1.4.2.2 MisperceptionMisperceptions in PyILM are modeled as probabilistic context-sensitive rules (which in-cludes the null context, i.e. context-free rules). They target sounds that have particularphonological features and which exist in speci\u001Cc contexts, which are themselves de\u001Cnedin terms of features. Misperceptions may also refer to word boundaries. The e\u001Bect of a18misperception is to change phonetic values. An example of a \u001Cnal-devoicing misperceptionwould be represented as:[+voice, \u00E2\u0088\u0092son] \u0019 [\u00E2\u0088\u0092.15voice] / _#, p=.3In prose, this reads as \u0010on any given utterance, there is a 30% chance that voicedobstruents have their [voice] value reduced by .15 when they occur in word-\u001Cnal position\u0011.The idea is that a misperception changes the surface phonetic value of a sound such that itbecomes more likely the listener will categorize it as having the opposite underlying featurevalue.Here I wish to emphasize again that the term \u0010misperception\u0011 is intended as a coverterm for any kind of e\u001Bects that could occur during either production or perception, whichmight substantially a\u001Bect what a learner infers about a language. Determining the origin ofa sound change is important in understanding speci\u001Cc changes in actual natural languages,of course, but in a simulation the distinction is irrelevant. The key point is that somethingdisrupts transmission and a learner has the potential to infer a sound that the speaker didnot intend.A list of misperceptions must be provided as input to the simulation. If none areprovided, then no sound changes will occur, and simulation is pointless. The number andtype of sound changes that can occur in a particular simulation, therefore, is limited. Thisis unrealistic, but it is an unavoidable constraint that comes with computer simulation;there must be a \u001Cnite set of parameters to simulate. In any case, it is probably unfeasibleto draw up a list of all possible misperceptions. PyILM could be considered an \u0010ideal world\u0011simulation (cf. the \u0010ideal observer\u0011 of James Kirby (2014b)) where we know everything thereis to know about what kinds of misperceptions are possible.An alternative, and more complex, way of modeling misperceptions would be to sim-ulate the vocal tract and auditory-perceptual systems of the agents in detail, and allowmisperceptions to arise naturally from the way these systems work. Such a simulationwould most certainly be useful in an e\u001Bort to support the position that misperceptionsarise from phonetic factors. My aim for this dissertation, however, is not to explain howor why misperceptions occur. I simply assume that they do occur, and I am interestedtheir long-term consequences on the structure of inventories. It su\u001Eces for these purposesto simulate the e\u001Bect of misperception, rather than the cause. For an example of morecomplex modeling of physiology, see Oudeyer (2005a, 2005b, 2005c) on the evolution ofphonotactic constraints.In other words, the sound changing rules of PyILM (\u0010misperceptions\u0011) are intended asuseful abstractions that capture the spirit of how sound change is thought to work. Theyallow for a wide range of di\u001Berent sound changes to be simulated (including context-freeones). All elements of misperception are open to modi\u001Ccation by the user and any numberof misperceptions can be active in a given simulation run.For example, both hypercorrection and hypocorrection can be simulated through theuse of these rules. An example that Ohala (1983) gives of hypercorrection, where the19learner erroneously factors out some aspect of the signal, is the unrounding of stops beforerounded vowels in Greek, /lukwos/ > /lukos/. To simulate the possibility that agentsmight hypercorrect in this situation, a rule such as [\u00E2\u0088\u0092continuant, +round] \u0019 [\u00E2\u0088\u0092.15round]/ _[+voc, +round], p=.1 would be included in the simulation.A hypocorrection, where a learner fails to account for a phonetic e\u001Bect and assumes it isinherent to the signal, would be exempli\u001Ced by pre-consonantal neutralization of a voicingcontrast. This could be represented in PyILM with a misperception such as [\u00E2\u0088\u0092sonorant,\u00E2\u0088\u0092vocalic, +voice] \u0019 [\u00E2\u0088\u0092.15voice] / _[\u00E2\u0088\u0092vocalic, \u00E2\u0088\u0092sonorant], p=.11.4.2.3 LearningIn the \u001Cnal stage of a simulation turn, the misperception function sends the list of phoneticvalues to the learner. The learner receives this as a list, so the ability to parse speech intosegment sized units is assumed. Learning is done using an exemplar-based model, whereagents keep detailed representations of experienced events (Pierrehumbert 2001, Johnson2007). For each sound in the word, a learner stores the phonetic values in memory, thenattempts to categorize the sound by comparing these values to all other known categories. Ifany of them meet a threshold for similarity, then the input sound is placed in that category.Otherwise, a new category is created, the input sound becomes its sole exemplar, and futurelearning can be in\u001Duenced by this category.At the beginning of the learning phase, agents do not know any categories at all, so the\u001Crst sound that is experienced becomes the \u001Crst category, and all others are built up fromthere. Categorization is done on the basis of phonetic similarity. Two sounds are consideredto be instances of the same category if they di\u001Ber by less than some value (the particularthreshold is determined by a simulation parameter that can be set by the user). At the endof learning, agents infer a Gaussian distribution from the observed phonetic values, for eachfeature, for each category they have created. This distribution is what will be sampled bythe production algorithm when the agent becomes the speaker.1.4.3 Some notes on design1.4.3.1 Social factorsPyILM is not intended to model all possible types of sound change. It focuses speci\u001Ccallyon changes related to production and perception. Another major cause of change, which isnot simulated, is contact between dialects or languages.Contact can lead to one language borrowing words which contain sounds not foundin the native inventory. For instance, some Bantu languages are known to have acquiredclicks by borrowing from neighbouring Khoe-San languages (G\u00C3\u00BCldemann and Stoneking2008). Other times, however, borrowing words leads to no changes at all: English has notacquired uvular fricatives or front rounded vowels, despite borrowing numerous words from20French which contain these sounds (e.g. hors d'oeuvre, objet d'art, ma\u00C3\u00AEtre d' ). Instead,those words have undergone adaptation to the English sound system.Borrowing is, in a sense, more arbitrary than sound change based on phonetic factors.Borrowing depends on coincidences of contact between cultures, and how the borrowingactually plays out depends on similarities between the sound systems of two languages,as well as various socio-linguistic factors. The frequency of borrowing, and its e\u001Bect oninventories, over the history of a language is sporadic.It is e\u001Bectively impossible for languages to avoid phonetically-based change, but changethrough contact is avoidable. The people living on North Sentinel Island represent theextreme case of contact-avoidance. Inhabitants of the island are hostile to outsiders, andoccasionally kill people who come too close (McDougall 2006). It is unlikely that theSentinelese have borrowed many words (at least not recently), but it is quite likely thattheir language has undergone some kind of sound change in the last several generations.Phonetic changes are more of a constant factor across time and across languages. Theydepend on factors related to human speech production and perception. We can expectphonetic factors to in\u001Duence unrelated languages in similar ways, leading to cross-linguistictendencies.1.4.3.2 Single-agent transmission changeAnother type of change not simulated is what Labov (2007) calls \u0010di\u001Busion\u0011. This refers tolanguage changes that occur when mature speakers of a language adopt the speech habitsof a di\u001Berent group of speakers. Labov contrasts this with the term \u0010transmission\u0011 to referto language changes that occur as language is being passed from mature speaker to learner.PyILM is strictly a transmission chain, with a single learning agent and a single speakingagent at each generation.Di\u001Busion-chain versions of cultural transmission models do exist (Mesoudi and Whiten2008, Whiten and Mesoudi 2008), and have even been applied speci\u001Ccally to the study oflanguage (Smith and Wonnacott 2010). Along the same lines, Gri\u001Eths and Kalish (2007,Section 7, p.470) show how the mathematics of their single-agent model generalizes to largerpopulations (though this is not strictly related to di\u001Busion chains).Modeling transmission or di\u001Busion requires di\u001Berent design choices, because the fun-damental factors driving language change are di\u001Berent in each case. Transmission-chainmodels represent an acceptable level of simpli\u001Ccation, given the goals of this dissertation.A more nuanced outcome could be achieved by combining di\u001Busion and transmission in asingle simulation. This would allow the nature of the input to the learner to vary more asthe speaking agents possibly change their behaviour throughout the speaking phase.211.4.3.3 Discrete learning periodIt is common in agent-based simulations for there to be a speci\u001Cc learning phase, after whichagents can no longer learn anything new. This approximately simulates the real-worldsequence of events where one's ability to learn language is greatest as a child, commonlyknown as the \u0010critical period\u0011 (e.g. Newport et al. 2001, Pallier 2007), and slows down withage. Having a sharp, and arbitrary, cut-o\u001B point is a simpli\u001Ccation for the purposes ofcomputer simulation.More broadly, research has found that the way people speak continues to change over thecourse of their life. For instance, Harrington et al. (2000a, 2000b, 2005, 2006) studied theChristmas broadcasts of Queen Elizabeth II taken from a period of roughly 30 years. Theyfound evidence of a change in vowel pronunciation, in particular that the Queen's vowelswere becoming more like those of Standard Southern British speakers. In another case,Sanko\u001B and Blondeau (2007) found a change in the pronunciations of the rhotic consonantof Montreal French, with some adult speakers moving from an apical /r/ to a dorsal /;R/.1.4.3.4 No teleologySound change occurs for entirely non-teleological reasons in this model. The only thingthat happens is that learners learn from the input. They make no assumptions about whatsound systems should look like. In contrast, it is common in other models to give agentsadvance knowledge about the sound system. For example, James Kirby (2014b) describesa computational ILM for simulating a change currently underway in Khmer, where anaspiration contrast (e.g. /ka/ vs. /kha/) is being replaced by a tonal contrast (/ka/ vs./k\u00C3\u00A1/). The overall design of Kirby's simulation shares much in common with PyILM, butthere is a crucial di\u001Berence in that Kirby's agents are modeled as \u0010ideal observers\u0011 which isa type of Bayesian classi\u001Cer (see Geisler 2003). These classi\u001Cers require a prior probabilityfor each phonetic category, in order to compute the probability that an input sound belongsto that category. This e\u001Bectively means that agents have foreknowledge about which andhow many possible sound categories could exist in the language.Feldman et al. (2009) also use a Bayesian model for learning sound categories by learninga lexicon. In this model, learners know in advance that there are exactly 4 sound categoriesin the inventory. Kirby and Sonderegger (2013) consider the iterated learning of 2 vowels(not full words). Vowels are represented simply by an F1 value, and the distribution of F1values is known to all agents, and does not change over time.Along the same lines, in some models learning is done by selecting between a limitednumber of choices. Tupper (2015) gives a mathematical model for the conditions underwhich two vowels might merge over time, but considers only a case where agents knowinglychoose between /i/ and /e/.Wedel's (2007) computational model has a small set of underlying sounds that map toexactly one surface sound, and the special category /x/ which has two possible allophones.22The learner must select one of the allophones as an underlying form for /x/, and Wedelshowed how di\u001Berent types of learning error resulted in di\u001Berent outcomes.Advance knowledge of possible sound categories might be useful for a simulation of thechange from one speci\u001Cc inventory state to another, but it is undesirable for a general modelof inventory change. Sound inventories need to be free to change, grow, and shrink withina relatively large space of possibilities. In PyILM simulations, there are no pre-determinedsound categories at all. Agents simply build up categories based on the the informationavailable to them in the input.In this respect, the learning algorithm has more in common with de Boer (2000, 2001,2002). Agents in his simulations play an \u0010imitation game\u0011, where a speaking agent producesa single vowel and a listening agent tries to imitate it. de Boer's simulations use supervisedlearning, meaning that listeners are given feedback about how well they imitated, and theyuse this information to place prototypes into a vowel space. PyILM, in contrast, usesunsupervised learning where no feedback is given. Agents in de Boer's simulations make noassumptions in advance about how many vowels there might be in a language. The \u001Cnalnumber and type of vowels depends on the interactions between agents and the success ofindividual imitations. Vowel systems ranging from three to nine vowels emerged from deBoer's simulations.1.4.3.5 Phonemes and allophonesSounds are represented at two levels in PyILM. At a surface/phonetic level, sounds arevectors of real numbers. At an underlying/phonological level, sounds are lists of binaryfeatures. A simulation starts by generating a set of these categories for the \u001Crst agent.First underlying categories are created, and then a distribution of phonetic values, forsampling during the production phase, is generated for each of these categories.New sounds that are introduced through sound change are, at least initially, limited toa particular context (due to the context-sensitive nature of the misperceptions that giverise to them). These new sounds are considered as allophonic variants of whichever soundthey grew out of. For example, if /b/ lenites to [v] between vowels, and this is the onlyinstance of [v] anywhere in the lexicon, then [v] will be considered an allophone of /b/. Astime goes on, these allophones eventually cease to vary with another category, and attainthe status of a phoneme. This transition, from misperception to allophone to phoneme, isintended to parallel the real-world process of phonologization, where an initially phonetice\u001Bect eventually becomes a \u001Cxed part of a phonological system (Berm\u00C3\u00BAdez-Otero 2007).These categories have no bearing on the outcome of a simulation. Agents are not awareof what is a phoneme and what is an allophone, and neither the production algorithm northe learning algorithm ever speci\u001Ccally reference these categories. Within a simulation,everything is considered to be a \u0010segment\u0011. The categorization of a sound as a phoneme orallophone is done at the end of a simulation, as a tool for understanding how sounds aredistributed in a lexicon.23This issue will be discussed in more detail in Chapters 3 and 4 along with speci\u001Ccsimulation results.1.4.4 Expected outcomes and inventory structureThe set of misperceptions that is supplied to a simulation acts like the bottleneck in Kirby's(1998, 2000, 2002) simulations of syntax. They are the main constraints preventing thesuccessful transmission of sounds over time. By the end of a simulation, the expectationis for an inventory to have whatever set of sounds is least likely to be a\u001Bected by thesemisperceptions, given the set of phonetic contexts in the lexicon.In simple cases, it is even possible to predict what the outcome will be before runninga simulation. Consider the simplest situation when only a single misperception is actingon transmission. For discussion purposes, assume there is a \u001Cnal-devoicing misperceptionwhich sometimes makes word-\u001Cnal voiced obstruents less voiced than they are in otherpositions. Suppose that two simulations are run, each starting with a lexicon generatedfrom the sounds /b, d, g, i, a/. The di\u001Berence now is that one simulation has a lexiconwith only V or CV syllables, while the other allows up to CVC syllables.The \u001Cnal inventory of the CV language is easy to predict in this case: it will be /b, d, g,i, a/, i.e. it will not have changed, since the relevant environment for devoicing does not existin the lexicon. On the other hand, the CVC language might develop an inventory as largeas /p, b, t, d, k, g, i, a/, depending on the speci\u001Cc contexts in the lexicon and how oftenmisperceptions actually occurred during the simulation. For instance, if the distribution of/b/ in the initial lexicon was restricted to \u001Cnal position, then all instances of /b/ are proneto misperception, and there is a low probability of /b/ still existing in the \u001Cnal lexicon, andhigh probability of /p/ existing in at least one word. If no words in the initial lexicon endedin /g/, then there is no reason for /k/ to be in the \u001Cnal lexicon. Predicting the outcomebecomes more di\u001Ecult, or impossible, with a larger number of interacting misperceptionsadded to the simulation.Selection for learnability occurs as sounds from the original inventory of the languagechange due to misperception. If a sound cannot reliably be learned, given the environmentsin the lexicon and the set of misperceptions acting on transmission, it will probably notsurvive the entire simulation. The \u001Cnal inventory is one that is in a sense optimized for itsown transmissibility.This simple example also demonstrates how patterns in inventories can be derived non-teleologically by modeling only individual sound changes. If we could watch the CVClanguage generation-by-generation, we would observe what Martinet (1952, 1955) called\u0010gap \u001Clling\u0011. Assume that all voiced stops appear in both initial and \u001Cnal position inthe very \u001Crst lexicon. After some number of generations, one of them would devoice dueto misperception in \u001Cnal position, creating a stop inventory of, say, /b, t, d, g/. Laterdevoicing misperceptions could change the inventory to /p, b, t, d, g/, and this createsan apparent gap at the velar place (although in this hypothetical example the gap is the24opposite of the one normally found in natural languages, where /g/ is more likely to bemissing). Eventually, misperception will \u001Cll in that gap for a full stop system of /p, b, t,d, k, g/, but at no point did agents intend for this to happen. They simply learn from theinput, and a full suite of consonant pairs is a side-e\u001Bect of the way that misperception isa\u001Becting the learning data.1.5 SummaryThis chapter introduced the concept of selection for learnability. This refers to the ideathat certain linguistic patterns tend to persist over time because they are more likely tobe successfully transmitted to the following generation. This concept was \u001Crst developedfor the study of syntax and morphology by Kirby (2000, 2001, 2002) who showed thatcompositional morpho-syntax can emerge over time because it is more learnable, given alimit on the number of sentences that a learner can have access to.I proposed that phonological inventories are also selected for learnability, although in adi\u001Berent way. Learners are not constrained by any bottleneck on transmission (since thenumber of phonological categories in a language is \u001Cnite and relatively small in number).Instead, the strongest in\u001Duence on inventories comes from channel bias (Moreton 2008),which refers to phonetic e\u001Bects like co-articulation or acoustic similarity. Channel biasintroduces variability into the input of a learner, and this variability is the precursor tosound change. If learners misperceive some aspect of speech due to channel bias, thenthis misperception gets retransmitted to the following generation when learners becomespeakers.For the purposes of this dissertation, I use the term \u0010misperception\u0011 very broadly, andit refers to any kind of sound change that occurs because a learner infers a sound systemthat di\u001Bers from the one used by the speakers of the language at the previous generation.This notion of misperception is in turn taken from the models developed by Ohala (1981,1983, 1992, et seq.) and Blevins (Blevins 2004, 2007).I implemented the basic assumptions of these diachronic models into a computer simu-lation called PyILM, which will be used to investigate three aspects of phonological inven-tories: their size, the relative frequency of their segments, and their feature economy. Thefollowing chapter provides more speci\u001Cc technical details of how PyILM works.25Chapter 2PyILM2.1 IntroductionThis chapter details PyILM, a computer program written in Python for simulating lan-guage transmission, with a focus on phonology. PyILM's design is informed by theoriesof sound change through misperception (e.g. Ohala (1983), Blevins (2004)), and its formalimplementation is based on the Iterated Learning Model (e.g. Smith et al. (2003), Kirby(2001)). These topics were discussed in detail in the previous chapter.PyILM simulates the transmission of a lexicon over multiple generations. It createsagents arranged in a chain, each of which learns a phonological system from the output ofthe previous agent. PyILM allows users to manipulate numerous parameters of this processand run iterated learning simulations to explore how sound change happens under di\u001Berentconditions. Section 2.2.2 of this chapter gives a complete list of the parameters that a usercan set. Section 2.4 gives some more details on how to use and con\u001Cgure a simulation.2.1.1 Iterated Learning ModelsComputational models of iterated learning, e.g. Kirby (1999, 2001), follow this basic patternof nested loops:Generate a speaking agentGenerate a l e a rn i ng agentLoop x times :Loop y times :The speaker produces an ut te ranceThe l e a r n e r l e a r n s from th i s ut t e ranceRemove the speaker from the s imu la t i onMake the l e a r n e r the speakerCreate a new l e a r n e r26This pattern represents x generations of language transmission with y learning itemsat each generation. The corresponding loops for PyILM are given in Algorithm 2.1. Theseloops are explained in \u0010pseudo-code\u0011, which I will continue to use throughout the chapterto explain the logic of PyILM. Pseudo-code is text consisting of valid Python expressions,most of which appears as actual lines of code in PyILM. However, some of the code hasbeen changed to make it more readable in the context of a dissertation, hence the namepseudo-code.Algorithm 2.1 Main simulation loop1 Simulat ion . load ( \" c on f i g . i n i \" )2 speaker = BaseAgent ( )3 Simulat ion . i n i t i a l i z e ( speaker )4 l i s t e n e r = Agent ( )5 for gene ra t i on in range ( S imulat ion . g ene ra t i on s ) :6 for j in range ( S imulat ion . words_per_turn ) :7 word = speaker . t a l k ( )8 word = Simulat ion . t ransmit (word )9 l i s t e n e r . l i s t e n (word )10 Simulat ion . r ecord ( gene ra t i on )11 speaker . clean_up ( )12 speaker = l i s t e n e r13 l i s t e n e r = Agent ( )14 Simulat ion . generate_output ( )Line 1 loads user-provided details about the simulation and con\u001Cgures PyILM appro-priately. Line 2 creates a new agent for the \u001Crst generation of a simulation, and Line 3seeds it with an initial lexicon and inventory. Line 4 creates a new \u0010blank\u0011 listener who willlearn her language from the speaker. To be clear, this initialization phase is not intendedto represent any actual events in language transmission. PyILM only simulates the \u0010evolu-tion\u0011 of language in the sense that it simulates how languages change over time; it does notsimulate the emergence of language from non-language. The \u001Crst speaker in the simulationrepresents some speaker at some point in the history of some language. Note that the \u001Crstspeaker is formally a di\u001Berent kind of object in the program than the other speakers, sincethe \u001Crst speaker requires a set of initialization functions, while later generations rely onlearning algorithms.Line 5 starts a loop that runs once for each generation being simulated (see sec-tion 2.2.2.2). Line 6 starts a loop that runs once for each word a learner hears.In line 7 a speaker chooses a word to say (see the production algorithm, section 2.3.4).Line 8 simulates misperception by changing some of the segments of the word (see sec-tion 2.2.10). Misperceptions are context-sensitive, and probabilistic, so sometimes nothing27at all happens on this line. On line 9 the learner learns from this new word (see the learningalgorithm, section 2.3.1).On line 10, PyILM keeps a record of what the speaker's inventory and lexicon look likebefore the speaker is removed from the simulation on line 11. On line 12, the new speakercreates some probability distributions to be used during the next production phase. Line13 generates a new listener for the next loop to start over again.Line 14 is executed only after the \u001Cnal generation of the simulation has \u001Cnished learning.It prints a report of what happened during the simulation, using the information logged ateach generation on line 10. The program then terminates.2.2 Objects2.2.1 OverviewPyILM was written using an object oriented approach to programming. An \u0010object\u0011 is wayof representing a concept in a computer program. In the case of PyILM, objects representconcepts relevant to sound change or phonology. Objects have attributes which representproperties or characteristics of the objects and their values may be \u001Cxed or mutable. Objectsalso have methods which describe what the object can do. An example of an object is theFeature object, which represents the concept of a distinctive phonological feature. Featureobjects have two attributes: name and sign. These are both strings, with name having avalue of something like \u0010voice\u0011 or \u0010continuant\u0011 and sign having a value of either \u0010+\u0011 or \u0010-\u0011.They also have an equal_to method, which is used to decide if two Feature objects are thesame or not.This example also illustrates two typographical conventions adopted in this chapter.First, the names of objects in the computer program are written with an initial capitalletter to distinguish them from the use of the same words to refer to concepts in linguistics,e.g. the Feature object vs. distinctive feature. Second, the typewriter font is used whenreferring to object attributes and methods.This section explains the main objects in the simulation, as well as their relevant at-tributes. Details about object methods are, for the most part, omitted here because theyare generally not of relevance for understanding how the simulation works. One exceptionto this is the Agent object, which has methods for speech production and learning thatare important to understanding the simulation. Examples of omitted methods include:methods for making equal to and not equal to comparisons, methods for generating stringrepresentations, and methods for reading and writing to \u001Cles.There are nine objects discussed in this section: Simulation, Word, Segment, Feature,FeatureSpace, Sound, Token, Agent and Misperception. To understand the relationshipbetween them, it is useful to think of the objects in PyILM as being \u0010stacked\u0011 inside oneanother. The diagram in Figure 2.1 is a visualization of this. Note that the \u001Cgure is not28Figure 2.1: The objects of PyILMa description of object inheritance - it is a visualization of how the objects \u001Ct togetherconceptually.Every run of the simulation creates a new Simulation object, inside of which there areAgent objects that talk and listen to each other. Agents all have a lexicon attribute whichcontains Words, which are made up of Segments, which are made up of Features.A speaker uses a production algorithm to transform Segments into a di\u001Berent kind ofobject called a Sound, representing an actual speech sound instead of a unit in the mentallexicon. The phonetic characteristics of speech sounds are represented by objects calledTokens.The simulation passes Sounds through a misperception function, which may alter someof their Tokens' values, depending on the environment in which they occur. Then theseSounds are sent to the listener's learning algorithm which creates new Features and Seg-ments.The transmission of a single segment is illustrated in more detail in Figure 2.2. Aspeaker starts by selecting a word from the lexicon. The lexicon is a list of meanings, eachassociated with a string of segment symbols, each of which are \u0010translated\u0011 into a set ofphonological features. All possible features that can be discriminated are kept together ina FeatureSpace object, where each feature is represented as an interval [0,1]. Figure 2.229shows only three feature dimensions (F1, F2, and F3), with each dimension represented asa number line. The points circled with solid lines are the range of values that represent[\u00E2\u0088\u0092feature] segments, and the points circled with dashed lines are the [+feature] values. Thelines in black represent the values a speaker experienced during their time as a learner. Thelines in red represent the particular phonetic values that the production algorithm choseon this occasion. Supposing that F1is [voice], F2is [continuant] and F3is [nasal], then thesegment is [+voice, \u00E2\u0088\u0092continuant, \u00E2\u0088\u0092nasal]. This could be represented as /b/ (among othersymbols).Figure 2.2: The transmission of a phonological segmentThe values chosen by the production algorithm are then sent to the misperceptionfunction. In this example the environment was right for a misperception to occur, and thelistener is going to hear the F1value - the voicing value - of this segment as lower thanintended by the speaker.The red lines in the learner's FeatureSpace represent where the values were stored. In\u001Cgure 2.2 the learner has already heard a number of words, and has formed some phono-logical categories. The boundaries will continue to shift over the course of learning.The values for F2and F3are interpreted \u0010correctly\u0011 by the listener, that is, her learningalgorithm categorizes them as examples of the same phonological feature value as in thespeaker's FeatureSpace. Due to the misperception, F1is categorized as an example of theopposite feature value.302.2.2 Simulation2.2.2.1 OverviewEach time PyILM is run, a single Simulation object is created, and the entire simulationruns \u0010inside\u0011 of this object. The Simulation object has methods for initializing simulationdetails, creating and removing agents from the simulation, causing agents to speak or listen,causing misperceptions to occur, and writing the results of the simulation to \u001Cle. None ofthese methods are discussed here. Details about Agents (2.2.9) and Misperceptions (2.2.10)can be found in their own sections.The Simulation object has 10 attributes representing factors relevant to cultural trans-mission. These are discussed here, and it is possible for users to set the value of any ofthese attributes. See section 2.4 for details on how to do this.2.2.2.2 generationsThe value of this attribute determines how many generations of listeners the simulationshould run for. The default value is 30. This means the simulation stops after the 30thlearner has \u001Cnished learning.2.2.2.3 initial_lexicon_sizeThis value of this attribute sets the number of words in the initial lexicon. The defaultvalue is 30. The words of the initial lexicon are created using the invention algorithm (seesection 2.3.5).2.2.2.4 initial_inventoryThis attribute controls the size and contents of the initial segment inventory of the simu-lation. The value supplied to should be a list of segment symbols separated by commas.For instance, p,t,k,b,d,g,f,s,h,m,a,i,e would be an acceptable starting inventory. The setof symbols used should correspond with symbols in a feature \u001Cle (see the features_fileattribute in section 2.2.2.9).If some degree of randomness is desired, then the value supplied to initial_inventoryshould be two numbers separated by a comma. The \u001Crst number represents the number ofconsonants and the second the number of vowels. There must be at least one consonantand at least one vowel in every simulation. The segments are randomly selected from thefeature \u001Cle (section 2.2.2.9). The simulation determines what is a consonant and what isa vowel by checking the value of the [vocalic] feature : [+voc] segments are called \u0010vowels\u0011and [\u00E2\u0088\u0092voc] segments are called \u0010consonants\u0011.If no initial inventory is supplied, then the default value is 10 random consonants and3 random vowels.312.2.2.5 minimum_repetitionsDuring the production phase, the speaking agent will produce every word in the lexicon atleast minimum_repetition times. For example, if set to 2, then every word will producedat least twice. The default value is 1, and it cannot be set any lower.Words in a lexicon are grouped into \u0010frequency blocks\u0011, with the \u001Crst block containingwords that are produced exactly minimum_repetition times. Each successive block that iscreated has a frequency of twice the previous block. Doubling the frequency approximatesa Zip\u001Can distribution of words in natural language (Yang 2010). This blocking is donerandomly for the \u001Crst generation in a simulation. If any words are invented during thesimulation (see section 2.2.2.11) they go into the least-frequent block.In natural language, a similar Zip\u001Can distribution holds of individual words: the mostfrequent word in a language is about twice as frequent as the next one, which is is twiceas frequent as the next, and so on. Early testing with PyILM found that implementingfrequency distributions on a per-item basis resulted in simulations that ran for far too long.By grouping words into frequency blocks, each of which is twice as frequent as the next,the running time of the simulation is greatly improved while still maintaining a Zipf-likedistribution.2.2.2.6 min_word_lengthThis value sets the minimum number of syllables a word must have. The default value is1, and it cannot be set lower.2.2.2.7 max_word_lengthThis value sets the upper bound for the number of syllables a word can have. The defaultvalue is 3, and it must be equal to or greater than min_word_length. These min and maxvalues are used by the agent's invention algorithm when generating new words (see section2.3.5).2.2.2.8 phonotacticsThe phonotactics attribute is used by an agent's invention algorithm (see section 2.3.5)for creating new words. Invention happens in every simulation run to generate the lexiconof the very \u001Crst agent. Invention otherwise only occurs if Simulation.invention_rate isset greater than zero (see section 2.2.2.11).The value supplied to this attribute should be a single string that consists of only theletters \u0010C\u0011 and \u0010V\u0011. This string represents the maximal syllable structure. By default,PyILM assumes that all possible sub-syllables should be allowed. For instance, if the valuesupplied is \u0010CCVC\u0011, then the set {V,CV,CCV,VC,CVC,CCVC} will serve as the set ofpossible syllables. The simulation determines which segments are consonants and which32are vowels by looking at the segment's value of [vocalic]. [\u00E2\u0088\u0092voc] are treated as \u0010C\u0011 and[+voc] are treated as \u0010V\u0011.It is possible to exclude a subset of syllables by listing them after the maximal form sepa-rated by commas. For instance, if the string supplied to this attribute is \u0010CCVCC,VC,VCC\u0011then the maximal form is CCVCC, and all of its sub-syllables are allowed except VC andVCC. In other words, simulation will use the set {CV,CVC,CCVC,V}. All languages inPyILM must allow a syllable consisting of at least a vowel, so the syllable V cannot beexcluded.The phonotactics of a language are \u001Cxed for a given simulation. This is whyphonotactics is an attribute of the Simulation object, as opposed to being an attribute ofthe Agent object. The phonotactics play a role in the outcome of a simulation (since sound-changing misperceptions are context-sensitive and phonotactics de\u001Cnes the set of possiblecontexts) so it is useful to hold them invariant for a simulation to understand their e\u001Becton sound change. The following chapter describes the output of a simulation, and there ismore discussion of the speci\u001Cc e\u001Bect of phonotactics.The default value is \u0010CVC\u0011.2.2.2.9 features_fileThe value supplied for this attribute should be the name of a text \u001Cle which gives a phono-logical feature description for possible segments. PyILM comes with a default features \u001Clethat describes several hundred segments using more than a dozen features, and will besu\u001Ecient in most cases. This \u001Cle is a based on the ipa2spe \u001Cle available in P-base (Mielke2008). However, users can modify this \u001Cle or write their own. Each line of the \u001Cle must have\u001Crst a symbol, then a list of phonological features, all separated by commas. An exampleof such a \u001Cle, with a very small feature space, is given in Figure 2.3.33i,+voice,+cont,+voc,+highA,+voice,+cont,+voc,-highG,+voice,+cont,-voc,+highiP,+voice,-cont,+voc,+highi\u00CB\u009A,-voice,+cont,+voc,+highv,+voice,+cont,-voc,-highg,+voice,-cont,-voc,+highi\u00CB\u009AP,-voice,-cont,+voc,+highb,+voice,-cont,-voc,-highk,-voice,-cont,-voc,+highp,-voice,-cont,-voc,-highA\u00CB\u009AP,-voice,-cont,+voc,-highf,-voice,+cont,-voc,-highA\u00CB\u009A,-voice,+cont,+voc,-highx,-voice,+cont,-voc,+highAP,+voice,-cont,+voc,-highFigure 2.3: Sample feature \u001CleThe features provided in this \u001Cle de\u001Cne the dimensions of phonetic and phonologicalspace that can be used by a language. Every Segment in an agent's lexicon in the simulationconsists of some set of phonological features (see section 2.2.4.2). Every Sound uttered byan agent consists of a set of multi-valued phonetic features (see 2.2.7). The names ofthe features used in both cases are the same, and they are determined by the names infeatures_file.The symbols in this \u001Cle largely serve as a kind of user interface. Sounds are actuallyrepresented in PyILM as numbers in a list, but this representation is unhelpful for a human.If PyILM has to print a symbol, for example when producing a report of the simulation, ituses these symbols as a more readable representation.2.2.2.10 max_lexicon_sizeThis sets a maximum size for the lexicon of the language. If the lexicon has reachedmaximum size and invention is required, then one of the least frequent words in the lexiconis selected (at random) and removed from the language, to be replaced by the newly inventedword. The default value is 30.2.2.2.11 invention_rateThis represents the probability that a speaker will produce words that were not in herinput. This could represent new words that have come into fashion during the speaker's life,coinages that she created herself, borrowings, or any other source of new words. Note that34invention never introduces new sounds, so if inventions are considered to be like borrowings,then it would be a case of complete adaptation to the native sound system. Further, inventedwords always match the phonotactics of the language (see section 2.2.2.8). The number ofinvented words is determined by the max_inventions parameters (see section 2.2.2.12).The value of this attribute must be between 0 and 1. If the value is set to 0, then agentsnever invent words and only the words used in the \u001Crst generation will be the ones usedthroughout (though, of course, their phonological and phonetic properties may change). Ifthe value is set to 1, then new words will join the lexicon each generation. See section 2.3.5for more details on the invention algorithm. The default value is 0.2.2.2.12 max_inventionsThe invention phase only happens once for each agent, at the beginning of the productionphase. Pseudo-code for this is given below. During the invention phase, the simulation willmake max_inventions attempts to generate a new word and add it to the lexicon. Thedefault value for this attribute is 0 (i.e. no inventions ever occur). The probability of aword actually being invented is set by the attribute invention_rate (see section 2.2.2.11).f o r j in range ( Simulat ion . max_inventions ) :n = random . random(#generate a random number in [ 0 , 1 ]i f n <= Simulat ion . invent ion_rate :word = agent . create_new_word ( )agent . update_lexicon (word )In a simulation where max_inventions has been set to X, and invention_rate is setto Y, the probability that X new words actually are invented at a given generation is xyfor x M (. The probability that no new words are invented is () \u00E2\u0088\u0092 x)y. Suppose thatmax_inventions is 3 and invention_rate is 0.2. This means that for any generation,no more than 3 new words can enter the lexicon. The probability that only 2 new wordsare invented is (:2 \u00C3\u0097 (:2 5 (:(4. The probability that no new words enter the lexicon is(:0\u00C3\u0097 (:0\u00C3\u0097 (:0 5 (:5)2.2.2.2.13 misperceptionsThis should be the name of a text \u001Cle, in the same directory as PyILM, that contains a listof misperceptions that could occur over the course of a simulation. Each line should havesix arguments separated by semi-colons. The PyILM Visualizer also contains an option forcreating misperception \u001Cles with a more intuitive user interface. See section 2.2.10 for moredetails on Misperception objects.The \u001Crst argument is the name of a misperception. The second argument is a list ofphonological features, separated by commas, that describe segments that can be altered by35the misperception. The third argument is the name of a feature which undergoes change ifthe misperception occurs. Multiple values for the third argument are not permitted, andmisperceptions can only change one feature at a time. The fourth argument is a number inthe interval (0,1) representing how much the phonetic feature changes if the misperceptionhappens. The \u001Cfth argument is the environment in which the misperception can occur.Environments should be speci\u001Ced at the level of phonological features. A special value of* can be used to mean \u0010context-free\u0011. The sixth and \u001Cnal argument should be a numberin (0,1) representing the probability that the misperception occurs. The default set ofmisperceptions are in \u001Cle called \u0010misperceptions.txt\u0011 that is bundled with PyILM and userscan also consult this \u001Cle for an example of the format to follow.pre-nasal nasalization bias;-nasal,+voc;nasal;.05;_+nasal,-voc;.15word \u001Cnal devoicing;+voice,-nasal;voice;-.1;_#;.2intervocalic lenition;-cont,-son;cont;-.1;+voc_+voc;.1default vowel voicing;+voc;voc;.05;*;.75In the \u001Crst example, the misperception targets non-nasal vowels. The nasal value ofthese vowels is raised by 0.05 when these vowels appear before nasal consonants, and onany given utterance where such a context appears there is a 0.15 chance this happens. Thesecond example targets voiced non-nasal segments. These sounds have their voicing valuesdecreased by 0.1 when they appear word \u001Cnally, and this happens with a 0.2 probability.The \u001Cnal example shows a context-free misperception, which is referred to as a \u0010bias\u0011.In this case, vowels have their voicing values raised slightly under all circumstances. Thisallows vowels to be a\u001Bected by the \u001Cnal-devoicing misperception, but not to the sameextent as consonants because their voicing value gets raised a little bit anyway. (Of course,vowels could be completely excluded from the \u001Cnal devoicing misperception by changing thetargeted segments to be include [\u00E2\u0088\u0092voc] only.) In Chapter 5, the e\u001Bects of misperceptionsand biases are explored in more detail.The amount by which a misperception changes a sound cannot be greater than 1 or lessthan -1. This is because phonetic values in PyILM must be numbers between 0 and 1. Ifthe e\u001Bect of misperception would push a value above 1, then PyILM will force the valueback down to 1. Similarly if a misperception were to push a value below 0, PyILM willraise the value back up to 0.2.2.2.14 minimum_activation_levelThis attribute should be a number in [0,1] representing how close to an existing categoryan input examplar must be, in order to be considered for membership in that category.Technical details of the exemplar learning algorithm are given in section 2.3.1.36Setting this number closer to 1 sharpens an agent's discrimination, and permits smallerdistances between categories. This leads to inventories with more segments and less vari-ation in pronunciation. Setting it all the way to 1.0 means that an input exemplars onlycount as a member of an existing category if they have phonetic values that match exactlyto all other exemplars in the category. This is a rare occurrence, so inventories tend togrow rapidly with this setting.Setting this number closer to 0 blurs an agent's discrimination and increases the distancerequired between categories. Setting it all the way to 0 means that there is no distancebetween categories, and every input sound after the \u001Crst counts as an example of whateverthe learner \u001Crst heard. This leads to an immediate collapse in the segmental inventory andit reduces to a single segment within a generation.2.2.2.15 auto_increase_lexicon_sizeThe attribute initial_lexicon_size (see section 2.2.2.3), is used to determine the numberof words in the lexicon of the initial generation. These words are all randomly generated,using the set of sounds supplied to the initial_inventory (section 2.2.2.4) attribute.However, in this randomness, it sometimes happens that not all of the initial sounds actuallymake it into a lexical item. If auto_increase_lexicon_size is set to True, then PyILMwill continue to generate words for the initial lexicon until every sound occurs at least once,even if it means surpassing initial_lexicon_size. If auto-increasing is set to False, thenthe lexicon size remained capped at the the initial value.2.2.2.16 initial_wordsThis parameter allows the user to submit a list of words, separated by commas, that shouldappear in the lexicon of the initial generation. It is the user's responsibility to ensure thatthe words contain symbols which appear in the initial lexicon of the language. If a wordsupplied to initial_words contains a symbol not in the initial inventory, then PyILM willraise an exception and stop running. In short, this parameter cannot safely be used incombination with a randomly selected started inventory (see section 2.2.2.4).It is possible for a user to create words that do not conform to the phonotactics of thelanguage with this parameter, although this is not recommended as it may cause unexpectedbehaviour during the simulation.The initial lexicon is guaranteed to include every word supplied toinitial_words, even if this means going beyond the lexicon size supplied toinitial_lexicon_size (section 2.2.2.3). If this occurs, PyILM will also not enforcethe auto_increase_lexicon_size (section 2.2.2.15) parameter, and the initial inven-tory will consist only of the sounds found in the initial_words list. If, on the otherhand, the number of initial words is smaller than the initial lexicon size, PyILM willcontinue to randomly generate words until the lexicon is an appropriate size, and the37auto_increase_lexicon_size parameter works as expected.By default, this option is not turned on and the initial lexicon will consist of randomlygenerated words.2.2.2.17 allow_unmarkedNormally, sounds in a PyILM simulation are represented as lists of binary features, markedas either [+feature] or [\u00E2\u0088\u0092feature]. If the allow_unmarked option is set to \u0010True\u0011, then a thirdfeature value \u0010n\u0011 is allowed (this is the \u0010unmarked\u0011 value). If a sound is marked [nfeature],it means that every instance of that sound experienced by a learner had a phonetic value of0 on a given feature dimension (see sections 2.2.5 and 2.2.6 for more details on how featureswork in a simulation). In practice, [nfeature] would be used in cases where a feature doesnot apply at all to a sound, e.g. a glottal stop could be marked [nlateral] because the tonguebody is not involved whatsoever in the articulation of a glottal stop.Note that [nfeature] is not equivalent to [\u00E2\u0088\u0092feature], even though sounds with eitherfeature value will have low phonetic values on particular dimension. If the allow_unmarkedoption is used, it is important to ensure that misperceptions are formatted properly (seesection 2.2.10), and speci\u001Ccally make reference to [nfeatures] if desired.The default value of allow_unmarked is \u0010False\u0011, meaning that only binary features areused in a simulation. Every simulation reported in this dissertation was run with the defaultvalue for this option.2.2.2.18 seedThis attribute controls the random seed used in PyILM. Its value can be any number orstring. By default is it a randomly selected integer in [1,10000]2.2.2.19 seg_specific_misperceptionsThis parameter used to create a special kind of misperception, for the purposes of a testinga hypothesis about feature economy to be presented later in this dissertation. Details aregiven in Chapter 5, Section 5.4. This parameter takes a value of either True or False, andthe default is False.2.2.3 WordsWords are generalized objects that represent either an entry in a mental lexicon or anutterance of one of these lexical items. Words have two attributes: string, which is a listrepresenting the segmental melody of the word, and meaning, which is an integer.382.2.3.1 stringIf a Word is in a lexicon, then the string attribute is an ordered list of Segments (seesection 2.2.4) representing the segmental content of a word, plus two word boundary symbols\u0010#\u0011. The value of string in this case is learned, and updated, as part of the learningalgorithm (see section 2.3.1).If a Word represents an utterance, then string is an ordered list of Sounds (see 2.2.7). Inthis case, the value of string is determined by the production algorithm (see section 2.3.4).2.2.3.2 meaningThe meaning attribute is just an integer. The meaning attribute in used when the learningalgorithm checks if an input word means the same thing as a known lexical item. Twowords are considered to \u0010mean the same thing\u0011 if their meaning attributes compare equal.This only very roughly models the concept of \u0010meaning\u0011, and is certainly not representativeof what a real human speaker knows about the meanings of words. This simpli\u001Ccation issu\u001Ecient for the purposes of modeling sound change.The \u001Crst word invented in the simulation is assigned a meaning of 0. Then a counter isstarted, and each new word is assigned the next integer. One consequence of this process ofgenerating meanings is that there can be no poly-morphemic words in any language. Everyword has a single meaning, so any language generated by PyILM is completely isolating.Another consequence is that no synonyms will ever appear in the language. If a listenerhears two words that mean the same thing, it will always be two instances of the sameword. The counter doesn't run backwards, so speakers will never invent a new way ofsaying something they can already say. There can be variations in the pronunciation of aword, due to phonetic e\u001Bects or misperceptions, but there can be no completely unrelatedforms with the same meaning.2.2.4 SegmentsSegments represent mental categories, themselves representing speech sounds, that agentscan learn. Segments are the units that make up the words in an agent's lexicon. They arenot the actual speech sounds that agents produce and perceive. In other words, they arelike phonemes, not phones. The objects representing speech productions are called Sounds,and they are described in section 2.2.7.Segments are not atomic objects. They are represented by a set of phonological fea-tures (see 2.2.5). Features in turn are abstractions representing a ranges of values along aparticular dimension of phonetic space.The ability to segment speech is taken for granted in PyILM and learners are assumed tobe able to portion out the speech signal in such a way as to form some kind of segment-likeunit. The particular phonetic and phonological characteristics of segments are, of course,learned during the simulation and not pre-determined.39Segments have three attributes: symbol, which is an identi\u001Cer for the segment,features, which is a list of distinctive phonological features, and envs, which lists allthe environments the segment appears in.2.2.4.1 symbolSegments all have a symbol attribute, which is a string of Unicode characters (normally butnot necessarily of length 1). The symbols are drawn from those provided to the simulation'sfeatures_file variable (see section 2.2.2.9).Symbols are just a convenience for the simulation, and the actual choice of a symbolcan be entirely arbitrary. However, experience has found that choosing segment labels atrandom makes it very di\u001Ecult to interpret the simulation results. The natural intuition ofa linguist is to assume that IPA characters are used meaningfully, so if instead they arerandomly associated with features, then reading simulation results becomes a frustratingpuzzle of trying to remember what, e.g. /p/ stands for this time. Instead, PyILM triesto choose a \u0010reasonable\u0011 symbol for a given segment by selecting a symbol for a segmentwhose feature description most closely matches the segment under consideration. Thismeans that in most cases, the segment symbol will match the phonetic values in a way thatmakes linguistic sense, although it is always safer to inspect the actual feature values andnot rely on the symbol.In some cases, sounds can appear in a simulation that have a feature speci\u001Ccation notfound in the user's feature \u001Cle. For instance, a user might include a misperception thatnasalizes stops under some conditions. Normally, stops are [\u00E2\u0088\u0092sonorant] and nasals are[+sonorant], but this nasalization misperception can create sounds that are [\u00E2\u0088\u0092sonorant,+nasal]. PyILM needs a symbol for such a sound, and if it cannot \u001Cnd a perfect matchin the user's feature \u001Cle, it will take a sound that matches most of the features. This canoccasionally lead to unexpected results when visualized (e.g. PyILM might pick a symbolfor a plosive to represent the non-sonorant nasal). Again, it is always safer to check featurevalues than to rely on the assigned symbol, especially in simulations with a large numberof misperceptions and less-predictable outcomes.2.2.4.2 featuresThis attribute is a list of distinctive Features (see 2.2.5) that uniquely characterize thesegment. These values are inferred by the agent's learning algorithm (see section 2.3.1),and may change over the course of learning. They are \u001Cxed after learning ends, and donot change once the agent becomes the speaker. The actual distribution of phonetic valuescorresponding to a particular segment is recorded in a FeatureSpace object. These objectsare described in more detail in section 2.2.6.402.2.4.3 envsThe envs attribute keeps track of all the environments in which a segment appears. Theenvironment of a Segment is de\u001Cned as as the immediately adjacent Segments to the leftand right. More formally, the environment of Segment in position j of a Word's string is atuple consisting of the Segment at position j -1 and the Segment at position j+1. Words inan agent's lexicon begin and end with word boundaries, and these provide the appropriateenvironment for word-initial and word-\u001Cnal segments. Although word boundaries are for-mally treated as Segments, they di\u001Ber from the Segments described in this section in thatthey lack phonological features and they are not considered part of an agent's inventory.They only exist as part of an agent's lexicon; the objects transmitted to learners do notcontain word boundaries and must be re-constructed by the learner. This has no practicale\u001Bect on the simulation at the moment because speakers only ever utter one word at a time.2.2.4.4 distributionThe distribution attribute is a normal probability distribution representing the distributionover possible phonetic feature values for the segment. An agent keeps in memory onedistribution for each feature dimension. These are created at the end of the learning phase.2.2.5 FeaturesA feature-dimension is a one-dimensional space, a number line, representing some salient,gradable property of speech that listeners are aware of and can use to categorize speechsounds. Feature objects represent phonologically signi\u001Ccant ranges of values, and are cre-ated by the learning algorithm (see section 2.3.1). These ranges represent values for phono-logical features, and there are three possible feature values: \u0010+\u0011, \u0010\u00E2\u0088\u0092\u0011, \u0010n\u0011. The [+feature]category is for ranges of higher values, and the [\u00E2\u0088\u0092feature] category is for ranges of lowervalues. The actual values depend on the details of the simulation in question. A third valueis [nfeature] which is assigned to segments that are always expressed with a value of 0 forthe feature (e.g. a glottal stop would be [nlateral]). By default, the \u0010n\u0011 value is not used insimulations, and all features are binary. To use it, set Simulation.allow_unmarked to Truein the con\u001Cguration \u001Cle.The number of feature dimensions is determined by the features listed inSimulation.features_file (see 2.2.2.9). The clustering into features is also illustrated in\u001Cgure 2.2 on page 30.Feature objects have two attributes: sign and name. The sign attribute can havevalue of \u0010+\u0011 or \u0010-\u0011, (and possibly \u0010n\u0011) and name is drawn from those provided toSimulation.features_file. The value of \u0010n\u0011 is only available if the allow_unmarkedoption is turned on (see section 2.2.2.17). This option is turned o\u001B by default.The set of feature-dimensions is \u001Cxed at the beginning of a simulation and does notchange throughout (see section 2.2.2.9). This set is represented as a FeatureSpace object41(see section 2.2.6). Having a \u001Cxed set of features does not, of course, mean that every oneof them will participate in a contrast for a given simulation. For instance, it is possiblefor a language to have no lateral consonants in the initial generation, meaning no segmentsmarked [+lateral], and to not acquire any over the simulation run. So long as every soundhas relatively low values along the [lateral] dimension, they will all get classi\u001Ced as [\u00E2\u0088\u0092lateral],and the feature will not be contrastive.2.2.6 FeatureSpaceA FeatureSpace object is a multidimensional space representing all possible phonetic values.Every utterable sound in the simulation can be represented by some point in this space.FeatureSpaces have f feature-dimensions, where f is the size of the set of features providedto the Simulation.features_file attribute (see section 2.2.2.9). Points in any givendimension fall somewhere in the interval [0,1]. Formally a FeatureSpace is just a Pythondictionary (a hash table) where the keys are the names of a feature-dimension and valuesare lists of Token objects (see section 2.2.8) that are stored along that feature-dimension.This has the consequence that one phonological feature in a language corresponds toonly one kind of phonetic feature along a single dimension. This is unlike natural languagewhere a range of phonetic characteristics may be related to a phonological feature. Forinstance, [voice] may correspond with VOT, burst amplitude, and F0 (e.g. Lisker (1986),Raphael (2005)).2.2.7 SoundsIterated learning models place a heavy emphasis on the fact that language exists in both amental form and a physical form. The Segment objects discussed in section 2.2.4 representpart of mental language. The corresponding object representing physical speech is calleda Sound. The di\u001Berence between a Segment and a Sound is analogous to the di\u001Berencebetween a phoneme and a phone. Sounds are created by the production algorithm (sec-tion 2.3.4), further manipulated by the misperception function (section 2.2.10), and serveas input to the learning algorithm (section 2.3.1).Sounds as objects are similar to Segments. Sounds only have symbol and features at-tributes, although features is a list of Token objects (see section 2.2.8), rather than Featureobjects which is the case with Segments. A Sound exists for only a single event of trans-mission, then is removed from the simulation. The environment in which a Sound occursis calculated in place by the misperception function, or the listener's learning algorithm, asneeded.2.2.8 TokensTokens represent the spoken values of Features (section 2.2.5), much like Sounds (sec-tion 2.2.7) represent spoken Segments (section 2.2.4). Features actually represents ranges42of phonetic values, and a Token object represents one value from that range. A Token hasfour attributes: name, value, label, and env.2.2.8.1 nameThe name attribute is the name of whichever phonetic feature this Token represents (seesection 2.2.2.9).2.2.8.2 valueThe value attribute is a number in [0,1]. In a sense, value represents how \u0010strongly\u0011 agiven sound expresses a particular feature (whichever feature is given for the name attribute).Larger values are intended to represent increasingly salient or prominent information, al-though what this means would depend on the feature in question. What makes somethingmore [nasal] in actual speech would depend on, e.g. nasal air\u001Dow, degree of closure, nasalityof adjacent segments, etc.2.2.8.3 labelThe label attribute is a reference to the symbol attribute of one of the Segment objectsin the agent's inventory, indicating that a Token counts as an exemplar of that particularsegment. The values along a given feature dimension that are associated with a particularsegment will change over the course of learning, and Token labels are continually updatedto keep in line with changes to the inventory.2.2.8.4 envThe attribute env represent the environment in which the Token was perceived. Thisconsists of a tuple of two references, the \u001Crst a reference to the Segment on the left andthe second a reference to Segment on the right. These references allow a dynamic updatingof the FeatureSpace as the learner's inventory changes over the course of learning. Forinstance, suppose a learner has perceived a word-initial Token before a vowel which getslabeled /e/. The env attribute of this Token would be the tuple (#,e). If the category /e/later gets merged with another vowel category and has its label changed, perhaps to thelabel /i/, then the env of this Token would be automatically updated to (#,i).2.2.9 AgentsAgent objects represent the people who learn and transmit a language. Agents have fourimportant attributes: lexicon, inventory, feature_space, and distributions. Thereare also three Agent methods described in their own section: a production algorithm (2.3.4),a learning algorithm (2.3.1), and an invention algorithm (2.3.5).43In addition to the Agent object, there is also a BaseAgent object, which is used foragents in the 0th generation of the simulation. The two objects share many attributesand methods, and formally speaking Agent inherits from BaseAgent. These distinctionsare largely unnecessary for understanding how the simulation works, however, so they areignored here and I present all of the relevant information under the general heading of\u0010Agent\u0011.2.2.9.1 lexiconEntries in the lexicon are represented by another kind of object called a Word (see sec-tion 2.2.3). Words are learned and stored in the lexicon as part of the learning algorithm(see section 2.3.1). Lexicons are essentially just \u0010storehouses\u0011 of words. The lexicon is amapping between meanings and lists of Words that can be used to convey that meaning.Each possible Word is stored alongside the raw count of how many times it appeared in anagent's input. Multiple Words can become associated with the same meaning through mis-perception. For instance, in a simulation with a \u001Cnal devoicing misperception, the meaning17 might be associated with the list /pad (6), pat (4)/, which would indicate that [pad] washeard 6 times during an agent's learning phase, while the word [pat] was heard 4 times.Put another way, meanings are analogous to lexical items, and Words are phonologicalrepresentations of these lexical items.2.2.9.2 inventoryThe inventory of an agent is a list of all the Segments that appear in at least one Word inthe agent's lexicon. The inventory is used in learning to make comparisons between words(see section 2.3.1). The inventory is also used by the invention algorithm (see section 2.3.5),which creates new arrangements of known segments.2.2.9.3 feature_spaceThis attribute just serves to point to a FeatureSpace object (see section 2.2.6). This objectrepresents a multidimensional phonetic space, and every sound that an Agent can hearor produce is represented as a point in this space. The feature_space of an Agent isinitially empty, and it gets \u001Clled with points, which are then clustered, during learning.The production algorithm makes use of these clusters for deciding on what phonetic featurevalues to assign to di\u001Berent phonological values.2.2.9.4 distributionsAn agent's distributions attribute is a dictionary organized \u001Crst by segment, then byfeature. Each feature is mapped to a probability distribution, which is sampled by the44production algorithm when it needs phonetic values. This is described in more detail insection 2.3.4.1.2.2.10 MisperceptionThe idea behind misperceptions is that some sounds, in some phonetic environments, aresusceptible to being perceived by the learner in a di\u001Berent way than the speaker intended.For instance, there is a tendency for word \u001Cnal voiced obstruents to be pronounced in sucha way as to be perceived as voiceless (Blevins (2006b)). This can lead to instances ofmisperception where a speaker intends /bab/ and the learner understands /bap/.Misperception objects are intended to represent factors inherent to human communica-tion that a\u001Bect perception of sounds probabilistically, in well-de\u001Cned environments. Theseare factors that could potentially a\u001Bect speech perception at every utterance, and so becomerelevant to the cultural transmission of sounds. For instance, speakers produce oral vowelswith more nasality before nasal consonants (Chen (1997)). This fact about the pronun-ciation of vowels in certain environments means that in the transmission of any languagewith words that contain a sequence of an oral vowel followed by a nasal consonant, thereis some small probability that learners will mistakenly interpret these vowels as inherentlynasal, leading to a sound change where vowels articulated as oral vowels at one generationare articulated as nasal vowels in a later generation (cf. Ohala 1983)On the other hand, Misperception objects are not intended to represent instances ofmisperception caused by e.g. the conversation happening at a loud concert, or peanutbutter in the speaker's mouth. These factors certainly a\u001Bect production and perception ofspeech sounds, but they do not occur with enough regularity to be worth including in asimulation of cultural transmission.Formally speaking, a misperception is a probabilistic, context-sensitive change to aToken object's value attribute. Here are two examples:[+vocalic] \u00E2\u0086\u0092 [nasal +.1] / _[+nasal], .2 (\u0010pre-nasal nasalization\u0011)[+voice, \u00E2\u0088\u0092son] \u00E2\u0086\u0092 [voice -.15] / _#, .3 (\u0010\u001Cnal-devoicing\u0011)The \u001Crst example reads as \u0010There is a .2 chance that Tokens representing [+vocalic]Segments have their [nasal] value increased by 0.1 if they occur in the environment before aSegment marked [+nasal]\u0011. The second example reads as \u0010There is a .3 chance that Tokensrepresenting voiced obstruents have their [voice] value decreased by 0.15 if they occur inword-\u001Cnal position\u0011.The probabilities are arbitrary and chosen for illustration. The probability of any mis-perception actually occurring is an empirical question, and not one that PyILM can be usedto answer. Instead, users can set this value and run multiple simulations to understand howhigher and lower values a\u001Bect the overall course of sound change. In fact, all aspects of amisperception are open to modi\u001Ccation by the user (see the Simulation.misperceptionsattribute, section 2.2.2.13).45Misperceptions have six attributes: name, which identi\u001Ces the misperception, target,which describes the segment susceptible to misperception, salience, which is a numberrepresenting units of change, env which describes when the change happens, and p, whichrepresents the probability of a change happening. These are described in the subsectionsbelow and section 2.2.10.7 gives the pseudo-code for how misperceptions are handled inPyILM.2.2.10.1 nameThe name attribute is a string used to for referring to the Misperception. It has no rolein the outcome of a simulation. In fact, its only use is for printing the report at the endof a simulation. PyILM lists misperceptions that applied during the simulation so thatusers can more easily understand why certain sound changes happened. Keeping this inmind, name should be something descriptive, such as \u0010pre-nasal vowel nasalization\u0011 or \u0010\u001Cnalobstruent devoicing\u0011.2.2.10.2 targetThe target attribute is one or more phonological features representing the class of soundsa\u001Bected by the misperceptions. In the case of \u001Cnal devoicing, this attribute would probablybe set to \u0010+voice, -son, -voc\u0011.2.2.10.3 featureThis attribute is the name of the feature that changes if the misperception occurs. In thecase of \u001Cnal devoicing, the value of this attribute would be \u0010voice\u0011. The feature attributeis often, but not necessarily, one of the features listed in the target attribute. Only onename is allowed for this attribute.2.2.10.4 salienceThe salience attribute represents the magnitude of a change caused by misperception. Theattribute can be any real number in [\u00E2\u0088\u00921,1]. If a misperception actually happens, then itssalience is added directly to the value of the a\u001Bected Token (see section 2.2.8). However,Token values must remain in the range [0,1]. If the salience would drive a Token's valuebeyond those bounds, the value is rounded back to 0 or to 1.2.2.10.5 envThe env attribute is a string representing the environment in which a misperception takesplace. There are three possible formats for this string: \u0010X_\u0011, \u0010_X\u0011, \u0010X_Y\u0011, where X and Yare strings consisting of the names of one or more features separated by commas, and the46underscore represents the position of the sound that might be misperceived. For instance,the following are acceptable values:+voice_\u00E2\u0088\u0092nasal__+voice,\u00E2\u0088\u0092son+voc_+voc2.2.10.6 pThe p attribute is a number in (0,1) that represents the probability of a misperceptionoccurring.2.2.10.7 How misperception happensMisperception applies to the output of the production algorithm (see section 2.3.4, see also\u001Cgure 2.1 on page 29). The output of the misperception function is sent as input to thelearning algorithm. This means there are no further changes that can apply to any soundsin a word, once the word is received by the learning algorithm.The misperception function loops through the utterance, and checks to see if any of thesegments are in a position where they might be a\u001Bected by misperception. This done bycomparing the environment of that segment to the env attribute of the Misperception. Ifthey match, then PyILM \u0010rolls the dice\u0011, so to speak, and there is some probability, basedon the Misperception's p attribute, that change happens. The pseudo code for this is givenbelow.Algorithm 2.2 Misperception function1 for sound in utte rance :2 e = Simulat ion . get_environment ( sound , ut t e rance )3 for mis in Simulat ion . mi spe rcept i ons :4 i f Simulat ion . check_for_misperception (mis , e ) :5 #i f a mispercep t ion i s a p p l i c a b l e in t h i s environment6 i f set (mis . f i l t e r ) . i s s u b s e t ( set ( sound . f e a t u r e s ) ) :7 #and i f i t i s a p p l i c a b l e to t h i s sound8 i f random . random ( ) <= mis . p :9 #and i f i t a p p l i e s on t h i s occas ion10 sound . f e a t u r e s [ mis . t a r g e t ]+=mis . s a l i e n c e11 #change the f e a t u r e va l u e s o f the sound472.2.10.8 A note on misperception de\u001CnitionsThe way that misperceptions are de\u001Cned can a\u001Bect the outcome of a simulation. Misper-ceptions target phonological features. What this means is that when a misperception hashad its full e\u001Bect, and a sound has switched categories, then the misperception will stopapplying. For example, suppose that a simulation has a word-\u001Cnal devoicing misperceptionthat targets sounds marked [+voice, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont], and suppose further that /b/ appearsword-\u001Cnally in the initial generation. Eventually a word like /ab/ will become /ap/. Atthis point, the devoicing misperception no longer applies to the \u001Cnal consonant, because ithas become [\u00E2\u0088\u0092voice] and the misperception targets only [+voice] sounds.This is a design choice for PyILM, and it is not a claim about the way that soundchange operates. It is certainly not the case that the phonetic e\u001Bects underlying soundchange suddenly stop occurring just because of the way that some people have organizedtheir mental grammar. The idea in PyILM is that after a sound has changed categories(e.g. from [+voice] to [\u00E2\u0088\u0092voice]), then it is irrelevant if any further phonetic e\u001Bects occur.If a speaking agent has recategorized /b/ as /p/ then it does not matter if \u001Cnal devoicingapplies to /p/ any more, since it is already voiceless.If, on the other hand, the \u001Cnal devoicing misperception had been designed to target only[\u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] sounds, without reference to [voice], then even after the switch from /b/ to/p/ the misperception will continue to apply. This leads to a \u0010polarization\u0011 e\u001Bect, wherethe phonetic values for a sound in\u001Duenced by misperception will continue to get pushedto the extreme ends of a feature dimension. For example, tokens of a sound a\u001Bected bythe \u001Cnal devoicing misperception will eventually all have a [voice] value of 0. (This mayalso cause a further recategorization if the allow_unmark option is enabled; see section2.2.2.17.) A misperception that raises a feature value will likewise eventually push all tokenof a category to have a phonetic value of 1.For this dissertation, all misperceptions were designed in such a way as to avoid thepolarization e\u001Bect.2.3 AlgorithmsThis section details three kinds of algorithms used by agents in the simulation: learning,production, and invention.Information about an agent's phonological system is represented using an exemplarmodel (Johnson 2007, Pierrehumbert 2001). These are models of memory where learn-ers keep \u0010copies\u0011 of every experienced speech event. These copies are known as exem-plars. Exemplars are stored in a multidimensional space, and can in principle be storedat any level of detail. In PyILM, this space is a FeatureSpace object (see section 2.2.6),and the exemplars are stored at the level of phonetic features as Token objects (see sec-tion 2.2.8). The number of dimension in this space is equal to the number of features listedin Simulation.features_file.48Both the learning and production algorithms are in\u001Duenced by the exemplar model.The learning algorithm works by comparing input values to the exemplars in memory. Theproduction algorithm generates phonetic values from a distribution that is created basedon the exemplar space.This section on Algorithms is organized from the perspective of a newly created agentin the simulation. The \u001Crst thing an agent does is learn, followed by an update of theirlexicon and inventory, and \u001Cnally an agent reaches the production phase.2.3.1 Learning algorithmThere are two phases to the learning algorithm: parsing and updating. In the parsing phase,the learner assigns a category to each incoming sound. The results of categorization areused in the second phase to update the lexicon and inventory. After these steps have beenrun for each input word the simulation runs a phonological feature clustering algorithm.2.3.1.1 Parsing a WordThe goal of this phase of learning is to assign each Sound of the incoming word to aSegment category. This is done by comparing the phonetic similarity of the input with allthe previous inputs that are stored in memory. If the input is su\u001Eciently similar to anysegment in the learner's inventory, then it is assigned to that category. Otherwise, a newcategory is created. The overall learning process for a word is described in Algorithm 2.3.Learning starts with the input of a Word object (see section 2.2.3) consisting of Sounds(see section 2.2.7). Sounds have a features attribute, although this actually consists ofToken objects (see section 2.2.8), not Feature objects. Tokens have phonetic values, whichare real numbers in [0,1]. For each phonetic dimension, the new token is \u001Crst stored into theexemplar space. Then it is compared to every other token in the space, and an activationvalue is returned for each such comparison.The activation function referenced on line 5 is based on Pierrehumbert (2001). It isdescribed in Algorithm 2.4. Each of the existing categories is assigned an \u0010activation\u0011value, with higher activation values representing greater phonetic similarity. Activation ofan exemplar is measured as e raised to the power of the negative di\u001Berence between theinput token and the exemplar. Activation of a segment category is the sum of the activationof its exemplars.Agents have a threshold for similarity, which is controlled by a simulation parametercalled minimum_activation_level (see section 2.2.2.14). It is a number in [0,1] that rep-resents the degree to which a segment category must be activated in order for agents toconsider an input token to be a member of that category. A value of 0.8, for example,means that in order for a input token to be considered a member of an existing segmentcategory, the averaged activation of all exemplars for that category must be 80% of themaximum possible activation.49Algorithm 2.3 Learning algorithm1 def l e a r n i ng ( input_word ) :2 best_matches = l i s t ( )3 for sound in input_word :4 act ivat ion_matr ix = dict ( )5 for token in sound . f e a t u r e s :6 a c t i v a t i o n s = ca l c u l a t e_ac t i v a t i on s ( agent . inventory ,token )7 for seg , va lue in a c t i v a t i o n s . i tems ( ) :8 act ivat ion_matr ix [ seg ] . append ( value )9 #act i va t ion_matr i x [ seg ] [ j ] e qua l s10 #how much seg i s a c t i v a t e d11 #on the j t h f e a t u r e12 for seg in act ivat ion_matr ix :13 t o t a l = sum( act ivat ion_matr ix [ seg ] )14 act ivat ion_matr ix [ seg ] . append ( t o t a l )15 act ivat ion_matr ix . s o r t ( key=lambda x : x [\u00E2\u0088\u00921])16 best_matches . append ( act ivat ion_matr ix [\u00E2\u0088\u00921])17 #best_matches [ j ] e qua l s18 #the ca tegory wi th the h i g h e s t a c t i v a t i o n19 #for the j t h p o s i t i o n in the word20 category = None21 new_word = Word( )22 for seg , a c t i v a t i o n in best_matches :23 i f a c t i v a t i o n >= 0 :24 category = agent . inventory [ seg ]25 else :26 category = create_new_category ( seg )27 new_word . s t r i n g . append ( category )28 agent . update_feature_space ( category )2930 agent . update_inventory (new_word)31 agent . update_lexicon (new_word)50Algorithm 2.4 Activation function1 def c a l cu l a t e_ac t i v a t i on ( input_token ) :23 ac tua l_ac t iva t i on = sum(math . e**(\u00E2\u0088\u00921*( input_token \u00E2\u0088\u0092 exemplar ) )for exemplar in feature_dimension )45 min_activat ion = sum(math . e**\u00E2\u0088\u0092(1\u00E2\u0088\u0092Simulat ion .minimum_activation_level ) for exemplar infeature_dimension )67 d i s t ance = sc ipy . i n t e g r a t e . quad (lambda x : math . e**\u00E2\u0088\u0092x ,min_activation , ac tua l_ac t i va t i on )89 return d i s t anceThe activation function uses this minimum_activation_level parameter to calculatethe speci\u001Cc minimum activation level for the given feature dimension and segment cate-gory. Then it calculates the di\u001Berence between the actual activation and the minimum bytreating these values as points on the curve y 5 e\u00E2\u0088\u0092x and calculating their distance. If thisdistance is greater than or equal to 0 the input token meets the similarity threshold forthis segment category (at least on this feature dimension) and might be considered as apotential match. If the distance is a negative number, then the actual activation level islower than the minimum and the input token is not similar enough to this segment categoryon this dimension.These distances are returned to the main algorithm, and they are summed and addedto the activation matrix. Then the distances on each phonetic dimension are summed, andif any of these total distances is greater than or equal to 0, then the input token is assignedto the category with the highest value. Otherwise a new segment category is created.After learning, another algorithm searches for any \u0010spurious\u0011 categories that might havebeen created. A spurious category is one where the interval of exemplar values representingthe category are a sub-interval of some other category, along every dimension.Spurious categories crop up early in the learning phase when the examplar space is stillsparsely populated, and they do not occur in every learning phase. To illustrate this, con-sider the following hypothetical simulation where the speaker's inventory has two fricatives/s/ and /z/. For simplicity, assume there are only three features: [nasal, continuant, voice].The speaker produces an example of /s/, which has values [0.01, 0.9, 0.1], i.e. the soundhas low nasality, high continuancy, and low voicing. This is the \u001Crst sound the learner hasheard, and it is assigned to the category labeled /s/, which matches the category in thespeaker's inventory (although the learner does not know this, of course).51Then the speaker produces an example of /z/, with values [0.02, 0.8, 0.6]. This isnearly identical to /s/ on the nasality and continuancy dimensions, but di\u001Bers quite a loton the voicing dimension. Assume that in this case this di\u001Berence is su\u001Ecient for thelearner to decide that this sound is not an example of /s/. Since /s/ is the only categorythe leaner knows yet, a new category has to be created for this new sound, and it islabeled /z/ (whether the learner actually does make a new category depends on the valueof minimal_activation_level, see section 2.2.2.14).Next, the speaker produces another example of /s/, this time with values [0.01, 0.85,0.35]. This sound is similar to both /s/ and /z/ on the nasality dimension, and relativelyclose to both on the continuancy dimension. On the voicing dimension, the new sound isquite distant from both /s/ and /z/. Assume the learning algorithm considers this soundto be close to neither /s/ nor /z/, and assigns to its own category, labeled /Z/ (again, in asimulation this would depend on minimal_activation_level). This category will becomethe spurious one. By the end of the learning phase, the range of values that the learnerassociates with /Z/ are going to be a indistinguishable from those associated with /s/, sinceboth of these sets of values were drawn from the same underlying distribution, namely theone the speaker associates with /s/.As learning progresses, the learning agent hears more and more examples of the frica-tives, and the exemplar space begins to \u001Cll up. For the purposes of this example, supposethat by the end of learning, there are exemplars of /s/ with nasality values ranging from 0to 0.03, continuancy values ranging from 0.8 to 1.0, and voicing values ranging from 0.05 to0.4. If the exemplar(s) associated with the (spurious) category /Z/ were to be fed back intothe learning algorithm at the end of the learning phase, they would surely be categorizedas /s/To check for spurious categories, an algorithm does a pairwise comparison of everysegment in the inventory. For each pair of sounds A and B, it checks if the minimumexemplar value of sound A is greater than or equal to the minimum value sound B, andalso if the maximum value of sound A is less than or equal to the maximum value of soundB, across every feature dimension. If both conditions are true, on every dimension, thensound A is considered to be spurious. In this case, all exemplars labeled A are relabeled B,and A is removed from the inventory.2.3.1.2 Creating new segment categoriesWhen an input Sound has phonetic values that are too di\u001Berent from any known category,a new Segment object is created. Its Token values are analyzed, and phonological featuresare assigned, as described in section 2.3.3. A symbol is then chosen for the segment basedon these phonological feature values.The symbol is chosen in a fairly simplistic way. The program consults the possiblesegments in the list provided to Simulation.feature_file (see section 2.2.2.9), and assignsa score to each of them by comparing distinctive feature values. A symbol scores 1 point52per feature match. The highest ranked symbols are put in a set and the rest discarded.This remaining set is further \u001Cltered to remove any symbols that are already in use inthe inventory. A symbol is randomly chosen from the remaining set members. A randomselection of this sort is safe, since the symbol has no e\u001Bect on the outcome of the simulation,and exists purely to increase the readability of the output.2.3.2 Updates2.3.2.1 The lexiconOnce the input word has now been transformed into a list of Segment objects, the learnercan add it into the lexicon. If a word with this meaning has never been encountered before,the agent creates a new entry for this meaning in her lexicon, and adds the input word witha frequency of 1.If this meaning has been encountered before, the agent checks to see if this particularpronunciation is known, i.e. checks to see if there is a match between the input Word'sstring attribute and the string attribute of any Word already in the lexicon. If so, thefrequency of that Word is increased by 1, if not the input Word is added to the list ofpossible pronunciations with a frequency of 1.2.3.2.2 The inventoryIn the \u001Cnal phase of learning, the inventory is updated. This may involve one of two things.If the input word contained phonetic values such that a new segment was created, then theinventory needs to have that segment added. Even if the input word matched entirely toknown segments, the speci\u001Cc values associated with each of those segments must now beupdated.2.3.3 Determining phonological feature valuesPhonological categories are determined using a k-means algorithm that clusters exemplarvalues along each feature dimension.The algorithm begins by creating k points in the space representing the objects that arebeing clustered. These initial points are called \u0010centroids\u0011 and they represent a potentialcenter of a cluster. Then every point in the data is added to the cluster with the closestcentroid. After all the data has been classi\u001Ced this way, new centroids are chosen bycalculating an average point for each of the existing clusters. The data is then reclassi\u001Cedby clustering it based on the new centroids. This process of averaging and reclassifying isrepeated until the point where the algorithm chooses the same centroids two loops in a row.In the case of the simulation, the clustering function takes two arguments: feature,which is the name of the feature dimension to cluster, and k, which is the number of clusters.By default k=2, since phonological features are typically modeled as binary.53Algorithm 2.5 K-means algorithm1 def kmeans ( f ea ture , k=2) :2 o ld_centro ids = agent . l ea rned_cent ro ids3 new_centroids = [ random . uniform ( . 1 , . 2 5 ) , random . uniform( . 7 5 , . 9 ) ]4 while not o ld_centro ids == new_centroids :5 c l u s t e r s = dict ( )6 tokens = agent . f eature_space [ f e a t u r e ]7 for token in tokens :8 c l o s e s t = 9999 for c in new_centroids :10 i f abs ( token . value\u00E2\u0088\u0092c ) < abs ( token . value\u00E2\u0088\u0092c l o s e s t ) :11 c l o s e s t = c12 c l u s t e r s [ c l o s e s t ] . append ( token . va lue )13 o ld_centro ids = new_centroids14 new_centroids = l i s t ( )15 for k in c l u s t e r s :16 new_centroids . append (sum( c l u s t e r s [ k ] ) / len (c l u s t e r s [ k ] ) )17 agent . l ea rned_cent ro ids = o ld_centro ids18 return c l u s t e r s54On Line 4 the algorithm chooses two centroids that are relatively far apart from eachother, which is typical for the initial choice of centroids. Then the main loop is enteredon Line 5. The loop exits when the choice of centroids doesn't change across loops. Thenfrom Lines 6-12, the tokens for a given feature dimension are selected, and for each token,it is compared to the most recently selected set of centroids, called new_centroids. On the\u001Crst loop these are random choices, on further loops they are calculated averages.The clusters dictionary assignment on Line 13 creates a mapping from centroid valuesto a list of Tokens assigned to that centroids cluster. Then the new_centroids values aresaved into old_centroids and the new_centroids is emptied out (Lines 13-15). Finally,on Lines 16 and 17, new_centroids is \u001Clled with new centroid values calculated as theaverage of the Tokens in the clusters dictionary.The loop then returns to the beginning. If the most recent run of Line 17 calculatedthe same average values as the last time Line 17 was run, then the loop breaks and theprogram jumps to Line 18 where agent saves the values from new_centroids and thefunction returns the new cluster centroids.There are two possible phonological feature values: + and \u00E2\u0088\u0092. After the k-means clus-tering is done, the cluster with the higher centroid is designated the [+feature] cluster, andthe one with the lower centroid is designated the [\u00E2\u0088\u0092feature] cluster. If all tokens fall into asingle cluster, then a feature value is chosen based on the values of the tokens. If most ofthe tokens have a value above .5, then [+feature] is assigned to the entire cluster, otherwise[\u00E2\u0088\u0092feature] is assigned. If the allow_unmarked option is set to True (see section 2.2.2.17),and there is a category where every token value is 0, then [nfeature] is assigned.2.3.4 Production algorithmThe production algorithm selects a Word from an agent's lexicon to produce, and transformseach of the Segments in the Word into a Sound. While Segments in an Agent's lexicon aremade up of phonological features, Sounds are made up of phonetic Tokens which havereal-valued features. There are three steps in production described here.2.3.4.1 InitializationThis step occurs at the end of the main simulation loop, just after a learner has been\u0010promoted\u0011 to speaker (see 2.1). The new speaker looks through their inventory, and foreach segment it estimates a distribution of phonetic values along each dimension. Agentsassume the distribution is Gaussian. Pseudo code is given below. This code runs once foreach segment in an agent's inventory. During testing, it was found that the distributionswere better estimated using distance from the median, rather than from the mean. TheGaussian distribution is implemented using the normalvariate function of Python's builtin random module.55Algorithm 2.6 Distribution estimation1 def es t imate ( segment ) :2 for f e a t u r e in segment . f e a t u r e s :3 c loud = [ token . va lue for token in agent . f eature_space [f e a tu r e ] i f token . l a b e l == segment . symbol ]4 median = agent . calculate_median ( c loud )5 mad = agent . calculate_median ( [ abs ( value\u00E2\u0088\u0092median ) for valuein c loud ] )6 agent . d i s t r i b u t i o n s [ segment ] [ f e a t u r e ] = random .normalvar iate (median , mad)2.3.4.2 Step 1: Word selectionProduction begins with a decision: select a word from the lexicon or invent a new word. Theprobability with which a new word is invented is given by the simulation's invention_rateattribute (see section 2.2.2.11). If a new word is required, the speaker uses the inventionalgorithm described in section 2.3.5 to create one. Otherwise, one is chosen from the lexicon.Rather than choosing a word directly, agents actually \u001Crst select a meaning, then choosewhich word to produce for that meaning. Each meaning in the lexicon is associated with alist of Words, each stored alongside a raw count of how many times it appeared in the input.The production algorithm chooses a Word with probability proportional to it is frequency.2.3.4.3 Step 2: Transforming Segments into SoundsThe word selected by the \u001Crst step consists of Segments (see section 2.2.4), but these arenot the objects transmitted to the learner. In the second step of the production algorithm,Segments are transformed into di\u001Berent objects known as Sounds (see section 2.2.7), whichrepresent an instance of a segment being pronounced. Agents pass through each featureof each segment in the word. For each feature, agents sample a value from the appropri-ate probability distribution for that segment. Pseudo-code for this algorithm is shown inAlgorithm 2.7.56Algorithm 2.7 Production algorithm1 def produce ( l ex i ca l_ i t em ) :2 ut te rance = Word( l i s t ( ) , l ex i ca l_ i t em . meaning )3 for segment in l ex i ca l_ i t em :4 sound = Sound ( )5 for f e a t u r e in segment . f e a t u r e s :6 phonetic_value =7 agent . d i s t r i b u t i o n s [ segment ] [ f e a t u r e ] . sample ( )8 sound . f e a t u r e s [ f e a t u r e ] = phonet i c_feature9 ut te rance . append ( sound )10 return utte ranceThe utterance returned at the end of the algorithm represents what the speaker intendsto produce for the listener. It is not necessarily what the listener hears. This utterance issubsequently sent through a misperception algorithm (see section 2.2.10) which may changethe utterance in some way.2.3.5 Invention algorithmThe invention algorithm serves two purposes. It is used to generate a lexicon for the initialgeneration of the simulation, and it is used by agents at later generations if they choose tocreate a new word. The words constructed by this algorithm always conform to the existingsyllable shapes of the language. The following pseudo-code outlines the algorithm.Algorithm 2.8 Invention algorithm1 def invent ( agent , phonotac t i c s ) :2 word = Word( )3 sy l_length=random . rand int ( S imulat ion . min_word_length ,4 Simulat ion . max_word_length )5 for j in range ( sy l_length ) :6 syl_type = random . cho i c e ( S imulat ion . phonotac t i c s )7 for x in syl_type :8 i f x == `V ' :9 seg=random . cho i c e ( agent . inventory . vowels )10 e l i f x == `C ' :11 seg = random . cho i c e ( agent . inventory . cons )12 word . s t r i n g . append ( seg )13 return word57Line 4 creates a new \u0010empty\u0011 word object (see section 2.2.3). Line 5 randomly determinesthe length of the word in syllables (see section 2.2.2.6 and 2.2.2.7).The loop that begins on line 8 will run once for each syllable in the word. Line 10 selectssome possible syllable for a given phonotactics. For example, if (C)V(C) was supplied tothe algorithm, then it selects randomly from the set {CVC, CV, VC, V}.The loop that begins on line 9 runs through each segment in the syllable type chosen.For each C or V \u0010slot\u0011, PyILM randomly selects a segment of the appropriate type. Oncethe entire word has been constructed, it is assigned a new meaning and then stored in thelexicon.The invention algorithm does not check to see if there is an existing word with the samesegmental material as the new one. In other words, it is possible for homophones to appearin the language. However, this has no e\u001Bect on production or learning of these words, so itis basically irrelevant to the outcome of the simulation.2.4 Using PyILM2.4.1 Obtaining PyILMThe source code for PyILM is available for download fromhttps://www.github.com/jsmackie/PyILM. It is recommended to run PyILM usingPython 3.4. There are also some 3rd party libraries needed: Numpy and SciPy arenecessary to run the basic PyILM code, and the Visualizer requires Matplotlib and PIL(Python Image Library). All of these can be obtained from the Python Package Index athttps://pypi.python.org.2.4.2 Con\u001Cguration \u001ClesPyILM simulations require a con\u001Cguration \u001Cle. These \u001Cles should be saved into a foldercalled \u0010con\u001Cg\u0011, which must be a subfolder of the main PyILM directory. A con\u001Cguration\u001Cle is a text \u001Cle which must conform to a particular structure, described below, and its \u001Cleextension must be .ini. Con\u001Cguration \u001Cles are broken up into sections, each indicated by aheader in square brackets. Each line in a section may contain a parameter name followed byan equals sign followed by a value. (This is the standard INI \u001Cle format used on Windows.)An example is given in Figure 2.4, with some discussion following.There are four section headers recognized by PyILM: [simulation], [misperceptions],[inventory], and [lexicon]. The order of the parameters in a section is not important. The[simulation] section is mandatory. The parameter names which can be used are listed inSection 2.2.2. Any parameter that is not mentioned in the con\u001Cguration \u001Cle will be givena default value. These defaults are likewise described in Section 2.2.2.The [misperception] section is also mandatory. Each line in this section can include thename of a misperception as a parameter (any name is allowed, and spaces are permitted),58[simulation]initial_lexicon_size=30generations=30phonotactics=CCVCCinvention_rate=0.05minimum_repetitions=2min_word_length=1max_word_length=3[misperceptions]#\"misperceptions\"stop lenition=-voc,-cont,-son;cont;.5;+voc_+voc;.25nasalization=-cont,-son,-nasal,-voc;nasal;.5;_+nasal,+son,-voc;.25initial fortition=-voc,+cont,-nasal;cont;-.5;#_;.25stop aspiration=-voc,-son,-voice,-cont;hisubglpr;.5;_+voc,+high;.25obstruent glottalization=-voc,-son,-cont,-voice;mvglotcl;.5;_-voc,+glotcl,-mvglotcl;.25#\"biases\"ejectives are marked=-voc,-son,-cont,+glot_cl,+mvglotcl;mvglotcl;-.1;*;.5retro\u001Dex is marked=-ant,-distr,-cont,+cor,-son,-voc;ant;.1;*;.5[inventory]start=p,t,k,b,d,g,m,n,f,s,z,a,i,u#start=10,3[lexicon]words=kapa,mufu,tiki,matk,bziafmFigure 2.4: Example con\u001Cguration \u001Cle59and the remainder of the misperception's details follow the equal sign. See section 2.2.10for more information on how to structure a misperception.The [inventory] section is optional, and only allows the single parameter name start,which takes the same possible values as the [simulation] parameter initial_inventory (seesection 2.2.2.4). The [inventory] section exists to make it conceptually easier to manage theinventory separately from the other simulation parameters, and because future versions ofPyILM are anticipated to have more possible parameters in this section.The [lexicon] section is also optional, and can take the single parameter words, whichis the same thing as using the initial_words parameter in the [simulation] section (seesection 2.2.2.16).The [simulation] and [misperception] sections should come \u001Crst in a con\u001Cguration \u001Cle.The [inventory] and [lexicon] sections, if present, should come at the end.If a line in the \u001Cle begins with either the symbol \u0010#\u0011 or \u0010;\u0011 then PyILM will ignore theentire line. This can allow users to include comments to themselves about parameters. Italso provides a convenient way of \u001Dipping parameter values between simulations withoutkeeping multiple copies of a con\u001Cguration \u001Cle with minor di\u001Berences. The use of the #symbol is demonstrated in Figure 2.4 where the words \u0010misperceptions\u0011 and \u0010biases\u0011 areincluded as comments, and there is an alternative possible starting inventory. If thesesymbols are encountered in the middle of a line, they are treated normally, which is whatallows misperceptions to make use of both symbols without any problems.2.4.3 Running a simulationThere are two ways to run simulations. From a command line, navigate to the PyILMdirectory and then typepython pyilm.py filenamewhere filename is the name of your con\u001Cguration \u001Cle. If no \u001Clename is provided, thenall the defaults are used.To run a simulation from within another python script, use the following code, replacingthe string \u0010con\u001Cg.txt\u0011 with the name of the appropriate con\u001Cguration \u001Cle.import pyilmsim = pyilm.Simulation('config.txt')sim.main()There is also a secondary program that can be downloaded for running multiple simu-lations, called pyilm_batch.py. To run a batch of N simulations, type the following in acommand linepython pyilm_batch.py filename Nwhere filename is again the name of a con\u001Cguration \u001Cle. Supply the string None forthe \u001Clename to use all defaults. For example, to run a batch of 25 simulations from withina Python script, use the following code:60import pyilm_batchbatch = pyilm_batch.Batch('config.txt', 25)batch.run()When running in batch mode, the user-supplied value for the random seed is ignored,and a di\u001Berent random seed is generated for each simulation, while keeping all other con-\u001Cguration details the same.2.4.4 Viewing resultsAfter running the \u001Crst simulation, PyILM will create a new a folder called \u0010SimulationResults\u0011, which will be placed in the same folder as pyilm.py. Each simulation is givenits own subfolder inside of the Simulation Results folder. These subfolders are named\u0010Simulation output (X)\u0011, where X is a number automatically assigned by PyILM.This output folder contains a copy of the con\u001Cguration \u001Cle, as well as \u001Cles detailing thestate of the simulation at the end of each generation. Information about the exemplar spaceis written to \u001Cles with the name feature_distributionsX.txt where X is the generationnumber. Information about the inventory and lexicon is written to \u001Cles with the nametemp_outputX.txt, with X again standing in for a generation number. It should be notedthat PyILM starts counting at 0, not 1. Generation 0 is the initial generation seeded withinformation from the con\u001Cguration \u001Cle. Generation 1 is the \u001Crst generation to learn fromthe output of another agent. A side-e\u001Bect of this is that the \u001Crst simulation you run willbe in the folder \u0010Simulation output (0)\u0011.The output \u001Cles can be opened and inspected, but they are not formatted to be human-readable. They are intended for use with the PyILM Visualizer, which is an independentprogram that displays the information in a graphical interface. As such, it is not recom-mended that you change any of the names of the \u001Cles, or alter any of the contents, becausethis can cause unusual behaviour in the Visualizer.The Visualizer can be opened by double-clicking the \u001Cle visualizer.py which comeswith PyILM. It will be located in the same folder as the main PyILM program. Whenthe program launches, select the Data menu, then input the simulation and generationnumber that you wish to see. Blank lines are interpreted as the number `0'. From there,it is possible to navigate between simulations and generations using the \u0010Forward\u0011 and\u0010Backward\u0011 buttons on the top right, or by returning to the Data menu.Each generation shows the segment inventory as a table of buttons. Clicking on a buttonbrings up more details about that segment, including its distribution in the lexicon, phoneticand phonological properties. More information about the simulation can be viewed underthe Synchrony and Diachrony menus. Synchrony options include anything speci\u001Cc to thegeneration currently displayed, such as the lexicon. Diachrony options include the ability toplot changes over time. Misperceptions, which do not change, are listed under Synchrony.61Figure 2.5: Screen shot of PyILM Visualizer2.5 Other notes2.5.1 LimitationsPyILM cannot do everything. The program is designed largely to explore the long-termconsequences of misperception-based sound change for segment inventories. There are aseveral other ways in which sound systems can change over time that are not modeled.2.5.1.1 No social contactOne of the limitations of PyILM is that there is only ever a single speaker and a singlelistener, so sound changes that rely on contact between speakers of di\u001Berent languages isnot possible.Human cultures speaking di\u001Berent language often live nearby and interact with eachother. This often leads to languages borrowing words or morphemes from the other lan-guage. Occasionally, entire paradigms are borrowed. This can lead to changes in a soundsystem if the borrowed items contain phonemes that are not part of the borrowing language.For example, click consonants have entered into some Bantu language through borrowing(G\u00C3\u00BCldemann and Stoneking 2008). There is no guarantee of this occurring, of course, so62it is also quite common for languages to change the sounds of loanwords so that they \u001Ctnative patterns (Peperkamp 2004).The focus of the dissertation is on how phonetic e\u001Bects in\u001Duence the evolution of soundinventories, so no borrowing is simulated. It would be possible, however, to implementa simple form of borrowing in PyILM with a few additions. At arbitrary points in asimulation, generate new words that contain one or more sounds not \u001Cguring already inthe simulation, and add them to the speaking agent's lexicon. Loanword adaptation canbe simulated by running these words through a speaking agent's learning algorithm to seewhich categories any novel sounds might be assigned to.2.5.1.2 No deletion or epenthesisChanges are also limited to those that a\u001Bect feature values. Deletion and epenthesis donot occur. The main reason for excluding these changes is because they can change thephonotactics of a language, and phonotactics will play a relevant role in the simulationsreported later in this dissertation.In fact, deletion is technically possible in PyILM, but simply not implemented for anysimulations that I report for the dissertation. Epenthesis is considerably harder to imple-ment, and is currently not possible.Suppose that we want to implement an epenthesis rule that inserts a vowel between twonon-continuants. The e\u001Bect of the epenthesis rule should be that a phonetic vowel appears;there is no underlying vowel in the lexical item that corresponds to the epenthesized vowel.Suppose it is a mid-central schwa-like vowel. Because it is a phonetic epenthesis, we cannotsimply use a schwa symbol - it must be represented by a column of numbers. How do wegenerate these numbers?There are three options for this. One is to generate numbers for a mid-central vowelbased on the speaker's exemplar space. This is easy if the speaker happens to have such avowel already in their inventory. If there is no such vowel, then it is di\u001Ecult to come upwith a general solution for which other vowel would be the \u0010closest\u0011, since any arbitraryvowel system is possible in PyILM. In any case, whatever vowel is chosen will not be amid-central vowel, so it will not correspond to the description of the epenthesis rule. Thismakes the behaviour of the simulation unpredictable from a user's perspective, and is nota good design choice.The second option of generating numbers using the listener's exemplar space has thesame problems. It is further complicated by the problem that their exemplar space continuesto change throughout the learning phase so the type of epenthesized vowel would, again,vary unpredictably.The third option is to include in a PyILM a generic vowel \u0010generator\u0011 that can be usedto epenthesize a vowel of a predictable quality in every case. This option feels extremelyarti\u001Ccial compared to the \u001Crst two, where at least there was some semblance of changesbeing related to either articulation or perception. On the other hand, it does make it easier63to follow the changes that occur over the course of the simulation.2.5.1.3 No morphology or syntaxWords always convey a single meaning, and agents never utter more than one word at atime, so there is e\u001Bectively no morphology or syntax in the simulation. Since some soundchange might emerge from interactions at word or morpheme boundaries, this limitationdoes prevent modeling certain kinds of change. However, the changes that occur at amorpheme boundary are essentially of the same type as change that might occur withina morpheme. The root cause of the change is still a phonetic interaction of two adjacentsounds.2.5.1.4 No long distance changesMisperceptions that occur in PyILM can only target adjacent sounds. It is not possible tosimulate the emergence of any types of harmony patterns, for example. Although consonantharmony is rare, it does exist, and plausible historical routes for its development have beenproposed (e.g. Hansson 2007). However, the types of consonants that emerge from long-distance changes are a subset of those that might emerge from local changes. Since the goalof this dissertation is to understand how inventories change over time, there is no particulargain to be made by including long-distance changes.2.5.2 Running timeThe running time of a simulation is determined by a number of di\u001Berent factors.The most important are the lexicon_size and min_repetition parameters. Together,they determine the total number of words that a speaking agent will produce in a givengeneration. If there is no maximum lexicon size, and invention_rate > 0, then runningtime can increase for each generation if new words are added to the lexicon.Another factor is word length, since PyILM has to check every segment in each wordfor possible misperceptions, and the learner has to analyze each segment. The parameterscontrolling word length are, of course, min_word_length and max_word_length. Phono-tactics also plays a role here too, since the average word length in a CV language is goingto be shorter than the average word in a CCVCC language, other things being equal.The number of misperceptions seems to have no signi\u001Ccant e\u001Bect on total running time.Checking if a misperception applies is trivial and, in most cases, nothing happens. Thenumber of contexts where misperceptions apply is much smaller than the total number ofcontexts in the entire lexicon. When a misperception does apply, the operation is, again,trivial since changing phonetic values consists of adding two numbers together, followed bya check to ensure no phonetic value goes below 0 or above 1.A single generation of a CV language with 30 initial words and a maximum lexiconsize of 30 takes less than a second. Setting the phonotactics to CCVCC increases the time64signi\u001Ccantly, and a single generation may take 10 seconds. The recording phase, wherePyILM generates an output \u001Cle for use with the visualizer, also contributes to runningtime. The length of time it takes for a generation to be recorded depends on the numberof changes that occurred in the simulation.Another factor is the time taken in labeling segments for human readability. Duringthe simulation, segments are simply numbered, rather than being assigned IPA symbols.This is because there is no way to know which symbol will be appropriate until the end ofthe learning phase, when the phonological feature values are assigned. Searching the list ofall possible symbols, and comparing feature values to see which would be best, can be timeconsuming. PyILM looks for a short-cut by comparing against the previous generation, andwhere sounds have not changed feature values it simply re-uses the old symbol.65Chapter 3Sample simulations3.1 IntroductionIn this chapter I will give some examples of simulation output, using relatively simpleparameter settings. This will help to clarify how the various parameters contribute theoutcome of a simulation, and how various historical changes can be simulated.Con\u001Cguration \u001Cles will be presented throughout this chapter. They are presented astables, rather than being formatted as actual con\u001Cguration \u001Cles, for purposes of readability.Similarly, parameter names in these tables have been somewhat changed to employ regulartypeface and formatting conventions, e.g. max_lexicon_size is written here as Max lexiconsize. Not all simulation parameters are indicated, due to the large number of parameters.Each simulation is presented to illustrate a point, and only parameters relevant to the topicunder discussion are indicated. The features \u001Cle used is the default one, which is based onthe Sound Pattern of English (Chomsky and Halle 1968) feature speci\u001Ccations available inP-base (Mielke 2008).3.2 Simulation 1 - A single abrupt changeThis \u001Crst simulation is very simple. The con\u001Cguration \u001Cle for the simulation is show inTable 3.1.The initial inventory is obviously not a natural inventory, but by keeping it arti\u001Cciallysmall, it is easier to understand what is happening. There is only a single misperceptionthat can occur, \u001Cnal devoicing, and there is intentionally a single voiced stop /b/, so thatonly one sound is susceptible to change in the simulation.The misperception is shown in Table 3.1. The \u0010target\u0011 column shows the features thata sound must have in order for the misperception to apply. The \u0010feature\u0011 column showsthe feature which changes if the misperception applies. The \u0010salience\u0011 column shows thedirection and magnitude of a change. The \u0010environment\u0011 column shows the context where66Table 3.1: Con\u001Cguration for Simulation 1a sound must occur in order for the misperception to apply. Finally, the \u0010probability\u0011column shows, of course, the probability that the misperception applies. The salience ofthe change in this case, .5, makes it very likely that a learner will assign tokens a\u001Bected bythe misperception to a di\u001Berent category than those not a\u001Bected. In other words, it makesit likely that sound change will happen.Phonotactics and word length are tightly regulated so that all words will have VC orV shape. This is an extremely unnatural pattern not found in human languages, so this isfor the purposes of illustration (although see Breen and Pensal\u001Cni (1999) for an argumentthat Arrernte is a language without onsets). The phonotactic settings ensure that /b/ willoccur only in \u001Cnal position, which will in turn guarantee that the misperception occurs atsome point. The lexicon of the initial generation will be limited and repetitive, consistingof 30 random draws from the set {iq, is, ib, i}. Figure 3.2 shows how the inventory of thelanguage changes over the course of the simulation. Segments shown in parentheses areallophones (the precise meaning of \u0010allophone\u0011 in the context of PyILM will be describedbelow).67Table 3.2: Comparison of inventories in Simulation 1Note that all simulations start with a generation 0, which is the initial generation thatis seeded with the information from the con\u001Cguration \u001Cle. The inventory of generation 1 isthe \u001Crst that could potentially have undergone sound change. As Figure 3.2 shows, rightaway in generation 1 some of the tokens of /b/ have been misperceived as belonging to adi\u001Berent category /p/ and a voiceless stop has entered the language. Initially, this /p/ isjust a variant of /b/. Certain words in the lexicon are always pronounced as [ib], whileothers vary between [ib] and [ip]. Eventually, after enough generations of the simulation, afew words come to be pronounced as [ip] all of the time. At this point /p/ is no longer justa variant of /b/, and is now a full member of the inventory of the language.PyILM keeps track of how sounds are changing in this respect. In the Visualizer the\u0010total\u0011 inventory is a count of the number of sounds that occur anywhere in the lexicon.The \u0010core\u0011 inventory is the set of sounds that all occur in at least one word where they donot vary with anything else. In generation 1 in Figure 3.2, the total count for the inventoryis given as 5 (four total consonants plus one vowel), while the core count is given as only4 (three core consonants plus one vowel), since at this point in the simulation /p/ occursonly as a variant of /b/. In generation 3, the core count rises to 5 as there are now somewords with a /p/ that does not vary with /b/. Sounds in the core inventory are also shownin the Visualizer with raised button backgrounds, while the variants are shown with sunkenbutton backgrounds.This is analogous to the distinction between \u0010phonemes\u0011 and \u0010allophones\u0011 in phonolog-ical theory: the core inventory is all the phonemes, and the total inventory includes boththe phonemes and all of their allophones. More speci\u001Ccally, a phoneme in PyILM is anysound that occurs in at least one word where it does not vary with any other sound. Anallophone is a sound that occurs uniquely as a variant of another sound. I will continueto use the terms phoneme and allophone throughout this chapter as convenient labels for68these types of simulated sound categories, but with the understanding this is not the usualsense of these terms.In particular, it is normal to de\u001Cne phonemes in terms of contrast: sounds that contrastwith each other (i.e. participate in minimal pairs) are assigned to di\u001Berent phoneme cate-gories, while sounds that do not contrast (either due to complementary distribution or freevariation) are analyzed as allophones of a single phoneme. Minimal pairs or overlappingdistribution are not necessary for phonemic status in PyILM.The initial lexicon of a simulation is generated to include minimal pairs, but it is notalways possible to ensure that every phoneme has a minimal pair with every other. Thisis because there also is a parameter controlling for the size of the initial lexicon, and thenumber of minimal pairs required for all sounds to have a pair can exceed the lexiconmaximum. In this very small example, there are in fact many minimal pairs in the initiallexicon because it only consists of the words {ib,in,iq,i}. Additionally, when /p/ enters thecore inventory, it immediately participates in a minimal pair with all the other consonants,making it more obviously a new phoneme in the language. With larger inventories, largerlexicons are required to get the full number of possible minimal pairs.Another common criterion for determining allophones is complementary distribution.This is usually balanced with a requirement that the allophone be phonetically similar tothe underlying phoneme category, since accidental complementary distribution can occur,e.g. in English [h] is only ever in initial position and [N] is in non-initial position, yet theseare not considered allophones of the same underlying category. Neither of these criteriaare considered for determining allophones in PyILM. This is largely due to the di\u001Eculty ofimplementing algorithms in PyILM that can accomplish this.It is not impossible - there do exist algorithms for estimating the probability that twosounds are allophones. For example Peperkamp et al. (2006) use the Kullback-Leiblermeasure of the di\u001Berence between probability distributions, and Hall (2009) uses entropy.In principle, such algorithms could be applied to the languages simulated by PyILM, butthere are some complications that make this di\u001Ecult. Speci\u001Ccally, these algorithms, or anyother similar ones, require strong assumptions about what counts as an \u0010environment\u0011 forthe purposes of complementary distribution.Environments can be de\u001Cned at any arbitrary level - which should be considered? Forexample, suppose sound A occurs in the environments {t_i, a_a, s_o}, and sound B occursin the environments {z_u, d_u, u_u}. Are these sounds in complementary distribution?If we consider just the segmental level, then the answer could be yes: Sound B only occursbefore /u/, and Sound A occurs elsewhere.If we think about features instead of whole sounds, then the situation becomes morecomplex. There are thousands of possible feature combinations to consider, depending onthe feature system in use. On one analysis, both sound A and sound B have the samedistribution: they can occur between vowels and they both follow coronal obstruents. Ona di\u001Berent analysis, Sound A occurs between low vowels and after voiceless obstruents,while Sound B occurs between high vowels, or perhaps round vowels, and it follows voiced69Figure 3.1: Change in inventory size for Simulation 1obstruents.In dealing with a natural language, a linguist can make use of general knowledge aboutsound patterns, information from elsewhere in the language or related dialects, and intu-itions about what constitutes a \u0010natural\u0011 pattern, in trying to determine allophonic varia-tion. For instance, if Sound B is labial(ized) and A is not, then an analysis of \u0010B occursbefore /u/, A occurs elsewhere\u0011 would be natural, since /u/ is also a round vowel. On theother hand, if A is voiceless and B is voiced, it might make more sense to refer to the factthat they only occur next to obstruents that match in voicing.In the simulated languages of PyILM an algorithm searching for complementary dis-tribution would have an enormous search space of all possible feature combinations toconsider, as well as the problem of determining whether it is the left or right hand side (orboth) of an environment that is most relevant. Therefore, as a way of avoiding some ofthese complications, I will make use of a much weaker de\u001Cnition of phoneme and allophone,where phonemic status is achieved by a sound when it exists in at least one word where itdoes not vary with another sound. Allophones are sounds that only exist as variants.Maintaining this conceptual distinction between phonemes and allophones is very usefulfor interpreting simulation results, in particular when it comes to questions of inventory size.Counts of inventory sizes of natural languages tend to be counts of phonemic inventories,so it is useful to do this in PyILM as well. Figure 3.1 shows change inventory size for thissimulation. The dotted line shows the core (phoneme) inventory, and the solid line showthe total inventory (phonemes and allophones).The \u001Cgure shows that the size of the total inventory rises immediately, since [p] appearsthrough misperception in the \u001Crst generation. However, it is not yet a member of the coreinventory, since it appears only as a variant of /b/. In generation 3, the size of the phonemeinventory rises as [p] has fully overtaken [b] somewhere in the lexicon. It now occurs in at70least one word where it does not vary with [b], though there are still many words where itremains a variant.Immediately in the next generation, the phoneme inventory size drops again. Lookingback at Table 3.2, this is because there has been a complete reversal in the language, and[b] has now become an allophone of /p/, that is, [b] only exists in words where it varies with[p]. This persists until Generation 5, and then [b] disappears completely. The consonantinventory for the remainder of the simulation is /p,q,n/.The reason that /b/ disappears entirely, rather than continuing to co-exist with /p/, isthat the phonotactics are restricted to VC syllables for the purposes of this simple illustra-tion. If the language allowed onsets, then any onset [b] would remain a /b/ forever, sinceno misperceptions target that environment. The length of time that a language spendsin the \u0010doublet\u0011 stage of having alternative pronunciations depends on the frequency withwhich misperceptions occur, and the frequency of the words containing segments subject tomisperception, which are parameters that will be analyzed in more detail throughout thischapter.This transition from /b/ to /p/ in PyILM, or any other change like it, is a simpli\u001Ccationof the real-world phenomenon of phonemicization, where phonetic e\u001Bects eventually resultin the appearance of a new phoneme in the inventory. Berm\u00C3\u00BAdez-Otero (2007) describesfour phases to this process. In the \u001Crst phase, a new sound is introduced through \u0010somephysical or physiological phenomenon\u0011 (Berm\u00C3\u00BAdez-Otero 2007, p. 7), and the languagegains a phonetic variant of an existing sound. In the second phase, this variation becomesmore categorical and what was once mostly a phonetic e\u001Bect becomes a conditioned phono-logical alternation. The third phase is called re-analysis, where the domain of applicationfor a phonological rule starts to change. It may become conditioned to a morphologicalenvironment, and lexical exceptions may appear. In the \u001Cnal phase, the original phoneticconditions become opaque, and the sounds become lexicalized, or the phonological rulebecomes a morphological one. PyILM does not simulate all four phases, but there are clearparallels: a sound emerges in one context through misperception, varies with another soundfor a period of time, then \u001Cnally lexicalizes (since there is no morphology in PyILM).3.3 Simulation 2 - A single gradual changeIn the example above, sound change occurred when a learner misperceived certain tokensof a devoiced /b/ to be di\u001Berent enough from the \u0010normally\u0011 voiced /b/, that a new cat-egory was assigned to these tokens. This new category existed alongside /b/ for a shortperiod, then eventually dominated the lexicon, replacing /b/ in every instance. This isrepresentative of scenarios in natural language where a sound \u001Crst enters a language asan allophone, then becomes a phoneme. It is not necessary that an allophone completelyreplace a phoneme in a simulation, but the phonotactics of Simulation 1 were so restrictivethat there was no other possible outcome. With more complex phonotactics, both /b/ and71Table 3.3: Comparison of inventories in Simulation 2/p/ would have been in the language at the end of the simulation.The appearance of /p/ or the disappearance of /b/ was abrupt, occurring suddenlyin some lexical items at some generations. It is also possible for a sound to change moregradually, by lowering the salience of a misperception, but increasing its frequency. This willhave the e\u001Bect of slowly pushing category boundaries in a particular direction, rather thangenerating a new category at any point. Eventually, the value of this phonetic dimensionfor a particular sound category will be considerably di\u001Berent from when the simulationstarted, and the phonetic properties of a sound category will have shifted far enough thatthe feature values will have \u001Dipped.For this simulation, the same con\u001Cguration \u001Cle was used as in the previous section, butwith two small changes. The simulation ran for 20 generations, instead of 10. There was achange to the misperception so that the devoicing misperception is twice as likely to occur,but its e\u001Bect is only half as strong.The same end-state inventory is achieved in both Simulation 1 and Simulation 2: thelanguage has /p/ but not /b/. The main di\u001Berence is that inventory size never changesin Simulation 2. The sound that is originally a /b/ has a voicing value that drops slowlyover time. In generation 2, it has fallen enough to be classed as a [\u00E2\u0088\u0092voice] sound, butit straddles a perceptual boundary so in generation 3, it bounces back up slightly to the[+voice] side. In generation 4 it drops down to [\u00E2\u0088\u0092voice], where it stays for the remainderof the simulation.3.4 Misperceptions and phonetic similarityHaving a high misperception salience means that learners are more likely to assign tokensa\u001Bected by misperception to a di\u001Berent segmental category than those not a\u001Bected bymisperception. If this is combined with a high probability of misperceptions occurring,then the inventory will undergo more abrupt, categorical changes, as in Simulation 1. Lowersalience values combined with higher probabilities of misperceptions leads to gradual soundchange, as in Simulation 2.72Misperception salience interacts with another parameter, called minimum_activation_level (see section 2.2.2.14). This parameter is used during the learning phase,and it acts like a threshold for phonetic similarity in sound categorization. It controls howsimilar a token must be to a given category in order for the learner to consider includingthat token in that category. If a learner hears a sound that fails to meet this threshold, thenthe sound will be assigned to a new category. This parameter must have a value between 0and 1. Setting it all the way to 1 means that input sounds must match existing categoriesexactly. This tends to lead to a proliferation of segment categories, since it is quite rarefor exemplar tokens to be exact matches. Setting it to 0 means that nothing is ever toodissimilar, and all input tokens after the \u001Crst will count as exemplars of whatever the \u001Crstwas categorized as.These extreme values lead to unusual results, with segment inventories that look nothinglike those of natural languages. More \u0010normal\u0011 looking inventories emerge with values inthe range of .5 to .7. Some results with di\u001Berent values are shown in Figure 3.2. Each ofthese simulations was run with the same con\u001Cguration \u001Cle.Figure 3.2: Results for various values of minimum_activiation_level.Figure 3.3 illustrates the interaction between misperception salience andminimum_activation_level. The \u001Cgure shows the results of using a di\u001Berent valueof minimum_activation_level, with each plot displaying change in inventory size for \u001Cvedi\u001Berent simulation runs, all using the same initial conditions, varying only the salience ofmisperceptions.73Figure 3.3: Varying misperception salience across three di\u001Berent values forminimum_activation_level. Misperception salience is shown in the legend. Simulation(a) uses a value of 0.2, Simulation (b) uses a value of 0.5 and Simulation (c) uses a valueof 1.0When the minimum_activation_level parameter is very low, as in Simulation (a), thesalience of misperceptions hardly matters. The learning algorithm collapses all the segmentsinto a single category. Even highly salient misperceptions cannot create segment categorieswith enough perceptual di\u001Berence from the one existing category.In Simulation (b), the minimum activation level is .5, so there is greater potentialfor misperception to create new categories. Growth in inventory size can be used as anindicator for when this occurs. Lower salience values produced smaller inventories whilegreater salience led to the creation of new categories quite quickly. Finally in Simulation(c), the high salience of misperceptions only speeds up growth in inventory size.3.5 Simulation 3 - Interactions between sound changesIn the previous simulations, there was only a single sound change that could occur. Thisexample gives a slightly more complex simulation in which sound changes can interactwith each other. Simulation 3 uses the same con\u001Cguration \u001Cle as Simulation 1: the initialinventory is /b, q, n/ and the phonotactics are set to VC. The only di\u001Berence is that thereare now two misperceptions:Devoicing [+voice, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] segments have their [voice] value reduced by .5 inthe environment of _# (p=.25)Lenition [\u00E2\u0088\u0092voice, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] segments have their [cont] value increased by .5 inthe environment of _# (p=.25)The \u001Crst is the same \u001Cnal devoicing change used in Simulation 1. The second is a lenitionprocess where voiceless stops become fricatives, also in \u001Cnal position. This means it ispossible for /q/ to be a\u001Bected by Lenition from the initial generation. On the other hand,/b/ is not a\u001Bected, since it is voiced.74Once \u001Cnal devoicing has had an e\u001Bect on /b/, however, the resultant [p] sound will beavailable for misperception as a labial fricative, perhaps /f/, creating a feeding relationshipbetween the changes. Figure 3.4 shows some of the inventories that appeared over thecourse of Simulation 3.Table 3.4: Comparisons of several generations in Simulation 3As the output shows, in generation 1 some changes have already happened. The /b/ hasdevoiced to [p] on some occasions, adding [p] as a new allophone of /b/. The /q/, which isalready voiceless, has also been a\u001Bected by the lenition misperception, and has also gainedan allophone, and adds the \u001Crst fricative to the inventory.In generation 2, some instances of [p] have lenited to [F], which is now actually countedas a second possible variant of /b/. PyILM will not consider it to be an allophone of /p/,since /p/ has no independent existence yet in the simulation. Another way of thinkingabout this is that the language has only a single labial sound at this point, with threepossible pronunciations. By generation 6 /p/ has become the phoneme for the labial set:it has completely replaced /b/ in a certain number of words, and now [b] only ever appearsas a variant of /p/. At generation 10, the voiced labial has completely disappeared, andboth /p/ and /F/ are considered phonemes. The uvular stop has also disappeared at thispoint, replaced in every instance by a fricative. By generation 15, the language is back toan inventory of the original size, with two fricatives replacing the two original stops. Thenasal, meanwhile, has remained completely una\u001Bected the entire time.75Figure 3.4 depicts how the total and core inventories are changing. Between generation0 and 10, there is a large degree of allophonic variation. This settles down at generation10 as [b] and [q] disappear as possible variants. The total inventory then drops once morewhen /p/ is eventually replaced by /F/.Figure 3.4: Change in inventory size for Simulation 3Setting up a feeding or bleeding relationship such as this can be quite di\u001Ecult withoutcareful manipulation of parameters. In this case, there is sure to be feeding that will happenbecause the initial /b/ is not subject to the lenition, only /p/ is. And /p/ is, by design,the outcome of the other misperception. If this simulation had been run with a randomselection of segments, or with more complex phonotactics, there would be no guarantee thiswould occur. One misperception might never be triggered because the particular segmenttype or context is lacking in the language.This simulation also illustrates why it is di\u001Ecult to use PyILM to simulate the evolutionof any speci\u001Cc natural language. For instance, suppose a simulation was seeded with alexicon of Old English, and the set of misperceptions was con\u001Cgured to include all knownsound changes from Old English to Modern English. There is no guarantee that all of themisperceptions would occur in the same order in a PyILM simulation as they did in the realworld. So long as the conditions for a misperception are met in the lexicon, it is possiblefor a sound change to occur. It is not possible to force a particular ordering, at least notwithout introducing an unnecessary amount of teleology.76Table 3.5: Con\u001Cguration for Simulation 43.6 Simulation 4 - CVC languageThis simulation uses the con\u001Cguration details shown in Table 3.5. For this simulation, thephonotactics are slightly more complex, allowing CVC, CV, V, and VC syllables. Wordsare still capped at a single syllable, however. The initial inventory is speci\u001Ced in a \u001Cle as/b, d, t, k, f, z, m, n, a, i, e/. The misperception \u001Cle is the same as Simulation 2, that is,there is a chance of \u001Cnal devoicing and of \u001Cnal lenition of voiceless stops.We know what is in the initial inventory, so we can make a few guesses about how thesimulation will turn out, given enough time. Misperceptions only target \u001Cnal position, so ifthe lexicon contains at least one word with each segment in initial position, then all of theinitial phonemes are guaranteed to survive until the end of the simulation. As it stands,the vowels and the nasals are sure to survive anyway, so long as they appear in at least oneword, since no misperceptions target them at all.Assume that in fact all the segments appear in initial position, so the inventory will notshrink over time. How can the inventory grow? Consider \u001Crst the labial set. The segment/b/ will permit the creation of /p/ through devoicing. This /p/ is transient, though, andinevitably all examples of it will lenite, just as in the last simulation. No /p/ can \u0010survive\u0011because all instances of this category are found in \u001Cnal position, which is exactly wherelenition applies. This extremely restricted distribution is due to the fact that there wasno /p/ in the initial inventory; /p/ appears through devoicing which occurs only in \u001Cnalposition. If /p/ had been part of the initial inventory, then it might have occurred in initialposition, which would have protected it from the lenition change.Once /p/ has lenited to a fricative, there are no more changes that can take place. The\u001Cnal labial inventory will include /b, p*, f/ where /p*/ is either a labial stop or a labialfricative, depending on how long the simulation has been running. This fricative may ormay not merge with the original /f/. A merger will take place only if /p*/ and /f/ areidentical on every dimension except [continuant]. This also implies that /b/ and /f/ areidentical on every dimension except [voice] and [continuant], since /p/ descends from /b/.If, for any reason, /b/ and /f/ di\u001Ber on some other feature, then when /p/ lenites, theresulting category will be considered a di\u001Berent category from /f/.77For example, in the feature system of Hayes (2011), [labiodental] is feature. Since theoriginal /b/ is [\u00E2\u0088\u0092labiodental] in this system1, then /p*/ will be [\u00E2\u0088\u0092labiodental] becausethe Lenition misperception only a\u001Bects the feature [continuant]. In this case, /p*/ willdi\u001Ber from /f/ by both [continuant] and [labiodental], and will therefore be categorized assomething other than /f/.The coronals will have a somewhat di\u001Berent evolution. There is /z/ but not /s/, so itis expected that /s/ will eventually appear through misperception. In fact some kind ofvoiceless coronal fricative is almost certain to appear because the original /d/ should devoiceto /t/, which is subject to the lenition misperception. It is possible that the voiceless coronalfricative that ultimately descends from /z/ will merge with the one that descends from /t/,but it will depend on speci\u001Ccally how the exemplar token values are distributed in a givensimulation, and this process is partly random. The coronal inventory could eventually growto /d, t, z, s1, s2/, where the two /s/ segments represent potentially di\u001Berent descendantsof original /t/ and original /z/.The single dorsal /k/, if it only appears in \u001Cnal position, is doomed. It is certain toundergo lenition, and there are no original voiced dorsals that could create a \u0010replacement\u0011/k/ through devoicing. The \u001Cnal dorsal inventory is therefore going to be /x, k/, if /k/appears in initial position, otherwise it will simply be /x/.Again, these predictions depend on some assumptions about the initial lexicon of thelanguage, and whether or not the relevant segments all appear in the relevant environments.Running this same simulation multiple times with di\u001Berent random seeds will producedi\u001Berent outcomes. The plot in Figure 3.5 shows the change in (total) inventory size over\u001Cve simulations using the same initial conditions, but with di\u001Berent random seeds. In thesimulations with larger \u001Cnal inventories, sounds appear in a greater variety of environments,increasing the probability that they survive the entire simulation, since they are more likelyto appear in an environments not targeted by a misperception. In the simulations withsmaller \u001Cnal inventories, there was less diversity in sound distributions, and some soundsdisappeared because they occurred only in environments targeted by misperception.Note that not all simulations started with exactly the same number of sounds: somestarted with 10 and some with 11. All languages were given the same con\u001Cguration \u001Cle withthe same inventory, but on some random seeds, not all of these sounds were actually sam-pled during the construction of the initial lexicon. There is a simulation parameter calledauto_increase_lexicon_size (see section 2.2.2.15) which would force every simulation touse all 11 sounds, but it was set to False for these cases.Table 3.6 gives some snapshots of a language actually generated by one of these randomseeds. The labials turned out as predicted. The phoneme /b/ \u001Crst develops an allophone[p], which then becomes a phoneme, and which then lenites and becomes a fricative. Inthis case, it did not merge with the existing /f/, and there are two labial fricatives in the1In Hayes' system, [labiodental] is actually a unary feature, so /b/ would simply not have this featureat all. However, since features in PyILM cannot be unary, /b/ would be considered [-labiodental].78Figure 3.5: Change in total inventory size with \u001Cve di\u001Berent random seeds\u001Cnal inventory. Inspecting the simulation, it appears that /f/ is [\u00E2\u0088\u0092distributed] while /F/ is[+distributed], which is a feature it inherits from the original /p/.The coronals evolved more or less as expected as well. The phoneme /d/ devoiced to[t], which gave rise to [T], which eventually achieved phonemic status. The phoneme /t/rises and falls throughout the simulation, as some tokens of [d] devoice, then lenite. Someinstances of /d/ still remain in \u001Cnal position, so they allow for new devoicing which leads tonew lenitions (that all merged with the \u001Crst /T/). If the simulation were run long enough,eventually /t/ would completely overtake /d/ in all \u001Cnal positions, and then eventuallylenite to /T/.The dorsal stop was lost, and replaced by a fricative, which was predicted. However,this actually did not happen entirely due to misperception. This segment had a curiousevolution. The original /k/ appeared in three words: /ki/, /ik/ and /ak/. It early onacquired an allophone [x] in \u001Cnal position. This [x] became an increasingly common variantuntil it was the dominant pronunciation in two out of three words: /ik/ > /ix/ and /ak/> /ax/. Then the learner in generation 10 decided to group /x/ and /k/ into a singlecategory. Even the /k/ in initial position, not a\u001Bected by misperception, merged with /x/.Why did this occur?Inspecting the simulation more closely revealed that the initial /k/ category had beenseeded with exemplars that happened to have extremely low values on the [continuant]dimension, so that most tokens produced had a value of 0.1 or less. The \u001Cnal lenitionmisperception boosted production values by +.5, which created tokens that only barelypassed the threshold for a learning agent to categorize something as [+continuant], so thenew /x/ category had values that straddled a perceptual boundary.At generation 10 it appears as though the learner failed to notice any signi\u001Ccant di\u001Ber-79ence between any /k/ or /x/ tokens produced by the previous generation, and categorizedthem all as [\u00E2\u0088\u0092continuant], that is, /k/ became the new phoneme. This created a categorywith a large degree of variation in [continuant] values. Misperception continues to act ontransmission to the next generation, which pushed average [continuant] token values higher.The learner at generation 11 also only learned a single velar category, but this time [+con-tinuant], that is, /x/ became the phoneme. Since there is no misperception that makesword-initial tokens any less continuant, there is no way for /k/ to return to the inventory,and this collapse of categories is essentially permanent.Table 3.6: Comparison of several generation in Simulation 43.7 Simulation 5 - Invention and the spread of new segmentsIn the previous examples, the new segments that are created by misperceptions are incompetition with existing segments, and only one of them can win. Inevitably, it will bethe one that is \u0010preferred\u0011 by the misperception. These newly created segments, however,are more like replacements for the older segments, rather than truly new additions to thelanguage. They never leave their original environments, because the invention rate has beenset to 0.0 for the previous simulations. In this next example, the invention rate is raised todemonstrate how this a\u001Bects the evolution of an inventory. The con\u001Cguration details areshown in Table 3.7.Agents inventing new words will draw from the total inventory of sounds, not just fromthe phoneme inventory. This makes it possible for allophones to become phonemes, becausethey can appear in an invented word in an environment where they are not in variation80Table 3.7: Con\u001Cguration for Simulation 5with another sound. This is analogous to a process that is known to happen in naturallanguage where words are borrowed containing an allophone in a novel environment, whichcan lead to that allophone taking on phonemic status. For example, in Old English [f] and[v] were allophones of a single phoneme, with [v] occurring intervocalically and [f] occurringelsewhere. Over time, English borrowed French words that contained a [v] in positionsother than between vowels (McMahon 2002). This created overlapping distributions of [f]and [v], which resulting in [v] eventually taking on phonemic status.The misperceptions are the same as the previous simulations: a 25% chance of word-\u001Cnal devoicing and word-\u001Cnal lenition. The combination of misperceptions and inventionscreates di\u001Berent outcomes than the previous simulations without invention. For instance,consider just the coronals. In the initial inventory there is a voiced coronal stop /d/, but novoiceless counterpart. The voiced one appears in both word-initial and word-\u001Cnal positionin the initial lexicon. After several generations of the simulation, all of the /d/ in \u001Cnalposition have devoiced, and there is now a voiceless coronal stop /t/ in the inventory. Thisnewly created stop is now subject to \u001Cnal lenition, and eventually all instances of it becomevoiceless fricatives, returning the language to a state of only having the one (voiced) coronalstop.81Table 3.8: Comparison of several generations in Simulation 5This voiced stop will then have a restricted distribution - it will only be found in word-initial position, because no misperceptions operate there. In previous simulations, no morechange would be possible at this point, since misperceptions can have no more e\u001Bects.However, in this simulation the invention rate is greater than 0, so there is the possibilitythat an agent can create new words and put the voiced stop back into \u001Cnal position. Thismakes it now a target of \u001Cnal devoicing, and a voiceless stop will eventually re-join theinventory. It is also possible that during the period of time where /t/ exists as a phoneme,an agent will invent a new word that contains a /t/ in initial position, shielding at leastsome instances of /t/ from lenition, making it a more permanent member of the inventory.3.8 SummaryIn this chapter, I demonstrated how inventories evolve in PyILM, and how various sim-ulation parameters can a\u001Bect this evolution. The notion of phonemes and allophones inPyILM were introduced, as they di\u001Ber somewhat from the common use of these terms inphonological theory. A sound is considered to be a phoneme in PyILM if it occurs in atleast one word in the lexicon where it does not vary with another sound. A sound is con-sidered to be an allophone if it only ever occurs as a variant of other sounds. There were\u001Cve simulations presented in this chapter.Simulation 1 showed how inventories can change through the abrupt introduction of anew sound, and Simulation 2 showed how categories can shift slowly over time. The di\u001Berentoutcomes depended on the values of di\u001Berent simulation parameters. When misperceptionshave a high salience, this tends to lead to the emergence of allophones. For instance,suppose a simulation has an intervocalic lenition misperception with a high salience, and82a lexicon has /b/ between vowels. A word such as /aba/ will quickly obtain two possiblepronunciations: [aba] and [ava]. Initially, the [v] sound will be a variant of /b/, but somenumber of generations, the word will come to be pronounced uniquely as [ava] and /v/ willenter the inventory as a phoneme.When misperceptions have a lower salience, sounds in an inventory tend to graduallychange categories, without the appearance of an intermediate phoneme. For example,suppose a simulation has a low-salience intervocalic lenition misperception. A word like/aba/ will continue to be pronounced as [aba] for a few generations, but the e\u001Bect of themisperception will slowly drag the [continuant] values of the /b/ segment (in this word)higher. Eventually, some learner will acquire the word as /ava/, and it will have a unique[ava] pronunciation. In contrast to the high-salience simulation, it is less likely that asituation will arise where both [aba] and [ava] are possible pronunciations in the low-saliencesimulation.There is also an interaction between the misperception saliencetheminimum_activation_level paramater. When this parameter is set very low(close to 0) then all segments in a simulation will tend to collapse into a single category.If the parameter is set very high (close to 1) then there is an extreme proliferation ofsegment categories. These e\u001Bects are very strong, and will occur regardless of the salienceand frequency of any misperceptions.Simulation 3 increased the number of misperceptions and included some feeding re-lationships, for example a lenition process that only a\u001Bect voiceless sounds which werethemselves the product of a devoicing misperceptions.Simulation 4 demonstrated how phonotactics can in\u001Duence the outcome. This is dueto the context-sensitive nature of sound changes. A language with only CV syllables hasexactly two contexts for consonants: word-initial or intervocalic (assuming a word of at leasttwo syllables). This limits the number and type of misperceptions that could potentiallyapply. On the other hand, a languages with CVCC syllables has a greater variety ofenvironments in its lexicon, which means that a greater variety of sound changes couldpotentially take place. The issue of phonotactics will be discussed in much more detail inthe Chapter 5.Simulation 5 introduced the concept of inventions. Invention has two major e\u001Bects onthe outcome of a simulation. One is that invention creates new words with new environ-ments, allowing misperceptions to apply to sounds that might not apply in other words.The second possibility is that allophones can be selected by the invention algorithm andplaces into new contexts where they do not vary with any other sounds, instantly achievingthe status of phonemes.This builds up the basic foundations of simulations in PyILM. Now more complex simu-lations can be considered, with the aim of trying to model the evolution of natural languageinventories. In Chapter 4, I will review the typology of natural language inventories, beforereturning again to PyILM in Chapter 5, with the aim of simulating these typological facts.83Chapter 4Natural language consonantinventories4.1 Inventory size4.1.1 OverviewSound inventories are extremely diverse. One of the most obvious ways in which they di\u001Beris in the number of sounds they contain. Counts of inventory size depend partly on whatis being counted. It is common in linguistics to make the distinction between the \u0010surface\u0011or \u0010phonetic\u0011 inventory of a language, which consists of the sounds that are physicallyarticulated, and the \u0010underlying\u0011 or \u0010phonemic\u0011 inventory, which consists of abstract mentalcategories assumed to be acquired by a learner of a language.Collecting a complete phonetic inventory, a set of all the speech sounds in a language, isactually not feasible, since no two speech productions are exactly alike, and this collectionwould be in\u001Cnite in size. Speech sounds are instead grouped into a \u001Cnite set of categories,with categorization typically done through the use of articulatory or acoustic features. TheInternational Phonetic Alphabet, for example, is a very widely used system for categorizingspeech sounds based on articulation. Major category features for consonants in the IPAinclude place of articulation, manner of articulation, voicing, and airstream mechanism.The phoneme inventory of the language is based on an analysis of the lexicon. The dis-tribution of a sound in the lexicon determines its phonemic status. Phonemes are usuallyargued for on the basis of contrast, with minimal pairs being the best evidence. Sounds thatnever appear in the same environment, i.e. have complementary distribution, are consideredallophones of a single phoneme. There is sometimes an additional requirement that allo-phones bear some phonetic resemblance to each other. For instance, in English the sounds[h] and [N] are in complementary distribution, with [h] appearing only in syllable initialposition, and [N] appearing in non-initial position. Despite this, the two sounds are notanalyzed as allophones of a single phoneme because they are phonetically quite di\u001Berent.84Figure 4.1: The inventories of Palauan, from Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005)The inventory of Palauan is a good example of how some of these decisions can a\u001Bectwhat gets counted in an inventory. The tables combined in Figure 4.1 come from Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005). The top table gives the approximate phonetic inventory, which is somethinglike the set of all articulatorily distinct sounds found in Palauan speech. The bottomtable gives what Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 calls the \u0010contrastive\u0011 consonants. Each box is a phonemiccategory and the symbol \u00E2\u0088\u00BC is used to indicate the multiple possible pronunciations for asound in that category.There is a di\u001Berence of 8 sounds between the two tables. There are three kinds of velarstops that are articulated in Palauan - voiceless unaspirated, voiceless aspirated, and voiced- but they are all considered variants of a single velar phoneme. There are two reasons forgrouping them together as allophones: (1) they are phonetically similar, (2) they appearin complementary distribution in the lexicon. Speci\u001Ccally, [kh] occurs in \u001Cnal position, [g]appears between vowels, and [k] appears elsewhere. Figure 4.2 provides a word list andsummary of this distribution.Since [k] is the least predictable of the allophones, it is also assumed to be the under-lying phoneme. In constructing a phonemic inventory of Palauan, the velar stop categorywould be represented using the symbol /k/. The aspirated and voiced velars would not berepresented.85Figure 4.2: Summary of the distribution of velar stops in Palauan, with data from Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005))This can be compared to another, rather more simple case, which is the bilabial nasal.The sound [m], according to Mor\u00C3\u00A9n-Duollj\u00C3\u00A1 (2005) is found in a variety of environments,and there are no noticeable variations in pronunciation. There is one other nasal in thelanguage, which appears nearly everywhere that [m] does, so there is no complementarydistribution that might suggest an allophonic relationship. Palauan is therefore assumedto include an underlying category /m/ which would appear in a phonemic inventory.The focus of this chapter will be phoneme inventories. One major reason for this is thatthere exist several large databases of information about phoneme inventories. Additionally,the abstract categorical nature of phoneme inventories makes them somewhat easier to col-lect and analyze, compared to the more gradient nature of phonetic data. Major databasesthat will be frequently referenced in this chapter are UPSID, P-base, and WALS, which aredescribed below.UPSID is the UCLA Phonological Segment Inventory Database. UPSID was the \u001Crstmajor database of inventories, and is extremely widely used. It was \u001Crst published asMaddieson (1984) with 317 languages. In Maddieson and Precoda (1989), it was ex-panded to 451 inventories. The database attempts to be genetically balanced, to representan even spread of the world's languages. UPSID has a a very simple web interface at:http://web.phonetik.uni-frankfurt.de/upsid.html.P-base was created as part of Je\u001B Mielke's dissertation work (Mielke 2008). It containsthe inventories of 628 varieties of 548 spoken languages. The languages in the database arethose that Mielke could \u001Cnd in grammars available at the Ohio State University and Michi-gan State University libraries (Library of Congress PA-PM). In addition to the inventories,P-base also includes any information about the patterning of sounds that was availablein the grammars. P-base has a graphical user interface with functions for \u001Cnding natural86classes, calculating feature economy, and comparing inventories. It can be downloaded athttp://pbase.phon.chass.ncsu.edu/WALS is the World Atlas of Language Structures (Dryer and Haspelmath 2013), andcontains information from more than 1,000 languages. WALS is not limited to phonologicalinventories, unlike the previous two resources, but rather contains information about nu-merous aspects of language, including morphological and syntactic information. It is alsonot a single database, rather it is a collection of individual chapters written by di\u001Berentauthors, and each chapter may sample a di\u001Berent set of languages. An interesting featureof WALS is the ability to display a map of the world, with individual languages taggedand colour-coded for particular features. The information available in WALS comes froma variety of sources, and each language has its sources listed. It is not possible to look ata speci\u001Cc phoneme inventory of a language in WALS. Instead, the information is packagedin a more coarse-grained way, by grouping languages into categories. For example, Feature6A in WALS (Maddieson 2013b) is titled \u0010uvular consonants\u0011 which categorizes languagesinto four categories: those with uvular stops, those with uvular continuants, those withboth, and those with neither. This makes it more useful for broad, typological studies,and somewhat less useful for the study of individual languages. It is available online atwww.wals.info.Large databases like these are created from a diverse array of sources, and constructedwith di\u001Berent goals in mind, so it is inevitable that there will be disagreements. Oneexample of this is the way that Jacaltec (Mayan, Mexico) is described in UPSID and P-base. In UPSID the stop series for this languages is listed as three voiceless aspirated stops/ph, th, kh/, two ejective stops /t',k'/, and two implosives /b<, q ph,th,khexcept after nasals where *p,*t,*k > p', t', k'. Ohala(1997) argues that ejectives can emerge from a sequence of a plosive and a glottal stop,when the closure for the glottal stop overlaps with closure for the plosive, e.g. the sequence[k] + [P] can result in [k'].These are contexts that cannot occur at all in a language with simple CV phonotactics,but can occur at syllable boundaries in CVC languages, and even within a single syllablein CCVCC languages. Therefore, ejectives are more likely to emerge in CCVCC or CVClanguages, which are also more likely to have larger inventories, by Hypothesis #1.This only addresses half of the issue, which is the question of why inventories diversifyas they grow. It does not explain why small inventories tend to look similar to each other,or what happens as inventories shrink. To address this, it will be necessary to make somemodi\u001Ccations to the way that misperception is modeled in PyILM.5.3.1 Misperception vs. biasOne problem with the current misperception model is that simulations can reach a pointwhere languages cease changing. After running a simulation su\u001Eciently long it will be thecase that for any given context in the lexicon, the sounds that appear in that context willeither be (a) sounds from the 0th generation of the simulation that are una\u001Bected by anymisperception, or (b) sounds that are the result of any misperception that can occur in thatcontext.This can be visualized as a state diagram, as shown in Figure 5.2. This representspossible states of the inventory in an extremely simpli\u001Ced simulation with only three startingconsonants /b, d, g/. Assume there exists a single misperception of \u001Cnal devoicing, andassume these sounds appear in word-\u001Cnal position in the 0th generation. Each circle is apossible inventory, and arrows represent directions of change. The top green circle is theinitial state, and the bottom red circle is the only possible \u001Cnal state. A change in stateoccurs when the devoicing misperception has changed a voiced stop into a voiceless one.Figure 5.2 is only a partial state diagram, since in an actual simulation run, the inventorycan enter some \u0010in-between\u0011 states where the voiceless and voiced obstruent both existtogether, e.g. /p, b, d, g/. The in-between states inevitably end with the voiceless consonantwinning out over the voiced one, so I have excluded these states from the diagram for clarity.Additionally, the state diagram only represents how the stop system in \u001Cnal position evolves.Outside of this context, the assumed misperception does not apply, so at any given state,the inventory will also contain whichever voiced stops are not in \u001Cnal position.There is a single terminal state that, once reached, cannot be exited. There are nomisperceptions that can change a /p, t, k/ inventory back into a /b, d, g/ inventory. Sucha state is called an \u0010absorbing state\u0011. If the simulation is run for long enough, the languagewill eventually reach this state. In this simple example, there is a single misperception, sothere is a single absorbing state.133Figure 5.2: State diagram for word-\u001Cnal obstruents in a simulation with \u001Cnal devoicingIn a simulation with a richer set of misperceptions, the state diagram becomes morecomplicated. More than one absorbing state might exist. Given a large enough set ofmisperceptions, it may even be technically possible to have a feeding loop. Consider thesefour changes:A > B / _ CC > D / B _B > A / _ DD > C / A _When A changes to B before C, it creates the right conditions for C to become D. Thisin turn creates the right conditions for B to go back to being A. This causes D to turn backinto C, and we return to the original state, ready to loop again. However, this requiresan extremely speci\u001Cc set of misperceptions and an extremely speci\u001Cc lexicon, and theremust be no other misperceptions that could break the loop. Such a set of changes is notlikely to arise in natural language, or at least not commonly enough to play any signi\u001Ccantrole in modeling sound change. That is, we should not use PyILM to construct loops likethis because the outcome of such simulations will tell us little about the way that naturallanguages evolve.The possiblity of absorbing states seems very unnatural. All human languages areconstantly undergoing sound change. It would be desirable that languages in a simulationdo the same. Of course, at some point language change has to \u0010stop\u0011 in a simulation because134simulations are \u001Cnite. A more realistic goal is to have a simulation where languages at leasthave the potential to continue changing right until the \u001Cnal generation.Absorbing states also make it impossible to simulate the right conditions for the phe-nomenon under discussion in this section. When inventories shrink over time, they mustshrink back towards a common set of sounds, otherwise the superset relations found inUPSID and P-base would not exist.Why do absorbing states occur in a simulation? It is because misperceptions are bothcontext-sensitive and asymmetrical. The probability of A > B / _C is not the same asthat of B > A / _ C. One of those probabilities is usually equal to zero, while the other isnon-zero. This pushes languages in a particular direction, without giving any way for thelanguage to return to the state that it used to be in.One way of addressing this, and allowing inventories to return to former states, wouldbe to make misperceptions symmetrical. For every A > B change, ensure there also existsa B > A change, in the same environment. Symmetrical misperceptions would be veryeasy to model in PyILM. A rounding misperception, such as [\u00E2\u0088\u0092round, \u00E2\u0088\u0092voc] \u0019 +.5round/ _[+round, +voc], could be coupled with an unrounding hypercorrection [+round, \u00E2\u0088\u0092voc]\u0019 -.5round / _[+round, +voc]. With these misperceptions at play, the inventory of alanguage can potentially continue to change \u0010forever\u0011 (i.e. until the very last generation ofa simulation), possibly bouncing back and forth between states.This may solve the problem of absorbing states, but it lacks empiral support. Thereare many kinds of sound changes that are clearly not symmetrical. For example, intervo-calic voicing of stops has been observed in numerous languages, but the reverse pattern ofintervocalic devoicing is vanishingly rare.Instead of modifying the symmetry of misperception, a better approach is to balanceout their context-sensitive nature by introducing context-free changes. As useful terms fordiscussion, context-sensitive changes will continue to be referred to as misperceptions, andcontext-free changes will be referred to as biases. The idea behind a bias is essentially thesame as a misperception: it is an articulatory or perceptual e\u001Bect that interferes with thetransmission of sounds, and creates the potential for a learner to acquire a di\u001Berent set ofsounds than the speaker intended to transmit.This idea was proposed in the previous chapter as Hypothesis #2, repeated here:Hypothesis #2Common sounds exist because of context-free biases in transmission that a\u001Bect all lan-guages, regardless of phonotactics. Rarer segments are rare because they require morespeci\u001Cc phonetic environments to appear, and these are more likely to exist in larger in-ventories, because larger inventories have more, and more di\u001Berent, phonetic contexts (byHypothesis #1).Biases, in contrast to misperceptions, are factors that always a\u001Bect the production or135Table 5.2: Con\u001Cguration for simulations comparing simple misperceptions and biasesperception of certain classes of speech sounds, regardless of where they occur. For instance,stops are biased towards being voiceless. This is because the conditions for voicing require acertain di\u001Berence between subglottal and supraglottal air pressures, which is more di\u001Ecultto maintain when air\u001Dow is stopped (Ohala 1983). This of course does not make voicingof stops impossible, it is just more likely that any stop, regardless of where it is produced,could be articulated as voiceless.This means that, all else being equal, voiced stops are constantly at risk of being misper-ceived as voiceless, because speakers might, at any point, fail to reach the right air pressuredi\u001Berential for voicing to occur. The fact that certain conditions seem to encourage thiseven more (e.g. word or utterance \u001Cnal position, Blevins (2006b)) compounds the likelihoodof voiceless stops being in any given inventory.Formally speaking, biases can be modeled in almost exactly the same was as mispercep-tions within PyILM. They have the same e\u001Bect of changing the phonetic values of certainsounds, but they can occur in any context, rather than in a speci\u001Cc one. In the notation ofPyILM, a context-free misperception is indicated with a * for the environment.Just as with misperceptions in PyILM, biases are abstractions, and using them in asimulation is not intended to be an argument for the existence of any particular kind ofbias in real language transmission. Any proposal for a bias would need to be argued foron its own merits. Here, I will simply be assuming the general existence of biases in orderto demonstrate a point: simulated inventories that are both small and large will sharesounds (namely, those which emerge through context-free bias), while larger inventorieswill additionally have rare or more complex sounds (namely, those which emerge throughcontext-sensitive misperception).To demonstrate, I will \u001Crst describe two simple simulations, each with a single misper-ception, and a single bias, so that their interaction is easier to follow. Table 5.2 shows thecon\u001Cguration for this example.For one simulation, the starting inventory was set to /p, t, k, i, u, a/ and for theother simulation the starting inventory was /b, d, g, i, u, a/. In other words, each of the136simulation conditions started with either a set of stops preferred by bias or the set of stopspreferred by misperceptions.It took several test runs of PyILM to decide on probabilities and salience values for thechanges. The context-free nature of biases means that there are more opportunities perutterance for a bias to in\u001Duence speech than for a misperception to do so. If the bias proba-bility is set too high, it can overpower a misperception, and lead to an absorbing state. Theidea, then, is to model biases as frequent but weak e\u001Bects, while misperceptions are strongbut less frequent. In this particular case, the bias is twice a likely as the misperception,but its e\u001Bect is small enough on a given utterance that it probably will not change whichcategory of sound is understood by the listener. The e\u001Bect is felt only over many exposuresto tokens of a category a\u001Bected by bias. Misperceptions occur half as often as biases, buttheir e\u001Bect is strong enough that a listener will probably perceive a categorically distinctsound.With the right balance, inventories produced by this kind of simulation will never stopchanging, unlike inventories in simulations with only misperceptions. The speci\u001Cc proba-bilities and salience values for biases and misperceptions will have a strong e\u001Bect on howfrequent and how abrupt the changes are (see Chapter 3, sections 3.2 and 3.3 for morediscussion on these parameters). Table 5.3 depicts the consonant inventories for a selectnumber of generations from one simulation. Figure 5.3 shows continual changes in inven-tory size over 100 generations of two simulations, starting with either voiced or voicelessconsonants.The constant change in inventory size comes from the interplay of the bias and themisperception. Assume an inventory that begins with only /p, t, k/. The misperceptionwill create voiced stops between vowels, which increases the size of the inventory. The biasreduces the voicing in all contexts, so some of these intervocalic voiced stops can mergeback with voiceless stops, decreasing the inventory size. The reverse occurs in a simulationthat begins with /b, d, g/. Some of these will devoice due to the bias, increasing inventorysize, while the intervocalic voicing misperception can cause the voiceless sounds to mergeback with the original voiced categories. This can be seen speci\u001Ccally for /g/ in Table 5.3.No /g/ exists in Generation 10. By generation 25, /g/ has emerged as an allophone of /k/.It becomes a full phoneme by Generation 50, then disappears again by Generation 70.5.3.2 Simulation resultsIn order to test Hypothesis #2, I ran 120 simulations, with both biases and misperceptions.Some, but not all, balance each other out. For example, I created a labialization misper-ception, then an anti-labialization bias. These ideas were roughly based on the descriptionsof Set 2 and Set 3 consonants in Lindblom and Maddieson (1988). In addition, I createda small number of misperceptions and biases that do not counteract each other. The fullset is listed in Table 5.4. As with the Hypothesis #1 test, simulations were divided evenlybetween CV, CVC, and CCVCC phonotactics (40 each), each with randomly generated137Table 5.3: Example of individual inventories in a simulation with misperception and bias,starting from only voiceless stopsFigure 5.3: Change in inventory size for two simulations, one starting with voiceless stops,one with voiced stops138Table 5.4: Misperceptions and biases for testing Hypothesis #2starting inventories, consisting of anywhere from 8-12 consonants, and 3-5 vowels, selectedat random from P-base.Each simulation ran for 50 generations, and the \u001Cnal inventory of each was collected.The expectation is that context-free biases will be responsible for a set of sounds foundin most inventories, while the context-sensitive misperceptions are what lead to more raresounds in larger inventories.The segments of the \u001Cnal inventories were therefore roughly categorized this way: soundsthat are the possible outcome of a bias were counted separately from the others. Forinstance, there is a bias against retro\u001Dex stop consonants. Retro\u001Dex is represented as[\u00E2\u0088\u0092ant, \u00E2\u0088\u0092distr, \u00E2\u0088\u0092cont, +con, \u00E2\u0088\u0092son, \u00E2\u0088\u0092voc] in PyILM. The bias a\u001Bects anything with thatfeature set, and raises the [ant] value, thus a segment marked [+ant, \u00E2\u0088\u0092distr, \u00E2\u0088\u0092cont,...] is apossible outcome of a bias, and would be \u001Dagged.The initial inventories of the simulation were generated by sampling uniformly at ran-dom from the set of all possible \u0010biased\u0011 and \u0010other\u0011 (non-biased) sounds. The total numberof biased segments is much smaller than the total number of other segments and so the ini-tial inventories tended to have a high proportion of non-biased sounds. If sound change hadno e\u001Bect on the relative proportion of segment types, then the expectation would be for theinventories of the \u001Cnal generation to also have a greater proportion of non-biased sounds.As Figure 5.4 shows, however, this is not what happens. Instead in smaller inventories, thenumber of biased segments is sometimes equal to or greater than the other segments.139Figure 5.4: Biased and non-biased sounds in the \u001Cnal simulated inventories.This parallels the relationship found in natural language inventories. Lindblom andMaddieson (1988) used the metaphor of a magnet and a rubber band to explain this. Thesesimulations suggest that the metaphor can replaced by more concrete notions, namely thatcontext-free biases are the rubber bands drawing languages toward common sounds, andthe context-sensitive biases are the magnets, pushing inventories to expand into di\u001Berentregions of phonetic space.5.4 Feature economyThis \u001Cnal section of the chapter turns to the topic of feature economy (Clements 2003, Hall2007, Mackie and Mielke 2011, see also discussion in section 4.3.2). Economy can be calcu-lated in several ways, but it is essentially a measurement of how many phonemes exist inan inventory, relative to the number of phonological features required to keep all phonemesdistinct. A discussion of various economy metrics was presented in Section 4.3.2.1. Naturallanguage inventories have been shown to be more economical than randomly generated setsof segments (Mackie and Mielke (2011)). In the previous chapter, I introduced a hypothesisabout the diachronic origin of feature economy, which will be tested through simulation inthis section. I repeat the hypothesis here:Hypothesis #3Feature economy e\u001Bects are emergent from the fact that sound change a\u001Bects phoneticfeatures, rather than whole sounds. This creates the possibility that a new set of sounds140will emerge in an inventory, all of the members of which di\u001Ber from an older set of soundsby one feature. This in turn creates the appearance of economy in an inventory.This hypothesis does not predict economical inventories to be favoured by cultural trans-mission. Greater economy does not mean greater learnability. Misperceptions and biasesare a more powerful force. For instance, it is common to \u001Cnd voicing contrasts amongobstruents, but this is rare among sonorants, even though it would be much more econom-ical to use the [voice] feature across the entire consonant inventory. The articulatory andacoustic-perceptual di\u001Eculties associated with voiceless sonorants outweigh any increase ineconomy that might result from adding them to the inventory. In other words, my claimis that economy emerges as a side-e\u001Bect of sound change and since all natural languagesundergo sound change, all of them display some degree of economy.This section is organized as follows. First, I will describe the general relationship be-tween sound change and economy, and provide a simple simulation as illustration. Next, Iwill move to testing the hypothesis more directly. This will be done by comparing results ofsimulations run with misperceptions that a\u001Bect classes of sounds, and simulations run withmisperceptions targeting individual sounds. If the hypothesis is correct, then economy willbe ultimately higher in cases where misperceptions are de\u001Cned over classes.5.4.1 How economy can change over timeEconomy scores are all calculated using two values: S, the number of segments in the inven-tory, and F, the minimum number of features needed to contrast the inventory. Economychanges as S and F change. Economy scores are raised if either S increases without in-creasing F or else F decreases without decreasing S. Increasing both S and F an equalamount will result in a lower economy value, while decreasing both S and F and equalamount will result in an increase in economy. This is true regardless of which of the foureconomy metrics are used. Let us consider in turn how each of these values can change.S changes whenever a segment is added or lost. Sound are added to the inventorythrough phonemic split, when misperception changes an existing sound in a particularcontext. In most cases, splits result in an increase to S. If a sound happens to have its(lexical) distribution strictly limited to contexts a\u001Bected by a misperception, then the newsound completely replaces the old one, and S does not change. It is possible for S todecrease due to merger, when all instances of one category become instances of anotherexisting category.Note that sound change can occur without any change to S at all. For instance, supposethat an inventory consists of /b, t, d/, and all instances of /b/ devoice to /p/. This is nota merger, since /p/ was not already in the inventory, nor is it a split, because there areno instances of /b/ left behind. The resulting inventory /p, t, d/ is the same size as theoriginal inventory, so S is unchanged.How about changes to F? They come along with changes to the inventory. As far asthe simulation is concerned, it would be impossible for F to change without a change in141inventory, since F is determined using the Feature Economist algorithm (see section 4.3.3).The same input inventory will always result in the same number of output features. Fincreases if a segment is added to the inventory that requires a feature which previouslywas not necessary. For instance, if an inventory has only stops contrasting in voicing andplace, and a fricative is added, then F will increase as [continuant] becomes a necessaryfeature.Losing a segment may result in a decrease of F if this segment was in minimal contrastwith another segment that now stands alone. For instance, imagine an inventory with aseries of voiceless obstruents, only one of which has a voiced counterpart, e.g. /p, t, k, s,z/. The feature [voice] is necessary to contrast /s, z/ only. If /z/ were to disappear, thenthe need for [voice] as a feature is also lost, and so F and S both decrease (and the samewould be true if /s/ were lost instead).There is an e\u001Bect that occasionally occurs in PyILM simulations, call it \u0010feature carry-over\u0011, which can result in a decrease in feature economy. Suppose there is an inventorywith six segments /p, b, tj, d, f, s/. The feature economy of this inventory is relativelyhigh, as only three features are required to contrast everything: [voice, continuant, coro-nal]. The palatalization on /tj/ obviously involves another feature too, and realistically, aspeaker of such a language would need to have this additional articulatory information rep-resented somehow. However, for the purposes of simply working out the smallest numberof phonological features necessary to contrast these segments, only three are required.Suppose there is a sound change that increases the [continuant] value of a segment, e.g.stops become fricatives, and that through this sound change the /tj/ lenites to /sj/. Now,it is necessary to introduce a feature for palatalization in order to contrast /s/ and /sj/.This has an overall e\u001Bect of lowering economy, because although S increases by 1, F alsoincreases by 1 (this is assuming that some instances of /tj/ do not undergo change; if every/tj/ becomes /sj/ then S would not increase, but F would, and economy would decreaseeven more). Changes that increase both S and F by the same amount will result in anoverall decrease in economy. This is simply due to the fact that for any inventory S >F, therefore an increase in S is proportionally less than an equal increase in F. Considera series of sound changes that each add one segment and one feature. Using the SimpleRatio measurement of economy, we would get this series of shrinking values: +=2 5 ):5,4=+ 5 ):+++ : : :, 5=4 5 ):25, 6=5 5 ):2, and so on.The simple ratio economy of /p, b, tj, d, f, s/ is E 5 6=+ 5 2 whereas adding in /sj/means E 5 7=4 5 ):75. This e\u001Bect of feature carry-over can occur in natural languagechanges, for instance a rounded back vowel may front, carrying its roundedness with it,potentially creating a contrast with an existing unrounded front vowel.Change in economy over time is illustrated in Figure 5.5 for a hypothetical simulation.That is, all the values are constructed for the purposes of illustrating how change in economyhappens. No actual simulations were run to obtain these numbers. The top of the \u001Cgureshows change in Simple Ratio, while the other three metrics (Frugality, Exploitation, andRelative E\u001Eciency) are shown at the bottom. This is because the latter three metrics are142Figure 5.5: Change in economy score for a hypothetical languagebounded between 0 and 1, while Simple Ratio has no upper limit, so they cannot easily beshown on the same scale.Each step along the x-axis is comparable to a generation in a simulation. At a glance,one can spot that the metrics do not increase or decrease in a uniform way over time.Consider just the change from the \u001Crst to the second generation. In the \u001Crst generation,the inventory is quite small, with only 5 segments and 3 features. At the second generation,both the number of segments and the number of features has grown. This results in SimpleRatio and Frugality both increasing, while Exploitation drops, and nothing happens toRelative E\u001Eciency. These di\u001Berences make sense, when we consider what each metric isactually measuring.Simple Ratio, as the name implies, is simply the ratio of segments to features. Since9/4 > 5/3, Simple Ratio goes up in the second generation. In fact, Simple Ratio increasessteadily until the 8th generation because each subsequent ratio is higher. In the eightgeneration, Simple Ratio falls because although the inventory size grew, it was not enoughto make up for the change in features. The inventory at the 9th generation would need tohave 38 segments in order to see a gain in Simple Ratio compared to the previous generation.Frugality is a measurement of how close an inventory comes to having the minimumnumber of features it could have, given its size. For S segments, this minimum number islog2S, rounded up to the next whole number. Frugality barely changes between the \u001Crst andsecond generation because in both cases this minimum number is achieved. For 5 segments,at minimum 3 features are needed, and for 9 segments at minimum 4 features are needed.Frugality is not a strict measure of this ratio however, since these inventories would score1.0 otherwise. Larger inventories actually receive slightly higher Frugality scores. This canbe observed by the way that Frugality changes between the second and third generation. In143both cases, the inventory requires 4 features and in both cases this is the minimum possiblefor an inventory of that size. These 4 features could potentially be used to contrast as manyas 16 segments, and so the third generation inventory with 15 segments scores higher thanthe second generation inventory with only 9 segments.Exploitation is, in a sense, the opposite of Frugality, and it measures how close aninventory comes to having the maximum number of segments for a given number of features.The maximum inventory size for F binary features is 2F, so for an inventory to remain highlyeconomical on the Exploitation metric, its size must increase by a power of 2 every timea feature is added. In the \u001Crst generation, with 3 features, it is possible to have as manyas 8 segments. The \u001Cnal generation shown in Figure 5.5 has 8 features, so the inventorywould by then need to have 256 segments to reach a perfect Exploitation score. This meansthat generally speaking, sound changes that result in an increase in the number of featureswill tend to result in a decrease in Exploitation scores. There are only two examples ofExploitation increasing in Figure 5.5: in the third generation when the number of segmentsincreases without any change to the features, and in the eight generation when the numberof segments and the number of features both decrease.Relative E\u001Eciency looks at the minimum and maximum number of features required fora given inventory size, and assigns a score based on where an inventory falls in that range.The \u001Crst four generation all score perfectly on Relative E\u001Eciency because they all havethe minimum possible features. This should be contrasted with Frugality, which assigneddi\u001Berent scores to these inventories based on their size. Relative E\u001Eciency falls in the \u001Cfththrough seventh generations because the number of features rises to 5, while the inventorysize stays in a range that could potentially require as few as 4 features. When the numberof features drops in the ninth generation, Relative E\u001Eciency once again goes up.Another issue to consider is that not only does economy change over time, the rangeof possible scores changes as well. To understand why this is so, it is helpful to plot allpossible economy scores for a range of features and segments. This is shown in Figure 5.6for Simple Ratio and Figure 5.7 for Frugality. The \u001Cgures show inventory size and thenumber of features on the x and y axes, while the z-axis shows the feature economy scorethat a language would have with that combination of segments and features. Not everypoint in space is \u001Clled, because it is not possible for certain combinations to occur. Aninventory with S segments needs at minimum log1b features (rounded up).One important di\u001Berence to note between Frugality and Simple Ratio is where theminimum scores lie for each feature value. For Simple Ratio, the minimum is the same.An inventory of S segments cannot possibly need more than S -1 features, so the minimumSimple Ratio lies just above 1.0 for all values of F.For Frugality, the lowest possible score actually varies with the number of features. Ifan inventory requires two features, then Frugality cannot go below 0.79. If an inventoryrequires three features, then the minimum score is now 0.6. At four features the minimumscore drops to 0.58, and so on.This means that sound changes requiring the addition of a new feature to the inventory144Figure 5.6: Range of possible Simple Ratio scoresFigure 5.7: Range of possible Frugality scores145have a greater impact on the Frugality score than Simple Ratio. For example, an inventorywith 25 segments and 5 features has a Simple Ratio score of 5.0 and a Frugality score of0.928. If a sound change occurred increasing the inventory to 26 segments, but at the costof adding a 6th feature, then both scores will drop. If the inventory can add just four moresegments, for a total of 30, then it would regain its old Simple Ratio score of 5. In the caseof Frugality, the inventory would need to balloon to 48 segments to equal its old score.5.4.2 An illustrative exampleIn this section, change in economy scores is illustrated using a simple simulation. The initialinventory was selected to be a simple set of consonants, mainly obstruents, with just threevowels: /b,d,g,q,f,z,x,m,n,i,u,a/. There were four context-sensitive misperceptions includedfor this simulation, and they only a\u001Bect the obstruents in the language:Devoicing [+voice, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] segments have their [voice] value reduced by .5 inthe environment of _# (p=.25)Lenition [\u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] segments have their [cont] value increased by .5 in the envi-ronment of +voc_+voc (p=.25)Fortition [\u00E2\u0088\u0092son, +cont] segments have their [cont] value reduced by .5 in the envi-ronment of #_ (p=.25)Assimilation [\u00E2\u0088\u0092son, \u00E2\u0088\u0092voc, \u00E2\u0088\u0092voice] segments have their [voice] value increased by .5in the environment of +voice, \u00E2\u0088\u0092voc_ (p=.25)Every sound change was assigned a .25 probability, meaning that on any utterance of asound in an appropriate context, there is .25 probability that a given misperception occurs.Each misperception alters a token's feature values by .5, which is a large enough change invalue, given the parameters of the learning algorithm, to practically ensure that the tokensa\u001Bected by sound change will be categorized as something di\u001Berent than tokens of the samesegments that go una\u001Bected by change.The phonotactics of the language are set to be maximally CVCC, and words can be oneor two syllables long. The phonotactics are such that it is possible for all four misperceptionsto occur at some point. Devoicing can occur in any C-\u001Cnal word, Assimilation can occur inany CC-\u001Cnal word. Lenition requires two vowels, which means that it requires a two-syllableword to be triggered. Fortition could happen in any word that starts with a fricative.At the end of the simulation, the feature economy of the language was calculated at eachgeneration, using the four metrics described in Figure 4.3.2.1. Feature economy, as originallyde\u001Cned by Clements (2003), measured the organization of the phonological inventory, notthe full surface inventory. Similarly in simulations with PyILM, only the core or underlyinginventory is considered. Allophones, that is sounds which occur uniquely as variants of146others and never on their own, are not counted. See section 3.2 in Chapter 2 for discussionof how allophones are identi\u001Ced in PyILM.Change in feature economy for this simulation is shown in Figure 5.8. Economy risesgradually over the course of the simulation, reaching a maximum of around 3.6 on the SimpleRatio measurement. Growth is not consistent. There are a few areas where economy hitsa plateau, and there are also times when it goes down.Figure 5.8: Change in feature economy for a simple simulationEconomy continues to rise, on all metrics, until just past the 20th generation. Theinitial increase is due to pairs of segments emerging from sound change. Generation 17represents the peak of the Simple Ratio scores, at 3.6. From here, economy dips and rises,but never falls lower than 3.2.The reason that economy never settles at a particular score is because some segmentsare in environments where they are subject to more than one misperception. For example,some voiced stops appear in \u001Cnal position following another voiced stop. This means thatboth devoicing and voicing assimilation can both apply. A voiceless sound can thereforeappear through one misperception, only to be wiped out by the other.Relative E\u001Eciency evolves di\u001Berently from all of the others. In some cases, it stays at1.0, a perfect score, while other scores drop (especially around the 20th generation). This isbecause, for a given feature value, there is range of segment inventory sizes that will alwaysget a Relative E\u001Eciency score of 1.0. Speci\u001Ccally, for an inventory of size S requiring Ffeatures for contrast, Relative E\u001Eciency is 1.0 if it is the case that 2F\u00E2\u0088\u00920+) \u00E2\u0089\u00A4 b \u00E2\u0089\u00A4 2F . Forexample, an inventory of 28 segments that requires 5 features for contrast will get a perfectRelative E\u001Eciency score because 23 + ) < 20 < 24. In fact, any inventory between 17 and32 segments will receive the same score.Practically speaking, this means that an inventory can shrink in size without a\u001Bectingthe Relative E\u001Eciency score, so long as the number of features does not change. This is147just what happens in the simulation shortly after generation 20. The inventory had evolveda voicing distinction at every place of articulation for both stops and fricatives. Therewere 18 total phonemes (16 obstruents and 2 nasals), and the inventory required 5 featuresfor contrast. That is the minimum possible so Relative E\u001Eciency was 1.0. At a latergeneration, one of the voicing contrasts collapsed as a voiceless sound merged with a voicedsound. This reduced the inventory size to 17, but without any change to the contrastivefeatures. This caused Simple Ratio to drop, because064 <074 , but Relative E\u001Eciency wasuna\u001Bected because 23 + ) < )0 < 24.5.4.3 Segment-speci\u001Cc misperceptions vs. class-level misperceptionsHypothesis #3 proposes that feature economy emerges over time because sound changes af-fect phonetic features, and hence classes of sounds, rather than targeting individual sounds.Importantly, the claim here is not that sound change a\u001Bects classes of sounds simultane-ously. At any given generation, any number of members of a class might be a\u001Bected, orperhaps none are, but the net result, over many generations, is that a class of sounds willhave undergone sound change. In other words, given enough time, sound change can cre-ate a new class of segments out of an old one, with the two classes di\u001Bering by whateverfeature was a\u001Bected in the sound change. This results in inventories with classes of soundscontrasting along particular feature dimensions, i.e. feature economy.To test this hypothesis, it is necessary to run simulations under two di\u001Berent conditions.In one condition, changes are de\u001Cned over broad classes of sounds (the \u0010class-level\u0011 condi-tion), i.e. the familiar kinds of misperceptions and biases already used in this dissertation.In the second condition, changes target individual sounds in an inventory (the \u0010segment-speci\u001Cc\u0011 condition). This is in a sense like simulating two possible worlds, where soundchange operates in di\u001Berent ways. The class-level condition represents the actual world,while the segment-speci\u001Cc condition represents a hypothetical alternative world that wecan compare against. If Hypothesis #3 is correct, then inventories in the class-level condi-tion should generally have higher economy scores than those in segment-speci\u001Cc condition.It is important to distinguish between segment-speci\u001Cc changes fabricated for thesesimulations, and multiple instances of class-level changes a\u001Becting individual segments.Even though it is fairly clear that real sound changes a\u001Bect broad classes of sounds, itdoes not mean that every sound in a class is equally likely to be a\u001Bected. Consider thecase of a lenition-type change, which increases the continuancy value of a stop betweenvowels. Hualde et al. (2011) discuss how lenition of intervocalic stops in Romance languagesvaries with context, and between types of stops. From these \u001Cndings, we might decide toimplement several \u0010segment-speci\u001Cc\u0011 lenition misperceptions, one for every plosive in aninventory. Each misperception would have a di\u001Berent salience and probability, and perhapseven somewhat di\u001Berent environments. That way, /b/ would be a\u001Bected by lenition in itsown way, slightly di\u001Berent from how /d/ or /g/ might be a\u001Bected.However, this would not be the right approach. All of these lenition changes have148identical outcomes, even if their triggering conditions are di\u001Berent. It would not matterwhether we de\u001Cne lenition for each segment, or whether we de\u001Cne lenition over a class ofsegments, because after a simulation has been running long enough, all stops between vowelswill have become fricatives (assuming there is no counter-acting bias). In other words, theseare not really segment-speci\u001Cc changes. They are instances of the same class-level change,with minor variations.Instead, to make something truly segment-speci\u001Cc, then not only would each mispercep-tion have to be de\u001Cned separately over each individual segment, but the outcome of eachmisperception needs to be di\u001Berent as well. For example, all stops in intervocalic positioncan still be subject to change, but each stop would have a di\u001Berent feature a\u001Bected: /b/might have its [nasal] value increased, and change into /m/ while /d/ might devoice to/t/. In this way sound change is actually targeting individual segments, and not (natu-ral) classes. To be clear, such changes are extremely unnatural, and they are only beingintroduced as a way of testing Hypothesis #3.In order to generate these kinds of misperceptions, I made a modi\u001Ccation to PyILM. Thesimulations in the segment-speci\u001Cc condition are initialized with the same set of class-levelmisperceptions used in the other simulations. At the beginning of each generation, PyILMlooks through the inventory of the speaking agent, and checks to see if any of the class-levelmisperceptions could potentially apply. A misperception is considered to potentially applyif, for any segment, the set of features targeted by the misperception is a subset of thesegment's full feature speci\u001Ccation.If a class-level misperception would apply, then PyILM generates a unique segment-speci\u001Cc one. The new misperception will have the same environment as the class-level one,and it will have the same probability of occurring, but it targets a set of features equalto the full speci\u001Ccation of the segment in question. The outcome of the segment-speci\u001Ccmisperception has the same salience value as the class-level one, but it applies to a new,randomly-selected, feature. After checking the entire inventory, the simulation runs asnormal, but using these segment-speci\u001Cc misperceptions instead of the class-level ones.At the beginning of each generation, the old set of segment-speci\u001Cc misperceptions isdeleted, and a new set is created. This is done to ensure that class-level and segment-speci\u001Cc misperceptions have the same chances of applying in a given lexicon. Due to thehighly-restricted nature of segment-speci\u001Cc changes, it is generally going to be the case thatthey only apply once. Suppose there there is a /b/-speci\u001Cc misperception that changes /b/to /m/. Once this occur, that /b/-speci\u001Cc misperception will never do anything else in thesimulation, unless by sheer chance another segment-speci\u001Cc change has created a /b/ fromsomething else. In practice, this means that inventories will change only during the \u001Crstfew generations, and then never again. By creating new segment-speci\u001Cc misperceptionsat each generation, based on how class-level misperceptions would apply, the outcomes ofsimulations in either condition should be comparable.For example, suppose that a simulation has a starting inventory of /p, t, k, z, i, a/, andthe following misperceptions.149Devoicing [+voice, \u00E2\u0088\u0092son] \u00E2\u0086\u0092 [\u00E2\u0088\u0092.5voice] / _#, p=.25Lenition [\u00E2\u0088\u0092son, \u00E2\u0088\u0092cont] \u00E2\u0086\u0092 [+.5cont] / +voc_+voc, p=.25The Devoicing misperception could apply to /z/ and the Lenition misperception can applyto /p, t, k/ (whether they actually apply depends, of course, on the lexicon having the rightenvironments). In this case, PyILM would create one new misperception for each of /z/,/p/, /t/ and /k/. The segment-speci\u001Cc misperceptions could look something like this:z-Change [\u00E2\u0088\u0092voc, \u00E2\u0088\u0092son, +cont, +voice, \u00E2\u0088\u0092nasal, +cor, \u00E2\u0088\u0092ant, \u00E2\u0088\u0092strid, \u00E2\u0088\u0092lat, \u00E2\u0088\u0092back,\u00E2\u0088\u0092low, \u00E2\u0088\u0092high, \u00E2\u0088\u0092round, \u00E2\u0088\u0092distr, \u00E2\u0088\u0092glot_cl, \u00E2\u0088\u0092hi_subgl_pr] \u00E2\u0086\u0092 [\u00E2\u0088\u0092.5back] / _#, p=.25p-Change [\u00E2\u0088\u0092voc, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont, \u00E2\u0088\u0092voice, \u00E2\u0088\u0092nasal, \u00E2\u0088\u0092cor, +ant, \u00E2\u0088\u0092strid, \u00E2\u0088\u0092back, \u00E2\u0088\u0092low,\u00E2\u0088\u0092high, \u00E2\u0088\u0092round, +distr, \u00E2\u0088\u0092glot_cl, \u00E2\u0088\u0092hi_subgl_pr]\u00E2\u0086\u0092 [+.5nasal] / +voc_+voc, p=.25t-Change [\u00E2\u0088\u0092voc, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont, \u00E2\u0088\u0092voice, \u00E2\u0088\u0092nasal, +cor, \u00E2\u0088\u0092ant, \u00E2\u0088\u0092strid, \u00E2\u0088\u0092lat, \u00E2\u0088\u0092back,\u00E2\u0088\u0092low, \u00E2\u0088\u0092high, \u00E2\u0088\u0092round, \u00E2\u0088\u0092distr, \u00E2\u0088\u0092glot_cl, \u00E2\u0088\u0092hi_subgl_pr]\u00E2\u0086\u0092 [+.5round] / +voc_+voc,p=.25k-Change [\u00E2\u0088\u0092voc, \u00E2\u0088\u0092son, \u00E2\u0088\u0092cont, \u00E2\u0088\u0092voice, \u00E2\u0088\u0092nasal, \u00E2\u0088\u0092cor, \u00E2\u0088\u0092ant, \u00E2\u0088\u0092strid, \u00E2\u0088\u0092lat, +back,\u00E2\u0088\u0092low, \u00E2\u0088\u0092high, \u00E2\u0088\u0092round, \u00E2\u0088\u0092distr, \u00E2\u0088\u0092glot_cl, \u00E2\u0088\u0092hi_subgl_pr] \u00E2\u0086\u0092 [+.5cont] / +voc_+voc,p=.25These misperceptions have the same environment and probability as the original Devoicingand Lenition, but they target a set of features that are speci\u001Cc to one particular segment.They all have the same salience as Devoicing and Lenition, altering a feature value by \u00C2\u00B1.5,but which feature they a\u001Bect is di\u001Berent in each case. As mentioned above, the a\u001Bectedfeature is randomly selected.5.4.4 Calculating feature economyAfter running a simulation, the feature economy of the inventory at each generation wascalculated using the Feature Economist algorithm. The results are reported in the nextsection, and this section describes the details of how the algorithm works. In brief, ittakes a set of segments with full feature speci\u001Ccations as input, and it returns the smallestnumber of features necessary to contrast every member of that set. The algorithm has beendescribed in Mackie and Mielke (2011), and a version of it appears in P-base (Mielke 2008). For this dissertation I wrote my own implementation.Calculating feature economy requires two numbers: S, the number of segments in theinventory I, and F, the smallest number of features from a feature set \u00CF\u0086 required to contrastall of the segments in I. For the purposes of calculating feature economy, two segments arecontrastive with respect to a feature set \u00CF\u0086 if they di\u001Ber by at least one feature in \u00CF\u0086. Themost useful mathematical tool for this is the concept of a \u0010combination\u0011. A k -combinationof a set A is an unordered subset consisting of k elements of A. The problem of \u001Cnding150the smallest number of features from \u00CF\u0086 that are necessary to contrast the segments in Ibecomes the problem of \u001Cnding the largest k -combination of features that are unnecessaryfor contrast.The number of k -combinations that can be drawn from a set of size N is written as(Nk), which is read as \u0010N choose k\u0011. This is equal toN !k!(N\u00E2\u0088\u0092k(! if N M k, otherwise it is equal to0.The Feature Economist algorithm begins with a pre-processing step where non-contrastive features are removed from \u00CF\u0086. These are features for which every segment inthe inventory shares a value. For instance, if there are no laterals in the inventory, thenevery segment will be [\u00E2\u0088\u0092lateral]. This means there are no contrasts based on [lateral] sothis feature can be discarded immediately.The algorithm then goes through a loop of creating larger and larger k -combinations offeatures. For each k -combination, the algorithm removes that combination from \u00CF\u0086, creatinga new subset \u00CF\u0086\u00E2\u0080\u00B2. A pairwise comparison of the segments in I is done to check if each pairis contrastive with respect to \u00CF\u0086\u00E2\u0080\u00B2. If not, that is if two segments become identical withoutthis particular k -combination of features, then the combination is added to a special list ofcrucial features. If contrast is still possible without this k -combination, then the set \u00CF\u0086\u00E2\u0080\u00B2is designated the final set (replacing any previous final set), and the algorithm carrieson to the next k -combination. When all k -combinations have been tried for some value ofk, then k is increased by 1, and the process of removing k -combinations repeats.If at any time the size of the final set and the value of k di\u001Ber by 2, e.g. k 5 |final|+2,then the algorithm terminates. It terminates at this point because this di\u001Berence meansthat the algorithm has attempted to remove every single k -combination for some value ofk and did not succeed. Whenever some k -combination can be removed and contrast ismaintained, the contents of the final set are updated, and its cardinality becomes equalto |\u00CF\u0086\u00E2\u0080\u00B2|\u00E2\u0088\u0092 k (the full set of features, minus the combination currently removed). After tryingall k -combinations, the value of k is increased by one. If no k -combinations can be removedfor this new value of k, then the contents of the final list will not change, and k will againincrease. At this point k 5 |final|+ 2 and the algorithm should halt.For each value of k, all possible k -combinations are generated. If the combination isfound to be a superset of any element of the crucial list, then it is skipped and theinventory is not checked for contrast. For example, if it was previously found that re-moving [nasal,son] left two segments identical, then there is no point in trying to remove[nasal,son,voice]. At \u001Crst, checking every combination slows down the algorithm, and oftenit needs to do a pairwise comparison of the inventory for every k -combination up to k=4.Around this point, the crucial list begins to \u001Cll in, and after this the algorithm runs muchfaster as it can immediately reject k -combinations, without the need to run the pairwisecomparison across the inventory.To help understand some of the numbers involved, here is an example of the algorithmat work. The consonant inventory [Pj, |h, bl, n,>kp, d\u00E2\u0080\u009D :, j\u00CB\u009C, pSw, kw, H, f, n d r ] was ran-domly generated and given as input. The initial set of features consisted of 19 features.151The pre-processing step removed 2 non-contrastive features. Early in the calculation, mostof the feature possibilities need to be considered. At k=5, for example, the number of5-combinations is(064)5 6; )00 and the algorithm tried a pairwise comparison of the in-ventory for 5,374 of those combinations (87%). At the point where k=10, the number of10-combinations is(0600)5 )1; 440 but the algorithm only checked the inventory for contrastusing 3,165 of them (16%), because the rest were supersets of combinations that failedearlier. This saves more than 10,000 comparisons at this step. Eventually, the algorithmremoved 12 more features, thus a minimum of 5 features are needed to contrast this inven-tory.5.4.5 Simulation resultsFor each of the class-level and segment-speci\u001Cc conditions, I ran 90 simulations, for a total of180 simulations. Each condition was broken into three phonotactic groups: 30 simulationswith CV phonotactics, 30 with CVC phonotactics, and 30 with CCVCC phonotactics. Thisensures that a variety of di\u001Berent inventories will emerge, both in term of inventory sizeand contents.Starting inventories sizes varied as well, and each phonotactic group was broken in threesize categories: 10 simulations started with small consonant inventories, ranging from 8-15consonants, 10 started with medium-sized inventories between 20-40 consonants, and 10started with large inventories of 60-80 consonants. Every inventory had between 3 and 5vowels, although no misperceptions a\u001Bect vowels so this set remained constant for the entiresimulation. Each simulation ran for 30 generations.The starting inventories were generated by sampling uniformly at random from P-base.Only 90 inventories were created, and these were shared across the two conditions. Addi-tionally, the 90 starting lexicons from the class-level condition were re-used in the segment-speci\u001Cc condition. In other words, for each class-level simulation, there was a segment-speci\u001Cc simluation that had identical starting conditions.Figure 5.9 shows change in average feature economy scores for the 90 simulations withclass-level changes, for all four metrics. Results for simulations with segment-speci\u001Cc mis-perceptions are given in Figure 5.10.Overall, there is an increase in economy on the Simple Ratio metric, in both types ofsimulations. The Exploitation metric, on the other hand, goes down in both. Changes inSimple Ratio and Exploitation can mostly be explained by changes in inventory size. Inven-tories generally grow in size over the course of a simulation. Larger inventories tend to scorehigher on Simple Ratio, and they tend to score lower on Exploitation. This relationshipbetween size and economy score was discussed in more detail in Section 4.3.2.1.To see if the di\u001Berences between the two simulation types were signi\u001Ccant, I \u001Ct theeconomy scores from generation 30 to a linear regression model. The independent variableswere inventory size and misperception type (class-level or segment-speci\u001Cc). The dependentvariable was economy score. It is important to note that my choice to use the economy152Figure 5.9: Change in average feature economy for simulations run with class-level changesFigure 5.10: Change in average feature economy for simulations run with segment-speci\u001Ccchanges153Table 5.5: Results of two-way ANOVA with inventory size and misperception type aspredictors and economy score as dependent variable.scores from generation 30 is somewhat arbitrary. It is not clear that the economy scoreshave completely stabilized at this point, nor is it clear how many generations would besu\u001Ecient. Ideally, one would calculate a stationary distribution of inventories, but this ischallenging given the extremely large space of possible inventories that could evolve in asimulation.The model was calculated using the anova function from the R programming language(R Core Team 2016). Results are shown in Table 5.5. Misperception type has a signi\u001Ccante\u001Bect on economy scores when using Simple Ratio (F (); )6+) 5 )+:672; p < (:((+), Frugality(F (); )6+) 5 )(:640; p < (:((2), and Relative E\u001Eciency (F (); )6+) 5 5:176; p < (:(2).Exploitation is the only metric not to show any signi\u001Ccant e\u001Bect of misperception type(F (); )6+) 5 ):)57; p 5 (:2)):::).Inventory size is a signi\u001Ccant factor for both Simple Ratio (F (); )6+) 5 +01:(4; p <2:2\u00C3\u0097)(\u00E2\u0088\u009205) and Exploitation (F (); )6+) 5 )24:1)7; p < 2\u00C3\u0097)(\u00E2\u0088\u009205), but it is not signi\u001Ccantfor Frugality (F (); )6+) 5 02:15+; p < 2:12\u00C3\u0097)(\u00E2\u0088\u009205) nor for Relative E\u001Eciency (F (); )6+) 5(:652; p < (:4+). There is no signi\u001Ccant interaction between misperception and inventorysize for any of the metrics, except Simple Ratio (F (); )6+) 5 )+:672; p < (:(5).These results show that when languages are transmitted, via iterated learning, underconditions where sound change in\u001Duences classes of sounds, the resulting inventories aremore economical than they would be if sound change targetted individual segments. Thisis consistent with Hypothesis #3, which said that feature economy is an emergent conse-quence of the way sound change operates, as opposed to it being an inherent property ofphonological systems.In simulations with class-level misperceptions, economy scores generally rise becausesound changes are a\u001Becting classes of sounds, as predicted by Hypothesis #3. For example,suppose there is a simulation with an inventory with a set of three voiceless stops /p, t, k/ in154the initial generation. Suppose further that a class-level intervocalic voicing misperceptionis active. If all three voiceless stops occur between vowels somewhere in the lexicon, theneventually a set of three voiced stops will appear and the inventory will be /p, b, t, d, k, g/.Since these new stops are minimally di\u001Berent from the old ones, di\u001Bering only by [voice],economy will increase because 3 new sounds are added at the cost of only a single feature.If the feature [voice] is already in use for sounds outside of this stop series, then the increasein economy is even greater, because three new sounds are added \u0010for free\u0011, without the costof an additional feature.In a simulation with segment-speci\u001Cc changes, the evolution of the inventory will bedi\u001Berent. Assuming the same voiceless stop inventory of /p, t, k/, three new misperceptionswill be generated, each of which a\u001Bects a di\u001Berent feature. It might be that /p/ becomes /f/between vowels, while /t/ becomes /th/ and /k/ becomes /k'/ in the same environment. Ifall three segment-speci\u001Cc change occur, then the inventory will not achieve greater economy.Three new sounds will enter the inventory at the cost of three new features.Toward the end of the simulations, Frugality and Relative E\u001Eciency rise slightly in thesegment-speci\u001Cc condition. This rise probably has to do with growth in inventory size.When the inventory grows large enough, the feature space is saturated and even changestargeting randomly-selected features will end up creating sounds that share a feature withan existing sound simply by chance.Additionally, it is possible for multiple segment-speci\u001Cc changes to have the cumulativee\u001Bect of class-level changes. For example, a class-level misperception that would havea\u001Bected /p, t, k/, might turn into segment-speci\u001Cc misperception that changes the [cont]value of /p/ in the \u001Crst generation, but not the [cont] values of /t/ and /k/. In the secondgeneration, there might be a segment-speci\u001Cc misperception that is generated which changesthe [cont] value of /t/ (but not /p/ or /k/), and in the third generation another might begenerated changing the [cont] value of /k/ (but not /p/ or /t/). A class-level misperceptionhas e\u001Bectively occurred over the course of three generations. This is probably a rare event,especially in three consecutive generations, but it is plausible that this happens by chanceto at least some sound class over a long number of generations.Another factor is that the randomly generated segment-speci\u001Cc misperceptions can tar-get any arbitrary feature, creating combinations of features that would never occur in thereal world, and giving these changes perhaps an \u0010unfair\u0011 advantage over the class-level oneswhich were more carefully crafted to avoid these unnatural outcomes. For example, it ispossible for a segment-speci\u001Cc change to raise the [high] value of a [\u00E2\u0088\u0092high, +low] sound.This could result in a [+high, +low] sound, which is a highly unrealistic sound that couldnonetheless increase the economy of an inventory.Overall, however, these result seem to generally support Hypothesis #3 because (a)economy has been shown to increase due to the way sound change operates and (b) class-level changes lead to inventories with higher economy scores, compared to segment-speci\u001Ccchanges.155Chapter 6ConclusionIn this dissertation, I introduced three hypotheses about how sound change shapes conso-nant inventories, and tested these hypotheses through computer simulation.The \u001Crst hypothesis was that inventory size is related to phonotactic complexity. Lan-guages with more complex phonotactics will tend to develop larger inventories, while lan-guages with more restrictive phonotactics tend to develop smaller inventories. This isbecause sound change is (mostly) context-sensitive, and phonotactics de\u001Cne the set of pos-sible contexts in a language. Having more possible contexts in a language means that thereis a greater diversity of sound changes that could occur. As sounds introduced throughchange become phonologized, the inventory grows.This hypothesis was tested by running a large set of simulations grouped into di\u001Berentphonotactic categories. All simulations were initialized with randomly-generated inventoriesof the same size. The same set of potential misperceptions was used for each simulation. Theoutcome was that inventories of languages restricted to maximally CV syllables grew theleast. Languages with maximally CVC syllables grew into larger inventories, and the largestinventories were found among languages with CCVCC syllables. These results supportHypothesis #1.The second hypothesis concerned the frequency of consonants across languages. Soundsare not evenly distributed, and some are far more common than others. Small inventoriestend to be made up of just the most common sounds, while large inventories contain rareor unique sounds (Lindblom and Maddieson 1988, Maddieson 2011, see Section 4.2).The existence of cross-linguistically common sounds was hypothesized to be due to theexistence of context-free sound changes, which, by de\u001Cnition, apply in inventories of anysize. Smaller inventories tend to be made up primarily of the most common sounds, becausethey have limited phonotactic contexts (by Hypothesis #1). Large inventories have more,and more diverse, phonetic contexts, leading to the evolution of a more diverse array ofsounds.This hypothesis was tested in a way similar to the \u001Crst one, by running a large number156of simulations. Simulations were initialized with randomly generated sets of segments, andall simulations used the same set of biases and misperceptions. The outcome was thatsmaller inventories tended to contain mostly those sounds favoured by bias, and inventoriesdiversify as they grow.The third hypothesis was about feature economy, which is the tendency for inventories tomaximize the ratio between the number of segments and the (minimal) number of featuresrequired to contrast them (Clements 2003, Mackie and Mielke 2011). Hypothesis #3 statesthat feature economy emerges in inventories because sound change is de\u001Cned over classesof sounds, rather than individual sounds. Over time, this produces inventories with sets ofsounds di\u001Bering by only one feature, which is essentially what feature economy measures.Testing this hypothesis was done by running two kinds of simulations. In one, theprobabilistic biases that underlie sound change were de\u001Cned to take scope over broad classesof sounds, in another they were de\u001Cned such that they could only a\u001Bect speci\u001Cc segments.Feature economy was calculated at each generation of these simulations. A linear regressionmodel, with inventory size and misperception type as predictors and economy scores asdependent variable, showed that misperception type had a signi\u001Ccant e\u001Bect on economyscores for all metrics except Exploitation.These results lend support to the theory that typology is shaped by diachronic forces(e.g. Blevins 2004). However, in contrast to most of the existing research, which tendsto focus on speci\u001Cc sound changes, this dissertation has taken a higher-level approach bysimulating multiple interacting changes over many generations of language transmission.The results are also demonstrate how the concept of selection for learnability (Brightonet al. 2005) can be applied to the study of phonological inventories. The sounds that aninventory has are those which are most likely to be successfully retransmitted over time.This gives us a way of understanding certain properties of inventories in a non-teleologicalframework.The simulation software designed for this dissertation, PyILM, was built to be open-ended, and could be modi\u001Ced to potentially study other phenomena. There are severalchanges that could made to the code to increase its utility. For instance, PyILM was con-structed with the intention of studying the evolution of consonant inventories, but it couldbe extended to vowel inventories. Vowel systems were not included in the current studybecause the way that they change over time seems to be quite di\u001Berent from consonants.In particular, vowel systems show an e\u001Bect of dispersion (e.g. de Boer 2002), where vowelstend to spread out over the available phonetic space. This is the opposite of the featureeconomy e\u001Bect seen in consonant inventories, where a small number of features are re-used.Vowels are also known to undergo chain shifts, which occur much less often in consonantinventories. It would be ideal to update PyILM so that both vowel and consonant evolutioncan be simulated.There are also several improvements that could be made to the way that misperceptionsare modeled. Currently, only one feature at a time can be a\u001Bected by misperception,but it would be useful to increase that number. Additionally, it would be convenient to157have features \u0010linked\u0011 in some way, such that misperceptions targeting one feature wouldnaturally include another feature (e.g. a misperception that makes a consonant more orless nasal should also make it more or less sonorant).It would also be useful to diversify the changes that can be modeled as misperceptions.For instance, changes that result in metathesis may be the result of misperception (e.g.Blevins and Garrett (2004)) and could be modeled. Changes could also be non-local, andtarget segments further away, in order to simulate the evolution of harmony patterns.Modeling epenthesis and deletion would be very useful, and would have the biggestimpact on the results reported in this dissertation because of the potential to disrupt thephonotactics. In particular, the results reported for the tests of Hypothesis #1 and Hy-pothesis #2 both rely heavily on phonotactics as a main factor, and things may come outdi\u001Berently if phonotactic patterns are not frozen. For example, suppose there is a languagewith strictly CV syllables, and suppose there is the possibility for vowel reduction/dele-tion. This means that a CVCV sequence could become a CCV sequence. This puts twoconsonants adjacent to each other, something that is normally impossible given CV-onlyphonotactics, and it creates the potential for misperceptions which would otherwise onlyapply in languages with more complex syllable structures.Many of the parameters in the simulations are \u001Cxed ahead of time, and it would bean improvement if at least some of these values could be more \u001Dexibly adjusted over thecourse of a simulation. In particular, it would be good to have misperception probabilitiesand salience values be a\u001Bected by other factors. For instance, functional load plays arole in change, such that sounds which carry a higher functional load are less likely toundergo change (Bouchard-C\u00C3\u00B4t\u00C3\u00A9 et al. 2013) and avoidance of homophones may be a factorin inhibiting change (Blevins and Wedel 2009). The learning algorithm is another place forimprovement. Agents have a few parameters that are set by hand, such as the thresholdfor deciding if two sounds are distinct or not. Ideally, this is information that agents couldlearn from data.Finally, an improvement of a di\u001Berent kind would be to have more of a social environ-ment in the simulations. Currently, PyILM has only one speaker agent and one listeningagent per generation. Having a larger population would make it possible for other kinds ofsound changes to be modeled. For instance, in a larger population, pronunciations can beconsidered to be more or less prestigious, and agents can adopt or reject certain pronunci-ations based on social relationships.Despite the limitations of PyILM, it still produced interesting and useful results. This,to some extent, makes the results even more interesting. While it might be expected thatonly a simulation including linguistically signi\u001Ccant e\u001Bects such as notions of contrast, trueallophony, or social factors, might be required, PyILM shows that even with very simpleassumptions, it is possible to simulate the emerge of phonological patterns in inventories.158BibliographyAbdel-Massih, E. T.: 1971, Tamazight verb structure: A generative approach, Indiana Uni-versity, USA.Atkinson, Q. D.: 2011, Phonemic diversity supports a serial founder e\u001Bect model of languageexpansion from Africa, Science 332, 346\u0015349.Bandhu, C., Dahal, B., Holzhausen, A. and Hale, A.: 1971, Nepali segmental phonology,Tribhuvan University, Kirtipur.Bauer, L.: 2007, The linguistics student's handbook, Edinburgh University Press, Edin-burgh.Berm\u00C3\u00BAdez-Otero, R.: 2007, Diachronic phonology, in P. de Lacy (ed.), The Cambridgehandbook of phonology, Cambridge University Press, Cambridge, pp. 497\u0015517.Berwick, R. C., Pietroski, P., Yankama, B. and Chomsky, N.: 2011, Poverty of the stimulusrevisited, Cognitive Science 35(7), 1207\u00151242.Blevins, J.: 2004, Evolutionary Phonology. The Emergence of Sound Patterns, CambridgeUniversity Press.Blevins, J.: 2006a, New perspectives on English sound patterns: \"natural\" and \"unnatural\"in Evolutionary Phonology, Journal of English Linguistics 34(1), 6\u001525.Blevins, J.: 2006b, A theoretical synopsis of Evolutionary Phonology, Theoretical Linguis-tics 32(2), 117\u0015166.Blevins, J.: 2007, Interpreting Misperception: Beauty is in the Ear of the Beholder, OxfordLinguistics, Oxford, pp. 144\u0015154.Blevins, J.: 2009, Another universal bites the dust: Northwest Mekeo lacks coronalphonemes, Oceanic Linguistics 48(1), 264\u0015273.Blevins, J. and Garrett, A.: 2004, The evolution of metathesis, in B. Hayes, R. M. Kirch-ner and D. Steriade (eds), Phonetically based phonology, Cambridge University Press,Cambridge, pp. 117\u0015156.159Blevins, J. and Wedel, A.: 2009, Inhibited sound change: An evolutionary approach tolexical competition, Diachronica 26(2), 143\u0015183.Bostoen, K. and Sands, B.: 2012, Clicks in south-western Bantu languages: Contact-induced vs. language-internal lexical change, in M. Brenzinger and A.-M. Fehn (eds),Proceedings of the 6th world congress of African linguistics, Vol. 5 of World Congress ofAfrican Linguistics, K\u00C3\u00B6ppe Verlag, pp. 129\u0015140.Bouchard-C\u00C3\u00B4t\u00C3\u00A9, A., Hall, D., Gri\u001Eths, T. L. and Klein, D.: 2013, Automated reconstruc-tion of ancient languages using probabilistic models of sound change, Proceedings of theNational Academy of Sciences 110(11), 4224\u00154229.Breen, G. and Pensal\u001Cni, R.: 1999, Arrernte: A language with no syllable onsets, LinguisticInquiry 30(1), 1\u001525.Brighton, H., Kirby, S. and Smith, K.: 2005, Cultural selection for learnability: Threeprinciples underlying the view that language adapts to be learnable, in M. Tallerman(ed.), Language origins: Perspectives on evolution, Oxford University Press, Oxford,pp. 291\u0015309.Butcher, A.: 1999, What speakers of Australian aboriginal languages do with their velumsand why: The phonetics of the nasal/oral contrast, in J. Ohala, Y. Hasegawa, M. Ohala,D. Granville and A. Bailey (eds), Proceedings of the International Congress of PhoneticSciences, University of California, San Francisco, pp. 479\u0015482.Chang, S., Plauch\u00C3\u00A9, M. and Ohala, J.: 2001, Markedness and consonant confusion asym-metries, in E. Hume and K. Johnson (eds), The Role of Speech Perception in Phonology,Academic Press, London/San Diego, pp. 79\u0015101.Chen, M. Y.: 1997, Acoustic correlates of English and French nasalized vowels, The Journalof the Acoustical Society of America 102(4), 2360\u00152370.Chen, N. F., Slifka, J. L. and Stevens, K. N.: 2007, Vowel nasalization in American English:Acoustic variability due to phonetic context, in J. Trouvain and W. J. Barry (eds),Proceedings of 16th International Congress of Phonetic Sciences, Saarbr ucken, Germany,pp. 905\u0015908.Chomsky, N. and Halle, M.: 1968, The sound pattern of English, Harper and Row, NewYork.Choudhury, M., Mukherjee, A., Basu, A. and Ganguly, N.: 2006, Analysis and synthesis ofthe distribution of consonants over languages: a complex network approach, Proceedingsof the COLING/ACL, COLING-ACL '06, Association for Computational Linguistics,Stroudsburg, PA, USA.160Clements, G.: 2003, Feature economy in sound systems, Phonology 20(3), 287\u0015333.Clements, G.: 2009, The role of features in phonological inventories, in E. Raimy andC. Cairns (eds), Contemporary views on architecture and representations in phonology,MIT Press, Cambridge, Massachusetts, pp. 19\u001576.Colarusso, J.: 1988, The Northwest Caucasian Languages: A Phonological Survey, GarlandPublishing, New York.Cornish, H.: 2010, Investigating how cultural transmission leads to the appearance of designwithout a designer in human communication systems, Interaction Studies 11(1), 112\u0015137.Cornish, H., Tamariz, M. and Kirby, S.: 2009, Complex adaptive systems and the originsof adaptive structure: What experiments can tell us, Language Learning 59, 187\u0015205.Coup\u00C3\u00A9, C., Marsico, E. and Philippson, G.: 2011, How economical are phonological inven-tories?, in F. Pellegrino, E. Marsico, I. Chitoran and C. Coup\u00C3\u00A9 (eds), Proceedings of theICPhS XVII, Walter de Gruyter, pp. 524\u0015528.Craig, C.: 1977, The Structure of Jacaltec, University of Texas, Bloomington.Cysouw, M., Dediu, D. and Moran, S.: 2012, Comment on `Phonemic Diversity Supports aSerial Founder E\u001Bect Model of Language Expansion from Africa', Science 335(6069), 657.da Silva, G. R.: 2014, Squib: A feature geometric analysis of Pirah\u00C3\u00A3 phonology and tonology(Mura), Rivista Linguistica 10(2), 1\u001520.Dantsuji, M.: 1984, A study on voiceless nasals in Burmese, Studia Phonologica 18, 1\u001514.Day, C.: 1972, The Jacaltec Language, Indiana University, Bloomington.Day, C.: 1973, The Jacaltec Language, Indiana University, Bloomington.de Boer, B.: 2000, Self-organization in vowel systems, Journal of Phonetics 28(4), 441\u0015465.de Boer, B.: 2001, The origins of vowel systems, Oxford University Press, Oxford.de Boer, B.: 2002, Evolving sound systems, in A. Cangelosi and D. Parisi (eds), Simulatingthe evolution of language, Springer-Verlag, London, pp. 79\u001597.de Groot, A. W.: 1931, Phonologie und Phonetik als Funktionswissenschaften, Travaux duCercle Linguistique de Prague 4, 116\u0015147.Dell, F.: 1985, Les r\u00C3\u00A8gles et les sons, revised 2nd edn, Hermann, Paris.Donohue, M. and Nichols, J.: 2011, Does phoneme inventory size correlate with populationsize?, Linguistic Typology 15, 161\u0015170.161Dresher, B. E.: 2003, The contrastive hierarchy in phonology, Toronto working papers inlinguistics 20, 47\u001562.Dryer, M. S. and Haspelmath, M.: 2013, WALS Online, Max Planck Institute for Evolu-tionary Anthropology, Leipzig.Elbert, S. and Pukui, M.: 1979, Hawaiian Grammar, University Press of Hawaii, Honolulu.Everett, D. L.: 1986, Pirah\u00C3\u00A3, in D. C. Derbyshire and G. K. Pullum (eds), Handbook ofAmazonian Languages 1, Mouton de Gruyter, Berlin, pp. 200\u0015325.Feldman, N. H., Gri\u001Eths, T. L. and Morgan, J. L.: 2009, Learning phonetic categoriesby learning a lexicon, Proceedings of the 31st annual conference of the Cognitive ScienceSociety pp. 2208\u00152213.Firchow, I. B. and Firchow, J.: 1969, An abbreviated phoneme inventory, AnthropologicalLinguistics 11, 271\u0015276.Fortescue, M.: 1984, West Greenlandic, Croom Helm Descriptive Grammars, Croom Helm,London.Garrett, A. and Johnson, K.: 2012, Phonetic bias in sound change, in A. Yu (ed.), Origins ofsound change: Approaches to phonologization, Oxford University Press, Oxford, pp. 51\u001597.Geisler, W. S.: 2003, Ideal observer analysis, The visual neurosciences pp. 825\u0015837.Gri\u001Eths, T. L. and Kalish, M. L.: 2007, Language evolution by iterated learning withBayesian agents, Cognitive Science 31(3), 441\u0015480.Guion, S. G.: 1997, The role of perception in the sound change of velar palatalization,Phonetica 55(1-2), 18\u001552.G\u00C3\u00BCldemann, T. and Stoneking, M.: 2008, A historical appraisal of clicks: a linguistic andgenetic population perspective, Annual Review of Anthropology 37, 93\u0015109.Hall, D. C.: 2007, The role and representation of contrast in phonological theory, PhDthesis, University of Toronto.Hall, K. C.: 2009, A probabilistic model of phonological relationships from contrast to al-lophony, PhD thesis, Ohio State University.Hansson, G. \u00C3\u0093.: 2007, On the evolution of consonant harmony: The case of secondaryarticulation agreement, Phonology 24(01), 77\u0015120.Hansson, G. \u00C3\u0093.: 2008, Diachronic explanations of sound patterns, Language and LinguisticsCompass 2(5), 859\u0015893.162Harrington, J.: 2006, An acoustic analysis of `happy-tensing' in the Queen's Christmasbroadcasts, Journal of Phonetics 34(4), 439\u0015457.Harrington, J., Palethorpe, S. and Watson, C.: 2000a, Monophthongal vowel changes in Re-ceived Pronunciation: An acoustic analysis of the Queen's Christmas broadcasts, Journalof the International Phonetic Association 30, 63\u001578.Harrington, J., Palethorpe, S. and Watson, C.: 2005, Deepening or lessening the divide be-tween diphthongs? an analysis of the Queen's annual Christmas broadcasts, in W. J.Hardcastle and J. M. Beck (eds), A \u001Cgure of speech: A Festschrift for John Laver,Lawrence Erlbaum Associates, Mahwah, NJ, pp. 227\u0015261.Harrington, J., Palethorpe, S. and Watson, C. I.: 2000b, Does the Queen speak the Queen'sEnglish?, Nature 408, 927\u0015928.Haudricourt, A.: 1961, Richesse en phon\u00C3\u00A9mes et richesse en locuteurs, L'Homme 1, 5\u001510.Hay, J. and Bauer, L.: 2007, Phoneme inventory size and population size, Language 83, 388\u0015400.Hayes, B.: 2011, Introductory Phonology, Blackwell Textbooks in Linguistics, Wiley-Blackwell, United Kingdom.Herbert, R. K.: 1985, The puzzle of Bantu ejectives and aspirates, in J. Fisiak (ed.),Papers from the 6th International Conference on Historical Linguistics, John BenjaminsPublishing, pp. 251\u0015267.Hock, H. H.: 1991, Principles of Historical Linguistics, revised 2nd edn, Mouton de Gruyter,Berlin.Hualde, J. I., Simonet, M. and Nadeu, M.: 2011, Consonant lenition and phonologicalrecategorization, Laboratory Phonology 2(2), 301\u0015329.Huttar, G. and Huttar, M.: 1994, Ndyuka, Routledge, London.Hyman, L. M.: 2008, Universals in phonology, The Linguistic Review 25(1-2), 83\u0015137.Johnson, K.: 2007, Decisions and mechanisms in exemplar-based phonology, in M.-J. Sol\u00C3\u00A9,P. S. Beddor and M. Ohala (eds), Experimental approaches to phonology, Oxford Univer-sity Press, Oxford, pp. 25\u001540.Jongman, A., Wayland, R. and Wong, S.: 2000, Acoustic characteristics of English frica-tives, The Journal of the Acoustical Society of America 108(3), 1252\u00151263.Kalish, M., Gri\u001Eths, T. and Lewandowsky, S.: 2007, Iterated learning: Intergenera-tion knowledge transmission reveals inductive biases, Psyconomic Bulletin and Review14, 288\u0015294.163Kirby, J.: 2014a, Incipient tonogenesis in Phnom Penh Khmer: Acoustic and perceptualstudies, Journal of Phonetics 43, 69\u001585.Kirby, J.: 2014b, Incipient tonogenesis in Phnom Penh Khmer: Computational studies,Laboratory Phonology 5(1), 195\u0015230.Kirby, J. and Sonderegger, M.: 2013, A model of population dynamics applied to phoneticchange, in M. Knau\u001B, M. Pauen, N. Sebanz and I. Wachsmuth (eds), Proceedings ofthe 35th Annual Conference of the Cognitive Science Society, Cognitive Science Society,Austin, TX, pp. 776\u0015781.Kirby, S.: 1996, Function, Selection and Innateness: The Emergence of Language Univer-sals, PhD thesis, University of Edinburgh, Scotland.Kirby, S.: 1998, Language evolution without natural selection: From vocabulary to syntaxin a population of learners, Technical Report EOPL-98-1, Department of Linguistics,University of Edinburgh, Edinburgh.Kirby, S.: 2000, Syntax without natural selection: How compositionality emerges fromvocabulary in a population of learners, in C. Knight (ed.), The Evolutionary Emergenceof Language: Social Function and the Origins of Linguistic Form, Cambridge UniversityPress, Cambrige, pp. 303\u0015323.Kirby, S.: 2001, Spontaneous evolution of linguistic structure - an iterated learning modelof the emergence of regularity and irregularity, Evolutionary Computation, IEEE Trans-actions on 5(2), 102\u0015110.Kirby, S.: 2002, Learning, bottlenecks, and the evolution of recursive syntax, in T. Briscoe(ed.), Linguistic Evolution through Language Acquisition: Formal and ComputationalModels, Cambridge University Press, pp. 173\u0015204.Kirby, S., Cornish, H. and Smith, K.: 2008, Cumulative cultural evolution in the labora-tory: An experimental approach to the origins of structure in human language, PNAS105(31), 10681\u001510686.Klatt, D.: 1975, Voice onset time, frication, and aspiration in word-initial consonant clus-ters, Journal of Speech and Hearing Research 8(4), 686\u0015706.Klein, T.: 2006, Creole phonology typology: Phoneme inventory size, vowel quality distinc-tions and stop consonant series, in P. Bhatt and I. Plag (eds), The Structure of CreoleWords: Segmental, Syllabic and Morphological aspects, Walter de Gruyter, pp. 3\u001523.Kochetov, A. and Colantoni, L.: 2011, Spanish nasal assimilation revisited: A cross-dialectelectropalatographic study, Laboratory Phonology 2(2), 487\u0015523.164Krakow, R. A.: 1994, Nonsegmental in\u001Duences on velum movement patterns: Syllables, sen-tences, stress, and speaking rate, Haskins Laboratory Status Report on Speech Research,SR-117/118 pp. 31\u001548.Labov, W.: 2007, Transmission and di\u001Busion, Language 83(2), 344\u0015387.Ladefoged, P.: 1995, Voiceless approximants in Tee, UCLA Working Papers in Phoneticspp. 85\u001588.Lahiri, A. and Reetz, H.: 2010, Distinctive features: Phonological underspeci\u001Ccation inrepresentation and processing, Journal of Phonetics 38(1), 44\u001559.Legate, J. A. and Yang, C. D.: 2002, Empirical re-assessment of stimulus poverty argu-ments, The Linguistic Review 18(1-2), 151\u0015162.Lin, S., Speeter, P. and Coetzee, A.: 2014, Gestural reduction, lexical frequency, and soundchange: A study of post-vocalic /l/, Laboratory Phonology 5, 9\u001536.Lindblom, B. and Maddieson, I.: 1988, Phonetic universals in consonant systems, in L. Cand L. Hyman (eds), Language, Speech, and Mind, Routledge, London, pp. 62\u001578.Lisker, L.: 1986, \u0010Voicing\u0011 in English: A catalogue of acoustic features signaling /b/ versus/p/ in trochees, Language and Speech 29(1), 3\u001511.Lorenzino, G.: 1998, The Angloar Creole Portuguese of S\u00C3\u00A3o Tom\u00C3\u00A9: Its grammar and so-ciolinguistic history, LINCOM studies in Pidgin and Creole languages, City University,New York.Mackie, S. and Mielke, J.: 2011, Feature economy in real, random, and synthetic invento-ries, in G. N. Clements and R. Ridouane (eds), Where Do Phonological Features ComeFrom?: Cognitive, Physical and Developmental Bases of Distinctive Speech Categories,John Benjamins Publishing Company, pp. 43\u001563.Maddieson, I.: 1984, Patterns in Sounds, Cambridge Series in Speech Science and Commu-nication, Cambridge University Press, Cambridge.Maddieson, I.: 2005, Issues of phonological complexity: Statistical analysis of the relation-ship between syllable structures, segment inventories and tone contrasts, UC BerkeleyLab Annual Report pp. 259\u0015268.Maddieson, I.: 2007, Issues of phonological complexity: Statistical analysis of the relation-ship between syllable structures, segment inventories and tone contrasts, Oxford UniversityPress, pp. 93\u0015103.165Maddieson, I.: 2011, Phonological complexity in linguistic patterning, in W.-S. Lee andE. Zee (eds), Proceedings of the 17th International Congress of Phonetic Sciences, HongKong, China, pp. 28\u001535.Maddieson, I.: 2013a, Consonant inventories, in M. S. Dryer and M. Haspelmath (eds),The World Atlas of Language Structures Online, Max Planck Institute for EvolutionaryAnthropology, Leipzig.URL: http://wals.info/chapter/1Maddieson, I.: 2013b, Uvular Consonants, Max Planck Institute for Evolutionary Anthro-pology, Leipzig.URL: http://wals.info/chapter/6Maddieson, I. and Precoda, K.: 1989, Updating UPSID, UCLA Working Papers in Pho-netics 74, 104\u0015111.Martinet, A.: 1952, Function, structure, and sound change, Word 8(1), 1\u001532.Martinet, A.: 1955, Economie des changements phon\u00C3\u00A9tiques, Francke, Berne.McDougall, D.: 2006, Survival comes \u001Crst for the last Stone Age tribe world. Accessed:2016-01-10.URL: http://www.theguardian.com/world/2006/feb/12/theobserver.worldnews12McMahon, A.: 2002, An introduction to English phonology, Edinburgh University Press,Edinburgh.McMullin, K. J.: 2016, Tier-based locality in long-distance phonotactics: learnability andtypology, PhD thesis, University of British Columbia.URL: https://oc-web.library.ubc.ca/cIRcle/collections/24/items/1.0228114Mesoudi, A. and Whiten, A.: 2008, The multiple roles of cultural transmission experimentsin understanding human cultural evolution, Philosophical Transactions of the Royal So-ciety B: Biological Sciences 363(1509), 3489\u00153501.Mielke, J.: 2008, The emergence of distinctive features, Oxford University Press.Mooshammer, C. and Geng, C.: 2008, Acoustic and articulatory manifestations of vowelreduction in German, Journal of the International Phonetic Association 38(02), 117\u0015136.Moran, S., McCloy, D. and Wright, R.: 2012, Revisiting population size vs. phonemeinventory size, Language 88, 877\u0015893.Moran, S., McCloy, D. and Wright, R.: 2014, PHOIBLE Online, Max Planck Institute forEvolutionary Anthropology, Leipzig. Accessed: 2016-05-05.URL: http://phoible.org/166Mor\u00C3\u00A9n-Duollj\u00C3\u00A1, B.: 2005, The segmental sound pattern of Palauan. Accessed: 2015-06-22.URL: http://www.hum.uit.no/a/moren/palauannew.pdfMoreton, E.: 2008, Analytic bias and phonological typology, Phonology 25(01), 83\u0015127.Newport, E., Bavelier, D. and Neville, H.: 2001, Critical thinking about critical periods:Perspectives on a critical period for language acquisition, in E. Dupoux (ed.), Language,brain and cognitive development: Essays in honor of Jacques Mehler, Citeseer, pp. 481\u0015502.Nichols, J.: 1992, Linguistic Diversity in Space and Time, University of Chicago Press,Chicago.Ohala, J.: 1981, The listener as a source of sound change, in C. S. Masek, R. A. Hendrick andM. F. Miller (eds), Chicago Linguistic Society, Parasession on Language and Behavior,Chicago Linguistics Society, Chicago, pp. 178\u0015203.Ohala, J.: 1983, The origin of sound patterns in vocal tract constraints, in P. F. MacNeilage(ed.), The production of speech, Springer-Verlag, New York, pp. 189\u0015216.Ohala, J.: 1992, What's cognitive, what's not, in sound change, in G. Kellermann andM. D. Morrissey (eds), Diachrony within synchrony: Language history and cognition,Peter Lang Verlag, Frankfurt, pp. 309\u0015355.Ohala, J.: 1997, Emergent stops, Proceedings of the 4th Seoul International Conference onLinguistics [SICOL], Seoul, pp. 84\u001591.Ohala, M. and Ohala, J.: 1991, Epenthetic nasals in the historical phonology of Hindi,Proceedings of the XIIth International Congress of Phonetic Sciences 3.Oudeyer, P.-Y.: 2005a, How phonological structures can be culturally selected for learn-ability, Adaptive Behavior 13(4), 269\u0015280.Oudeyer, P.-Y.: 2005b, The self-organization of combinatoriality and phonotactics in vo-calization systems, Connection Science 17(3-4), 325\u0015341.Oudeyer, P.-Y.: 2005c, The self-organization of speech sounds, Journal of Theoretical Bi-ology 233(3), 435\u0015449.Pallier, C.: 2007, Critical periods in language acquisition and language attrition, inB. K\u00C3\u00B6pke, M. S. Schmid, M. Keijzer and S. Dostert (eds), Language Attrition: Theo-retical perspectives, John Benjamins, Amsterdam/Philidelphia, pp. 155\u0015168.Pape, D., Mooshammer, C., Hoole, P. and Fuchs, S.: 2003, Devoicing of word-initial stops:A consequence of the following vowel?, in J. Harrington and M. Tabain (eds), SpeechProduction, Macquarie Monographs in Cognitive Science, Psychology Press, Sydney, Aus-tralia.167Pater, J. and Staubs, R.: 2013, Feature economy and iterated grammar learning, Paperpresented at the 21st Manchester Phonology Meeting.Peperkamp, S.: 2004, A psycholinguistic theory of loanword adaptations, in M. Ettlinger,N. Fleisher and M. Park-Doob (eds), Annual Meeting of the Berkeley Linguistics Society,Vol. 30, Berkeley Linguistic Society, Berkeley, CA, pp. 341\u0015352.Peperkamp, S., Le Calvez, R., Nadal, J.-P. and Dupoux, E.: 2006, The acquisition ofallophonic rules: Statistical learning with linguistic constraints, Cognition 101(3), B31\u0015B41.Pierrehumbert, J.: 2001, Exemplar dynamics, word frequency, lenition, and contrast, inJ. Bybee and P. Hopper (eds), Frequency e\u001Bects and the emergence of linguistic structure,John Benjamins Publishing Company, Amsterdam, pp. 137\u0015157.R Core Team: 2016, R: A Language and Environment for Statistical Computing, R Foun-dation for Statistical Computing, Vienna, Austria. Accessed: 2016-05-19.URL: https://www.R-project.orgRaphael, L. J.: 2005, Acoustic cues to the perception of segmental phonemes, in D. Pisoniand R. Remez (eds), The handbook of speech perception, Blackwell Publishers, Oxford,pp. 182\u0015206.Recasens, D.: 2014, Coarticulation and Sound Change in Romance, Current Issues in Lin-guistic Theory, John Benjamins, Amsterdam.Robinson, S.: 2006, The phoneme inventory of Aita Rotokas, Oceanic Linguistics45(1), 206\u0015209.Sanko\u001B, G. and Blondeau, H.: 2007, Language change across the lifespan: /r/ in MontrealFrench, Language 83(3), 560\u0015588.Schourup, L. C.: 1973, A cross-language study of vowel nasalization, in A. Malikouti-Drachman, G. Drachman, M. L. Edwards, J. E. Geis and L. C. Schourup (eds), WorkingPapers in Linguistics 15, Ohio State University, Columbus, Ohio, USA.Smith, K. and Kirby, S.: 2008, Cultural evolution: implications for understanding thehuman language faculty and its evolution, Philosophical Transactions of the Royal SocietyB: Biological Sciences 363(1509), 3591\u00153603.Smith, K., Kirby, S. and Brighton, H.: 2003, Iterated learning: A framework for theemergence of language, Arti\u001Ccial Life 9(4), 371\u0015386.Smith, K. and Wonnacott, E.: 2010, Eliminating unpredictable variation through iteratedlearning, Cognition 116(3), 444\u0015449.168Snyman, J.: 1969, An Introduction to the !Xu Language, Communications from the Schoolof African Studies, University of Cape Town, Balkema Cape Town.Soukka, M.: 2000, A descriptive grammar of Noon: A Cangin language of Senegal, Vol. 40of LINCOM studies in African linguistics, LINCOM Europe.Stanton, J.: 2016, Learnability shapes typology: the case of the midpoint pathology, Lan-guage 92(4), 753\u0015791.Steriade, D.: 1995, Underspeci\u001Ccation and markedness, in J. Goldsmith (ed.), Handbook ofPhonological Theory, Blackwell, Oxford/Cambridge MA, pp. 114\u0015174.Swadesh, M.: 1952, Lexico-Statistic Dating of Prehistoric Ethnic Contacts: With SpecialReference to North American Indians and Eskimos, Proceedings of the American Philo-sophical Society 96(4), 452\u0015463.Tesar, B. and Smolensky, P.: 1998, Learnability in Optimality Theory, Linguistic Inquiry29(2), 229\u0015268.Traill, A.: 1985, Phonetic and Phonological Studies of !Xoo Bushman, John BenjaminsPublishing Company, Amsterdam.Trudgill, P.: 2004, Linguistic and social typology: The Autronesian migrations andphoneme inventories, Linguistic Typology 8, 305\u0015320.Trudgill, P.: 2011, Social structure and phoneme inventories, Linguistic Typology 15, 155\u0015160.Tupper, P.: 2015, Exemplar dynamics and sound merger in language, SIAM Journal onApplied Mathematics 75(4), 1469\u00151492.Verhoef, T.: 2012, The origins of duality of patterning in arti\u001Ccial whistled languages,Language and Cognition 4, 357\u0015380.Verhoef, T. and de Boer, B.: 2011, Cultural emergence of phonemic combinatorial struc-ture in an arti\u001Ccial whistled language, Proceedings of the 17th International Congress ofPhonetic Sciences (ICPhS XVII) pp. 2066\u00152069.Verhoef, T., Kirby, S. and de Boer, B.: 2013, Combinatorial structure and iconicity inarti\u001Ccial whistled languages, inM. Knau\u001B, M. Pauen, N. Sebanz and I. Wachsmuth (eds),Cooperative Minds: Social Interaction and Group Dynamics: Proceedings of the 35thAnnual Meeting of the Cognitive Science Society, Cognitive Science Society, Austin,TX,pp. 3669\u00153674.169Verhoef, T., Kirby, S. and Padden, C.: 2011, Cultural emergence of combinatorial structurein an arti\u001Ccial whistled language, in L. Carlson, C. Hoelscher and T. Shipley (eds),Proceedings of the 33rd Annual Conference of the Cognitive Science Society, CognitiveScience Society, Austin, TX, pp. 483\u0015488.Wedel, A. B.: 2007, Feedback and regularity in the lexicon, Phonology 24(1), 147\u0015185.Whiten, A. and Mesoudi, A.: 2008, Establishing an experimental science of culture: animalsocial di\u001Busion experiments, Philosophical Transactions of the Royal Society B: BiologicalSciences 363(1509), 3477\u00153488.Wichmann, S., Muller, A., Velupillai, V., Brown, C., Holman, E., Brown, P., Sauppe,S., Belyaev, O., Urban, M., Molochieva, Z., Wett, A., Bakker, D., List, J.-M., Egorov,D., Mailhammer, R., Beck, D. and Geyer, H.: 2011, The ASJP database (version 13).Accessed: 2016-05-31.URL: http://asjp.clld.org/Wichmann, S., Rama, T. and Holman, E.: 2011, Phonological diversity, word length, andpopulation sizes across languages: The ASJP evidence, Linguistic Typology 15(2).Wilson, C.: 2006, Learning phonology with substantive bias: An experimental and compu-tational study of velar palatalization, Cognitive science 30(5), 945\u0015982.Yang, C.: 2010, Who's afraid of George Kingsley Zipf?URL: http://www.ling.upenn.edu/ ycharles/papers/zipfnew.pdfZuidema, W.: 2003, How the poverty of the stimulus solves te poverty of the stimulus,Advances in neural information processing pp. 51\u001558.170"@en . "Thesis/Dissertation"@en . "2017-02"@en . "10.14288/1.0340865"@en . "eng"@en . "Linguistics"@en . "Vancouver : University of British Columbia Library"@en . "University of British Columbia"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@* . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* . "Graduate"@en . "Simulating the evolution of consonant inventories"@en . "Text"@en . "http://hdl.handle.net/2429/60389"@en .