UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Simulating the evolution of consonant inventories Mackie, James Scott 2017

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2017_february_mackie_james.pdf [ 3.96MB ]
Metadata
JSON: 24-1.0340865.json
JSON-LD: 24-1.0340865-ld.json
RDF/XML (Pretty): 24-1.0340865-rdf.xml
RDF/JSON: 24-1.0340865-rdf.json
Turtle: 24-1.0340865-turtle.txt
N-Triples: 24-1.0340865-rdf-ntriples.txt
Original Record: 24-1.0340865-source.json
Full Text
24-1.0340865-fulltext.txt
Citation
24-1.0340865.ris

Full Text

SIMULATING THE EVOLUTION OF CONSONANT INVENTORIESbyJAMES SCOTT MACKIEB.A., University of Ottawa, 2006M.A., University of Ottawa, 2007A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTSFOR THE DEGREE OFDOCTOR OF PHILOSOPHYinTHE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES(Linguistics)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)January, 2017©James Scott Mackie 2017AbstractA major question in phonology concerns the role of historical changes in shaping the ty-pology of languages. This dissertation explores the eect of sound change on consonantinventories.Historical reconstruction is mainly done by comparing cognate words across languages,making it dicult to track how inventories change specically. Additionally, few languageshave historical written records that can be directly examined. For this dissertation, themain research tool is computer simulation, using bespoke software called PyILM, which isbased on the Iterated Learning Model (Kirby 2011, Smith et al. 2003). This allows for thesimulation of sound change from arbitrary starting points, controlling for a multitude ofvariables.PyILM is an agent-based model, where a `speaking' agent transmits a set of words toa `listening' agent. The speaking agent is then removed, the learner becomes the speaker,and a new learner is introduced. The cycle repeats any number of times, roughly simulatingthe transmission of language over many generations.Sound change in a simulation is due to channel bias (Moreton 2008), the result ofwhich is that agents occasionally misinterpret some aspect of speech, and internalize soundcategories that dier from the previous generation (Ohala 1981, Blevins 2004). Threetypological generalizations are examined, none of which have previously been studied froman evolutionary perspective:(1) The total number of consonants in a language. This is shown to be related tosyllable structure, such that languages with simple syllables develop smaller inventories thanlanguages with complex syllables. This mirrors a positive correlation between inventory sizeand syllable structure in natural languages, as reported by Maddieson (2007).(2) The correlation reported by Lindblom and Maddieson (1988) between the size ofan inventory and the complexity of its segments. This eect emerges in simulations whencontext-free changes are introduced, since these changes produce similar outcomes in in-ventories of all sizes.(3) Feature economy (Clements 2003), which refers to the way that consonants within alanguage tend to make use of a minimal number of distinctive features. Economy emergesover time when sound changes take scope over classes of sounds, rather than targetingindividual sounds.iiPrefaceThis dissertation is the original work of the author. I wrote all of the computer code for Py-ILM, including the accompanying GUI, from the ground-up using Python 3.4. The FeatureEconomist algorithm used for the results in Chapter 5 was originally designed in collabo-ration with Je Mielke. It has previously been used for the research in Mackie and Mielke(2011), and it is included with the software P-base (Mielke 2008). The implementationused for this dissertation is one that I wrote myself.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xList of Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiii1 Consonant inventories and sound change . . . . . . . . . . . . . . . . . . . . 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Iterated learning models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Sound change models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 An ILM for phonology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4.2 One turn of a PyILM simulation . . . . . . . . . . . . . . . . . . . . . 181.4.2.1 Production . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.2.2 Misperception . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.4.2.3 Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3 Some notes on design . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3.1 Social factors . . . . . . . . . . . . . . . . . . . . . . . . . . . 201.4.3.2 Single-agent transmission change . . . . . . . . . . . . . . . . 211.4.3.3 Discrete learning period . . . . . . . . . . . . . . . . . . . . . 221.4.3.4 No teleology . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.4.3.5 Phonemes and allophones . . . . . . . . . . . . . . . . . . . . 231.4.4 Expected outcomes and inventory structure . . . . . . . . . . . . . . . 241.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25iv2 PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.1.1 Iterated Learning Models . . . . . . . . . . . . . . . . . . . . . . . . . 262.2 Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.2 generations . . . . . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.3 initial_lexicon_size . . . . . . . . . . . . . . . . . . . . . 312.2.2.4 initial_inventory . . . . . . . . . . . . . . . . . . . . . . . 312.2.2.5 minimum_repetitions . . . . . . . . . . . . . . . . . . . . . . 322.2.2.6 min_word_length . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.7 max_word_length . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.8 phonotactics . . . . . . . . . . . . . . . . . . . . . . . . . . 322.2.2.9 features_file . . . . . . . . . . . . . . . . . . . . . . . . . . 332.2.2.10 max_lexicon_size . . . . . . . . . . . . . . . . . . . . . . . . 342.2.2.11 invention_rate . . . . . . . . . . . . . . . . . . . . . . . . . 342.2.2.12 max_inventions . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.2.13 misperceptions . . . . . . . . . . . . . . . . . . . . . . . . . 352.2.2.14 minimum_activation_level . . . . . . . . . . . . . . . . . . 362.2.2.15 auto_increase_lexicon_size . . . . . . . . . . . . . . . . . 372.2.2.16 initial_words . . . . . . . . . . . . . . . . . . . . . . . . . . 372.2.2.17 allow_unmarked . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.2.18 seed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.2.19 seg_specific_misperceptions . . . . . . . . . . . . . . . . 382.2.3 Words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.2.3.1 string . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.3.2 meaning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.4 Segments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 392.2.4.1 symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.2.4.2 features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.2.4.3 envs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.4.4 distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.5 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 412.2.6 FeatureSpace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.7 Sounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.8 Tokens . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 422.2.8.1 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.2 value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.3 label . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.8.4 env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43v2.2.9 Agents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.2.9.1 lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.2 inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.3 feature_space . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.9.4 distributions . . . . . . . . . . . . . . . . . . . . . . . . . . 442.2.10 Misperception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 452.2.10.1 name . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.2 target . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.3 feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.4 salience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.5 env . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462.2.10.6 p . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.2.10.7 How misperception happens . . . . . . . . . . . . . . . . . . . 472.2.10.8 A note on misperception denitions . . . . . . . . . . . . . . 482.3 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 482.3.1 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.1.1 Parsing a Word . . . . . . . . . . . . . . . . . . . . . . . . . . 492.3.1.2 Creating new segment categories . . . . . . . . . . . . . . . . 522.3.2 Updates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.2.1 The lexicon . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.2.2 The inventory . . . . . . . . . . . . . . . . . . . . . . . . . . . 532.3.3 Determining phonological feature values . . . . . . . . . . . . . . . . . 532.3.4 Production algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.4.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . 552.3.4.2 Step 1: Word selection . . . . . . . . . . . . . . . . . . . . . . 562.3.4.3 Step 2: Transforming Segments into Sounds . . . . . . . . . . 562.3.5 Invention algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.4 Using PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.1 Obtaining PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.2 Conguration les . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 582.4.3 Running a simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 602.4.4 Viewing results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 612.5 Other notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1.1 No social contact . . . . . . . . . . . . . . . . . . . . . . . . . 622.5.1.2 No deletion or epenthesis . . . . . . . . . . . . . . . . . . . . 632.5.1.3 No morphology or syntax . . . . . . . . . . . . . . . . . . . . 642.5.1.4 No long distance changes . . . . . . . . . . . . . . . . . . . . 642.5.2 Running time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 643 Sample simulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66vi3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663.2 Simulation 1 - A single abrupt change . . . . . . . . . . . . . . . . . . . . . . 663.3 Simulation 2 - A single gradual change . . . . . . . . . . . . . . . . . . . . . 713.4 Misperceptions and phonetic similarity . . . . . . . . . . . . . . . . . . . . . 723.5 Simulation 3 - Interactions between sound changes . . . . . . . . . . . . . . . 743.6 Simulation 4 - CVC language . . . . . . . . . . . . . . . . . . . . . . . . . . . 773.7 Simulation 5 - Invention and the spread of new segments . . . . . . . . . . . 803.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824 Natural language consonant inventories . . . . . . . . . . . . . . . . . . . . 844.1 Inventory size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 844.1.2 Population size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 904.1.3 Hypothesis #1: Phonotactics and inventory size . . . . . . . . . . . . 984.2 Inventory contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1004.2.2 Hypothesis #2 - Common consonants . . . . . . . . . . . . . . . . . . 1074.3 Inventory organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1074.3.2 Feature economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094.3.2.1 Measuring feature economy . . . . . . . . . . . . . . . . . . . 1124.3.3 Cross-linguistic tendencies . . . . . . . . . . . . . . . . . . . . . . . . . 1154.3.4 Explaining economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1194.3.4.1 A computational model . . . . . . . . . . . . . . . . . . . . . 1204.3.4.2 Whistle experiments . . . . . . . . . . . . . . . . . . . . . . . 1224.3.5 Hypothesis #3 - Sound change and feature economy . . . . . . . . . . 1254.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265 Simulating inventory evolution . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.2 Inventory size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285.2.1 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.3 Common consonants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1325.3.1 Misperception vs. bias . . . . . . . . . . . . . . . . . . . . . . . . . . . 1335.3.2 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1375.4 Feature economy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1405.4.1 How economy can change over time . . . . . . . . . . . . . . . . . . . 1415.4.2 An illustrative example . . . . . . . . . . . . . . . . . . . . . . . . . . 1465.4.3 Segment-specic misperceptions vs. class-level misperceptions . . . . . 1485.4.4 Calculating feature economy . . . . . . . . . . . . . . . . . . . . . . . 1505.4.5 Simulation results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152vii6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158viiiList of Tables3.1 Conguration for Simulation 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 673.2 Comparison of inventories in Simulation 1 . . . . . . . . . . . . . . . . . . . 683.3 Comparison of inventories in Simulation 2 . . . . . . . . . . . . . . . . . . . 723.4 Comparisons of several generations in Simulation 3 . . . . . . . . . . . . . . 753.5 Conguration for Simulation 4 . . . . . . . . . . . . . . . . . . . . . . . . . . 773.6 Comparison of several generation in Simulation 4 . . . . . . . . . . . . . . . 803.7 Conguration for Simulation 5 . . . . . . . . . . . . . . . . . . . . . . . . . . 813.8 Comparison of several generations in Simulation 5 . . . . . . . . . . . . . . . 824.1 Co-occurrence of V and Z in UPSID (from Clements (2003, p. 303) ) . . . . 1114.2 The inventory of West Greenlandic . . . . . . . . . . . . . . . . . . . . . . . 1174.3 Feature economy eects in Pater and Staubs (2013) . . . . . . . . . . . . . . 1215.1 Conguration for testing phonotactic eects on inventory size . . . . . . . . 1295.2 Conguration for simulations comparing simple misperceptions and biases . 1365.3 Example of individual inventories in a simulation with misperception andbias, starting from only voiceless stops . . . . . . . . . . . . . . . . . . . . . 1385.4 Misperceptions and biases for testing Hypothesis #2 . . . . . . . . . . . . . . 1395.5 Results of two-way ANOVA with inventory size and misperception type aspredictors and economy score as dependent variable. . . . . . . . . . . . . . . 154ixList of Figures1.1 Model of sound change through listener misperception, Ohala (1981, p. 182) 101.2 Emergent stops, from Ohala (1997) . . . . . . . . . . . . . . . . . . . . . . . 112.1 The objects of PyILM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.2 The transmission of a phonological segment . . . . . . . . . . . . . . . . . . . 302.3 Sample feature le . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.4 Example conguration le . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.5 Screen shot of PyILM Visualizer . . . . . . . . . . . . . . . . . . . . . . . . . 623.1 Change in inventory size for Simulation 1 . . . . . . . . . . . . . . . . . . . . 703.2 Results for various values of minimum_activiation_level. . . . . . . . . . . 733.3 Varying misperception salience across three dierent values forminimum_activation_level. Misperception salience is shown in thelegend. Simulation (a) uses a value of 0.2, Simulation (b) uses a value of 0.5and Simulation (c) uses a value of 1.0 . . . . . . . . . . . . . . . . . . . . . . 743.4 Change in inventory size for Simulation 3 . . . . . . . . . . . . . . . . . . . . 763.5 Change in total inventory size with ve dierent random seeds . . . . . . . . 794.1 The inventories of Palauan, from Morén-Duolljá (2005) . . . . . . . . . . . . 854.2 Summary of the distribution of velar stops in Palauan, with data from Morén-Duolljá (2005)) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 864.3 Phonemic inventory of Central Rotokas, based on Firchow and Firchow (1969) 884.4 The inventory of !Xo˜o´, based on Traill (1985) . . . . . . . . . . . . . . . . . . 894.5 Correlations between speaker population size (individual languages) and in-ventory size, from Hay and Bauer (2007). . . . . . . . . . . . . . . . . . . . . 914.6 Correlations between speaker population size (language families) and inven-tory size, from Hay and Bauer (2007) . . . . . . . . . . . . . . . . . . . . . . 924.7 Relationship between population size (log scale) and inventory size for severallanguage families, from Donohue and Nichols (2011) . . . . . . . . . . . . . . 944.8 Population size and inventory size, from Wichmann, Rama and Holman (2011) 954.9 Predicted magnitude of the eect of population size on inventory size, fromMoran et al. (2012, p. 18) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96x4.10 IPA chart warped to show consonant frequency in P-base (Mielke 2008) . . . 1004.11 Consonant inventory size and number of superset inventories in P-base . . . 1024.12 Consonant inventory size in P-base and number of unique consonants . . . . 1034.13 Segment complexity plotted against inventory size for the inventories of P-base1054.14 Consonant inventories from P-base with reversed segment complexity . . . 1064.15 Consonant inventories of Noon and Tamazight . . . . . . . . . . . . . . . . . 1084.16 Randomly generated consonant inventory . . . . . . . . . . . . . . . . . . . . 1084.17 Three sound systems diering in symmetry and economy, from Clements(2003, p. 292) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1104.18 Inventories of Hawaiian, French and Nepali (from Clements (2003, p. 288)). . 1134.19 Ranges of feature economy scores in the inventories of P-base (Mielke 2008) 1164.20 Feature economy scores of natural languages and randomly generated inven-tories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1184.21 Example of whistle recombinations from Verhoef and de Boer (2011, p. 2) . 1235.1 Average inventory size for 50 simulations over 50 generations, across 3 dierentphonotactic conditions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1315.2 State diagram for word-nal obstruents in a simulation with nal devoicing . 1345.3 Change in inventory size for two simulations, one starting with voiceless stops,one with voiced stops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1385.4 Biased and non-biased sounds in the nal simulated inventories. . . . . . . . 1405.5 Change in economy score for a hypothetical language . . . . . . . . . . . . . 1435.6 Range of possible Simple Ratio scores . . . . . . . . . . . . . . . . . . . . . . 1455.7 Range of possible Frugality scores . . . . . . . . . . . . . . . . . . . . . . . . 1455.8 Change in feature economy for a simple simulation . . . . . . . . . . . . . . 1475.9 Change in average feature economy for simulations run with class-level changes1535.10 Change in average feature economy for simulations run with segment-specicchanges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153xiList of Algorithms1.1 Generalized Iterated Learning Model . . . . . . . . . . . . . . . . . . . . . . 51.2 Generalized Iterated Learning Model for phonology . . . . . . . . . . . . . . 182.1 Main simulation loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272.2 Misperception function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472.3 Learning algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 502.4 Activation function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512.5 K-means algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 542.6 Distribution estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 562.7 Production algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572.8 Invention algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57xiiAcknowledgmentsThis project has been a long time in the making, and it could not have been completedwithout the help of many people. First and foremost is my family. My partner Cameronand our daughter Autumn have shown incredible patience, love, and understanding over theyears, and I could not have nished this without their support. My parents, Margaret andCraig, and my brothers, Simon and Alan, have been extremely supportive and encouraging,even if they don't always understand what I'm up to. I love you all.I'd also like to thank my committee, Gunnar, Molly, and Alex, who have all beenextremely helpful over the course of this project. I'm grateful for all that they have donefor me. Gunnar was my rst contact at UBC when I applied to be a PhD student, and I'mextremely happy that he was able to direct my dissertation as well.There are many other individuals who made my graduate school experience special, andwho deserve a mention too. In no particular order, I'd like to thank...Je Mielke and Ana Arregui, at the University of Ottawa, for encouraging me to pursuefurther studies in linguistics.Xiahou Dun, Kraus, a foolish pianist, Robo Kitty, Cingulate, Mortley, !amicable andothers on the SA forums who have really helped me rene my ideas about language andlinguistics.The PCT team: Kathleen Hall, Blake Allen, Michael Fry, and Michael McAulie, whotaught me a great deal about programming, and helped me see how much fun it is.Johnathon Jones, who has given me amazing opportunties to apply linguistics outsideof a university setting.Everyone I've ever TAed for (Susannah Kirby, Lisa Matthewson, Brian Gick, DouglasPulleyblank, Henry Davis, and especially Strang Burton). Teaching has been one of themost meaningful and exciting parts of my PhD, and I've enjoyed every class with every oneof you.xiiiChapter 1Consonant inventories and soundchange1.1 IntroductionEach human language uses only a nite subset of all possible consonants and vowels, andthis collection of sounds is known as the inventory of the language. This dissertation is astudy of consonant inventories. I investigate three dierent aspects of consonant inventories,and how they change over time.The rst is the total size of an inventory. I propose that a main factor that inu-ences inventory size is the phonotactics (syllable structure) of a language. Languages withmore restrictive phonotactics (e.g. only CV syllables, and hence no word-internal conso-nant clusters, nor any nal consonants) will tend to develop small inventories over time,while languages with more permissive phonotactics (e.g. maximally CCVCC syllables, andhence the possibility of word-internal consonant clusters as well as nal consonants) willdevelop larger inventories. This is supported by a correlation reported in Maddieson (2007),which shows that syllable structure complexity is positively related to inventory size in theinventories of UPSID (Maddieson and Precoda 1989).The second aspect of inventories to be studied is the frequency of consonants acrosslanguages. Certain sounds are extremely common (such as /p/ and /m/) while other soundsare less common (such as /q'/ and /Ð/). This frequency distribution is related to inventorysize: small inventories tend to have only the most common sounds, while large inventorieshave all of the most common sounds, as well as rare or even unique ones (Maddieson (2011),Lindblom and Maddieson (1988), see also Section 4.2). Put another way, small inventorieslook similar to each other, and inventories diversify as they grow.Lindblom and Maddieson (1988) propose that this eect is due to the way that inven-tories grow over time. Inventories rst saturate a small set of all possible sounds, beforeexpanding into other areas of phonetic space. A metaphorical rubber band draws inven-1tories back toward this basic set, accounting for the contents of small inventories, whilea metaphorical magnet pushes sounds apart from each other, resulting in the increasingdiversity of large inventories.This metaphor is appealing but Lindblom and Maddieson do not oer any historicalevidence to support it, nor do they point to any specic types of sound changes that mightunderlie the rubber band and magnet eect. I propose that the basis for a common setof sounds across languages is the existence of context-free sound changes, which can aectinventories of any size. Large inventories have more unique sounds because such languagesalso have a wider variety of phonetic contexts (due to the correlation with phonotacticsdiscussed above).The third aspect of inventories is something known as feature economy (Clements2003, 2009). This describes the tendency for inventories to be organized around the re-useof a small number of distinctive features. For example, many languages have a six-stopsystem /p, b, t, d, k, g/, where the feature [voice] is re-used for contrast at each place ofarticulation. In comparison, a six-stop system /p, b, t, t', kw, k/, one that makes use of adierent feature for contrast at each place of articulation, is extremely rare, if not actuallyunattested. Mackie and Mielke (2011) showed that natural languages exhibit higher featureeconomy scores than randomly generated sets of segments, but did not oer an explanationfor why.I propose that feature economy is the emergent result of sound change. This is becausethe phonetic biases underlying sound change are such that they can aect classes of sounds,rather than individual sounds. This makes it possible for a new set of sounds to emergein an inventory, all of the members of which are minimally dierent from another set,diering only by whatever phonetic feature was aected by sound change. Over time, thiscreates the appearance of economy in an inventory. Randomly generated inventories areless economical than natural languages because they have never undergone sound change.These proposals about consonant inventories will be tested through computer simula-tion. In Chapter 2, I present PyILM, a Python package for running sound change simula-tions. Broadly speaking, PyILM is an implementation of a listener-based theory of soundchange. In contemporary linguistics, this approach to sound change is probably most well-known through the work of John Ohala (1981, 1983, 1991, 1997, et seq.) and more recentlyEvolutionary Phonology (Blevins 2004, 2006b, Blevins and Wedel 2009). The computa-tional framework used is the Iterated Learning Model (Brighton et al. 2005, Kirby et al.2008, Smith and Kirby 2008, Cornish 2010). This is an agent-based model where agentsare arranged in a transmission chain. The nth agent receives information from agent n− ),formulates a hypothesis about it, and then transmits new information to agent n+ ).Sound change in a simulation is due to events known as misperceptions in the termi-nology of PyILM. Misperceptions are dened as probabilistic, context-sensitive rules (whichincludes context-free rules, i.e. rules sensitive to any context). When a misperception oc-curs, the phonetic value of a speech sound is altered. The shift in phonetic value createsthe potential for the agent at generation n to acquire a dierent set of sound categories2compared to the agent at generation n− ).Simulations have a large number of parameters that can be set, which makes it possibleto study how sound systems evolve under dierent conditions. For instance, the hypothesisthat inventory size is connected to phonotactic complexity can be testing by running severalsimulations with identical starting conditions, varying only the syllable types permitted,and measuring the size of the inventories after N generations of transmission.The dissertation is organized as follows: The remainder of Chapter 1 discusses issuesof language transmission and sound change. Chapter 2 provides technical details of Py-ILM. Chapter 3 gives some toy simulations to illustrate how PyILM works. Chapter 4returns to the topic of natural languages with an overview of cross-linguistic tendencies ininventories. Chapter 5 provides the results of PyILM simulations demonstrating how thesecross-linguistic tendencies can emerge through iterated learning of sound systems1.2 Iterated learning modelsLanguages can only survive over time if they are continually re-learned at each genera-tion. This continuation of people learning from others, who in turn learned from others,is referred to as cultural transmission (Brighton et al. 2005). A simulation of languagechange should be at least in part a simulation of cultural transmission. The actual processof cultural transmission is extremely complex, as it includes an uncountable number of in-teractions between an enormous network of people, often with intricate social relationships.Language use and acquisition is also tied to the physical environment, conversational con-text, and various other socio-linguistic factors. There may even be more than one languagebeing transmitted at a time. This makes computational modeling of cultural transmissionchallenging, and it is common to abstract away from this complexity, and focus on simplersituations.Cultural transmission is modeled in this dissertation as an Iterated Learning Model(Kirby 1998, Kirby et al. 2008, Kalish et al. 2007, Griths and Kalish 2007, Smith et al.2003). This is a simple model of information transmission where individuals are arranged ina chain. The nthindividual receives input from the n-1thindividual, formulates a hypothesisabout the input, then uses that hypothesis to produce output for the n+1thindividual. Interms of language change, each pair of individuals is intended to represent one generationof language transmission.In such a model, there are only ever two agents interacting at a time. There is alwaysone agent who already knows a language, referred to hereafter as the speaker, and one agentwho is learning from the speaker, referred to hereafter as the listener or the learner. Theseare relative terms. Every agent spends some time in both roles, with the exception of therst agent, who is seeded with some kind of language to get the simulation started, andhence is never a learner. The agent at Generation 2, for example, is a learner with respectto the agent in Generation 1, but a speaker with respect to the agent in Generation 3.3The nature of iterated learning is such that the information being transmitted canchange over time. Any errors that occur in transmission can continue to get propagatedthrough the chain of agents. It is possible that the language acquired by the nal generationis extremely dierent from what the rst generation knew. This is a desirable outcome interms of modeling natural language change, since languages (eventually) become mutuallyunintelligible with their ancestral forms.The amount of change that occurs in an iterated learning model, and how often it occurs,depends on how reliably information can be re-learned by each agent. Information that isdicult to reliably re-transmit (for whatever reason) will tend to change or disappear froma language. This is known as selection for learnability (Kirby et al. 2008, Brighton et al.2005, Smith and Kirby 2008).In order for linguistic forms to persist from one generation to the next, theymust repeatedly survive the processes of expression and induction. That is,the output of one generation must be successfully learned by the next if theselinguistic forms are to survive. We say that those forms that repeatedly survivecultural transmission are adaptive in the context of cultural transmission: theywill be selected for due to the combined pressures of cultural transmission andlearning. (Brighton et al. 2005, p. 10; emphasis added)One goal of this chapter is to apply this concept to the study of sound change. Thiswill not be dicult, since it already has much in common with popular models of soundchange through misperception (e.g. Ohala 1981, Blevins 2004). The concept of selectionfor learnability originally grew out of research on syntax and morphology, so it is useful tostart with a brief overview of that literature, even though it takes us somewhat far aeldfrom the topic of phonological inventories. I will focus on the theoretical and computationalaspects, and pay less attention to the implications for syntax.Early work in this area was done by Simon Kirby (Kirby 1996, 1998, 2000, 2001) whohas focused on how compositional syntax can emerge in a language that is initially non-compositional, through the process of iterated learning. A compositional language in thiscase is dened as one where the meaning of an utterance is a function of the meaning of itsparts and how they are put together. A non-compositional language, on the other hand,is dened as one where every meaning is expressed through a holistic arbitrary pairingof meanings and sound strings. This is not a strictly binary division, and a languagecould have some meanings expressed by compositional structures while others are non-compositional. This is in fact the case with natural languages where we both observecompositional patterns (e.g. regular word order and morphological paradigms) and non-compositional forms (idioms, irregular forms).Algorithm 1.1 gives an outline of a typical iterated learning simulation from Kirby'swork. This would simulate g generations of agent interactions, and each agent learns fromd utterances.4Algorithm 1.1 Generalized Iterated Learning ModelGenerate a speaking agent with a grammarGenerate a learning agent with no grammarLoop g times:Loop d times:The speaking agent produces an utterance.The learning agent tries to parse the utterance with her grammar.If she cannot, she memorizes it as an unanalyzable whole.The speaker is removed from the simulation.The learner becomes the new speaker.A new learner is added into the simulation.Kirby argues that compositionality emerges as languages adapt to a specic constraintimposed by cultural transmission, namely the fact that a learner cannot hear an exampleof every sentence that she will potentially want to express as a speaker. This constraint isreferred to as the transmission bottleneck, and it is similar to the concept of the Povertyof the Stimulus in generative linguistics (Legate and Yang (2002), Berwick et al. (2011), seeZuidema (2003) for comparison of the transmission bottleneck to poverty of the stimulus).This constraint is built into simulations by ensuring that d is less than the total number ofutterances an agent could possibly produce.In Kirby's models, languages are said to survive transmission if the learner at gener-ation n acquires a grammar such that she would produce the same utterance for the samemeaning as the speaker at generation n− ). If the grammar changes between generation nand n+), then the language of generation n did not survive transmission. It is important totreat the word survive as a technical term to be understood in the context of simulations.There is a single language in a simulation, and a single pair of users at a time.Non-compositional languages cannot survive when there is a bottleneck on transmission.This is because a learner-turned-speaker will, at some point, want to express a meaningshe has never heard expressed. She will not be able to guess how this meaning would beexpressed by the previous generation, due to the lack of compositionality. Instead, shewill need to invent a new way of expressing this meaning, which will serve as input tothe following generation. Because of this change, the older language does not survive theentire simulation. Changes like this are guaranteed to occur at each generation, due to theconstant bottleneck. A slightly dierent language will appear at each generation throughouta simulation.Compositional languages, on the other hand, can survive transmission even with abottleneck. A learner need not hear an example of every single sentence. As long as alearner knows the component parts, and knows rules for putting parts together, she canconstruct a novel utterance that has a high chance of being the same as what the previousgeneration would have constructed. This increases the chances of a language surviving5transmission many generations in a row.Kirby's simulated agents all have the capability of learning compositional grammars, butthe initial agent is intentionally seeded with a grammar that lacks compositionality entirely.The key step in the emergence of compositionality is the rst time an agent invents a novelutterance. The way in which the invention algorithm works is crucial. The algorithmconstructs a new utterance by looking for other meanings an agent already knows that aresimilar to the one she wants to express, selecting some random sub-string from there, andthen adding on a new random string. The result of this invention is the introduction ofevidence for compositionality into the input of the following generation. There are now twosimilar meanings with shared sub-strings that a learner can infer are connected by a rule.For example, suppose Learner 1 acquired afxaba for fox eats bird. Later, this agentbecomes Speaker 2 and invents afxagatam for fox eats mouse by taking the substring afxafrom a known word and adding on a randomly generated string gatam to the end. Thefollowing Learner 2 hears both of these utterances, and infers that afxa means fox eats, bameans bird and gatam means mouse. Learner 2 could then posit a rule where either baor gatam can follow afxa, as opposed to memorizing each meaning independently. Learner2 now has the rst partially compositional grammar in the simulation. (See Kirby (2000)for specic details of the invention and rule-induction algorithm.)Learner 2 will eventually become Speaker 3, and will make use of this rule to produceutterances. Any utterances that Speaker 3 invents containing the meaning fox eats willcontain the string afxa, and this is information that Learner 3 can use to acquire a grammarsimilar to Speaker 3.Over the following generations, more and more compositional rules enter into the lan-guage through this cycle of invention and rule-induction. Since compositional languagescan be learned in spite of a bottleneck, transmission has a lower error rate, and eventuallythe language comes to be dominated entirely, or almost entirely, by compositionality.It is important to note that there is no single factor that can explain the emergenceof compositionality. It is the result of a combination of agent behaviour and culturaltransmission. Changing either of these changes the resulting languages. Trivially, if agentswere cognitively incapable of using compositional structure (if they could do neither ruleinduction nor compositional invention) then of course compositionality would never arise.If agents were content to learn strictly from the input, and never invent new utterances,then non-compositional languages would have a higher chance of surviving.If the transmission model did not involve iterated learning, but agents at each genera-tion received input from the same external source instead, then invention and rule-inductionwould have no long-term impact, and the language would not evolve toward composition-ality. The specic size of the bottleneck can change the outcome (Smith et al. 2003),potentially favouring non-compositional languages. Frequency in the input makes a dier-ence as well, and non-compositional forms can survive if they are highly frequent (Kirby2001).It is also important to emphasize that compositionality appears entirely through non-6teleological means. While it is true that agents do introduce compositional utterances onpurpose through the invention algorithm, this does not represent a teleological elementof the model. The reason for inventing an utterance is not so that the language can, inthe future, be compositional. Utterances are invented to solve in-the-moment needs ofcommunication, with no regard to long-term consequences. Language change itself is notdirected towards the goal of achieving compositionality, so the model is not teleological, evenif the individual agent interactions could be said to have a goal. Instead, compositionalityis achieved through selection for learnability.This is not just an eect that occurs in computer simulations. It has also been demon-strated in laboratory experiments with human participants. Kirby et al. (2008) and Cornishet al. (2009) discuss experiments where compositional lexicons can emerge from an initiallyrandomly-generated set of words through iterated learning. The rst set of participants inan experiment were shown pictures of objects, each of which was paired with a randomlygenerated string of CV syllables. The objects were constructed out of three features: shape(square, circle, triangle), colour (red, blue, green), movement (horizontal, spiral, bouncing).Participants were not made explicitly aware of these features.Participants were instructed to learn the names, and they were then tested on theirability to recall those names. The answers they provided in the recall test were then givenas the labels for those objects to next set of participants in the experiment, and the cyclerepeated. Unlike actual cultural transmission, participants in these experiments never meteach other, and were not made aware that their answers would be given to other participants.The lexicons at the end of the experiment appeared to be organized around certain fea-tures of the objects, rather than having each word arbitrarily matched to an object. Kirbyet al. (2008) provide an example of a nal lexicon with consistent pairing of morphemes andcolours (ne is black, la is blue, ra is red) as well as motion (ki for horizontal movementand pilu for spiral). Shape was less consistent. Blue and back triangles were encoded as keif moving horizontally, but as ki if moving in a spiral. Red horizontal triangles were calledhe and the red spiral triangles were ho.This is a result of the lexicon adapting to the learning requirements of the participants.It is dicult to remember nine random strings of sounds, so the rst participants in theexperiment tend to have a high error rate. Even if they could not remember the name of anobject, they still had to supply a word of some kind for the recall test, so they invented anew word based on whatever other words they actually could remember. This immediatelydecreased the diculty for the second participant, since the lexicon now contained words forsimilar objects that have substrings in common, which helps with learning. Some irregularforms still exist in the lexicon by the end, because while participants may not be able toremember nine random strings, they can remember two or three.In summary, the idea that languages adapt to how they are being transmitted, whatBrighton et al. (2005) call selection for learnability, is useful for understanding how patternscan emerge in languages over long periods of time. Changes happen as agents each maketheir own small adjustments to the language to meet their needs at a particular time. Agents7do not consider what eects their change will have on the future state of the language.Agents are not even aware that they are changing anything. They do not know whatthe underlying forms of the previous generation looked like, so they cannot know if theyare deviating from them. Patterns tend to emerge because all agents learn under similarconditions. Changes that make it easier for one agent to learn to use the language will alsomake it easier for future agents to use the language, though no agents are aware of this.1.3 Sound change modelsHow do phonological systems adapt to transmission? The facts relevant to the transmissionof sound systems are of course very dierent from syntax or morphology. Learners-turned-speakers do not face the same problems with phonology as they might with syntax. Speakersmay have to express novel propositions or construct unique arrangements of words andphrases, but they are never in a place where they need to invent a new sound they didnot hear in their input and only rarely is there a need to construct a unique sequence ofconsonants and vowels.The notion of a transmission bottleneck may still apply to some properties of sound sys-tems, however. Stanton (2016) argues that certain patterns in stress sytems are unattestedbecause they are too dicult to learn from input data. It is also likely that the natureof the input aects the learnability of long-distance harmony patterns (McMullin 2016).As far as phonological inventories are concerned, however, the transmission bottleneck notan important factor, since it is highly unlikely that a sound is so rare in the input that alearner does not acquire it.In order for a phonological inventory to survive transmission, a learner must acquirethe same set of categories as the speaker. The main obstacle to successful transmissionof an inventory is channel bias, which Moreton (2008, p. 87) describes as phonetically-systematic errors in transmission between speaker and hearer, caused largely by subtlephonetic interactions.Moreton contrasts channel bias with analytic bias, a term he uses to refer to cognitivefactors that make learning certain patterns easier or harder. Moreton argues that somepatterns are not explainable without reference to analytic bias, and a complete under-standing of phonology requires considering it along with channel bias. Although I agreein general with Moreton on this issue, for the purposes of this dissertation, I will focusonly on the eects of channel bias. This is because analytic bias seems to be more relevantfor learning phonological patterns involving the interaction of sounds. Moreton himselfdemonstrates the need for analytic bias by discussing vowel-to-vowel height dependenciesand vowel-height-to-consonant-voicing dependencies. On the other hand, I am interestedin the transmission of individual sound categories, which are more likely to be aected bychannel bias.There are many phonetic eects that could be considered channel bias. Co-articulation8can change some characteristics of a sound, such as nasal consonants taking on the placeof articulation of following consonants (Kochetov and Colantoni 2011). Speakers may failto reach an articulatory target, such as when vowels become more centralized in unstressedsyllables, a phenomenon known as undershoot (Mooshammer and Geng 2008). Some-times, two sounds might just be acoustically very similar and easily confusable, such as [f]and [T] (Jongman et al. 2000).The most important consequence of channel bias is that it introduces variability into theinput of a learner. This variability is the precursor to sound change. Due to channel bias,a particular phonological category will have multiple possible pronunciations, and some ofthese may be dierent enough from each other that a learner of a language incorrectly infersthat they are, in fact, representative of dierent categories. When the learner becomes thespeaker, these dierent categories are re-transmitted to the following generation, cementingthem into the language.One important aspect of channel bias is that it is context-sensitive. Pronunciation doesnot randomly vary from utterance to utterance. The way that a particular sound categorymanifests itself phonetically is inuenced by the kinds of sounds that occur before and afterit. For this reason, a sound change such as nMk' / _[+continuant] is not expected to occurin any language because there is no obvious relationship between the continuant nature ofthe environment in which the change occurs, and the nasal-to-ejective change that is theoutcome. It is highly unlikely that a listener would mistake [ans] for [ak's], for example.On the other hand, a change like nMm / _p is a more natural change, because there is aphonetic connection between the environment (a labial stop) and the outcome of the change(a coronal becoming a labial).A main claim of this dissertation is that inventories, and their typological characteristics,are the result of adaptation to channel bias over many generations of cultural transmission.In other words, channel bias is to phonological inventories what the bottleneck is to morpho-syntax. The sounds that appear in an inventory are those which are the most likely to besuccessfully retransmitted, given the set of environments found in the lexicon, and given anychannel bias that might apply in these environments. Since all humans have roughly thesame articulatory and perceptual systems, it is expected that unrelated languages will besubject to the same kinds of channel bias, and hence similar patterns can arise in languagesall around the world.This type of approach to phonology is sometimes referred to as a diachronic approach,since the main locus of explanation is in the transmission of the language from generationto generation. In an overview article on diachronic explanations in phonology, Hansson(2008) describes them this way:[R]ecurrent sound patterns are the product of recurrent diachronic events(sound changes), which have their ultimate causes in the physical conditionsunder which speaker-listener interactions take place in language use and lan-guage transmission across generations. On this view, voicing is neutralized in9Figure 1.1: Model of sound change through listener misperception, Ohala (1981, p. 182)preconsonantal (as opposed to prevocalic) position not because some constraintto this eect is part of the innate endowment of humans, nor because learnersare predisposed to posit only such constraints as are grounded in phonetics.Rather, languages will show some tendency to acquire such neutralization pat-terns for the simple reason that, in positions where distinctive voicing is hardfor listeners (including learners) to detect, listeners / learners will be liablenot to detect it, erroneously interpreting a preconsonantal voiced obstruent asbeing voiceless and encoding it as such in their mental representation of theword-form in question. If and when the pattern caused by such recurring mis-interpretations becomes entrenched, the result is a language with systematicvoicing neutralization precisely in those positions where such neutralization isphonetically motivated. (Hansson 2008, pp.4-5)The most inuential line of work in this area comes from John Ohala (1981, 1983, 1991, 1997,et seq.). Ohala's theory of sound change is based on the idea of listener misperceptions,which occur when listeners acquire something from the speech signal other than whatthe speaker intended. Listeners at some point become speakers, and the misperceivedinformation serves as the basis for producing speech, which in turn becomes input forfuture learners. Ohala's models have much in common with models of iterated learning.This is very clear in the diagram from Ohala (1981) shown in Figure 1.1, which predatesany of the modern formal literature on iterated learning.Note that this model of change crucially relies on a third generation. The learner10Figure 1.2: Emergent stops, from Ohala (1997)who initially misinterpreted the signal has to re-transmit this misinterpretation to a newgeneration. Ohala calls the change without re-transmission a mini-sound change. This isbecause:it would so far only involve one speaker-hearer. However, if this person's speechis copied by other speakers, this mini-sound change could become a regularsound change, i.e. characteristic of a well-dened speech community. (Ohala1981, p. 184)One of Ohala's primary arguments is that listener misperceptions arise in the rst placebecause speech is inherently ambiguous. Figure 1.2 is a diagram from Ohala (1997) thatillustrates how stops can emerge from the ambiguity created by co-articulation. In thetransition between two consonants, total or near-total obstruction of the oral tract mayoccur. The speaker has met the condition to produce a stop at this point, even thoughthis was not the intention of the speaker and there is no underlying stop in their mentalrepresentation at this position in the word. Listeners may interpret this transient closureas belonging to a true stop, and assume that one really does exist in the word , leading toa sound change.Ohala discusses several dierent environments where stops can appear through thisprocess. I mention only two here. The rst is when a nasal is followed by a fricative. Thisis noticeable in English words such as warmth, which may be pronounced as [wOrmT] or[wOrmpT], or length which may be pronounced as [lENT] or [lENkT].The stop emerges as follows, using the word warmth as an example and focusing on thetransition from [m] to [T]. First the oral tract is closed at the lips for [m], but the velum islowered. This is the initial state in Figure 1.2, where line B represents the closed lips andline A represents the open velar port.From this initial state, the velum has to raise and the closure at the lips has to bereleased, with a new constriction formed at the teeth for [T]. In between these two stages,11there is the possibility for the velum to have raised before the labial closure is released.This is the transitional state in Figure 1.2. This creates the conditions for an oral stop, andonce the lips do open, there is a release of air into the fricative which can be mistaken fora stop burst. This is represented by the nal state in Figure 1.2.Ohala (1997) points to a few cases of historical change that could potentially be haveresulted from listeners misperceiving the burst as being a true stop, for example, the intro-duction of /b/ between /m/ and /r/ in French, e.g. Latin /kamera/ > French /SAmbK/.A second environment where stops might emerge is between a fricative and a lateral,e.g. [l] and [s]. During the transition between manners of articulation there may be totalclosure formed by the tongue against the sides or roof of the mouth. This would producethe right conditions for a [t] to be perceived. Ohala gives an example from English, whereelse has come to be pronounced [Elts], and another example from Kwak'wala k'weìtsoP tobe feasted from k'weì + soP.Velar palatalization is another sound change that can potentially be explained throughlistener misperception. This is a common change k > tS before front vowels. Guion (1997)investigated the question of why the /k/ is fronted all the way to the post-alveolar place ofarticulation (see also Chang et al. (2001) and Wilson (2006)).Guion (1997) notes that the peak spectral frequency of the burst of a velar stop is relatedto the frontness of the following vowel. Specically, higher vowels result in higher burstpeaks, and the peaks are highest before /i/. Guion compared these to the peak spectralfrequencies of /tS/, which is relatively constant across dierent vowel contexts, but alsohigher than the velar peak in general. The burst of a /k/ is highest before /i/, and soit is in that environment where it is most spectrally similar to /tS/ , which is exactly theenvironment where the sound change tends to occur. Thus, this sound change could haveits origin in misperception.Guion conducted an experiment to further establish whether these sounds are indeedperceptually similar. Participants heard examples of a CV syllable consisting of one of [k,g, tS, dZ] followed by one of [i,a,u], and were given a forced-choice identication task. Asexpected, [ki] was identied as [tSi] more often than [ka] was identied as [tSa].James Kirby has proposed a listener-based explanation for the recent development oftone in Phnom Penh Khmer. Kirby reports that the trill /r/ is being lost in onset clusters,and is replaced by aspiration and a change in f0 contour. Kirby argues that the previouscontrast of CV and CrV has transphonologized into a contrast based on f0 of the vowel.This is supported by perception experiments in Kirby (2014a) where listeners were able touse f0 as cue for distinguishing words that are underlyingly /CrV/ from those which are/CV/.Kirby (2014b) further demonstrates how this kind of listener-led sound change canoccur, this time with a series of computational simulations. Agents in a simulation receiveexamples of words, and their task is to assign each segment in the word to a category.Segments are classied based on four phonetic dimensions.In addition, there is a channel bias that alters the input to agents. The bias has two12simultaneous eects: in a sequence CrV, it reduces the duration of /r/, and it lengthensthe onset of the vowel. This bias has a cumulative eect, so the perceptibility of /r/ slowlydecreases over the course of a simulation, while the length of the onset increases.Early in the simulations, agents were able to distinguish /CrV/ words from /CV/ usingthe length of /r/, but as the simulation ran on this became impossible because of the bias.Instead, agents begin using information about the vowel onset because that information hasbecome more salient to them, which is similar to what Kirby (2014a) describes for Khmer.Another inuential model of diachronic phonology is the Evolutionary Phonology model(Blevins 2004, 2006a, Blevins and Wedel 2009), which builds on Ohala's work. The basicpremise behind Evolutionary Phonology is that [p]rincipled diachronic explanations forsound patterns have priority over competing synchronic explanations unless independentevidence demonstrates, beyond reasonable doubt, that a synchronic account is warranted.(Blevins 2006b, p. 23). Common sound patterns are common because they result from com-mon sound changes. Sound changes themselves are the result of articulatory and perceptualfactors hindering perfect language transmission.Much of Blevins' terminology is borrowed from biological evolution, and shares some-thing in common with the iterated learning literature, although her Evolutionary Phonologybook does not cite any of that work. For instance, she has a discussion of adaptations(Blevins 2004, p. 54), which is reminiscent of the concept of selection for learnability,namely that sounds are selected for on the basis of their ability to survive transmission ina particular context:If a contrast between two sounds is just barely perceptible in a particularphonetic environment, its chances of survival in a noisy world are slight. ... Inreconsidering the case of change where [anpa] is heard as [ampa] it makes verylittle sense to compare the sounds [n] and [m] outside the specic environmentin which they occur. In the same sense that the usefulness of claws and toe-pads cannot be assessed outside particular physical environments in which theyoccur, there is no sense in which /n/ is a better or more useful nasal consonantthan /m/ or vice versa. Adaptation occurs with respect to a specic phoneticcontext.Ohala and Blevins present slightly dierent typologies of misperception. Ohala dividesmisperceptions into two types, called hypercorrection and hypocorrection (e.g. Ohala(1992)). Hypocorrection occurs when a listener assumes that a phonetic eect, such asco-articulation, is an intended part of the signal, and internalizes it as such.For example, the amount of aspiration that occurs on a stop depends on the height ofthe vowel that follows it (Hansson 2008, Ohala 1983). This is because in order for voicingto occur, the vocal folds need to vibrate, and this requires a suitable pressure dierentialbetween the oral cavity and subglottal cavity. During the closure phase of a stop, thepressure in the oral cavity builds to become equal with the subglottal pressure, and whenthe stop is released, pressure drops in the oral cavity. How fast this drop happens, and how13long it takes to achieve the right pressure dierential, depends on the height of the vowel, i.e.the size of the oral cavity through which air can escape. Higher vowels make for narroweropenings, which slows the drop in pressure, and also increases the turbulence/noisy qualityof the stop burst, which can make the stop sound more aricate-like.If learners hypocorrect, they may infer an underlying aricate in this position, includingpossibly as an allophonic variant of the stop. In fact, numerous languages have phonologicalprocesses converting stops to aricates before high vowels, e.g. Japanese t → tS / _iThe other kind of change, hypercorrection, occurs if a listener erroneously tries to fac-tor out a part of the speech signal. The main example of hypercorrection seems to bedissimilation. For example, in Classical Greek a change has occurred such that labializedconsonants became unlabialized adjacent to rounded vowels, e.g. *lukwos < lukos `wolf'.If hypercorrection is at play, then this change was caused by listeners who assumed thelabialization of the consonants was due to the adjacent rounded vowel, and removed it,creating unlabialized consonants. Hypercorrection is expected to target phonetic charac-teristics which have a relatively long duration (e.g. palatalization, glottalization, but notcontinuancy or arication). Unlike a hypocorrection, dissimilation and hypercorrection arenot likely to eliminate the original triggering environment for the change. This is becausethe listener has to notice this environment in the rst place in order to even make thehypercorrection.Blevins has a three-way typology of misperceptions in her model, calling them choice,chance, and change. Change occurs when the phonetic signal is misperceived by thelistener due to acoustic similarities between the actual utterance and the perceived utter-ance. For instance, a listener might misperceive a [T] as a [f]. The misperception occurs onthe surface, and there is no correction taking place on the part of the listener. In fact, thelistener's underlying form is remaining entirely faithful to the surface form but the surfaceform was not a good representation of what the speaker intended.Chance is a term for when the phonetic signal is accurately perceived by the listener butis intrinsically phonologically ambiguous. The listener associates a phonological form withthe utterance which diers from the phonological form in the speaker's grammar. Blevins'example here is the speaker says /aP/ → [Pa˜P] and the listener hears [Pa˜P] → /Pa/.Choice describes a situation where there are multiple phonetic variants of a singlephonological form which are accurately perceived by the listener. The listener (a) ac-quires a prototype or best exemplar which diers from that of the speaker; and/or (b)associates a phonological form with the set of variants which diers from the phonologicalform in the speaker's grammar. In Blevins' example, the speaker has an underlying form/tuP@laN/ which is variously pronounced as [tuP@laN], [tuP@laN], or [tuPlaN], e.g. there arevarious amounts of schwa that actual appear on the surface. The listener has a choiceabout whether to include the schwa in the underlying form, or factor it out as in irrelevanttransition between the glottal stop and the lateral.Although these diachronic models tend to focus on the role of the listener, the speakeris equally important because the speaker produces the listener's input. The diachronic14model presented in Garrett and Johnson (2012), for example, is one that more explicitlyincorporates the role of the speaker. Their model diers from Blevins and Ohala in that itattempts to categorize sound changes based on their underlying mechanisms, rather thanon their outcome. Garrett et al. focus on four specic factors: motor planning, speechaerodynamics, gestural mechanics, and speech perception. Two of these, motor planningand gestural mechanics, are clearly speaker-oriented.Certain kinds of sound changes are more obviously speaker-initiated than others. Oneexample is assimilatory change, which occurs when there is overlap in articulation betweentwo sounds, causing one sound to acquire the features of the other. Consider the nasalizationof vowels before nasal consonants, for instance. To produce the vowel, the oral tract needsto be open to some degree and relatively free of obstruction, and there should be no nasalairow, i.e. the velum should be raised. The postvocalic nasal consonant has conictingrequirements: the speaker needs to close o the oral tract at some place of articulation, andlower the velum for nasal airow. Since the velum cannot be instantaneously displaced, andsince oral closure cannot happen immediately, there is the possibility that the speaker willspend some time with the velum lowered and the oral tract open, which eectively resultsin a nasal vowel.Vowel nasalization tends to occur more often when the nasal consonant follows thevowel, compared to when the consonant precedes it (Chen et al. 2007). This is again dueto articulatory eects. When the consonant follows the vowel, the potential co-articulationhappens as the speaker attempts to open the velar port and close o the oral tract. Whenthe consonant comes before the vowel, however, the potential period of co-articulationhappens as the speaker attempts to close the velar port and open the oral tract. As itturns out, the movement required for velic opening in post-vowel nasals is about 1.6 timesfaster than the movement required for oral opening in post-nasal vowels (Krakow (1994)).The faster speed of the velic movement means there is a greater probability of producing anasal vowel in a VN sequence, compared to a NV sequence, because the velar port is goingto open for the nasal before the oral tract can be closed.This co-articulatory eect has been argued to be the source of historical changes whereoral vowels nasalize before nasal consonants, becoming full-edged phonemes, such as oc-curred in some Romance languages (Recasens 2014). It is also common for many languagesto have allophonic nasalization of vowels before nasal consonants (Schourup 1973), and thistoo probably developed from misperceptions arising from co-articulation.However, this articulatory timing is not universal, so nasalization of vowels adjacentto nasals is not universal either. Butcher (1999) studied the articulation of speakers ofAustralian languages in the Arandic, Lake Eyre, and Yura groups. He found that thesespeakers have systematically dierent timing in the raising and lowering of their velum,compared to speakers of English. In particular, the Australian speakers showed much moresudden changes in the state of their velum, which meant that nasality hardly spread at allinto adjacent vowels. Butcher further suggests that this particularity in articulation is theorigin of pre-stopped nasals in these languages.15The aerodynamic voicing constraint (Ohala 1983) is another example of how articulationcan play a role in sound change. The constraint refers to the requirements for modal voicing:there must be air owing through the vocal folds, which need to be tensed. This presents aproblem for voiced stops. By their nature, stop consonants cause air to accumulate in theoral cavity, and the dierence in air pressure above and below the glottis begins to equalize.At a certain point voicing becomes impossible. Voiced fricatives are also aected by thisconstraint because frication requires the air pressure in the oral cavity to be greater thanatmospheric pressure. This creates a conict: for voicing oral air pressure needs to be low,for frication oral air pressure needs to be high.How does the aerodynamic voicing constraint factor into an explanation of sound changethrough misperception? The argument would run as follows: the production of any voicedstop or voiced fricative inherently puts it in conict with this constraint. To maintain voic-ing, oral air pressure needs to be reduced somehow, and Ohala discusses several ways thiscan be achieved, including expansion of the cheeks, lowering of the larynx, or venting someof the accumulated air through the nose. On some occasions, these strategies can lead tospeakers producing speech with characteristics dierent than intended, which listeners will(wrongly) assume are intended characteristics. This makes the constraint dierent fromco-articulation, because it is not entirely context-dependent. There is an inherent dicultyin voicing an obstruent, regardless of its position in a word.Ohala (1983) provides a list of 12 potential implications this has for sound change andinventory typology, such as the fact that voiceless stops and fricatives are more commoncross-linguistically than their voiced counterparts. In P-base (Mielke 2008), for example,97% of the languages have at least one of /p,t,c,k,q,f,s,S/ whereas only 83% of languages inthe database have a voiced version of any of those.As another example, Ohala argues that implosives developed in Sindhi from geminatevoiced stops. The length of a geminate means it is even more at risk of becoming voicelessthan a singleton voiced stops. Oral air pressure must be kept low for an even longer periodof time through some means. Ohala proposes that this was done through larynx lowering,which listeners misinterpreted as implosion.Another articulatory eect that plays a role is gestural reduction. Lin et al. (2014)looked at the role of gestural reduction in the production of the English lateral /l/. Alveolarlaterals have two lingual constrictions, one anterior and one dorsal. The degree of anteriorconstriction in laterals varies with the phonetic context and between speakers, in some casesachieving no apical contact at all. In English, /l/ is especially likely to be reduced in thecontext of V_C, where C is non-alveolar. For instance, the /l/ in help or elk undergoesmore reduction than the /l/ in melt. This may be partly due to homorganicity, and Lin etal. nd that the anterior constriction is less reduced when the tongue tip would be makinga contact at that place for the following sound anyway.This reduction is one reason underlying a change currently underway in some varietiesof English, where /l/ loses its anterior constriction entirely, and vocalizes to /w/ or /u/ (dueto the dorsal constriction). Lin et al. report that in dialects where this change is underway,16it is more advanced in pre-labial and pre-velar contexts, which is expected given the greaterlikelihood for gestural reduction in that environment. The loss of /l/ has already occurredbefore /k/ in some words, though it is still preserved in the orthography, e.g. walk, talk,balk, etc..1.3.1 SummaryTo summarize the general model of sound change that has just been presented: Languagesmust be successfully transmitted from speaker to learner through the medium of physicalspeech in order to survive over time. There are numerous factors involved in articulation andperception, so-called channel bias (Moreton 2008), that impede successful transmission. Inaddition, the listener cannot necessarily know the intentions of the speaker, or if the signalhas been changed in any way. This creates the possibility that learners can misperceive someaspect of the speech signal. When learners eventually become speakers, this misperceivedelement is then re-transmitted to the following generation, making it part of the language(Ohala 1981, Blevins 2004).The term misperception in this context is intended to be neutral with respect to theactual source of the change. It could be due to perceptual or articulatory factors. Thekey point is that learners have acquired a language from the input that diers from thelanguage that generated the input, but they do not realize they have done so.The next step of the dissertation is to describe a computer simulation based on thisframework of sound change.1.4 An ILM for phonology1.4.1 OverviewSimulating the evolution of consonant inventories requires simulating multiple, potentiallyinteracting, sound changes over multiple generations. The following is an overview of Py-ILM, the simulation software designed for this dissertation1. Algorithm 1.2 represents ggenerations of transmission, with each agent learning from d words.At the core, this is just a model of the transmission of sound strings. There are no mor-phological or phonological processes, and agents can always recover the intended meaningof a word. Full technical details, including descriptions of various algorithms, are givenin Chapter 2. In the following sections, I will focus more on the higher-level cosrnceptualdetails and theoretical assumptions that went into building PyILM.1The simulation is an Iterated Learning Model (ILM) written in the Python programming language.The name PyILM follows a convention of the Python community of prepending py onto the names ofpackages or programs. The intended pronunciation is [pai.El.’Em]17Algorithm 1.2 Generalized Iterated Learning Model for phonologyGenerate a speaking agent with a lexiconGenerate a learning agent with no lexiconLoop g times:Loop d times:The speaking agent produces a word from the lexiconMisperception may alter some phonetic values in the wordThe learning agent assigns each sound in the word to a known phonological categoryIf no known categories match, then a new one is created.The speaker is removed from the simulation.The learner becomes the new speaker.A new learner is added into the simulation.1.4.2 One turn of a PyILM simulationTo understand how the simulation is intended to work, it is useful to give an overview of oneiteration of the simulation. This consists of the speaker producing a word, misperceptionspossibly occurring, and the listener learning something.1.4.2.1 ProductionThe turn begins with the speaking agent selecting a word to produce from the lexicon.Words in the lexicon are represented as strings of phonological categories (i.e. segments).These categories are, in turn, represented as a list of binary features of length F. Foreach category (segment) in a word, a production algorithm generates an array of length F,where the nthelement is a real number in [0,1], representing a phonetic value for the nthphonological feature. For example, assuming a very simple simulation with four features,[continuant, nasal, voice, sonorant], an instance of the category /b/ might be represented as[0.05, 0.30, 0.95, 0.04]. In an actual simulation, these numbers are determined by samplinga (truncated) Gaussian distribution for each feature. The distributions are inferred duringthe learning phase, except for the initial agent in a simulation, who is seeded with a set ofdistributions.This process of generating lists of phonetic values is done for each sound category inthe word, and then the resulting list is sent to the misperception function.1.4.2.2 MisperceptionMisperceptions in PyILM are modeled as probabilistic context-sensitive rules (which in-cludes the null context, i.e. context-free rules). They target sounds that have particularphonological features and which exist in specic contexts, which are themselves denedin terms of features. Misperceptions may also refer to word boundaries. The eect of a18misperception is to change phonetic values. An example of a nal-devoicing misperceptionwould be represented as:[+voice, −son]  [−.15voice] / _#, p=.3In prose, this reads as on any given utterance, there is a 30% chance that voicedobstruents have their [voice] value reduced by .15 when they occur in word-nal position.The idea is that a misperception changes the surface phonetic value of a sound such that itbecomes more likely the listener will categorize it as having the opposite underlying featurevalue.Here I wish to emphasize again that the term misperception is intended as a coverterm for any kind of eects that could occur during either production or perception, whichmight substantially aect what a learner infers about a language. Determining the origin ofa sound change is important in understanding specic changes in actual natural languages,of course, but in a simulation the distinction is irrelevant. The key point is that somethingdisrupts transmission and a learner has the potential to infer a sound that the speaker didnot intend.A list of misperceptions must be provided as input to the simulation. If none areprovided, then no sound changes will occur, and simulation is pointless. The number andtype of sound changes that can occur in a particular simulation, therefore, is limited. Thisis unrealistic, but it is an unavoidable constraint that comes with computer simulation;there must be a nite set of parameters to simulate. In any case, it is probably unfeasibleto draw up a list of all possible misperceptions. PyILM could be considered an ideal worldsimulation (cf. the ideal observer of James Kirby (2014b)) where we know everything thereis to know about what kinds of misperceptions are possible.An alternative, and more complex, way of modeling misperceptions would be to sim-ulate the vocal tract and auditory-perceptual systems of the agents in detail, and allowmisperceptions to arise naturally from the way these systems work. Such a simulationwould most certainly be useful in an eort to support the position that misperceptionsarise from phonetic factors. My aim for this dissertation, however, is not to explain howor why misperceptions occur. I simply assume that they do occur, and I am interestedtheir long-term consequences on the structure of inventories. It suces for these purposesto simulate the eect of misperception, rather than the cause. For an example of morecomplex modeling of physiology, see Oudeyer (2005a, 2005b, 2005c) on the evolution ofphonotactic constraints.In other words, the sound changing rules of PyILM (misperceptions) are intended asuseful abstractions that capture the spirit of how sound change is thought to work. Theyallow for a wide range of dierent sound changes to be simulated (including context-freeones). All elements of misperception are open to modication by the user and any numberof misperceptions can be active in a given simulation run.For example, both hypercorrection and hypocorrection can be simulated through theuse of these rules. An example that Ohala (1983) gives of hypercorrection, where the19learner erroneously factors out some aspect of the signal, is the unrounding of stops beforerounded vowels in Greek, /lukwos/ > /lukos/. To simulate the possibility that agentsmight hypercorrect in this situation, a rule such as [−continuant, +round]  [−.15round]/ _[+voc, +round], p=.1 would be included in the simulation.A hypocorrection, where a learner fails to account for a phonetic eect and assumes it isinherent to the signal, would be exemplied by pre-consonantal neutralization of a voicingcontrast. This could be represented in PyILM with a misperception such as [−sonorant,−vocalic, +voice]  [−.15voice] / _[−vocalic, −sonorant], p=.11.4.2.3 LearningIn the nal stage of a simulation turn, the misperception function sends the list of phoneticvalues to the learner. The learner receives this as a list, so the ability to parse speech intosegment sized units is assumed. Learning is done using an exemplar-based model, whereagents keep detailed representations of experienced events (Pierrehumbert 2001, Johnson2007). For each sound in the word, a learner stores the phonetic values in memory, thenattempts to categorize the sound by comparing these values to all other known categories. Ifany of them meet a threshold for similarity, then the input sound is placed in that category.Otherwise, a new category is created, the input sound becomes its sole exemplar, and futurelearning can be inuenced by this category.At the beginning of the learning phase, agents do not know any categories at all, so therst sound that is experienced becomes the rst category, and all others are built up fromthere. Categorization is done on the basis of phonetic similarity. Two sounds are consideredto be instances of the same category if they dier by less than some value (the particularthreshold is determined by a simulation parameter that can be set by the user). At the endof learning, agents infer a Gaussian distribution from the observed phonetic values, for eachfeature, for each category they have created. This distribution is what will be sampled bythe production algorithm when the agent becomes the speaker.1.4.3 Some notes on design1.4.3.1 Social factorsPyILM is not intended to model all possible types of sound change. It focuses specicallyon changes related to production and perception. Another major cause of change, which isnot simulated, is contact between dialects or languages.Contact can lead to one language borrowing words which contain sounds not foundin the native inventory. For instance, some Bantu languages are known to have acquiredclicks by borrowing from neighbouring Khoe-San languages (Güldemann and Stoneking2008). Other times, however, borrowing words leads to no changes at all: English has notacquired uvular fricatives or front rounded vowels, despite borrowing numerous words from20French which contain these sounds (e.g. hors d'oeuvre, objet d'art, maître d' ). Instead,those words have undergone adaptation to the English sound system.Borrowing is, in a sense, more arbitrary than sound change based on phonetic factors.Borrowing depends on coincidences of contact between cultures, and how the borrowingactually plays out depends on similarities between the sound systems of two languages,as well as various socio-linguistic factors. The frequency of borrowing, and its eect oninventories, over the history of a language is sporadic.It is eectively impossible for languages to avoid phonetically-based change, but changethrough contact is avoidable. The people living on North Sentinel Island represent theextreme case of contact-avoidance. Inhabitants of the island are hostile to outsiders, andoccasionally kill people who come too close (McDougall 2006). It is unlikely that theSentinelese have borrowed many words (at least not recently), but it is quite likely thattheir language has undergone some kind of sound change in the last several generations.Phonetic changes are more of a constant factor across time and across languages. Theydepend on factors related to human speech production and perception. We can expectphonetic factors to inuence unrelated languages in similar ways, leading to cross-linguistictendencies.1.4.3.2 Single-agent transmission changeAnother type of change not simulated is what Labov (2007) calls diusion. This refers tolanguage changes that occur when mature speakers of a language adopt the speech habitsof a dierent group of speakers. Labov contrasts this with the term transmission to referto language changes that occur as language is being passed from mature speaker to learner.PyILM is strictly a transmission chain, with a single learning agent and a single speakingagent at each generation.Diusion-chain versions of cultural transmission models do exist (Mesoudi and Whiten2008, Whiten and Mesoudi 2008), and have even been applied specically to the study oflanguage (Smith and Wonnacott 2010). Along the same lines, Griths and Kalish (2007,Section 7, p.470) show how the mathematics of their single-agent model generalizes to largerpopulations (though this is not strictly related to diusion chains).Modeling transmission or diusion requires dierent design choices, because the fun-damental factors driving language change are dierent in each case. Transmission-chainmodels represent an acceptable level of simplication, given the goals of this dissertation.A more nuanced outcome could be achieved by combining diusion and transmission in asingle simulation. This would allow the nature of the input to the learner to vary more asthe speaking agents possibly change their behaviour throughout the speaking phase.211.4.3.3 Discrete learning periodIt is common in agent-based simulations for there to be a specic learning phase, after whichagents can no longer learn anything new. This approximately simulates the real-worldsequence of events where one's ability to learn language is greatest as a child, commonlyknown as the critical period (e.g. Newport et al. 2001, Pallier 2007), and slows down withage. Having a sharp, and arbitrary, cut-o point is a simplication for the purposes ofcomputer simulation.More broadly, research has found that the way people speak continues to change over thecourse of their life. For instance, Harrington et al. (2000a, 2000b, 2005, 2006) studied theChristmas broadcasts of Queen Elizabeth II taken from a period of roughly 30 years. Theyfound evidence of a change in vowel pronunciation, in particular that the Queen's vowelswere becoming more like those of Standard Southern British speakers. In another case,Sanko and Blondeau (2007) found a change in the pronunciations of the rhotic consonantof Montreal French, with some adult speakers moving from an apical /r/ to a dorsal /;R/.1.4.3.4 No teleologySound change occurs for entirely non-teleological reasons in this model. The only thingthat happens is that learners learn from the input. They make no assumptions about whatsound systems should look like. In contrast, it is common in other models to give agentsadvance knowledge about the sound system. For example, James Kirby (2014b) describesa computational ILM for simulating a change currently underway in Khmer, where anaspiration contrast (e.g. /ka/ vs. /kha/) is being replaced by a tonal contrast (/ka/ vs./ká/). The overall design of Kirby's simulation shares much in common with PyILM, butthere is a crucial dierence in that Kirby's agents are modeled as ideal observers which isa type of Bayesian classier (see Geisler 2003). These classiers require a prior probabilityfor each phonetic category, in order to compute the probability that an input sound belongsto that category. This eectively means that agents have foreknowledge about which andhow many possible sound categories could exist in the language.Feldman et al. (2009) also use a Bayesian model for learning sound categories by learninga lexicon. In this model, learners know in advance that there are exactly 4 sound categoriesin the inventory. Kirby and Sonderegger (2013) consider the iterated learning of 2 vowels(not full words). Vowels are represented simply by an F1 value, and the distribution of F1values is known to all agents, and does not change over time.Along the same lines, in some models learning is done by selecting between a limitednumber of choices. Tupper (2015) gives a mathematical model for the conditions underwhich two vowels might merge over time, but considers only a case where agents knowinglychoose between /i/ and /e/.Wedel's (2007) computational model has a small set of underlying sounds that map toexactly one surface sound, and the special category /x/ which has two possible allophones.22The learner must select one of the allophones as an underlying form for /x/, and Wedelshowed how dierent types of learning error resulted in dierent outcomes.Advance knowledge of possible sound categories might be useful for a simulation of thechange from one specic inventory state to another, but it is undesirable for a general modelof inventory change. Sound inventories need to be free to change, grow, and shrink withina relatively large space of possibilities. In PyILM simulations, there are no pre-determinedsound categories at all. Agents simply build up categories based on the the informationavailable to them in the input.In this respect, the learning algorithm has more in common with de Boer (2000, 2001,2002). Agents in his simulations play an imitation game, where a speaking agent producesa single vowel and a listening agent tries to imitate it. de Boer's simulations use supervisedlearning, meaning that listeners are given feedback about how well they imitated, and theyuse this information to place prototypes into a vowel space. PyILM, in contrast, usesunsupervised learning where no feedback is given. Agents in de Boer's simulations make noassumptions in advance about how many vowels there might be in a language. The nalnumber and type of vowels depends on the interactions between agents and the success ofindividual imitations. Vowel systems ranging from three to nine vowels emerged from deBoer's simulations.1.4.3.5 Phonemes and allophonesSounds are represented at two levels in PyILM. At a surface/phonetic level, sounds arevectors of real numbers. At an underlying/phonological level, sounds are lists of binaryfeatures. A simulation starts by generating a set of these categories for the rst agent.First underlying categories are created, and then a distribution of phonetic values, forsampling during the production phase, is generated for each of these categories.New sounds that are introduced through sound change are, at least initially, limited toa particular context (due to the context-sensitive nature of the misperceptions that giverise to them). These new sounds are considered as allophonic variants of whichever soundthey grew out of. For example, if /b/ lenites to [v] between vowels, and this is the onlyinstance of [v] anywhere in the lexicon, then [v] will be considered an allophone of /b/. Astime goes on, these allophones eventually cease to vary with another category, and attainthe status of a phoneme. This transition, from misperception to allophone to phoneme, isintended to parallel the real-world process of phonologization, where an initially phoneticeect eventually becomes a xed part of a phonological system (Bermúdez-Otero 2007).These categories have no bearing on the outcome of a simulation. Agents are not awareof what is a phoneme and what is an allophone, and neither the production algorithm northe learning algorithm ever specically reference these categories. Within a simulation,everything is considered to be a segment. The categorization of a sound as a phoneme orallophone is done at the end of a simulation, as a tool for understanding how sounds aredistributed in a lexicon.23This issue will be discussed in more detail in Chapters 3 and 4 along with specicsimulation results.1.4.4 Expected outcomes and inventory structureThe set of misperceptions that is supplied to a simulation acts like the bottleneck in Kirby's(1998, 2000, 2002) simulations of syntax. They are the main constraints preventing thesuccessful transmission of sounds over time. By the end of a simulation, the expectationis for an inventory to have whatever set of sounds is least likely to be aected by thesemisperceptions, given the set of phonetic contexts in the lexicon.In simple cases, it is even possible to predict what the outcome will be before runninga simulation. Consider the simplest situation when only a single misperception is actingon transmission. For discussion purposes, assume there is a nal-devoicing misperceptionwhich sometimes makes word-nal voiced obstruents less voiced than they are in otherpositions. Suppose that two simulations are run, each starting with a lexicon generatedfrom the sounds /b, d, g, i, a/. The dierence now is that one simulation has a lexiconwith only V or CV syllables, while the other allows up to CVC syllables.The nal inventory of the CV language is easy to predict in this case: it will be /b, d, g,i, a/, i.e. it will not have changed, since the relevant environment for devoicing does not existin the lexicon. On the other hand, the CVC language might develop an inventory as largeas /p, b, t, d, k, g, i, a/, depending on the specic contexts in the lexicon and how oftenmisperceptions actually occurred during the simulation. For instance, if the distribution of/b/ in the initial lexicon was restricted to nal position, then all instances of /b/ are proneto misperception, and there is a low probability of /b/ still existing in the nal lexicon, andhigh probability of /p/ existing in at least one word. If no words in the initial lexicon endedin /g/, then there is no reason for /k/ to be in the nal lexicon. Predicting the outcomebecomes more dicult, or impossible, with a larger number of interacting misperceptionsadded to the simulation.Selection for learnability occurs as sounds from the original inventory of the languagechange due to misperception. If a sound cannot reliably be learned, given the environmentsin the lexicon and the set of misperceptions acting on transmission, it will probably notsurvive the entire simulation. The nal inventory is one that is in a sense optimized for itsown transmissibility.This simple example also demonstrates how patterns in inventories can be derived non-teleologically by modeling only individual sound changes. If we could watch the CVClanguage generation-by-generation, we would observe what Martinet (1952, 1955) calledgap lling. Assume that all voiced stops appear in both initial and nal position inthe very rst lexicon. After some number of generations, one of them would devoice dueto misperception in nal position, creating a stop inventory of, say, /b, t, d, g/. Laterdevoicing misperceptions could change the inventory to /p, b, t, d, g/, and this createsan apparent gap at the velar place (although in this hypothetical example the gap is the24opposite of the one normally found in natural languages, where /g/ is more likely to bemissing). Eventually, misperception will ll in that gap for a full stop system of /p, b, t,d, k, g/, but at no point did agents intend for this to happen. They simply learn from theinput, and a full suite of consonant pairs is a side-eect of the way that misperception isaecting the learning data.1.5 SummaryThis chapter introduced the concept of selection for learnability. This refers to the ideathat certain linguistic patterns tend to persist over time because they are more likely tobe successfully transmitted to the following generation. This concept was rst developedfor the study of syntax and morphology by Kirby (2000, 2001, 2002) who showed thatcompositional morpho-syntax can emerge over time because it is more learnable, given alimit on the number of sentences that a learner can have access to.I proposed that phonological inventories are also selected for learnability, although in adierent way. Learners are not constrained by any bottleneck on transmission (since thenumber of phonological categories in a language is nite and relatively small in number).Instead, the strongest inuence on inventories comes from channel bias (Moreton 2008),which refers to phonetic eects like co-articulation or acoustic similarity. Channel biasintroduces variability into the input of a learner, and this variability is the precursor tosound change. If learners misperceive some aspect of speech due to channel bias, thenthis misperception gets retransmitted to the following generation when learners becomespeakers.For the purposes of this dissertation, I use the term misperception very broadly, andit refers to any kind of sound change that occurs because a learner infers a sound systemthat diers from the one used by the speakers of the language at the previous generation.This notion of misperception is in turn taken from the models developed by Ohala (1981,1983, 1992, et seq.) and Blevins (Blevins 2004, 2007).I implemented the basic assumptions of these diachronic models into a computer simu-lation called PyILM, which will be used to investigate three aspects of phonological inven-tories: their size, the relative frequency of their segments, and their feature economy. Thefollowing chapter provides more specic technical details of how PyILM works.25Chapter 2PyILM2.1 IntroductionThis chapter details PyILM, a computer program written in Python for simulating lan-guage transmission, with a focus on phonology. PyILM's design is informed by theoriesof sound change through misperception (e.g. Ohala (1983), Blevins (2004)), and its formalimplementation is based on the Iterated Learning Model (e.g. Smith et al. (2003), Kirby(2001)). These topics were discussed in detail in the previous chapter.PyILM simulates the transmission of a lexicon over multiple generations. It createsagents arranged in a chain, each of which learns a phonological system from the output ofthe previous agent. PyILM allows users to manipulate numerous parameters of this processand run iterated learning simulations to explore how sound change happens under dierentconditions. Section 2.2.2 of this chapter gives a complete list of the parameters that a usercan set. Section 2.4 gives some more details on how to use and congure a simulation.2.1.1 Iterated Learning ModelsComputational models of iterated learning, e.g. Kirby (1999, 2001), follow this basic patternof nested loops:Generate a speaking agentGenerate a l e a rn i ng agentLoop x times :Loop y times :The speaker produces an ut te ranceThe l e a r n e r l e a r n s from th i s ut t e ranceRemove the speaker from the s imu la t i onMake the l e a r n e r the speakerCreate a new l e a r n e r26This pattern represents x generations of language transmission with y learning itemsat each generation. The corresponding loops for PyILM are given in Algorithm 2.1. Theseloops are explained in pseudo-code, which I will continue to use throughout the chapterto explain the logic of PyILM. Pseudo-code is text consisting of valid Python expressions,most of which appears as actual lines of code in PyILM. However, some of the code hasbeen changed to make it more readable in the context of a dissertation, hence the namepseudo-code.Algorithm 2.1 Main simulation loop1 Simulat ion . load ( " c on f i g . i n i " )2 speaker = BaseAgent ( )3 Simulat ion . i n i t i a l i z e ( speaker )4 l i s t e n e r = Agent ( )5 for gene ra t i on in range ( S imulat ion . g ene ra t i on s ) :6 for j in range ( S imulat ion . words_per_turn ) :7 word = speaker . t a l k ( )8 word = Simulat ion . t ransmit (word )9 l i s t e n e r . l i s t e n (word )10 Simulat ion . r ecord ( gene ra t i on )11 speaker . clean_up ( )12 speaker = l i s t e n e r13 l i s t e n e r = Agent ( )14 Simulat ion . generate_output ( )Line 1 loads user-provided details about the simulation and congures PyILM appro-priately. Line 2 creates a new agent for the rst generation of a simulation, and Line 3seeds it with an initial lexicon and inventory. Line 4 creates a new blank listener who willlearn her language from the speaker. To be clear, this initialization phase is not intendedto represent any actual events in language transmission. PyILM only simulates the evolu-tion of language in the sense that it simulates how languages change over time; it does notsimulate the emergence of language from non-language. The rst speaker in the simulationrepresents some speaker at some point in the history of some language. Note that the rstspeaker is formally a dierent kind of object in the program than the other speakers, sincethe rst speaker requires a set of initialization functions, while later generations rely onlearning algorithms.Line 5 starts a loop that runs once for each generation being simulated (see sec-tion 2.2.2.2). Line 6 starts a loop that runs once for each word a learner hears.In line 7 a speaker chooses a word to say (see the production algorithm, section 2.3.4).Line 8 simulates misperception by changing some of the segments of the word (see sec-tion 2.2.10). Misperceptions are context-sensitive, and probabilistic, so sometimes nothing27at all happens on this line. On line 9 the learner learns from this new word (see the learningalgorithm, section 2.3.1).On line 10, PyILM keeps a record of what the speaker's inventory and lexicon look likebefore the speaker is removed from the simulation on line 11. On line 12, the new speakercreates some probability distributions to be used during the next production phase. Line13 generates a new listener for the next loop to start over again.Line 14 is executed only after the nal generation of the simulation has nished learning.It prints a report of what happened during the simulation, using the information logged ateach generation on line 10. The program then terminates.2.2 Objects2.2.1 OverviewPyILM was written using an object oriented approach to programming. An object is wayof representing a concept in a computer program. In the case of PyILM, objects representconcepts relevant to sound change or phonology. Objects have attributes which representproperties or characteristics of the objects and their values may be xed or mutable. Objectsalso have methods which describe what the object can do. An example of an object is theFeature object, which represents the concept of a distinctive phonological feature. Featureobjects have two attributes: name and sign. These are both strings, with name having avalue of something like voice or continuant and sign having a value of either + or -.They also have an equal_to method, which is used to decide if two Feature objects are thesame or not.This example also illustrates two typographical conventions adopted in this chapter.First, the names of objects in the computer program are written with an initial capitalletter to distinguish them from the use of the same words to refer to concepts in linguistics,e.g. the Feature object vs. distinctive feature. Second, the typewriter font is used whenreferring to object attributes and methods.This section explains the main objects in the simulation, as well as their relevant at-tributes. Details about object methods are, for the most part, omitted here because theyare generally not of relevance for understanding how the simulation works. One exceptionto this is the Agent object, which has methods for speech production and learning thatare important to understanding the simulation. Examples of omitted methods include:methods for making equal to and not equal to comparisons, methods for generating stringrepresentations, and methods for reading and writing to les.There are nine objects discussed in this section: Simulation, Word, Segment, Feature,FeatureSpace, Sound, Token, Agent and Misperception. To understand the relationshipbetween them, it is useful to think of the objects in PyILM as being stacked inside oneanother. The diagram in Figure 2.1 is a visualization of this. Note that the gure is not28Figure 2.1: The objects of PyILMa description of object inheritance - it is a visualization of how the objects t togetherconceptually.Every run of the simulation creates a new Simulation object, inside of which there areAgent objects that talk and listen to each other. Agents all have a lexicon attribute whichcontains Words, which are made up of Segments, which are made up of Features.A speaker uses a production algorithm to transform Segments into a dierent kind ofobject called a Sound, representing an actual speech sound instead of a unit in the mentallexicon. The phonetic characteristics of speech sounds are represented by objects calledTokens.The simulation passes Sounds through a misperception function, which may alter someof their Tokens' values, depending on the environment in which they occur. Then theseSounds are sent to the listener's learning algorithm which creates new Features and Seg-ments.The transmission of a single segment is illustrated in more detail in Figure 2.2. Aspeaker starts by selecting a word from the lexicon. The lexicon is a list of meanings, eachassociated with a string of segment symbols, each of which are translated into a set ofphonological features. All possible features that can be discriminated are kept together ina FeatureSpace object, where each feature is represented as an interval [0,1]. Figure 2.229shows only three feature dimensions (F1, F2, and F3), with each dimension represented asa number line. The points circled with solid lines are the range of values that represent[−feature] segments, and the points circled with dashed lines are the [+feature] values. Thelines in black represent the values a speaker experienced during their time as a learner. Thelines in red represent the particular phonetic values that the production algorithm choseon this occasion. Supposing that F1is [voice], F2is [continuant] and F3is [nasal], then thesegment is [+voice, −continuant, −nasal]. This could be represented as /b/ (among othersymbols).Figure 2.2: The transmission of a phonological segmentThe values chosen by the production algorithm are then sent to the misperceptionfunction. In this example the environment was right for a misperception to occur, and thelistener is going to hear the F1value - the voicing value - of this segment as lower thanintended by the speaker.The red lines in the learner's FeatureSpace represent where the values were stored. Ingure 2.2 the learner has already heard a number of words, and has formed some phono-logical categories. The boundaries will continue to shift over the course of learning.The values for F2and F3are interpreted correctly by the listener, that is, her learningalgorithm categorizes them as examples of the same phonological feature value as in thespeaker's FeatureSpace. Due to the misperception, F1is categorized as an example of theopposite feature value.302.2.2 Simulation2.2.2.1 OverviewEach time PyILM is run, a single Simulation object is created, and the entire simulationruns inside of this object. The Simulation object has methods for initializing simulationdetails, creating and removing agents from the simulation, causing agents to speak or listen,causing misperceptions to occur, and writing the results of the simulation to le. None ofthese methods are discussed here. Details about Agents (2.2.9) and Misperceptions (2.2.10)can be found in their own sections.The Simulation object has 10 attributes representing factors relevant to cultural trans-mission. These are discussed here, and it is possible for users to set the value of any ofthese attributes. See section 2.4 for details on how to do this.2.2.2.2 generationsThe value of this attribute determines how many generations of listeners the simulationshould run for. The default value is 30. This means the simulation stops after the 30thlearner has nished learning.2.2.2.3 initial_lexicon_sizeThis value of this attribute sets the number of words in the initial lexicon. The defaultvalue is 30. The words of the initial lexicon are created using the invention algorithm (seesection 2.3.5).2.2.2.4 initial_inventoryThis attribute controls the size and contents of the initial segment inventory of the simu-lation. The value supplied to should be a list of segment symbols separated by commas.For instance, p,t,k,b,d,g,f,s,h,m,a,i,e would be an acceptable starting inventory. The setof symbols used should correspond with symbols in a feature le (see the features_fileattribute in section 2.2.2.9).If some degree of randomness is desired, then the value supplied to initial_inventoryshould be two numbers separated by a comma. The rst number represents the number ofconsonants and the second the number of vowels. There must be at least one consonantand at least one vowel in every simulation. The segments are randomly selected from thefeature le (section 2.2.2.9). The simulation determines what is a consonant and what isa vowel by checking the value of the [vocalic] feature : [+voc] segments are called vowelsand [−voc] segments are called consonants.If no initial inventory is supplied, then the default value is 10 random consonants and3 random vowels.312.2.2.5 minimum_repetitionsDuring the production phase, the speaking agent will produce every word in the lexicon atleast minimum_repetition times. For example, if set to 2, then every word will producedat least twice. The default value is 1, and it cannot be set any lower.Words in a lexicon are grouped into frequency blocks, with the rst block containingwords that are produced exactly minimum_repetition times. Each successive block that iscreated has a frequency of twice the previous block. Doubling the frequency approximatesa Zipan distribution of words in natural language (Yang 2010). This blocking is donerandomly for the rst generation in a simulation. If any words are invented during thesimulation (see section 2.2.2.11) they go into the least-frequent block.In natural language, a similar Zipan distribution holds of individual words: the mostfrequent word in a language is about twice as frequent as the next one, which is is twiceas frequent as the next, and so on. Early testing with PyILM found that implementingfrequency distributions on a per-item basis resulted in simulations that ran for far too long.By grouping words into frequency blocks, each of which is twice as frequent as the next,the running time of the simulation is greatly improved while still maintaining a Zipf-likedistribution.2.2.2.6 min_word_lengthThis value sets the minimum number of syllables a word must have. The default value is1, and it cannot be set lower.2.2.2.7 max_word_lengthThis value sets the upper bound for the number of syllables a word can have. The defaultvalue is 3, and it must be equal to or greater than min_word_length. These min and maxvalues are used by the agent's invention algorithm when generating new words (see section2.3.5).2.2.2.8 phonotacticsThe phonotactics attribute is used by an agent's invention algorithm (see section 2.3.5)for creating new words. Invention happens in every simulation run to generate the lexiconof the very rst agent. Invention otherwise only occurs if Simulation.invention_rate isset greater than zero (see section 2.2.2.11).The value supplied to this attribute should be a single string that consists of only theletters C and V. This string represents the maximal syllable structure. By default,PyILM assumes that all possible sub-syllables should be allowed. For instance, if the valuesupplied is CCVC, then the set {V,CV,CCV,VC,CVC,CCVC} will serve as the set ofpossible syllables. The simulation determines which segments are consonants and which32are vowels by looking at the segment's value of [vocalic]. [−voc] are treated as C and[+voc] are treated as V.It is possible to exclude a subset of syllables by listing them after the maximal form sepa-rated by commas. For instance, if the string supplied to this attribute is CCVCC,VC,VCCthen the maximal form is CCVCC, and all of its sub-syllables are allowed except VC andVCC. In other words, simulation will use the set {CV,CVC,CCVC,V}. All languages inPyILM must allow a syllable consisting of at least a vowel, so the syllable V cannot beexcluded.The phonotactics of a language are xed for a given simulation. This is whyphonotactics is an attribute of the Simulation object, as opposed to being an attribute ofthe Agent object. The phonotactics play a role in the outcome of a simulation (since sound-changing misperceptions are context-sensitive and phonotactics denes the set of possiblecontexts) so it is useful to hold them invariant for a simulation to understand their eecton sound change. The following chapter describes the output of a simulation, and there ismore discussion of the specic eect of phonotactics.The default value is CVC.2.2.2.9 features_fileThe value supplied for this attribute should be the name of a text le which gives a phono-logical feature description for possible segments. PyILM comes with a default features lethat describes several hundred segments using more than a dozen features, and will besucient in most cases. This le is a based on the ipa2spe le available in P-base (Mielke2008). However, users can modify this le or write their own. Each line of the le must haverst a symbol, then a list of phonological features, all separated by commas. An exampleof such a le, with a very small feature space, is given in Figure 2.3.33i,+voice,+cont,+voc,+highA,+voice,+cont,+voc,-highG,+voice,+cont,-voc,+highiP,+voice,-cont,+voc,+highi˚,-voice,+cont,+voc,+highv,+voice,+cont,-voc,-highg,+voice,-cont,-voc,+highi˚P,-voice,-cont,+voc,+highb,+voice,-cont,-voc,-highk,-voice,-cont,-voc,+highp,-voice,-cont,-voc,-highA˚P,-voice,-cont,+voc,-highf,-voice,+cont,-voc,-highA˚,-voice,+cont,+voc,-highx,-voice,+cont,-voc,+highAP,+voice,-cont,+voc,-highFigure 2.3: Sample feature leThe features provided in this le dene the dimensions of phonetic and phonologicalspace that can be used by a language. Every Segment in an agent's lexicon in the simulationconsists of some set of phonological features (see section 2.2.4.2). Every Sound uttered byan agent consists of a set of multi-valued phonetic features (see 2.2.7). The names ofthe features used in both cases are the same, and they are determined by the names infeatures_file.The symbols in this le largely serve as a kind of user interface. Sounds are actuallyrepresented in PyILM as numbers in a list, but this representation is unhelpful for a human.If PyILM has to print a symbol, for example when producing a report of the simulation, ituses these symbols as a more readable representation.2.2.2.10 max_lexicon_sizeThis sets a maximum size for the lexicon of the language. If the lexicon has reachedmaximum size and invention is required, then one of the least frequent words in the lexiconis selected (at random) and removed from the language, to be replaced by the newly inventedword. The default value is 30.2.2.2.11 invention_rateThis represents the probability that a speaker will produce words that were not in herinput. This could represent new words that have come into fashion during the speaker's life,coinages that she created herself, borrowings, or any other source of new words. Note that34invention never introduces new sounds, so if inventions are considered to be like borrowings,then it would be a case of complete adaptation to the native sound system. Further, inventedwords always match the phonotactics of the language (see section 2.2.2.8). The number ofinvented words is determined by the max_inventions parameters (see section 2.2.2.12).The value of this attribute must be between 0 and 1. If the value is set to 0, then agentsnever invent words and only the words used in the rst generation will be the ones usedthroughout (though, of course, their phonological and phonetic properties may change). Ifthe value is set to 1, then new words will join the lexicon each generation. See section 2.3.5for more details on the invention algorithm. The default value is 0.2.2.2.12 max_inventionsThe invention phase only happens once for each agent, at the beginning of the productionphase. Pseudo-code for this is given below. During the invention phase, the simulation willmake max_inventions attempts to generate a new word and add it to the lexicon. Thedefault value for this attribute is 0 (i.e. no inventions ever occur). The probability of aword actually being invented is set by the attribute invention_rate (see section 2.2.2.11).f o r j in range ( Simulat ion . max_inventions ) :n = random . random(#generate a random number in [ 0 , 1 ]i f n <= Simulat ion . invent ion_rate :word = agent . create_new_word ( )agent . update_lexicon (word )In a simulation where max_inventions has been set to X, and invention_rate is setto Y, the probability that X new words actually are invented at a given generation is xyfor x M (. The probability that no new words are invented is () − x)y. Suppose thatmax_inventions is 3 and invention_rate is 0.2. This means that for any generation,no more than 3 new words can enter the lexicon. The probability that only 2 new wordsare invented is (:2 × (:2 5 (:(4. The probability that no new words enter the lexicon is(:0× (:0× (:0 5 (:5)2.2.2.2.13 misperceptionsThis should be the name of a text le, in the same directory as PyILM, that contains a listof misperceptions that could occur over the course of a simulation. Each line should havesix arguments separated by semi-colons. The PyILM Visualizer also contains an option forcreating misperception les with a more intuitive user interface. See section 2.2.10 for moredetails on Misperception objects.The rst argument is the name of a misperception. The second argument is a list ofphonological features, separated by commas, that describe segments that can be altered by35the misperception. The third argument is the name of a feature which undergoes change ifthe misperception occurs. Multiple values for the third argument are not permitted, andmisperceptions can only change one feature at a time. The fourth argument is a number inthe interval (0,1) representing how much the phonetic feature changes if the misperceptionhappens. The fth argument is the environment in which the misperception can occur.Environments should be specied at the level of phonological features. A special value of* can be used to mean context-free. The sixth and nal argument should be a numberin (0,1) representing the probability that the misperception occurs. The default set ofmisperceptions are in le called misperceptions.txt that is bundled with PyILM and userscan also consult this le for an example of the format to follow.pre-nasal nasalization bias;-nasal,+voc;nasal;.05;_+nasal,-voc;.15word nal devoicing;+voice,-nasal;voice;-.1;_#;.2intervocalic lenition;-cont,-son;cont;-.1;+voc_+voc;.1default vowel voicing;+voc;voc;.05;*;.75In the rst example, the misperception targets non-nasal vowels. The nasal value ofthese vowels is raised by 0.05 when these vowels appear before nasal consonants, and onany given utterance where such a context appears there is a 0.15 chance this happens. Thesecond example targets voiced non-nasal segments. These sounds have their voicing valuesdecreased by 0.1 when they appear word nally, and this happens with a 0.2 probability.The nal example shows a context-free misperception, which is referred to as a bias.In this case, vowels have their voicing values raised slightly under all circumstances. Thisallows vowels to be aected by the nal-devoicing misperception, but not to the sameextent as consonants because their voicing value gets raised a little bit anyway. (Of course,vowels could be completely excluded from the nal devoicing misperception by changing thetargeted segments to be include [−voc] only.) In Chapter 5, the eects of misperceptionsand biases are explored in more detail.The amount by which a misperception changes a sound cannot be greater than 1 or lessthan -1. This is because phonetic values in PyILM must be numbers between 0 and 1. Ifthe eect of misperception would push a value above 1, then PyILM will force the valueback down to 1. Similarly if a misperception were to push a value below 0, PyILM willraise the value back up to 0.2.2.2.14 minimum_activation_levelThis attribute should be a number in [0,1] representing how close to an existing categoryan input examplar must be, in order to be considered for membership in that category.Technical details of the exemplar learning algorithm are given in section 2.3.1.36Setting this number closer to 1 sharpens an agent's discrimination, and permits smallerdistances between categories. This leads to inventories with more segments and less vari-ation in pronunciation. Setting it all the way to 1.0 means that an input exemplars onlycount as a member of an existing category if they have phonetic values that match exactlyto all other exemplars in the category. This is a rare occurrence, so inventories tend togrow rapidly with this setting.Setting this number closer to 0 blurs an agent's discrimination and increases the distancerequired between categories. Setting it all the way to 0 means that there is no distancebetween categories, and every input sound after the rst counts as an example of whateverthe learner rst heard. This leads to an immediate collapse in the segmental inventory andit reduces to a single segment within a generation.2.2.2.15 auto_increase_lexicon_sizeThe attribute initial_lexicon_size (see section 2.2.2.3), is used to determine the numberof words in the lexicon of the initial generation. These words are all randomly generated,using the set of sounds supplied to the initial_inventory (section 2.2.2.4) attribute.However, in this randomness, it sometimes happens that not all of the initial sounds actuallymake it into a lexical item. If auto_increase_lexicon_size is set to True, then PyILMwill continue to generate words for the initial lexicon until every sound occurs at least once,even if it means surpassing initial_lexicon_size. If auto-increasing is set to False, thenthe lexicon size remained capped at the the initial value.2.2.2.16 initial_wordsThis parameter allows the user to submit a list of words, separated by commas, that shouldappear in the lexicon of the initial generation. It is the user's responsibility to ensure thatthe words contain symbols which appear in the initial lexicon of the language. If a wordsupplied to initial_words contains a symbol not in the initial inventory, then PyILM willraise an exception and stop running. In short, this parameter cannot safely be used incombination with a randomly selected started inventory (see section 2.2.2.4).It is possible for a user to create words that do not conform to the phonotactics of thelanguage with this parameter, although this is not recommended as it may cause unexpectedbehaviour during the simulation.The initial lexicon is guaranteed to include every word supplied toinitial_words, even if this means going beyond the lexicon size supplied toinitial_lexicon_size (section 2.2.2.3). If this occurs, PyILM will also not enforcethe auto_increase_lexicon_size (section 2.2.2.15) parameter, and the initial inven-tory will consist only of the sounds found in the initial_words list. If, on the otherhand, the number of initial words is smaller than the initial lexicon size, PyILM willcontinue to randomly generate words until the lexicon is an appropriate size, and the37auto_increase_lexicon_size parameter works as expected.By default, this option is not turned on and the initial lexicon will consist of randomlygenerated words.2.2.2.17 allow_unmarkedNormally, sounds in a PyILM simulation are represented as lists of binary features, markedas either [+feature] or [−feature]. If the allow_unmarked option is set to True, then a thirdfeature value n is allowed (this is the unmarked value). If a sound is marked [nfeature],it means that every instance of that sound experienced by a learner had a phonetic value of0 on a given feature dimension (see sections 2.2.5 and 2.2.6 for more details on how featureswork in a simulation). In practice, [nfeature] would be used in cases where a feature doesnot apply at all to a sound, e.g. a glottal stop could be marked [nlateral] because the tonguebody is not involved whatsoever in the articulation of a glottal stop.Note that [nfeature] is not equivalent to [−feature], even though sounds with eitherfeature value will have low phonetic values on particular dimension. If the allow_unmarkedoption is used, it is important to ensure that misperceptions are formatted properly (seesection 2.2.10), and specically make reference to [nfeatures] if desired.The default value of allow_unmarked is False, meaning that only binary features areused in a simulation. Every simulation reported in this dissertation was run with the defaultvalue for this option.2.2.2.18 seedThis attribute controls the random seed used in PyILM. Its value can be any number orstring. By default is it a randomly selected integer in [1,10000]2.2.2.19 seg_specific_misperceptionsThis parameter used to create a special kind of misperception, for the purposes of a testinga hypothesis about feature economy to be presented later in this dissertation. Details aregiven in Chapter 5, Section 5.4. This parameter takes a value of either True or False, andthe default is False.2.2.3 WordsWords are generalized objects that represent either an entry in a mental lexicon or anutterance of one of these lexical items. Words have two attributes: string, which is a listrepresenting the segmental melody of the word, and meaning, which is an integer.382.2.3.1 stringIf a Word is in a lexicon, then the string attribute is an ordered list of Segments (seesection 2.2.4) representing the segmental content of a word, plus two word boundary symbols#. The value of string in this case is learned, and updated, as part of the learningalgorithm (see section 2.3.1).If a Word represents an utterance, then string is an ordered list of Sounds (see 2.2.7). Inthis case, the value of string is determined by the production algorithm (see section 2.3.4).2.2.3.2 meaningThe meaning attribute is just an integer. The meaning attribute in used when the learningalgorithm checks if an input word means the same thing as a known lexical item. Twowords are considered to mean the same thing if their meaning attributes compare equal.This only very roughly models the concept of meaning, and is certainly not representativeof what a real human speaker knows about the meanings of words. This simplication issucient for the purposes of modeling sound change.The rst word invented in the simulation is assigned a meaning of 0. Then a counter isstarted, and each new word is assigned the next integer. One consequence of this process ofgenerating meanings is that there can be no poly-morphemic words in any language. Everyword has a single meaning, so any language generated by PyILM is completely isolating.Another consequence is that no synonyms will ever appear in the language. If a listenerhears two words that mean the same thing, it will always be two instances of the sameword. The counter doesn't run backwards, so speakers will never invent a new way ofsaying something they can already say. There can be variations in the pronunciation of aword, due to phonetic eects or misperceptions, but there can be no completely unrelatedforms with the same meaning.2.2.4 SegmentsSegments represent mental categories, themselves representing speech sounds, that agentscan learn. Segments are the units that make up the words in an agent's lexicon. They arenot the actual speech sounds that agents produce and perceive. In other words, they arelike phonemes, not phones. The objects representing speech productions are called Sounds,and they are described in section 2.2.7.Segments are not atomic objects. They are represented by a set of phonological fea-tures (see 2.2.5). Features in turn are abstractions representing a ranges of values along aparticular dimension of phonetic space.The ability to segment speech is taken for granted in PyILM and learners are assumed tobe able to portion out the speech signal in such a way as to form some kind of segment-likeunit. The particular phonetic and phonological characteristics of segments are, of course,learned during the simulation and not pre-determined.39Segments have three attributes: symbol, which is an identier for the segment,features, which is a list of distinctive phonological features, and envs, which lists allthe environments the segment appears in.2.2.4.1 symbolSegments all have a symbol attribute, which is a string of Unicode characters (normally butnot necessarily of length 1). The symbols are drawn from those provided to the simulation'sfeatures_file variable (see section 2.2.2.9).Symbols are just a convenience for the simulation, and the actual choice of a symbolcan be entirely arbitrary. However, experience has found that choosing segment labels atrandom makes it very dicult to interpret the simulation results. The natural intuition ofa linguist is to assume that IPA characters are used meaningfully, so if instead they arerandomly associated with features, then reading simulation results becomes a frustratingpuzzle of trying to remember what, e.g. /p/ stands for this time. Instead, PyILM triesto choose a reasonable symbol for a given segment by selecting a symbol for a segmentwhose feature description most closely matches the segment under consideration. Thismeans that in most cases, the segment symbol will match the phonetic values in a way thatmakes linguistic sense, although it is always safer to inspect the actual feature values andnot rely on the symbol.In some cases, sounds can appear in a simulation that have a feature specication notfound in the user's feature le. For instance, a user might include a misperception thatnasalizes stops under some conditions. Normally, stops are [−sonorant] and nasals are[+sonorant], but this nasalization misperception can create sounds that are [−sonorant,+nasal]. PyILM needs a symbol for such a sound, and if it cannot nd a perfect matchin the user's feature le, it will take a sound that matches most of the features. This canoccasionally lead to unexpected results when visualized (e.g. PyILM might pick a symbolfor a plosive to represent the non-sonorant nasal). Again, it is always safer to check featurevalues than to rely on the assigned symbol, especially in simulations with a large numberof misperceptions and less-predictable outcomes.2.2.4.2 featuresThis attribute is a list of distinctive Features (see 2.2.5) that uniquely characterize thesegment. These values are inferred by the agent's learning algorithm (see section 2.3.1),and may change over the course of learning. They are xed after learning ends, and donot change once the agent becomes the speaker. The actual distribution of phonetic valuescorresponding to a particular segment is recorded in a FeatureSpace object. These objectsare described in more detail in section 2.2.6.402.2.4.3 envsThe envs attribute keeps track of all the environments in which a segment appears. Theenvironment of a Segment is dened as as the immediately adjacent Segments to the leftand right. More formally, the environment of Segment in position j of a Word's string is atuple consisting of the Segment at position j -1 and the Segment at position j+1. Words inan agent's lexicon begin and end with word boundaries, and these provide the appropriateenvironment for word-initial and word-nal segments. Although word boundaries are for-mally treated as Segments, they dier from the Segments described in this section in thatthey lack phonological features and they are not considered part of an agent's inventory.They only exist as part of an agent's lexicon; the objects transmitted to learners do notcontain word boundaries and must be re-constructed by the learner. This has no practicaleect on the simulation at the moment because speakers only ever utter one word at a time.2.2.4.4 distributionThe distribution attribute is a normal probability distribution representing the distributionover possible phonetic feature values for the segment. An agent keeps in memory onedistribution for each feature dimension. These are created at the end of the learning phase.2.2.5 FeaturesA feature-dimension is a one-dimensional space, a number line, representing some salient,gradable property of speech that listeners are aware of and can use to categorize speechsounds. Feature objects represent phonologically signicant ranges of values, and are cre-ated by the learning algorithm (see section 2.3.1). These ranges represent values for phono-logical features, and there are three possible feature values: +, −, n. The [+feature]category is for ranges of higher values, and the [−feature] category is for ranges of lowervalues. The actual values depend on the details of the simulation in question. A third valueis [nfeature] which is assigned to segments that are always expressed with a value of 0 forthe feature (e.g. a glottal stop would be [nlateral]). By default, the n value is not used insimulations, and all features are binary. To use it, set Simulation.allow_unmarked to Truein the conguration le.The number of feature dimensions is determined by the features listed inSimulation.features_file (see 2.2.2.9). The clustering into features is also illustrated ingure 2.2 on page 30.Feature objects have two attributes: sign and name. The sign attribute can havevalue of + or -, (and possibly n) and name is drawn from those provided toSimulation.features_file. The value of n is only available if the allow_unmarkedoption is turned on (see section 2.2.2.17). This option is turned o by default.The set of feature-dimensions is xed at the beginning of a simulation and does notchange throughout (see section 2.2.2.9). This set is represented as a FeatureSpace object41(see section 2.2.6). Having a xed set of features does not, of course, mean that every oneof them will participate in a contrast for a given simulation. For instance, it is possiblefor a language to have no lateral consonants in the initial generation, meaning no segmentsmarked [+lateral], and to not acquire any over the simulation run. So long as every soundhas relatively low values along the [lateral] dimension, they will all get classied as [−lateral],and the feature will not be contrastive.2.2.6 FeatureSpaceA FeatureSpace object is a multidimensional space representing all possible phonetic values.Every utterable sound in the simulation can be represented by some point in this space.FeatureSpaces have f feature-dimensions, where f is the size of the set of features providedto the Simulation.features_file attribute (see section 2.2.2.9). Points in any givendimension fall somewhere in the interval [0,1]. Formally a FeatureSpace is just a Pythondictionary (a hash table) where the keys are the names of a feature-dimension and valuesare lists of Token objects (see section 2.2.8) that are stored along that feature-dimension.This has the consequence that one phonological feature in a language corresponds toonly one kind of phonetic feature along a single dimension. This is unlike natural languagewhere a range of phonetic characteristics may be related to a phonological feature. Forinstance, [voice] may correspond with VOT, burst amplitude, and F0 (e.g. Lisker (1986),Raphael (2005)).2.2.7 SoundsIterated learning models place a heavy emphasis on the fact that language exists in both amental form and a physical form. The Segment objects discussed in section 2.2.4 representpart of mental language. The corresponding object representing physical speech is calleda Sound. The dierence between a Segment and a Sound is analogous to the dierencebetween a phoneme and a phone. Sounds are created by the production algorithm (sec-tion 2.3.4), further manipulated by the misperception function (section 2.2.10), and serveas input to the learning algorithm (section 2.3.1).Sounds as objects are similar to Segments. Sounds only have symbol and features at-tributes, although features is a list of Token objects (see section 2.2.8), rather than Featureobjects which is the case with Segments. A Sound exists for only a single event of trans-mission, then is removed from the simulation. The environment in which a Sound occursis calculated in place by the misperception function, or the listener's learning algorithm, asneeded.2.2.8 TokensTokens represent the spoken values of Features (section 2.2.5), much like Sounds (sec-tion 2.2.7) represent spoken Segments (section 2.2.4). Features actually represents ranges42of phonetic values, and a Token object represents one value from that range. A Token hasfour attributes: name, value, label, and env.2.2.8.1 nameThe name attribute is the name of whichever phonetic feature this Token represents (seesection 2.2.2.9).2.2.8.2 valueThe value attribute is a number in [0,1]. In a sense, value represents how strongly agiven sound expresses a particular feature (whichever feature is given for the name attribute).Larger values are intended to represent increasingly salient or prominent information, al-though what this means would depend on the feature in question. What makes somethingmore [nasal] in actual speech would depend on, e.g. nasal airow, degree of closure, nasalityof adjacent segments, etc.2.2.8.3 labelThe label attribute is a reference to the symbol attribute of one of the Segment objectsin the agent's inventory, indicating that a Token counts as an exemplar of that particularsegment. The values along a given feature dimension that are associated with a particularsegment will change over the course of learning, and Token labels are continually updatedto keep in line with changes to the inventory.2.2.8.4 envThe attribute env represent the environment in which the Token was perceived. Thisconsists of a tuple of two references, the rst a reference to the Segment on the left andthe second a reference to Segment on the right. These references allow a dynamic updatingof the FeatureSpace as the learner's inventory changes over the course of learning. Forinstance, suppose a learner has perceived a word-initial Token before a vowel which getslabeled /e/. The env attribute of this Token would be the tuple (#,e). If the category /e/later gets merged with another vowel category and has its label changed, perhaps to thelabel /i/, then the env of this Token would be automatically updated to (#,i).2.2.9 AgentsAgent objects represent the people who learn and transmit a language. Agents have fourimportant attributes: lexicon, inventory, feature_space, and distributions. Thereare also three Agent methods described in their own section: a production algorithm (2.3.4),a learning algorithm (2.3.1), and an invention algorithm (2.3.5).43In addition to the Agent object, there is also a BaseAgent object, which is used foragents in the 0th generation of the simulation. The two objects share many attributesand methods, and formally speaking Agent inherits from BaseAgent. These distinctionsare largely unnecessary for understanding how the simulation works, however, so they areignored here and I present all of the relevant information under the general heading ofAgent.2.2.9.1 lexiconEntries in the lexicon are represented by another kind of object called a Word (see sec-tion 2.2.3). Words are learned and stored in the lexicon as part of the learning algorithm(see section 2.3.1). Lexicons are essentially just storehouses of words. The lexicon is amapping between meanings and lists of Words that can be used to convey that meaning.Each possible Word is stored alongside the raw count of how many times it appeared in anagent's input. Multiple Words can become associated with the same meaning through mis-perception. For instance, in a simulation with a nal devoicing misperception, the meaning17 might be associated with the list /pad (6), pat (4)/, which would indicate that [pad] washeard 6 times during an agent's learning phase, while the word [pat] was heard 4 times.Put another way, meanings are analogous to lexical items, and Words are phonologicalrepresentations of these lexical items.2.2.9.2 inventoryThe inventory of an agent is a list of all the Segments that appear in at least one Word inthe agent's lexicon. The inventory is used in learning to make comparisons between words(see section 2.3.1). The inventory is also used by the invention algorithm (see section 2.3.5),which creates new arrangements of known segments.2.2.9.3 feature_spaceThis attribute just serves to point to a FeatureSpace object (see section 2.2.6). This objectrepresents a multidimensional phonetic space, and every sound that an Agent can hearor produce is represented as a point in this space. The feature_space of an Agent isinitially empty, and it gets lled with points, which are then clustered, during learning.The production algorithm makes use of these clusters for deciding on what phonetic featurevalues to assign to dierent phonological values.2.2.9.4 distributionsAn agent's distributions attribute is a dictionary organized rst by segment, then byfeature. Each feature is mapped to a probability distribution, which is sampled by the44production algorithm when it needs phonetic values. This is described in more detail insection 2.3.4.1.2.2.10 MisperceptionThe idea behind misperceptions is that some sounds, in some phonetic environments, aresusceptible to being perceived by the learner in a dierent way than the speaker intended.For instance, there is a tendency for word nal voiced obstruents to be pronounced in sucha way as to be perceived as voiceless (Blevins (2006b)). This can lead to instances ofmisperception where a speaker intends /bab/ and the learner understands /bap/.Misperception objects are intended to represent factors inherent to human communica-tion that aect perception of sounds probabilistically, in well-dened environments. Theseare factors that could potentially aect speech perception at every utterance, and so becomerelevant to the cultural transmission of sounds. For instance, speakers produce oral vowelswith more nasality before nasal consonants (Chen (1997)). This fact about the pronun-ciation of vowels in certain environments means that in the transmission of any languagewith words that contain a sequence of an oral vowel followed by a nasal consonant, thereis some small probability that learners will mistakenly interpret these vowels as inherentlynasal, leading to a sound change where vowels articulated as oral vowels at one generationare articulated as nasal vowels in a later generation (cf. Ohala 1983)On the other hand, Misperception objects are not intended to represent instances ofmisperception caused by e.g. the conversation happening at a loud concert, or peanutbutter in the speaker's mouth. These factors certainly aect production and perception ofspeech sounds, but they do not occur with enough regularity to be worth including in asimulation of cultural transmission.Formally speaking, a misperception is a probabilistic, context-sensitive change to aToken object's value attribute. Here are two examples:[+vocalic] → [nasal +.1] / _[+nasal], .2 (pre-nasal nasalization)[+voice, −son] → [voice -.15] / _#, .3 (nal-devoicing)The rst example reads as There is a .2 chance that Tokens representing [+vocalic]Segments have their [nasal] value increased by 0.1 if they occur in the environment before aSegment marked [+nasal]. The second example reads as There is a .3 chance that Tokensrepresenting voiced obstruents have their [voice] value decreased by 0.15 if they occur inword-nal position.The probabilities are arbitrary and chosen for illustration. The probability of any mis-perception actually occurring is an empirical question, and not one that PyILM can be usedto answer. Instead, users can set this value and run multiple simulations to understand howhigher and lower values aect the overall course of sound change. In fact, all aspects of amisperception are open to modication by the user (see the Simulation.misperceptionsattribute, section 2.2.2.13).45Misperceptions have six attributes: name, which identies the misperception, target,which describes the segment susceptible to misperception, salience, which is a numberrepresenting units of change, env which describes when the change happens, and p, whichrepresents the probability of a change happening. These are described in the subsectionsbelow and section 2.2.10.7 gives the pseudo-code for how misperceptions are handled inPyILM.2.2.10.1 nameThe name attribute is a string used to for referring to the Misperception. It has no rolein the outcome of a simulation. In fact, its only use is for printing the report at the endof a simulation. PyILM lists misperceptions that applied during the simulation so thatusers can more easily understand why certain sound changes happened. Keeping this inmind, name should be something descriptive, such as pre-nasal vowel nasalization or nalobstruent devoicing.2.2.10.2 targetThe target attribute is one or more phonological features representing the class of soundsaected by the misperceptions. In the case of nal devoicing, this attribute would probablybe set to +voice, -son, -voc.2.2.10.3 featureThis attribute is the name of the feature that changes if the misperception occurs. In thecase of nal devoicing, the value of this attribute would be voice. The feature attributeis often, but not necessarily, one of the features listed in the target attribute. Only onename is allowed for this attribute.2.2.10.4 salienceThe salience attribute represents the magnitude of a change caused by misperception. Theattribute can be any real number in [−1,1]. If a misperception actually happens, then itssalience is added directly to the value of the aected Token (see section 2.2.8). However,Token values must remain in the range [0,1]. If the salience would drive a Token's valuebeyond those bounds, the value is rounded back to 0 or to 1.2.2.10.5 envThe env attribute is a string representing the environment in which a misperception takesplace. There are three possible formats for this string: X_, _X, X_Y, where X and Yare strings consisting of the names of one or more features separated by commas, and the46underscore represents the position of the sound that might be misperceived. For instance,the following are acceptable values:+voice_−nasal__+voice,−son+voc_+voc2.2.10.6 pThe p attribute is a number in (0,1) that represents the probability of a misperceptionoccurring.2.2.10.7 How misperception happensMisperception applies to the output of the production algorithm (see section 2.3.4, see alsogure 2.1 on page 29). The output of the misperception function is sent as input to thelearning algorithm. This means there are no further changes that can apply to any soundsin a word, once the word is received by the learning algorithm.The misperception function loops through the utterance, and checks to see if any of thesegments are in a position where they might be aected by misperception. This done bycomparing the environment of that segment to the env attribute of the Misperception. Ifthey match, then PyILM rolls the dice, so to speak, and there is some probability, basedon the Misperception's p attribute, that change happens. The pseudo code for this is givenbelow.Algorithm 2.2 Misperception function1 for sound in utte rance :2 e = Simulat ion . get_environment ( sound , ut t e rance )3 for mis in Simulat ion . mi spe rcept i ons :4 i f Simulat ion . check_for_misperception (mis , e ) :5 #i f a mispercep t ion i s a p p l i c a b l e in t h i s environment6 i f set (mis . f i l t e r ) . i s s u b s e t ( set ( sound . f e a t u r e s ) ) :7 #and i f i t i s a p p l i c a b l e to t h i s sound8 i f random . random ( ) <= mis . p :9 #and i f i t a p p l i e s on t h i s occas ion10 sound . f e a t u r e s [ mis . t a r g e t ]+=mis . s a l i e n c e11 #change the f e a t u r e va l u e s o f the sound472.2.10.8 A note on misperception denitionsThe way that misperceptions are dened can aect the outcome of a simulation. Misper-ceptions target phonological features. What this means is that when a misperception hashad its full eect, and a sound has switched categories, then the misperception will stopapplying. For example, suppose that a simulation has a word-nal devoicing misperceptionthat targets sounds marked [+voice, −son, −cont], and suppose further that /b/ appearsword-nally in the initial generation. Eventually a word like /ab/ will become /ap/. Atthis point, the devoicing misperception no longer applies to the nal consonant, because ithas become [−voice] and the misperception targets only [+voice] sounds.This is a design choice for PyILM, and it is not a claim about the way that soundchange operates. It is certainly not the case that the phonetic eects underlying soundchange suddenly stop occurring just because of the way that some people have organizedtheir mental grammar. The idea in PyILM is that after a sound has changed categories(e.g. from [+voice] to [−voice]), then it is irrelevant if any further phonetic eects occur.If a speaking agent has recategorized /b/ as /p/ then it does not matter if nal devoicingapplies to /p/ any more, since it is already voiceless.If, on the other hand, the nal devoicing misperception had been designed to target only[−son, −cont] sounds, without reference to [voice], then even after the switch from /b/ to/p/ the misperception will continue to apply. This leads to a polarization eect, wherethe phonetic values for a sound inuenced by misperception will continue to get pushedto the extreme ends of a feature dimension. For example, tokens of a sound aected bythe nal devoicing misperception will eventually all have a [voice] value of 0. (This mayalso cause a further recategorization if the allow_unmark option is enabled; see section2.2.2.17.) A misperception that raises a feature value will likewise eventually push all tokenof a category to have a phonetic value of 1.For this dissertation, all misperceptions were designed in such a way as to avoid thepolarization eect.2.3 AlgorithmsThis section details three kinds of algorithms used by agents in the simulation: learning,production, and invention.Information about an agent's phonological system is represented using an exemplarmodel (Johnson 2007, Pierrehumbert 2001). These are models of memory where learn-ers keep copies of every experienced speech event. These copies are known as exem-plars. Exemplars are stored in a multidimensional space, and can in principle be storedat any level of detail. In PyILM, this space is a FeatureSpace object (see section 2.2.6),and the exemplars are stored at the level of phonetic features as Token objects (see sec-tion 2.2.8). The number of dimension in this space is equal to the number of features listedin Simulation.features_file.48Both the learning and production algorithms are inuenced by the exemplar model.The learning algorithm works by comparing input values to the exemplars in memory. Theproduction algorithm generates phonetic values from a distribution that is created basedon the exemplar space.This section on Algorithms is organized from the perspective of a newly created agentin the simulation. The rst thing an agent does is learn, followed by an update of theirlexicon and inventory, and nally an agent reaches the production phase.2.3.1 Learning algorithmThere are two phases to the learning algorithm: parsing and updating. In the parsing phase,the learner assigns a category to each incoming sound. The results of categorization areused in the second phase to update the lexicon and inventory. After these steps have beenrun for each input word the simulation runs a phonological feature clustering algorithm.2.3.1.1 Parsing a WordThe goal of this phase of learning is to assign each Sound of the incoming word to aSegment category. This is done by comparing the phonetic similarity of the input with allthe previous inputs that are stored in memory. If the input is suciently similar to anysegment in the learner's inventory, then it is assigned to that category. Otherwise, a newcategory is created. The overall learning process for a word is described in Algorithm 2.3.Learning starts with the input of a Word object (see section 2.2.3) consisting of Sounds(see section 2.2.7). Sounds have a features attribute, although this actually consists ofToken objects (see section 2.2.8), not Feature objects. Tokens have phonetic values, whichare real numbers in [0,1]. For each phonetic dimension, the new token is rst stored into theexemplar space. Then it is compared to every other token in the space, and an activationvalue is returned for each such comparison.The activation function referenced on line 5 is based on Pierrehumbert (2001). It isdescribed in Algorithm 2.4. Each of the existing categories is assigned an activationvalue, with higher activation values representing greater phonetic similarity. Activation ofan exemplar is measured as e raised to the power of the negative dierence between theinput token and the exemplar. Activation of a segment category is the sum of the activationof its exemplars.Agents have a threshold for similarity, which is controlled by a simulation parametercalled minimum_activation_level (see section 2.2.2.14). It is a number in [0,1] that rep-resents the degree to which a segment category must be activated in order for agents toconsider an input token to be a member of that category. A value of 0.8, for example,means that in order for a input token to be considered a member of an existing segmentcategory, the averaged activation of all exemplars for that category must be 80% of themaximum possible activation.49Algorithm 2.3 Learning algorithm1 def l e a r n i ng ( input_word ) :2 best_matches = l i s t ( )3 for sound in input_word :4 act ivat ion_matr ix = dict ( )5 for token in sound . f e a t u r e s :6 a c t i v a t i o n s = ca l c u l a t e_ac t i v a t i on s ( agent . inventory ,token )7 for seg , va lue in a c t i v a t i o n s . i tems ( ) :8 act ivat ion_matr ix [ seg ] . append ( value )9 #act i va t ion_matr i x [ seg ] [ j ] e qua l s10 #how much seg i s a c t i v a t e d11 #on the j t h f e a t u r e12 for seg in act ivat ion_matr ix :13 t o t a l = sum( act ivat ion_matr ix [ seg ] )14 act ivat ion_matr ix [ seg ] . append ( t o t a l )15 act ivat ion_matr ix . s o r t ( key=lambda x : x [−1])16 best_matches . append ( act ivat ion_matr ix [−1])17 #best_matches [ j ] e qua l s18 #the ca tegory wi th the h i g h e s t a c t i v a t i o n19 #for the j t h p o s i t i o n in the word20 category = None21 new_word = Word( )22 for seg , a c t i v a t i o n in best_matches :23 i f a c t i v a t i o n >= 0 :24 category = agent . inventory [ seg ]25 else :26 category = create_new_category ( seg )27 new_word . s t r i n g . append ( category )28 agent . update_feature_space ( category )2930 agent . update_inventory (new_word)31 agent . update_lexicon (new_word)50Algorithm 2.4 Activation function1 def c a l cu l a t e_ac t i v a t i on ( input_token ) :23 ac tua l_ac t iva t i on = sum(math . e**(−1*( input_token − exemplar ) )for exemplar in feature_dimension )45 min_activat ion = sum(math . e**−(1−Simulat ion .minimum_activation_level ) for exemplar infeature_dimension )67 d i s t ance = sc ipy . i n t e g r a t e . quad (lambda x : math . e**−x ,min_activation , ac tua l_ac t i va t i on )89 return d i s t anceThe activation function uses this minimum_activation_level parameter to calculatethe specic minimum activation level for the given feature dimension and segment cate-gory. Then it calculates the dierence between the actual activation and the minimum bytreating these values as points on the curve y 5 e−x and calculating their distance. If thisdistance is greater than or equal to 0 the input token meets the similarity threshold forthis segment category (at least on this feature dimension) and might be considered as apotential match. If the distance is a negative number, then the actual activation level islower than the minimum and the input token is not similar enough to this segment categoryon this dimension.These distances are returned to the main algorithm, and they are summed and addedto the activation matrix. Then the distances on each phonetic dimension are summed, andif any of these total distances is greater than or equal to 0, then the input token is assignedto the category with the highest value. Otherwise a new segment category is created.After learning, another algorithm searches for any spurious categories that might havebeen created. A spurious category is one where the interval of exemplar values representingthe category are a sub-interval of some other category, along every dimension.Spurious categories crop up early in the learning phase when the examplar space is stillsparsely populated, and they do not occur in every learning phase. To illustrate this, con-sider the following hypothetical simulation where the speaker's inventory has two fricatives/s/ and /z/. For simplicity, assume there are only three features: [nasal, continuant, voice].The speaker produces an example of /s/, which has values [0.01, 0.9, 0.1], i.e. the soundhas low nasality, high continuancy, and low voicing. This is the rst sound the learner hasheard, and it is assigned to the category labeled /s/, which matches the category in thespeaker's inventory (although the learner does not know this, of course).51Then the speaker produces an example of /z/, with values [0.02, 0.8, 0.6]. This isnearly identical to /s/ on the nasality and continuancy dimensions, but diers quite a loton the voicing dimension. Assume that in this case this dierence is sucient for thelearner to decide that this sound is not an example of /s/. Since /s/ is the only categorythe leaner knows yet, a new category has to be created for this new sound, and it islabeled /z/ (whether the learner actually does make a new category depends on the valueof minimal_activation_level, see section 2.2.2.14).Next, the speaker produces another example of /s/, this time with values [0.01, 0.85,0.35]. This sound is similar to both /s/ and /z/ on the nasality dimension, and relativelyclose to both on the continuancy dimension. On the voicing dimension, the new sound isquite distant from both /s/ and /z/. Assume the learning algorithm considers this soundto be close to neither /s/ nor /z/, and assigns to its own category, labeled /Z/ (again, in asimulation this would depend on minimal_activation_level). This category will becomethe spurious one. By the end of the learning phase, the range of values that the learnerassociates with /Z/ are going to be a indistinguishable from those associated with /s/, sinceboth of these sets of values were drawn from the same underlying distribution, namely theone the speaker associates with /s/.As learning progresses, the learning agent hears more and more examples of the frica-tives, and the exemplar space begins to ll up. For the purposes of this example, supposethat by the end of learning, there are exemplars of /s/ with nasality values ranging from 0to 0.03, continuancy values ranging from 0.8 to 1.0, and voicing values ranging from 0.05 to0.4. If the exemplar(s) associated with the (spurious) category /Z/ were to be fed back intothe learning algorithm at the end of the learning phase, they would surely be categorizedas /s/To check for spurious categories, an algorithm does a pairwise comparison of everysegment in the inventory. For each pair of sounds A and B, it checks if the minimumexemplar value of sound A is greater than or equal to the minimum value sound B, andalso if the maximum value of sound A is less than or equal to the maximum value of soundB, across every feature dimension. If both conditions are true, on every dimension, thensound A is considered to be spurious. In this case, all exemplars labeled A are relabeled B,and A is removed from the inventory.2.3.1.2 Creating new segment categoriesWhen an input Sound has phonetic values that are too dierent from any known category,a new Segment object is created. Its Token values are analyzed, and phonological featuresare assigned, as described in section 2.3.3. A symbol is then chosen for the segment basedon these phonological feature values.The symbol is chosen in a fairly simplistic way. The program consults the possiblesegments in the list provided to Simulation.feature_file (see section 2.2.2.9), and assignsa score to each of them by comparing distinctive feature values. A symbol scores 1 point52per feature match. The highest ranked symbols are put in a set and the rest discarded.This remaining set is further ltered to remove any symbols that are already in use inthe inventory. A symbol is randomly chosen from the remaining set members. A randomselection of this sort is safe, since the symbol has no eect on the outcome of the simulation,and exists purely to increase the readability of the output.2.3.2 Updates2.3.2.1 The lexiconOnce the input word has now been transformed into a list of Segment objects, the learnercan add it into the lexicon. If a word with this meaning has never been encountered before,the agent creates a new entry for this meaning in her lexicon, and adds the input word witha frequency of 1.If this meaning has been encountered before, the agent checks to see if this particularpronunciation is known, i.e. checks to see if there is a match between the input Word'sstring attribute and the string attribute of any Word already in the lexicon. If so, thefrequency of that Word is increased by 1, if not the input Word is added to the list ofpossible pronunciations with a frequency of 1.2.3.2.2 The inventoryIn the nal phase of learning, the inventory is updated. This may involve one of two things.If the input word contained phonetic values such that a new segment was created, then theinventory needs to have that segment added. Even if the input word matched entirely toknown segments, the specic values associated with each of those segments must now beupdated.2.3.3 Determining phonological feature valuesPhonological categories are determined using a k-means algorithm that clusters exemplarvalues along each feature dimension.The algorithm begins by creating k points in the space representing the objects that arebeing clustered. These initial points are called centroids and they represent a potentialcenter of a cluster. Then every point in the data is added to the cluster with the closestcentroid. After all the data has been classied this way, new centroids are chosen bycalculating an average point for each of the existing clusters. The data is then reclassiedby clustering it based on the new centroids. This process of averaging and reclassifying isrepeated until the point where the algorithm chooses the same centroids two loops in a row.In the case of the simulation, the clustering function takes two arguments: feature,which is the name of the feature dimension to cluster, and k, which is the number of clusters.By default k=2, since phonological features are typically modeled as binary.53Algorithm 2.5 K-means algorithm1 def kmeans ( f ea ture , k=2) :2 o ld_centro ids = agent . l ea rned_cent ro ids3 new_centroids = [ random . uniform ( . 1 , . 2 5 ) , random . uniform( . 7 5 , . 9 ) ]4 while not o ld_centro ids == new_centroids :5 c l u s t e r s = dict ( )6 tokens = agent . f eature_space [ f e a t u r e ]7 for token in tokens :8 c l o s e s t = 9999 for c in new_centroids :10 i f abs ( token . value−c ) < abs ( token . value−c l o s e s t ) :11 c l o s e s t = c12 c l u s t e r s [ c l o s e s t ] . append ( token . va lue )13 o ld_centro ids = new_centroids14 new_centroids = l i s t ( )15 for k in c l u s t e r s :16 new_centroids . append (sum( c l u s t e r s [ k ] ) / len (c l u s t e r s [ k ] ) )17 agent . l ea rned_cent ro ids = o ld_centro ids18 return c l u s t e r s54On Line 4 the algorithm chooses two centroids that are relatively far apart from eachother, which is typical for the initial choice of centroids. Then the main loop is enteredon Line 5. The loop exits when the choice of centroids doesn't change across loops. Thenfrom Lines 6-12, the tokens for a given feature dimension are selected, and for each token,it is compared to the most recently selected set of centroids, called new_centroids. On therst loop these are random choices, on further loops they are calculated averages.The clusters dictionary assignment on Line 13 creates a mapping from centroid valuesto a list of Tokens assigned to that centroids cluster. Then the new_centroids values aresaved into old_centroids and the new_centroids is emptied out (Lines 13-15). Finally,on Lines 16 and 17, new_centroids is lled with new centroid values calculated as theaverage of the Tokens in the clusters dictionary.The loop then returns to the beginning. If the most recent run of Line 17 calculatedthe same average values as the last time Line 17 was run, then the loop breaks and theprogram jumps to Line 18 where agent saves the values from new_centroids and thefunction returns the new cluster centroids.There are two possible phonological feature values: + and −. After the k-means clus-tering is done, the cluster with the higher centroid is designated the [+feature] cluster, andthe one with the lower centroid is designated the [−feature] cluster. If all tokens fall into asingle cluster, then a feature value is chosen based on the values of the tokens. If most ofthe tokens have a value above .5, then [+feature] is assigned to the entire cluster, otherwise[−feature] is assigned. If the allow_unmarked option is set to True (see section 2.2.2.17),and there is a category where every token value is 0, then [nfeature] is assigned.2.3.4 Production algorithmThe production algorithm selects a Word from an agent's lexicon to produce, and transformseach of the Segments in the Word into a Sound. While Segments in an Agent's lexicon aremade up of phonological features, Sounds are made up of phonetic Tokens which havereal-valued features. There are three steps in production described here.2.3.4.1 InitializationThis step occurs at the end of the main simulation loop, just after a learner has beenpromoted to speaker (see 2.1). The new speaker looks through their inventory, and foreach segment it estimates a distribution of phonetic values along each dimension. Agentsassume the distribution is Gaussian. Pseudo code is given below. This code runs once foreach segment in an agent's inventory. During testing, it was found that the distributionswere better estimated using distance from the median, rather than from the mean. TheGaussian distribution is implemented using the normalvariate function of Python's builtin random module.55Algorithm 2.6 Distribution estimation1 def es t imate ( segment ) :2 for f e a t u r e in segment . f e a t u r e s :3 c loud = [ token . va lue for token in agent . f eature_space [f e a tu r e ] i f token . l a b e l == segment . symbol ]4 median = agent . calculate_median ( c loud )5 mad = agent . calculate_median ( [ abs ( value−median ) for valuein c loud ] )6 agent . d i s t r i b u t i o n s [ segment ] [ f e a t u r e ] = random .normalvar iate (median , mad)2.3.4.2 Step 1: Word selectionProduction begins with a decision: select a word from the lexicon or invent a new word. Theprobability with which a new word is invented is given by the simulation's invention_rateattribute (see section 2.2.2.11). If a new word is required, the speaker uses the inventionalgorithm described in section 2.3.5 to create one. Otherwise, one is chosen from the lexicon.Rather than choosing a word directly, agents actually rst select a meaning, then choosewhich word to produce for that meaning. Each meaning in the lexicon is associated with alist of Words, each stored alongside a raw count of how many times it appeared in the input.The production algorithm chooses a Word with probability proportional to it is frequency.2.3.4.3 Step 2: Transforming Segments into SoundsThe word selected by the rst step consists of Segments (see section 2.2.4), but these arenot the objects transmitted to the learner. In the second step of the production algorithm,Segments are transformed into dierent objects known as Sounds (see section 2.2.7), whichrepresent an instance of a segment being pronounced. Agents pass through each featureof each segment in the word. For each feature, agents sample a value from the appropri-ate probability distribution for that segment. Pseudo-code for this algorithm is shown inAlgorithm 2.7.56Algorithm 2.7 Production algorithm1 def produce ( l ex i ca l_ i t em ) :2 ut te rance = Word( l i s t ( ) , l ex i ca l_ i t em . meaning )3 for segment in l ex i ca l_ i t em :4 sound = Sound ( )5 for f e a t u r e in segment . f e a t u r e s :6 phonetic_value =7 agent . d i s t r i b u t i o n s [ segment ] [ f e a t u r e ] . sample ( )8 sound . f e a t u r e s [ f e a t u r e ] = phonet i c_feature9 ut te rance . append ( sound )10 return utte ranceThe utterance returned at the end of the algorithm represents what the speaker intendsto produce for the listener. It is not necessarily what the listener hears. This utterance issubsequently sent through a misperception algorithm (see section 2.2.10) which may changethe utterance in some way.2.3.5 Invention algorithmThe invention algorithm serves two purposes. It is used to generate a lexicon for the initialgeneration of the simulation, and it is used by agents at later generations if they choose tocreate a new word. The words constructed by this algorithm always conform to the existingsyllable shapes of the language. The following pseudo-code outlines the algorithm.Algorithm 2.8 Invention algorithm1 def invent ( agent , phonotac t i c s ) :2 word = Word( )3 sy l_length=random . rand int ( S imulat ion . min_word_length ,4 Simulat ion . max_word_length )5 for j in range ( sy l_length ) :6 syl_type = random . cho i c e ( S imulat ion . phonotac t i c s )7 for x in syl_type :8 i f x == `V ' :9 seg=random . cho i c e ( agent . inventory . vowels )10 e l i f x == `C ' :11 seg = random . cho i c e ( agent . inventory . cons )12 word . s t r i n g . append ( seg )13 return word57Line 4 creates a new empty word object (see section 2.2.3). Line 5 randomly determinesthe length of the word in syllables (see section 2.2.2.6 and 2.2.2.7).The loop that begins on line 8 will run once for each syllable in the word. Line 10 selectssome possible syllable for a given phonotactics. For example, if (C)V(C) was supplied tothe algorithm, then it selects randomly from the set {CVC, CV, VC, V}.The loop that begins on line 9 runs through each segment in the syllable type chosen.For each C or V slot, PyILM randomly selects a segment of the appropriate type. Oncethe entire word has been constructed, it is assigned a new meaning and then stored in thelexicon.The invention algorithm does not check to see if there is an existing word with the samesegmental material as the new one. In other words, it is possible for homophones to appearin the language. However, this has no eect on production or learning of these words, so itis basically irrelevant to the outcome of the simulation.2.4 Using PyILM2.4.1 Obtaining PyILMThe source code for PyILM is available for download fromhttps://www.github.com/jsmackie/PyILM. It is recommended to run PyILM usingPython 3.4. There are also some 3rd party libraries needed: Numpy and SciPy arenecessary to run the basic PyILM code, and the Visualizer requires Matplotlib and PIL(Python Image Library). All of these can be obtained from the Python Package Index athttps://pypi.python.org.2.4.2 Conguration lesPyILM simulations require a conguration le. These les should be saved into a foldercalled cong, which must be a subfolder of the main PyILM directory. A congurationle is a text le which must conform to a particular structure, described below, and its leextension must be .ini. Conguration les are broken up into sections, each indicated by aheader in square brackets. Each line in a section may contain a parameter name followed byan equals sign followed by a value. (This is the standard INI le format used on Windows.)An example is given in Figure 2.4, with some discussion following.There are four section headers recognized by PyILM: [simulation], [misperceptions],[inventory], and [lexicon]. The order of the parameters in a section is not important. The[simulation] section is mandatory. The parameter names which can be used are listed inSection 2.2.2. Any parameter that is not mentioned in the conguration le will be givena default value. These defaults are likewise described in Section 2.2.2.The [misperception] section is also mandatory. Each line in this section can include thename of a misperception as a parameter (any name is allowed, and spaces are permitted),58[simulation]initial_lexicon_size=30generations=30phonotactics=CCVCCinvention_rate=0.05minimum_repetitions=2min_word_length=1max_word_length=3[misperceptions]#"misperceptions"stop lenition=-voc,-cont,-son;cont;.5;+voc_+voc;.25nasalization=-cont,-son,-nasal,-voc;nasal;.5;_+nasal,+son,-voc;.25initial fortition=-voc,+cont,-nasal;cont;-.5;#_;.25stop aspiration=-voc,-son,-voice,-cont;hisubglpr;.5;_+voc,+high;.25obstruent glottalization=-voc,-son,-cont,-voice;mvglotcl;.5;_-voc,+glotcl,-mvglotcl;.25#"biases"ejectives are marked=-voc,-son,-cont,+glot_cl,+mvglotcl;mvglotcl;-.1;*;.5retroex is marked=-ant,-distr,-cont,+cor,-son,-voc;ant;.1;*;.5[inventory]start=p,t,k,b,d,g,m,n,f,s,z,a,i,u#start=10,3[lexicon]words=kapa,mufu,tiki,matk,bziafmFigure 2.4: Example conguration le59and the remainder of the misperception's details follow the equal sign. See section 2.2.10for more information on how to structure a misperception.The [inventory] section is optional, and only allows the single parameter name start,which takes the same possible values as the [simulation] parameter initial_inventory (seesection 2.2.2.4). The [inventory] section exists to make it conceptually easier to manage theinventory separately from the other simulation parameters, and because future versions ofPyILM are anticipated to have more possible parameters in this section.The [lexicon] section is also optional, and can take the single parameter words, whichis the same thing as using the initial_words parameter in the [simulation] section (seesection 2.2.2.16).The [simulation] and [misperception] sections should come rst in a conguration le.The [inventory] and [lexicon] sections, if present, should come at the end.If a line in the le begins with either the symbol # or ; then PyILM will ignore theentire line. This can allow users to include comments to themselves about parameters. Italso provides a convenient way of ipping parameter values between simulations withoutkeeping multiple copies of a conguration le with minor dierences. The use of the #symbol is demonstrated in Figure 2.4 where the words misperceptions and biases areincluded as comments, and there is an alternative possible starting inventory. If thesesymbols are encountered in the middle of a line, they are treated normally, which is whatallows misperceptions to make use of both symbols without any problems.2.4.3 Running a simulationThere are two ways to run simulations. From a command line, navigate to the PyILMdirectory and then typepython pyilm.py filenamewhere filename is the name of your conguration le. If no lename is provided, thenall the defaults are used.To run a simulation from within another python script, use the following code, replacingthe string cong.txt with the name of the appropriate conguration le.import pyilmsim = pyilm.Simulation('config.txt')sim.main()There is also a secondary program that can be downloaded for running multiple simu-lations, called pyilm_batch.py. To run a batch of N simulations, type the following in acommand linepython pyilm_batch.py filename Nwhere filename is again the name of a conguration le. Supply the string None forthe lename to use all defaults. For example, to run a batch of 25 simulations from withina Python script, use the following code:60import pyilm_batchbatch = pyilm_batch.Batch('config.txt', 25)batch.run()When running in batch mode, the user-supplied value for the random seed is ignored,and a dierent random seed is generated for each simulation, while keeping all other con-guration details the same.2.4.4 Viewing resultsAfter running the rst simulation, PyILM will create a new a folder called SimulationResults, which will be placed in the same folder as pyilm.py. Each simulation is givenits own subfolder inside of the Simulation Results folder. These subfolders are namedSimulation output (X), where X is a number automatically assigned by PyILM.This output folder contains a copy of the conguration le, as well as les detailing thestate of the simulation at the end of each generation. Information about the exemplar spaceis written to les with the name feature_distributionsX.txt where X is the generationnumber. Information about the inventory and lexicon is written to les with the nametemp_outputX.txt, with X again standing in for a generation number. It should be notedthat PyILM starts counting at 0, not 1. Generation 0 is the initial generation seeded withinformation from the conguration le. Generation 1 is the rst generation to learn fromthe output of another agent. A side-eect of this is that the rst simulation you run willbe in the folder Simulation output (0).The output les can be opened and inspected, but they are not formatted to be human-readable. They are intended for use with the PyILM Visualizer, which is an independentprogram that displays the information in a graphical interface. As such, it is not recom-mended that you change any of the names of the les, or alter any of the contents, becausethis can cause unusual behaviour in the Visualizer.The Visualizer can be opened by double-clicking the le visualizer.py which comeswith PyILM. It will be located in the same folder as the main PyILM program. Whenthe program launches, select the Data menu, then input the simulation and generationnumber that you wish to see. Blank lines are interpreted as the number `0'. From there,it is possible to navigate between simulations and generations using the Forward andBackward buttons on the top right, or by returning to the Data menu.Each generation shows the segment inventory as a table of buttons. Clicking on a buttonbrings up more details about that segment, including its distribution in the lexicon, phoneticand phonological properties. More information about the simulation can be viewed underthe Synchrony and Diachrony menus. Synchrony options include anything specic to thegeneration currently displayed, such as the lexicon. Diachrony options include the ability toplot changes over time. Misperceptions, which do not change, are listed under Synchrony.61Figure 2.5: Screen shot of PyILM Visualizer2.5 Other notes2.5.1 LimitationsPyILM cannot do everything. The program is designed largely to explore the long-termconsequences of misperception-based sound change for segment inventories. There are aseveral other ways in which sound systems can change over time that are not modeled.2.5.1.1 No social contactOne of the limitations of PyILM is that there is only ever a single speaker and a singlelistener, so sound changes that rely on contact between speakers of dierent languages isnot possible.Human cultures speaking dierent language often live nearby and interact with eachother. This often leads to languages borrowing words or morphemes from the other lan-guage. Occasionally, entire paradigms are borrowed. This can lead to changes in a soundsystem if the borrowed items contain phonemes that are not part of the borrowing language.For example, click consonants have entered into some Bantu language through borrowing(Güldemann and Stoneking 2008). There is no guarantee of this occurring, of course, so62it is also quite common for languages to change the sounds of loanwords so that they tnative patterns (Peperkamp 2004).The focus of the dissertation is on how phonetic eects inuence the evolution of soundinventories, so no borrowing is simulated. It would be possible, however, to implementa simple form of borrowing in PyILM with a few additions. At arbitrary points in asimulation, generate new words that contain one or more sounds not guring already inthe simulation, and add them to the speaking agent's lexicon. Loanword adaptation canbe simulated by running these words through a speaking agent's learning algorithm to seewhich categories any novel sounds might be assigned to.2.5.1.2 No deletion or epenthesisChanges are also limited to those that aect feature values. Deletion and epenthesis donot occur. The main reason for excluding these changes is because they can change thephonotactics of a language, and phonotactics will play a relevant role in the simulationsreported later in this dissertation.In fact, deletion is technically possible in PyILM, but simply not implemented for anysimulations that I report for the dissertation. Epenthesis is considerably harder to imple-ment, and is currently not possible.Suppose that we want to implement an epenthesis rule that inserts a vowel between twonon-continuants. The eect of the epenthesis rule should be that a phonetic vowel appears;there is no underlying vowel in the lexical item that corresponds to the epenthesized vowel.Suppose it is a mid-central schwa-like vowel. Because it is a phonetic epenthesis, we cannotsimply use a schwa symbol - it must be represented by a column of numbers. How do wegenerate these numbers?There are three options for this. One is to generate numbers for a mid-central vowelbased on the speaker's exemplar space. This is easy if the speaker happens to have such avowel already in their inventory. If there is no such vowel, then it is dicult to come upwith a general solution for which other vowel would be the closest, since any arbitraryvowel system is possible in PyILM. In any case, whatever vowel is chosen will not be amid-central vowel, so it will not correspond to the description of the epenthesis rule. Thismakes the behaviour of the simulation unpredictable from a user's perspective, and is nota good design choice.The second option of generating numbers using the listener's exemplar space has thesame problems. It is further complicated by the problem that their exemplar space continuesto change throughout the learning phase so the type of epenthesized vowel would, again,vary unpredictably.The third option is to include in a PyILM a generic vowel generator that can be usedto epenthesize a vowel of a predictable quality in every case. This option feels extremelyarticial compared to the rst two, where at least there was some semblance of changesbeing related to either articulation or perception. On the other hand, it does make it easier63to follow the changes that occur over the course of the simulation.2.5.1.3 No morphology or syntaxWords always convey a single meaning, and agents never utter more than one word at atime, so there is eectively no morphology or syntax in the simulation. Since some soundchange might emerge from interactions at word or morpheme boundaries, this limitationdoes prevent modeling certain kinds of change. However, the changes that occur at amorpheme boundary are essentially of the same type as change that might occur withina morpheme. The root cause of the change is still a phonetic interaction of two adjacentsounds.2.5.1.4 No long distance changesMisperceptions that occur in PyILM can only target adjacent sounds. It is not possible tosimulate the emergence of any types of harmony patterns, for example. Although consonantharmony is rare, it does exist, and plausible historical routes for its development have beenproposed (e.g. Hansson 2007). However, the types of consonants that emerge from long-distance changes are a subset of those that might emerge from local changes. Since the goalof this dissertation is to understand how inventories change over time, there is no particulargain to be made by including long-distance changes.2.5.2 Running timeThe running time of a simulation is determined by a number of dierent factors.The most important are the lexicon_size and min_repetition parameters. Together,they determine the total number of words that a speaking agent will produce in a givengeneration. If there is no maximum lexicon size, and invention_rate > 0, then runningtime can increase for each generation if new words are added to the lexicon.Another factor is word length, since PyILM has to check every segment in each wordfor possible misperceptions, and the learner has to analyze each segment. The parameterscontrolling word length are, of course, min_word_length and max_word_length. Phono-tactics also plays a role here too, since the average word length in a CV language is goingto be shorter than the average word in a CCVCC language, other things being equal.The number of misperceptions seems to have no signicant eect on total running time.Checking if a misperception applies is trivial and, in most cases, nothing happens. Thenumber of contexts where misperceptions apply is much smaller than the total number ofcontexts in the entire lexicon. When a misperception does apply, the operation is, again,trivial since changing phonetic values consists of adding two numbers together, followed bya check to ensure no phonetic value goes below 0 or above 1.A single generation of a CV language with 30 initial words and a maximum lexiconsize of 30 takes less than a second. Setting the phonotactics to CCVCC increases the time64signicantly, and a single generation may take 10 seconds. The recording phase, wherePyILM generates an output le for use with the visualizer, also contributes to runningtime. The length of time it takes for a generation to be recorded depends on the numberof changes that occurred in the simulation.Another factor is the time taken in labeling segments for human readability. Duringthe simulation, segments are simply numbered, rather than being assigned IPA symbols.This is because there is no way to know which symbol will be appropriate until the end ofthe learning phase, when the phonological feature values are assigned. Searching the list ofall possible symbols, and comparing feature values to see which would be best, can be timeconsuming. PyILM looks for a short-cut by comparing against the previous generation, andwhere sounds have not changed feature values it simply re-uses the old symbol.65Chapter 3Sample simulations3.1 IntroductionIn this chapter I will give some examples of simulation output, using relatively simpleparameter settings. This will help to clarify how the various parameters contribute theoutcome of a simulation, and how various historical changes can be simulated.Conguration les will be presented throughout this chapter. They are presented astables, rather than being formatted as actual conguration les, for purposes of readability.Similarly, parameter names in these tables have been somewhat changed to employ regulartypeface and formatting conventions, e.g. max_lexicon_size is written here as Max lexiconsize. Not all simulation parameters are indicated, due to the large number of parameters.Each simulation is presented to illustrate a point, and only parameters relevant to the topicunder discussion are indicated. The features le used is the default one, which is based onthe Sound Pattern of English (Chomsky and Halle 1968) feature specications available inP-base (Mielke 2008).3.2 Simulation 1 - A single abrupt changeThis rst simulation is very simple. The conguration le for the simulation is show inTable 3.1.The initial inventory is obviously not a natural inventory, but by keeping it articiallysmall, it is easier to understand what is happening. There is only a single misperceptionthat can occur, nal devoicing, and there is intentionally a single voiced stop /b/, so thatonly one sound is susceptible to change in the simulation.The misperception is shown in Table 3.1. The target column shows the features thata sound must have in order for the misperception to apply. The feature column showsthe feature which changes if the misperception applies. The salience column shows thedirection and magnitude of a change. The environment column shows the context where66Table 3.1: Conguration for Simulation 1a sound must occur in order for the misperception to apply. Finally, the probabilitycolumn shows, of course, the probability that the misperception applies. The salience ofthe change in this case, .5, makes it very likely that a learner will assign tokens aected bythe misperception to a dierent category than those not aected. In other words, it makesit likely that sound change will happen.Phonotactics and word length are tightly regulated so that all words will have VC orV shape. This is an extremely unnatural pattern not found in human languages, so this isfor the purposes of illustration (although see Breen and Pensalni (1999) for an argumentthat Arrernte is a language without onsets). The phonotactic settings ensure that /b/ willoccur only in nal position, which will in turn guarantee that the misperception occurs atsome point. The lexicon of the initial generation will be limited and repetitive, consistingof 30 random draws from the set {iq, is, ib, i}. Figure 3.2 shows how the inventory of thelanguage changes over the course of the simulation. Segments shown in parentheses areallophones (the precise meaning of allophone in the context of PyILM will be describedbelow).67Table 3.2: Comparison of inventories in Simulation 1Note that all simulations start with a generation 0, which is the initial generation thatis seeded with the information from the conguration le. The inventory of generation 1 isthe rst that could potentially have undergone sound change. As Figure 3.2 shows, rightaway in generation 1 some of the tokens of /b/ have been misperceived as belonging to adierent category /p/ and a voiceless stop has entered the language. Initially, this /p/ isjust a variant of /b/. Certain words in the lexicon are always pronounced as [ib], whileothers vary between [ib] and [ip]. Eventually, after enough generations of the simulation, afew words come to be pronounced as [ip] all of the time. At this point /p/ is no longer justa variant of /b/, and is now a full member of the inventory of the language.PyILM keeps track of how sounds are changing in this respect. In the Visualizer thetotal inventory is a count of the number of sounds that occur anywhere in the lexicon.The core inventory is the set of sounds that all occur in at least one word where they donot vary with anything else. In generation 1 in Figure 3.2, the total count for the inventoryis given as 5 (four total consonants plus one vowel), while the core count is given as only4 (three core consonants plus one vowel), since at this point in the simulation /p/ occursonly as a variant of /b/. In generation 3, the core count rises to 5 as there are now somewords with a /p/ that does not vary with /b/. Sounds in the core inventory are also shownin the Visualizer with raised button backgrounds, while the variants are shown with sunkenbutton backgrounds.This is analogous to the distinction between phonemes and allophones in phonolog-ical theory: the core inventory is all the phonemes, and the total inventory includes boththe phonemes and all of their allophones. More specically, a phoneme in PyILM is anysound that occurs in at least one word where it does not vary with any other sound. Anallophone is a sound that occurs uniquely as a variant of another sound. I will continueto use the terms phoneme and allophone throughout this chapter as convenient labels for68these types of simulated sound categories, but with the understanding this is not the usualsense of these terms.In particular, it is normal to dene phonemes in terms of contrast: sounds that contrastwith each other (i.e. participate in minimal pairs) are assigned to dierent phoneme cate-gories, while sounds that do not contrast (either due to complementary distribution or freevariation) are analyzed as allophones of a single phoneme. Minimal pairs or overlappingdistribution are not necessary for phonemic status in PyILM.The initial lexicon of a simulation is generated to include minimal pairs, but it is notalways possible to ensure that every phoneme has a minimal pair with every other. Thisis because there also is a parameter controlling for the size of the initial lexicon, and thenumber of minimal pairs required for all sounds to have a pair can exceed the lexiconmaximum. In this very small example, there are in fact many minimal pairs in the initiallexicon because it only consists of the words {ib,in,iq,i}. Additionally, when /p/ enters thecore inventory, it immediately participates in a minimal pair with all the other consonants,making it more obviously a new phoneme in the language. With larger inventories, largerlexicons are required to get the full number of possible minimal pairs.Another common criterion for determining allophones is complementary distribution.This is usually balanced with a requirement that the allophone be phonetically similar tothe underlying phoneme category, since accidental complementary distribution can occur,e.g. in English [h] is only ever in initial position and [N] is in non-initial position, yet theseare not considered allophones of the same underlying category. Neither of these criteriaare considered for determining allophones in PyILM. This is largely due to the diculty ofimplementing algorithms in PyILM that can accomplish this.It is not impossible - there do exist algorithms for estimating the probability that twosounds are allophones. For example Peperkamp et al. (2006) use the Kullback-Leiblermeasure of the dierence between probability distributions, and Hall (2009) uses entropy.In principle, such algorithms could be applied to the languages simulated by PyILM, butthere are some complications that make this dicult. Specically, these algorithms, or anyother similar ones, require strong assumptions about what counts as an environment forthe purposes of complementary distribution.Environments can be dened at any arbitrary level - which should be considered? Forexample, suppose sound A occurs in the environments {t_i, a_a, s_o}, and sound B occursin the environments {z_u, d_u, u_u}. Are these sounds in complementary distribution?If we consider just the segmental level, then the answer could be yes: Sound B only occursbefore /u/, and Sound A occurs elsewhere.If we think about features instead of whole sounds, then the situation becomes morecomplex. There are thousands of possible feature combinations to consider, depending onthe feature system in use. On one analysis, both sound A and sound B have the samedistribution: they can occur between vowels and they both follow coronal obstruents. Ona dierent analysis, Sound A occurs between low vowels and after voiceless obstruents,while Sound B occurs between high vowels, or perhaps round vowels, and it follows voiced69Figure 3.1: Change in inventory size for Simulation 1obstruents.In dealing with a natural language, a linguist can make use of general knowledge aboutsound patterns, information from elsewhere in the language or related dialects, and intu-itions about what constitutes a natural pattern, in trying to determine allophonic varia-tion. For instance, if Sound B is labial(ized) and A is not, then an analysis of B occursbefore /u/, A occurs elsewhere would be natural, since /u/ is also a round vowel. On theother hand, if A is voiceless and B is voiced, it might make more sense to refer to the factthat they only occur next to obstruents that match in voicing.In the simulated languages of PyILM an algorithm searching for complementary dis-tribution would have an enormous search space of all possible feature combinations toconsider, as well as the problem of determining whether it is the left or right hand side (orboth) of an environment that is most relevant. Therefore, as a way of avoiding some ofthese complications, I will make use of a much weaker denition of phoneme and allophone,where phonemic status is achieved by a sound when it exists in at least one word where itdoes not vary with another sound. Allophones are sounds that only exist as variants.Maintaining this conceptual distinction between phonemes and allophones is very usefulfor interpreting simulation results, in particular when it comes to questions of inventory size.Counts of inventory sizes of natural languages tend to be counts of phonemic inventories,so it is useful to do this in PyILM as well. Figure 3.1 shows change inventory size for thissimulation. The dotted line shows the core (phoneme) inventory, and the solid line showthe total inventory (phonemes and allophones).The gure shows that the size of the total inventory rises immediately, since [p] appearsthrough misperception in the rst generation. However, it is not yet a member of the coreinventory, since it appears only as a variant of /b/. In generation 3, the size of the phonemeinventory rises as [p] has fully overtaken [b] somewhere in the lexicon. It now occurs in at70least one word where it does not vary with [b], though there are still many words where itremains a variant.Immediately in the next generation, the phoneme inventory size drops again. Lookingback at Table 3.2, this is because there has been a complete reversal in the language, and[b] has now become an allophone of /p/, that is, [b] only exists in words where it varies with[p]. This persists until Generation 5, and then [b] disappears completely. The consonantinventory for the remainder of the simulation is /p,q,n/.The reason that /b/ disappears entirely, rather than continuing to co-exist with /p/, isthat the phonotactics are restricted to VC syllables for the purposes of this simple illustra-tion. If the language allowed onsets, then any onset [b] would remain a /b/ forever, sinceno misperceptions target that environment. The length of time that a language spendsin the doublet stage of having alternative pronunciations depends on the frequency withwhich misperceptions occur, and the frequency of the words containing segments subject tomisperception, which are parameters that will be analyzed in more detail throughout thischapter.This transition from /b/ to /p/ in PyILM, or any other change like it, is a simplicationof the real-world phenomenon of phonemicization, where phonetic eects eventually resultin the appearance of a new phoneme in the inventory. Bermúdez-Otero (2007) describesfour phases to this process. In the rst phase, a new sound is introduced through somephysical or physiological phenomenon (Bermúdez-Otero 2007, p. 7), and the languagegains a phonetic variant of an existing sound. In the second phase, this variation becomesmore categorical and what was once mostly a phonetic eect becomes a conditioned phono-logical alternation. The third phase is called re-analysis, where the domain of applicationfor a phonological rule starts to change. It may become conditioned to a morphologicalenvironment, and lexical exceptions may appear. In the nal phase, the original phoneticconditions become opaque, and the sounds become lexicalized, or the phonological rulebecomes a morphological one. PyILM does not simulate all four phases, but there are clearparallels: a sound emerges in one context through misperception, varies with another soundfor a period of time, then nally lexicalizes (since there is no morphology in PyILM).3.3 Simulation 2 - A single gradual changeIn the example above, sound change occurred when a learner misperceived certain tokensof a devoiced /b/ to be dierent enough from the normally voiced /b/, that a new cat-egory was assigned to these tokens. This new category existed alongside /b/ for a shortperiod, then eventually dominated the lexicon, replacing /b/ in every instance. This isrepresentative of scenarios in natural language where a sound rst enters a language asan allophone, then becomes a phoneme. It is not necessary that an allophone completelyreplace a phoneme in a simulation, but the phonotactics of Simulation 1 were so restrictivethat there was no other possible outcome. With more complex phonotactics, both /b/ and71Table 3.3: Comparison of inventories in Simulation 2/p/ would have been in the language at the end of the simulation.The appearance of /p/ or the disappearance of /b/ was abrupt, occurring suddenlyin some lexical items at some generations. It is also possible for a sound to change moregradually, by lowering the salience of a misperception, but increasing its frequency. This willhave the eect of slowly pushing category boundaries in a particular direction, rather thangenerating a new category at any point. Eventually, the value of this phonetic dimensionfor a particular sound category will be considerably dierent from when the simulationstarted, and the phonetic properties of a sound category will have shifted far enough thatthe feature values will have ipped.For this simulation, the same conguration le was used as in the previous section, butwith two small changes. The simulation ran for 20 generations, instead of 10. There was achange to the misperception so that the devoicing misperception is twice as likely to occur,but its eect is only half as strong.The same end-state inventory is achieved in both Simulation 1 and Simulation 2: thelanguage has /p/ but not /b/. The main dierence is that inventory size never changesin Simulation 2. The sound that is originally a /b/ has a voicing value that drops slowlyover time. In generation 2, it has fallen enough to be classed as a [−voice] sound, butit straddles a perceptual boundary so in generation 3, it bounces back up slightly to the[+voice] side. In generation 4 it drops down to [−voice], where it stays for the remainderof the simulation.3.4 Misperceptions and phonetic similarityHaving a high misperception salience means that learners are more likely to assign tokensaected by misperception to a dierent segmental category than those not aected bymisperception. If this is combined with a high probability of misperceptions occurring,then the inventory will undergo more abrupt, categorical changes, as in Simulation 1. Lowersalience values combined with higher probabilities of misperceptions leads to gradual soundchange, as in Simulation 2.72Misperception salience interacts with another parameter, called minimum_activation_level (see section 2.2.2.14). This parameter is used during the learning phase,and it acts like a threshold for phonetic similarity in sound categorization. It controls howsimilar a token must be to a given category in order for the learner to consider includingthat token in that category. If a learner hears a sound that fails to meet this threshold, thenthe sound will be assigned to a new category. This parameter must have a value between 0and 1. Setting it all the way to 1 means that input sounds must match existing categoriesexactly. This tends to lead to a proliferation of segment categories, since it is quite rarefor exemplar tokens to be exact matches. Setting it to 0 means that nothing is ever toodissimilar, and all input tokens after the rst will count as exemplars of whatever the rstwas categorized as.These extreme values lead to unusual results, with segment inventories that look nothinglike those of natural languages. More normal looking inventories emerge with values inthe range of .5 to .7. Some results with dierent values are shown in Figure 3.2. Each ofthese simulations was run with the same conguration le.Figure 3.2: Results for various values of minimum_activiation_level.Figure 3.3 illustrates the interaction between misperception salience andminimum_activation_level. The gure shows the results of using a dierent valueof minimum_activation_level, with each plot displaying change in inventory size for vedierent simulation runs, all using the same initial conditions, varying only the salience ofmisperceptions.73Figure 3.3: Varying misperception salience across three dierent values forminimum_activation_level. Misperception salience is shown in the legend. Simulation(a) uses a value of 0.2, Simulation (b) uses a value of 0.5 and Simulation (c) uses a valueof 1.0When the minimum_activation_level parameter is very low, as in Simulation (a), thesalience of misperceptions hardly matters. The learning algorithm collapses all the segmentsinto a single category. Even highly salient misperceptions cannot create segment categorieswith enough perceptual dierence from the one existing category.In Simulation (b), the minimum activation level is .5, so there is greater potentialfor misperception to create new categories. Growth in inventory size can be used as anindicator for when this occurs. Lower salience values produced smaller inventories whilegreater salience led to the creation of new categories quite quickly. Finally in Simulation(c), the high salience of misperceptions only speeds up growth in inventory size.3.5 Simulation 3 - Interactions between sound changesIn the previous simulations, there was only a single sound change that could occur. Thisexample gives a slightly more complex simulation in which sound changes can interactwith each other. Simulation 3 uses the same conguration le as Simulation 1: the initialinventory is /b, q, n/ and the phonotactics are set to VC. The only dierence is that thereare now two misperceptions:Devoicing [+voice, −son, −cont] segments have their [voice] value reduced by .5 inthe environment of _# (p=.25)Lenition [−voice, −son, −cont] segments have their [cont] value increased by .5 inthe environment of _# (p=.25)The rst is the same nal devoicing change used in Simulation 1. The second is a lenitionprocess where voiceless stops become fricatives, also in nal position. This means it ispossible for /q/ to be aected by Lenition from the initial generation. On the other hand,/b/ is not aected, since it is voiced.74Once nal devoicing has had an eect on /b/, however, the resultant [p] sound will beavailable for misperception as a labial fricative, perhaps /f/, creating a feeding relationshipbetween the changes. Figure 3.4 shows some of the inventories that appeared over thecourse of Simulation 3.Table 3.4: Comparisons of several generations in Simulation 3As the output shows, in generation 1 some changes have already happened. The /b/ hasdevoiced to [p] on some occasions, adding [p] as a new allophone of /b/. The /q/, which isalready voiceless, has also been aected by the lenition misperception, and has also gainedan allophone, and adds the rst fricative to the inventory.In generation 2, some instances of [p] have lenited to [F], which is now actually countedas a second possible variant of /b/. PyILM will not consider it to be an allophone of /p/,since /p/ has no independent existence yet in the simulation. Another way of thinkingabout this is that the language has only a single labial sound at this point, with threepossible pronunciations. By generation 6 /p/ has become the phoneme for the labial set:it has completely replaced /b/ in a certain number of words, and now [b] only ever appearsas a variant of /p/. At generation 10, the voiced labial has completely disappeared, andboth /p/ and /F/ are considered phonemes. The uvular stop has also disappeared at thispoint, replaced in every instance by a fricative. By generation 15, the language is back toan inventory of the original size, with two fricatives replacing the two original stops. Thenasal, meanwhile, has remained completely unaected the entire time.75Figure 3.4 depicts how the total and core inventories are changing. Between generation0 and 10, there is a large degree of allophonic variation. This settles down at generation10 as [b] and [q] disappear as possible variants. The total inventory then drops once morewhen /p/ is eventually replaced by /F/.Figure 3.4: Change in inventory size for Simulation 3Setting up a feeding or bleeding relationship such as this can be quite dicult withoutcareful manipulation of parameters. In this case, there is sure to be feeding that will happenbecause the initial /b/ is not subject to the lenition, only /p/ is. And /p/ is, by design,the outcome of the other misperception. If this simulation had been run with a randomselection of segments, or with more complex phonotactics, there would be no guarantee thiswould occur. One misperception might never be triggered because the particular segmenttype or context is lacking in the language.This simulation also illustrates why it is dicult to use PyILM to simulate the evolutionof any specic natural language. For instance, suppose a simulation was seeded with alexicon of Old English, and the set of misperceptions was congured to include all knownsound changes from Old English to Modern English. There is no guarantee that all of themisperceptions would occur in the same order in a PyILM simulation as they did in the realworld. So long as the conditions for a misperception are met in the lexicon, it is possiblefor a sound change to occur. It is not possible to force a particular ordering, at least notwithout introducing an unnecessary amount of teleology.76Table 3.5: Conguration for Simulation 43.6 Simulation 4 - CVC languageThis simulation uses the conguration details shown in Table 3.5. For this simulation, thephonotactics are slightly more complex, allowing CVC, CV, V, and VC syllables. Wordsare still capped at a single syllable, however. The initial inventory is specied in a le as/b, d, t, k, f, z, m, n, a, i, e/. The misperception le is the same as Simulation 2, that is,there is a chance of nal devoicing and of nal lenition of voiceless stops.We know what is in the initial inventory, so we can make a few guesses about how thesimulation will turn out, given enough time. Misperceptions only target nal position, so ifthe lexicon contains at least one word with each segment in initial position, then all of theinitial phonemes are guaranteed to survive until the end of the simulation. As it stands,the vowels and the nasals are sure to survive anyway, so long as they appear in at least oneword, since no misperceptions target them at all.Assume that in fact all the segments appear in initial position, so the inventory will notshrink over time. How can the inventory grow? Consider rst the labial set. The segment/b/ will permit the creation of /p/ through devoicing. This /p/ is transient, though, andinevitably all examples of it will lenite, just as in the last simulation. No /p/ can survivebecause all instances of this category are found in nal position, which is exactly wherelenition applies. This extremely restricted distribution is due to the fact that there wasno /p/ in the initial inventory; /p/ appears through devoicing which occurs only in nalposition. If /p/ had been part of the initial inventory, then it might have occurred in initialposition, which would have protected it from the lenition change.Once /p/ has lenited to a fricative, there are no more changes that can take place. Thenal labial inventory will include /b, p*, f/ where /p*/ is either a labial stop or a labialfricative, depending on how long the simulation has been running. This fricative may ormay not merge with the original /f/. A merger will take place only if /p*/ and /f/ areidentical on every dimension except [continuant]. This also implies that /b/ and /f/ areidentical on every dimension except [voice] and [continuant], since /p/ descends from /b/.If, for any reason, /b/ and /f/ dier on some other feature, then when /p/ lenites, theresulting category will be considered a dierent category from /f/.77For example, in the feature system of Hayes (2011), [labiodental] is feature. Since theoriginal /b/ is [−labiodental] in this system1, then /p*/ will be [−labiodental] becausethe Lenition misperception only aects the feature [continuant]. In this case, /p*/ willdier from /f/ by both [continuant] and [labiodental], and will therefore be categorized assomething other than /f/.The coronals will have a somewhat dierent evolution. There is /z/ but not /s/, so itis expected that /s/ will eventually appear through misperception. In fact some kind ofvoiceless coronal fricative is almost certain to appear because the original /d/ should devoiceto /t/, which is subject to the lenition misperception. It is possible that the voiceless coronalfricative that ultimately descends from /z/ will merge with the one that descends from /t/,but it will depend on specically how the exemplar token values are distributed in a givensimulation, and this process is partly random. The coronal inventory could eventually growto /d, t, z, s1, s2/, where the two /s/ segments represent potentially dierent descendantsof original /t/ and original /z/.The single dorsal /k/, if it only appears in nal position, is doomed. It is certain toundergo lenition, and there are no original voiced dorsals that could create a replacement/k/ through devoicing. The nal dorsal inventory is therefore going to be /x, k/, if /k/appears in initial position, otherwise it will simply be /x/.Again, these predictions depend on some assumptions about the initial lexicon of thelanguage, and whether or not the relevant segments all appear in the relevant environments.Running this same simulation multiple times with dierent random seeds will producedierent outcomes. The plot in Figure 3.5 shows the change in (total) inventory size overve simulations using the same initial conditions, but with dierent random seeds. In thesimulations with larger nal inventories, sounds appear in a greater variety of environments,increasing the probability that they survive the entire simulation, since they are more likelyto appear in an environments not targeted by a misperception. In the simulations withsmaller nal inventories, there was less diversity in sound distributions, and some soundsdisappeared because they occurred only in environments targeted by misperception.Note that not all simulations started with exactly the same number of sounds: somestarted with 10 and some with 11. All languages were given the same conguration le withthe same inventory, but on some random seeds, not all of these sounds were actually sam-pled during the construction of the initial lexicon. There is a simulation parameter calledauto_increase_lexicon_size (see section 2.2.2.15) which would force every simulation touse all 11 sounds, but it was set to False for these cases.Table 3.6 gives some snapshots of a language actually generated by one of these randomseeds. The labials turned out as predicted. The phoneme /b/ rst develops an allophone[p], which then becomes a phoneme, and which then lenites and becomes a fricative. Inthis case, it did not merge with the existing /f/, and there are two labial fricatives in the1In Hayes' system, [labiodental] is actually a unary feature, so /b/ would simply not have this featureat all. However, since features in PyILM cannot be unary, /b/ would be considered [-labiodental].78Figure 3.5: Change in total inventory size with ve dierent random seedsnal inventory. Inspecting the simulation, it appears that /f/ is [−distributed] while /F/ is[+distributed], which is a feature it inherits from the original /p/.The coronals evolved more or less as expected as well. The phoneme /d/ devoiced to[t], which gave rise to [T], which eventually achieved phonemic status. The phoneme /t/rises and falls throughout the simulation, as some tokens of [d] devoice, then lenite. Someinstances of /d/ still remain in nal position, so they allow for new devoicing which leads tonew lenitions (that all merged with the rst /T/). If the simulation were run long enough,eventually /t/ would completely overtake /d/ in all nal positions, and then eventuallylenite to /T/.The dorsal stop was lost, and replaced by a fricative, which was predicted. However,this actually did not happen entirely due to misperception. This segment had a curiousevolution. The original /k/ appeared in three words: /ki/, /ik/ and /ak/. It early onacquired an allophone [x] in nal position. This [x] became an increasingly common variantuntil it was the dominant pronunciation in two out of three words: /ik/ > /ix/ and /ak/> /ax/. Then the learner in generation 10 decided to group /x/ and /k/ into a singlecategory. Even the /k/ in initial position, not aected by misperception, merged with /x/.Why did this occur?Inspecting the simulation more closely revealed that the initial /k/ category had beenseeded with exemplars that happened to have extremely low values on the [continuant]dimension, so that most tokens produced had a value of 0.1 or less. The nal lenitionmisperception boosted production values by +.5, which created tokens that only barelypassed the threshold for a learning agent to categorize something as [+continuant], so thenew /x/ category had values that straddled a perceptual boundary.At generation 10 it appears as though the learner failed to notice any signicant dier-79ence between any /k/ or /x/ tokens produced by the previous generation, and categorizedthem all as [−continuant], that is, /k/ became the new phoneme. This created a categorywith a large degree of variation in [continuant] values. Misperception continues to act ontransmission to the next generation, which pushed average [continuant] token values higher.The learner at generation 11 also only learned a single velar category, but this time [+con-tinuant], that is, /x/ became the phoneme. Since there is no misperception that makesword-initial tokens any less continuant, there is no way for /k/ to return to the inventory,and this collapse of categories is essentially permanent.Table 3.6: Comparison of several generation in Simulation 43.7 Simulation 5 - Invention and the spread of new segmentsIn the previous examples, the new segments that are created by misperceptions are incompetition with existing segments, and only one of them can win. Inevitably, it will bethe one that is preferred by the misperception. These newly created segments, however,are more like replacements for the older segments, rather than truly new additions to thelanguage. They never leave their original environments, because the invention rate has beenset to 0.0 for the previous simulations. In this next example, the invention rate is raised todemonstrate how this aects the evolution of an inventory. The conguration details areshown in Table 3.7.Agents inventing new words will draw from the total inventory of sounds, not just fromthe phoneme inventory. This makes it possible for allophones to become phonemes, becausethey can appear in an invented word in an environment where they are not in variation80Table 3.7: Conguration for Simulation 5with another sound. This is analogous to a process that is known to happen in naturallanguage where words are borrowed containing an allophone in a novel environment, whichcan lead to that allophone taking on phonemic status. For example, in Old English [f] and[v] were allophones of a single phoneme, with [v] occurring intervocalically and [f] occurringelsewhere. Over time, English borrowed French words that contained a [v] in positionsother than between vowels (McMahon 2002). This created overlapping distributions of [f]and [v], which resulting in [v] eventually taking on phonemic status.The misperceptions are the same as the previous simulations: a 25% chance of word-nal devoicing and word-nal lenition. The combination of misperceptions and inventionscreates dierent outcomes than the previous simulations without invention. For instance,consider just the coronals. In the initial inventory there is a voiced coronal stop /d/, but novoiceless counterpart. The voiced one appears in both word-initial and word-nal positionin the initial lexicon. After several generations of the simulation, all of the /d/ in nalposition have devoiced, and there is now a voiceless coronal stop /t/ in the inventory. Thisnewly created stop is now subject to nal lenition, and eventually all instances of it becomevoiceless fricatives, returning the language to a state of only having the one (voiced) coronalstop.81Table 3.8: Comparison of several generations in Simulation 5This voiced stop will then have a restricted distribution - it will only be found in word-initial position, because no misperceptions operate there. In previous simulations, no morechange would be possible at this point, since misperceptions can have no more eects.However, in this simulation the invention rate is greater than 0, so there is the possibilitythat an agent can create new words and put the voiced stop back into nal position. Thismakes it now a target of nal devoicing, and a voiceless stop will eventually re-join theinventory. It is also possible that during the period of time where /t/ exists as a phoneme,an agent will invent a new word that contains a /t/ in initial position, shielding at leastsome instances of /t/ from lenition, making it a more permanent member of the inventory.3.8 SummaryIn this chapter, I demonstrated how inventories evolve in PyILM, and how various sim-ulation parameters can aect this evolution. The notion of phonemes and allophones inPyILM were introduced, as they dier somewhat from the common use of these terms inphonological theory. A sound is considered to be a phoneme in PyILM if it occurs in atleast one word in the lexicon where it does not vary with another sound. A sound is con-sidered to be an allophone if it only ever occurs as a variant of other sounds. There wereve simulations presented in this chapter.Simulation 1 showed how inventories can change through the abrupt introduction of anew sound, and Simulation 2 showed how categories can shift slowly over time. The dierentoutcomes depended on the values of dierent simulation parameters. When misperceptionshave a high salience, this tends to lead to the emergence of allophones. For instance,suppose a simulation has an intervocalic lenition misperception with a high salience, and82a lexicon has /b/ between vowels. A word such as /aba/ will quickly obtain two possiblepronunciations: [aba] and [ava]. Initially, the [v] sound will be a variant of /b/, but somenumber of generations, the word will come to be pronounced uniquely as [ava] and /v/ willenter the inventory as a phoneme.When misperceptions have a lower salience, sounds in an inventory tend to graduallychange categories, without the appearance of an intermediate phoneme. For example,suppose a simulation has a low-salience intervocalic lenition misperception. A word like/aba/ will continue to be pronounced as [aba] for a few generations, but the eect of themisperception will slowly drag the [continuant] values of the /b/ segment (in this word)higher. Eventually, some learner will acquire the word as /ava/, and it will have a unique[ava] pronunciation. In contrast to the high-salience simulation, it is less likely that asituation will arise where both [aba] and [ava] are possible pronunciations in the low-saliencesimulation.There is also an interaction between the misperception saliencetheminimum_activation_level paramater. When this parameter is set very low(close to 0) then all segments in a simulation will tend to collapse into a single category.If the parameter is set very high (close to 1) then there is an extreme proliferation ofsegment categories. These eects are very strong, and will occur regardless of the salienceand frequency of any misperceptions.Simulation 3 increased the number of misperceptions and included some feeding re-lationships, for example a lenition process that only aect voiceless sounds which werethemselves the product of a devoicing misperceptions.Simulation 4 demonstrated how phonotactics can inuence the outcome. This is dueto the context-sensitive nature of sound changes. A language with only CV syllables hasexactly two contexts for consonants: word-initial or intervocalic (assuming a word of at leasttwo syllables). This limits the number and type of misperceptions that could potentiallyapply. On the other hand, a languages with CVCC syllables has a greater variety ofenvironments in its lexicon, which means that a greater variety of sound changes couldpotentially take place. The issue of phonotactics will be discussed in much more detail inthe Chapter 5.Simulation 5 introduced the concept of inventions. Invention has two major eects onthe outcome of a simulation. One is that invention creates new words with new environ-ments, allowing misperceptions to apply to sounds that might not apply in other words.The second possibility is that allophones can be selected by the invention algorithm andplaces into new contexts where they do not vary with any other sounds, instantly achievingthe status of phonemes.This builds up the basic foundations of simulations in PyILM. Now more complex simu-lations can be considered, with the aim of trying to model the evolution of natural languageinventories. In Chapter 4, I will review the typology of natural language inventories, beforereturning again to PyILM in Chapter 5, with the aim of simulating these typological facts.83Chapter 4Natural language consonantinventories4.1 Inventory size4.1.1 OverviewSound inventories are extremely diverse. One of the most obvious ways in which they dieris in the number of sounds they contain. Counts of inventory size depend partly on whatis being counted. It is common in linguistics to make the distinction between the surfaceor phonetic inventory of a language, which consists of the sounds that are physicallyarticulated, and the underlying or phonemic inventory, which consists of abstract mentalcategories assumed to be acquired by a learner of a language.Collecting a complete phonetic inventory, a set of all the speech sounds in a language, isactually not feasible, since no two speech productions are exactly alike, and this collectionwould be innite in size. Speech sounds are instead grouped into a nite set of categories,with categorization typically done through the use of articulatory or acoustic features. TheInternational Phonetic Alphabet, for example, is a very widely used system for categorizingspeech sounds based on articulation. Major category features for consonants in the IPAinclude place of articulation, manner of articulation, voicing, and airstream mechanism.The phoneme inventory of the language is based on an analysis of the lexicon. The dis-tribution of a sound in the lexicon determines its phonemic status. Phonemes are usuallyargued for on the basis of contrast, with minimal pairs being the best evidence. Sounds thatnever appear in the same environment, i.e. have complementary distribution, are consideredallophones of a single phoneme. There is sometimes an additional requirement that allo-phones bear some phonetic resemblance to each other. For instance, in English the sounds[h] and [N] are in complementary distribution, with [h] appearing only in syllable initialposition, and [N] appearing in non-initial position. Despite this, the two sounds are notanalyzed as allophones of a single phoneme because they are phonetically quite dierent.84Figure 4.1: The inventories of Palauan, from Morén-Duolljá (2005)The inventory of Palauan is a good example of how some of these decisions can aectwhat gets counted in an inventory. The tables combined in Figure 4.1 come from Morén-Duolljá (2005). The top table gives the approximate phonetic inventory, which is somethinglike the set of all articulatorily distinct sounds found in Palauan speech. The bottomtable gives what Morén-Duolljá calls the contrastive consonants. Each box is a phonemiccategory and the symbol ∼ is used to indicate the multiple possible pronunciations for asound in that category.There is a dierence of 8 sounds between the two tables. There are three kinds of velarstops that are articulated in Palauan - voiceless unaspirated, voiceless aspirated, and voiced- but they are all considered variants of a single velar phoneme. There are two reasons forgrouping them together as allophones: (1) they are phonetically similar, (2) they appearin complementary distribution in the lexicon. Specically, [kh] occurs in nal position, [g]appears between vowels, and [k] appears elsewhere. Figure 4.2 provides a word list andsummary of this distribution.Since [k] is the least predictable of the allophones, it is also assumed to be the under-lying phoneme. In constructing a phonemic inventory of Palauan, the velar stop categorywould be represented using the symbol /k/. The aspirated and voiced velars would not berepresented.85Figure 4.2: Summary of the distribution of velar stops in Palauan, with data from Morén-Duolljá (2005))This can be compared to another, rather more simple case, which is the bilabial nasal.The sound [m], according to Morén-Duolljá (2005) is found in a variety of environments,and there are no noticeable variations in pronunciation. There is one other nasal in thelanguage, which appears nearly everywhere that [m] does, so there is no complementarydistribution that might suggest an allophonic relationship. Palauan is therefore assumedto include an underlying category /m/ which would appear in a phonemic inventory.The focus of this chapter will be phoneme inventories. One major reason for this is thatthere exist several large databases of information about phoneme inventories. Additionally,the abstract categorical nature of phoneme inventories makes them somewhat easier to col-lect and analyze, compared to the more gradient nature of phonetic data. Major databasesthat will be frequently referenced in this chapter are UPSID, P-base, and WALS, which aredescribed below.UPSID is the UCLA Phonological Segment Inventory Database. UPSID was the rstmajor database of inventories, and is extremely widely used. It was rst published asMaddieson (1984) with 317 languages. In Maddieson and Precoda (1989), it was ex-panded to 451 inventories. The database attempts to be genetically balanced, to representan even spread of the world's languages. UPSID has a a very simple web interface at:http://web.phonetik.uni-frankfurt.de/upsid.html.P-base was created as part of Je Mielke's dissertation work (Mielke 2008). It containsthe inventories of 628 varieties of 548 spoken languages. The languages in the database arethose that Mielke could nd in grammars available at the Ohio State University and Michi-gan State University libraries (Library of Congress PA-PM). In addition to the inventories,P-base also includes any information about the patterning of sounds that was availablein the grammars. P-base has a graphical user interface with functions for nding natural86classes, calculating feature economy, and comparing inventories. It can be downloaded athttp://pbase.phon.chass.ncsu.edu/WALS is the World Atlas of Language Structures (Dryer and Haspelmath 2013), andcontains information from more than 1,000 languages. WALS is not limited to phonologicalinventories, unlike the previous two resources, but rather contains information about nu-merous aspects of language, including morphological and syntactic information. It is alsonot a single database, rather it is a collection of individual chapters written by dierentauthors, and each chapter may sample a dierent set of languages. An interesting featureof WALS is the ability to display a map of the world, with individual languages taggedand colour-coded for particular features. The information available in WALS comes froma variety of sources, and each language has its sources listed. It is not possible to look ata specic phoneme inventory of a language in WALS. Instead, the information is packagedin a more coarse-grained way, by grouping languages into categories. For example, Feature6A in WALS (Maddieson 2013b) is titled uvular consonants which categorizes languagesinto four categories: those with uvular stops, those with uvular continuants, those withboth, and those with neither. This makes it more useful for broad, typological studies,and somewhat less useful for the study of individual languages. It is available online atwww.wals.info.Large databases like these are created from a diverse array of sources, and constructedwith dierent goals in mind, so it is inevitable that there will be disagreements. Oneexample of this is the way that Jacaltec (Mayan, Mexico) is described in UPSID and P-base. In UPSID the stop series for this languages is listed as three voiceless aspirated stops/ph, th, kh/, two ejective stops /t',k'/, and two implosives /b<, q</ (the < symbol marksimplosives in UPSID). In P-base, the same language is listed with three unaspirated stops/p, t, k/ and there is a set of voiced stops /b, d, g/ labeled marginal. No implosives arelisted at all. The uvular is included in the ejective series /t', k', q'/. The symbol /b'/also appears, which represents a glottalized /b/, but this is apparently distinct from theimplosive (which is not listed for Jacaltec, but does appear for other languages in P-base).The two databases even give similar sources: P-base cites (Day 1973), which appears nearlyidentical to the (Day 1972) reference given in UPSID. Perhaps P-base simply replaced theUPSID symbol < with the apostrophe, although UPSID additionally cites (Craig 1977),which could be another source of the disparity.In UPSID, the median inventory size is between 28 and 29, meaning that half of theinventories have 29 or more sounds, while the other half have 28 or fewer. A majority oflanguages (70%) have between 20 and 37 segments.The smallest known inventory is that of Central Rotokas. The inventory is described inFirchow and Firchow (1969), who divide it into two classes of sounds: voiceless and voiced.The voiceless category consists of three voiceless stops (labial, coronal, and velar). Thevoiced category consists of three sounds that Firchow and Firchow transcribe as a voicedbilabial fricative, a voiced coronal tap, and a voiced velar stop. The phonemic inventory isshown in Figure 4.3.87Figure 4.3: Phonemic inventory of Central Rotokas, based on Firchow and Firchow (1969)However, they note there is considerable amount of free variation in the realization ofthe voiced series. The bilabial is variously realized as a fricative, as a nasal, or even a fullplosive. The coronal may surface as as a nasal, a lateral, a tap, or a plosive. And the velarcan be a fricative, a nasal, or a plosive. The voiceless coronal has slightly more conditionedvariation, and appears as [s] or [ts] before [i] but [t] elsewhere. This means that voicing andplace of articulation are contrastive in Central Rotokas, but manner of articulation is not.More recent eld work by Robinson (2006) suggests that nasals actually are distinctin the Aita dialect of Rotokas. Robinson reports minimal pairs for all three places ofarticulation: buta time vs. muta taste/feel, dito hole vs. nito remove embers, and katiburn vs. Nati bend. Other phonological alternations are similar between the dialects:the non-nasal voiced bilabial of Aita alternates between a stop and a fricative, the coronalalternates between a stop and a tap, and the voiceless coronal surfaces as [s] before [i].Robinson posits that Proto-Rotokas must have had nasal consonants, and they werelost in the Central dialect. He justies this by it being the simpler analysis, suggesting acontext-free historical change where all [+nasal] sounds in Proto-Rotokas became [−nasal]sounds in Central Rotokas. Robinson's account actually relies on [+nasal] sounds onlybecoming underlyingly [−nasal], since nasals still exist as surface variants of voiced soundsin Central Rotokas. What has changed is their contrastive status.According to Maddieson (2013a), there are two major regions of the world where smallinventories tend to predominate (where small means 6-14 consonants). One of these re-gions is the Pacic Islands, where most of the languages belong to the Oceanic branch ofthe Austronesian family. The other region is the northern part of South America, with lan-guages in Ecuador, Columbia, Venezuela and parts of Brazil. Some of these language belongto large genetic grouping, such as Arawakan or Carib, while others remain unclassied orare considered isolates.Perhaps the best known of these South American languages is Pirahã (Everett 1986).Pirahã has 8 consonants, /p, t, k, P, b, g, s, h/, which is 2 more than Central Rotokas.However, Pirahã has fewer vowels, so the overall phoneme count is the same as Rotokas.88Figure 4.4: The inventory of !Xo˜o´, based on Traill (1985)Allophonic variation also diers between the languages, so they have dierent surface soundsas well. da Silva (2014) reports that the voiced stops in Pirahã have nasal variants, althoughcuriously /g/ surfaces as [n], rather than the expected [N]. The fricative /s/ may surface as[s] or [h], and before /i/ the sound /t/ becomes [tS].At the other extreme end of inventory sizes are the Khoe-San languages, which tend tohave extremely large consonant inventories. The !Xo˜o´ language (Khoe-San; Botswana) isgenerally cited as having the largest consonant inventory, and the number given is typicallymore than 100 sounds, although authors disagree on exactly how many. P-base lists 117consonants, of which 70 are clicks, based on Traill (1985). The non-click inventory alsofeatures an impressive number of coronal aricates and ejectives. On the other hand,UPSID lists only 94 consonants, of which 47 are clicks, based on Snyman (1969). Eitherway, the inventory is considerably larger than most others. !Xo˜o´ also exhibits a remarkablyrich vowel inventory. There are vowel contrasts based on length, tone, pharyngealization,nasalization, and combinations thereof (though not all combinations are found).Outside of the Khoe-San group, WALS Chapter 1 (Maddieson 2013a) shows a fewregions around the world that tend to have large inventories, where large is dened ashaving 34 or more consonants. The Caucasus is one such region. It was here that thelargest inventory outside of click languages was found, belonging to Ubykh, a NorthwestCaucasian language that went extinct in the 1990s. According to the inventory in P-base,from Colarusso (1988), it had 81 consonants.Of this total, 76 were obstruents; only 7 were sonorants. Forty-four of the obstruentswere either fricatives or aricates, and the remainder of the inventory consisted of plo-sives. Like many other languages in this family, Ubykh had a very small phonemic vowelinventory, which P-base lists as consisting of only 2 vowels contrasting in height. Sec-89ondary articulations are heavily used in Ubykh, with virtually every obstruent occurring asa plain version in addition to palatalized, labialized, ejective, and pharyngealized versions,and sometimes combinations (e.g. there was a labialized ejective alveolar aricate, and apalatalized labialized uvular ejective).Another hotspot for large inventories, according to Maddieson (2013a), is the PacicNorthwest Coast of Canada and the United States. The larger language families found inthis region are the Salish, Wakashan, and Athabaskan languages. These languages oftenhave 40 or more consonants, and again there tends to be a greater use of secondary articu-lations, such as labialization, and glottalization (of both obstruents and sonorants) is alsoquite common.4.1.2 Population sizeWhy are there dierent sizes of inventories? What prevents languages from all having, say,25 consonants and 5 vowels each? This is due to the fact that inventories are unstable:languages gain and lose sounds over time, and each language follows its own unique path ofchanges, so it is very unlikely that all languages would end up with inventories of the samesize. The more interesting question is what factors aect inventory size in the rst place.Although one might expect phonology or phonetics to be relevant here, in recent years amore widely discussed factor was actually population size.This issue received a great deal of attention when Hay and Bauer (2007) published anunexpected correlation: they found that there was a positive statistical relationship betweeninventory size and the size of the population of speakers of the language. Their corpus foranalysis was the languages included in (Bauer 2007). Some of the languages were removedfrom consideration because it was not possible to get full vowel, consonant, and populationdata. Languages with no living speakers were also removed. Their nal corpus included 216languages. They additionally removed !Xo˜o´ and Acooli from certain analyses because ofthe size of their inventories (more than 4 standard deviations above the mean). They notethat the corpus is not a random set, and is necessarily biased toward languages that haveavailable data. This means a number of languages with very large populations (English,Hindi, Mandarin), but also many that are well-documented but with small populations(Diyari, Hixkaryana, Basque). The corpus is also not geographically balanced, and tendedtoward the Indo-European family and languages of the Pacic.Figure 4.5 from Hay and Bauer (2007, p.13) shows the relationship between populationsize and inventory size in their corpus. They measured total inventory size (bottom rightof the gure) as well as the relationship between population and sub-inventories, such assonorants. In all cases, they found a signicant correlation. Vowel inventories showedsimilar correlations between inventory size and population size, although Hay and Bauerreport that vowel and consonant inventory sizes have no correlation with each other inthis set (see also Maddieson (2007)). The authors additionally investigated the relationshipbetween average inventory size and average population size for each family, and again found90Figure 4.5: Correlations between speaker population size (individual languages) and inven-tory size, from Hay and Bauer (2007).91Figure 4.6: Correlations between speaker population size (language families) and inventorysize, from Hay and Bauer (2007)a signicant positive correlation. Figure 4.6, from Hay and Bauer 2007, p. 16, shows thisrelationship.This correlation was later used in a highly controversial article by Atkinson (2011). Hecalculated that distance from Africa correlates with inventory size, such that languagesspoken further away tend to have smaller inventories. He combines this with Hay andBauer's correlation to support a particular model of human migration out of Africa withsuccessive founder populations: these populations are very small, which implies theirlanguages would have small inventories. In his own words:If phoneme distinctions are more likely to be lost in small founder populations,then a succession of founder events during range expansion should progressivelyreduce phonemic diversity with increasing distance from the point of origin,paralleling the serial founder eect observed in population genetics. (Atkinson2011, p. 1)This paper was heavily criticized, with responses largely focusing on the statistics. Cysouwet al. (2012) nd several faults. Their main criticism is Atkinson's choice of data. Heused the coarse-grained summary of the UPSID database which is available in the WorldAtlas of Language Structures. Cysouw et al. tried to replicate Atkinson's methods on theoriginal UPSID data, and they report no signicant correlation between population sizeand inventory size.92They also object to Atkinson's use of the Bayesian information criterion (BIC) fordetermining geographical origin. Atkinson decided to allow locations which were as manyas 4 BIC units away from the optimal origin, which Cysouw et al. view as an arbitrary,unjustied choice. The authors also tried to use the BIC method on the original UPSIDdata, and this time the model suggested an origin in either Africa or the Caucasus.Donohue and Nichols (2011) used a dierent sample of 1,350 languages, rather thanreplicating Atkinson's study. The found no signicant correlation between population sizeand inventory size in their corpus. They did, however, nd a small correlation when theylooked at inventory size across geographical areas: inventories get somewhat smaller movingroughly west-to-east. Mean populations in the western regions (Africa, Europe) were largerand so were mean inventory sizes, whereas mean population size in the eastern regions(New-Guinea, Oceania) were smaller and so were mean inventory sizes. However, whenlooking within a geographical area, or within language families, there was no signicanttrend. Relationships between population size and inventory size within several families areshown in Figure 4.7.Donohue and Nichols (2011) also note that Atkinson's model predicts a monotonic de-crease in phonological inventory size as distance from Africa increases. In fact, around theworld there are local hotspots for phonological complexity, that is, languages markedlymore complex inventories than those nearby. For instance, within North America, the lan-guages of the Pacic Northwest Coast are a hotspot. These are unexplainable in Atkinson'smodel. Their conclusion is:A positive correlation between population size and size of phoneme inventoryis critical to Atkinson's argument, but such a correlation is not expected givencurrent knowledge of socio-linguistics, typology, and historical linguistics, andit cannot be demonstrated cross linguistically. Donohue and Nichols (2011, p.169)On the other hand, Wichmann, Rama and Holman (2011) did nd support for the corre-lation between population and inventory size. They used an extremely large corpus of 3,153languages taken from the ASJP Database (Wichmann, Muller, Velupillai, Brown, Holman,Brown, Sauppe, Belyaev, Urban, Molochieva, Wett, Bakker, List, Egorov, Mailhammer,Beck and Geyer (2011)). The data consists of 40 concepts, and the corresponding words, indierent languages. The 40 selected meanings come from the Swadesh list (Swadesh 1952)and most of the languages in their corpus had a word for 28 or more of these meanings.The database therefore does not provide direct access to phoneme inventories, only smalllexicons. To estimate the inventory of a language, the authors took the available word listfor a language and collected each symbol that appears in at least one word. In cases wheremore than one word was available for a given meaning, the authors took whichever wordhappened to be listed rst in the database.This method of counting leads to inaccuracies, and makes it unclear if this is a countof phonetic or phonological inventories. Wichmann, Rama and Holman nonetheless nd93rFigure 4.7: Relationship between population size (log scale) and inventory size for severallanguage families, from Donohue and Nichols (2011)94Figure 4.8: Population size and inventory size, from Wichmann, Rama and Holman (2011)that their numbers are comparable to UPSID, for those languages that appear in bothdatabases: word lists are approximately proportional to segment inventory sizes to a degreewhere it is meaningful to use SRs [Segments Represented in the word list] as proxies forsegment inventory sizes when it comes to investigating correlations with other features,such as word length, populations size, and geographical distances. Figure 4.8 shows theplotted correlations between population size and inventory size found in ASJP. Wichmann,Rama and Holman, p. 20 conclude that we are able to conrm that...larger populationsare associated with larger phoneme inventories, ... and that, nally, phoneme inventoriesdiminish with distance from Africa.Moran et al. (2012) attempted to replicate Hay and Bauer (2007), this time using alarger set of 969 languages from the PHOIBLE database (Moran et al. (2014)). They toofound a positive correlation, but an extremely weak one: the model predicts a increase ofbetween 1.02 and 1.04 phonemes per tenfold increase in population size. Since their corpusincludes languages with speaker populations ranging from 1 person to 840 million people,there is a predicted dierence of only nine phonemes between the smallest and largestpopulations in the sample. This is shown in Figure 4.9 from Moran et al. (2012, p. 18).Moran et al. conclude that the correlation, while present, is not big enough to beimportant.We believe that the magnitude of the relationship is not substantial enough tobe of interest when viewed in light of the variation within and across genealogicalgroupings. Thus we nd no compelling reason to consider population size as apotential causal factor in the development of phonological systems, and thus no95Figure 4.9: Predicted magnitude of the eect of population size on inventory size, fromMoran et al. (2012, p. 18)reason to postulate explanations, mechanisms, or reasons why such a patternexists. Moran et al. (2012, p. 896)Although I agree with Moran et al., the fact remains that languages do dier in inventorysize and no alternative explanations have been oered. The conclusion seems to havebeen that whatever determines inventory size, population size is not it. Is inventory sizesomething that randomly uctuates over time? Or could there be some factors that directlyinuence size?Several years before Hay and Bauer (2007), the relationship between population andinventory size had already been discussed by Trudgill (2004), although his work is more ofan argument for how such a relationship could develop, and does not include any statisticalmodel. In particular, Trudgill's paper looks at how contact between populations could bea factor that aects inventory size.In the case of long-term contact, Trudgill proposes that whether inventories grow orshrink depends on the type of contact. He cites Nichols (1992) who argues that in situationsof lengthy, stable contact between two languages, there are many opportunities for wordscontaining non-native sounds to be borrowed, leading to growth in inventory size. Forexample, some Bantu languages acquired clicks through borrowing from neighbouring Khoe-San (Bostoen and Sands 2012).A similar argument was oered by Haudricourt (1961), who suggested that bilingualismplays a role in inventory size. In particular in situations of egalitarian bilingualism, there96is a chance for inventories to grow larger as speakers borrow from each other's languages.On the other hand, Trudgill pointed to the simplications that tend to occur in theprocess of pidginization/creolization. A reduction in inventory size could be viewed as a kindof simplication, and thus a possible outcome of language contact. In other words, contactbetween languages could lead to growth due to borrowing, or loss due to pidginization.The notion that pidgnized/creolized languages should have smaller inventories is whatKlein (2006) calls the creole simplicity hypothesis. He tests this against a corpus of 23creoles, and nds that it actually does not hold true of phonological inventory size. Thesmallest inventory in this sample was Ndyuka (Huttar and Huttar 1994), with an inventoryof 19 sounds. Angolar (Lorenzino 1998) has the largest inventory, with 37 sounds. In otherwords, Klein's corpus of inventories has about the same range of sounds as most of thelanguages in UPSID. On the other hand, Klein reports that creoles tend to use a narrowerrange of stop contrasts than non-creoles.Trudgill also explored the opposite case, that is, the development of languages in isola-tion. His main focus was the Polynesian languages, which exhibit an interesting distributionof inventory sizes: the size of the inventories shrinks as one goes from west to east across theexpanse where these languages are spoken. Inventories are somewhat larger toward Asiaand get smaller toward Hawaii. Trudgill poses the following question: is there any connec-tion between this geographical penetration deeper and deeper into the formerly uninhabitedPacic, and the loss of consonants? (Trudgill 2004, p. 322).The emphasis here is on uninhabited. The languages of the islands of the Pacic devel-oped in relative isolation. They were geographically remote, and had contact mainly withspeakers of closely related Polynesian languages. Trudgill proposes the following possiblelink between isolation and small consonant inventories: isolated communities are startedby a very small number of people, which leads to tighter social networks, which means thatspeakers can assume more shared common ground between each other, which would eventu-ally lead to a situation in which communication with a relatively low level of phonologicalredundancy would have been relatively tolerable. (Trudgill 2004, p. 315).Community structure is the key part of Trudgill's argument that makes it dierentfrom Hay and Bauer (2007), or Atkinson (2011). It is not just about the number of peoplespeaking a language, it is about the network of speakers in a community. The way thecommunity network is organized depends to some degree on the size of the community, andthis is why there could be eects of population size on inventory size. In a later paper,Trudgill (2011) makes it very clear he believes that ve social factors could be expected,in combination, to have various kinds of inuence on phoneme inventory size; it will never,I suggest, be sucient to look at population gures alone.Some of these arguments may be plausible, and Trudgill is probably correct that com-munity structure has a non-trivial eect on language. However, the size of an inventoryis strongly inuenced by sound change, something left unmentioned in these accounts. Inorder for population size and inventory size to be tied to each other, sound changes wouldhave to keep up with the change in population size, e.g. more splits as the population97grows and more mergers/deletions as the population falls.Moreover, it is not obvious how population size could aect any of the actual mechanismsthat can lead to sound change. Consider a specic change: nal devoicing of obstruents.This is due to articulatory factors that make it dicult to maintain voicing in nal position,which leads to the production of obstruents which learners perceive to be voiceless, resultingin a sound change (e.g. Blevins (2006b)). In what way could the population size aect theprobability of this occurring?Drawing on the previous discussion, one might attempt to argue as follows: Final de-voicing is probabilistic and does not occur in every utterance. The size of a populationaects how many people a learner interacts with, which aects how many voiced tokens orvoiceless tokens that learner experiences. With a larger population size, there should be agreater probability of a sound change occurring.The problem with such an argument is that the probability of a learner hearing adevoiced obstruent does not crucially depend on how many people the learner interactswith. It depends on how many words in the lexicon contain a voiced obstruent in word-nalposition and the relative frequency of such words. These are features of the language thathave no connection to population size. Granted, some words are more frequently used incertain communities, so learners in dierent communities may experience dierent frequencydistributions of sounds in their input. However, words are arbitrary sound strings, so neitherthe size of population, nor the structure of the community, could have any inuence of thefrequency of voiced obstruents or their distribution in the lexicon. It is essentially up tochance which words happen to be most frequent in a given language and a given community,at a particular point in time.Even if population size did directly inuence the proportion of devoiced tokens that alearner experiences, this still would not provide a convincing link between population andinventory size, because the devoicing is not guaranteed to either increase or decrease thesize of an inventory. Suppose an inventory only has /b/. Final devoicing of /b/ to [p] couldlead to the introduction of a new phoneme /p/ into the inventory, increasing the size. Ifthe inventory already had both /b/ and /p/, then the distinction becomes neutralized innal position, but there would be no change to overall inventory size. If for some reason/b/ only occurs in nal position, then /b/ could merge with /p/, and inventory size drops.The same holds true for any other arbitrary sound change. Assuming the sound changeis largely, or only, aected by misperception, I can see no reasonable connection betweenpopulation size, community structure, and the probability that a sound change increasesor decreases the cardinality of the inventory.4.1.3 Hypothesis #1: Phonotactics and inventory sizeThere is another rather dierent correlation that I think can help tie together sound changeand inventory size. Maddieson (2007) divided the languages of UPSID into three categoriesbased on their phonotactics: simple (V or CV syllable shapes only), moderate (CV, VC,98CVC, and also CCVC but only if C2 is a glide) or complex (anything else). Comparingthe groups, Maddieson found a correlation between phonotactic complexity and inventorysize: languages with simpler phonotactics tend to have smaller inventories. Languages withsimple phonotactics had an average of 17.66 consonants, those with moderate phonotacticshad an average of 21.3 consonants, and languages in the complex group had an average of25.8 consonants in their inventories.This is easily seen at the extreme ends of inventory size: many of the smallest invento-ries belong to Polynesian languages, which have very simple phonotactics, while languagesof the Caucasus, which have some of the largest inventories, also have much more com-plex phonotactics. (The Khoe-San languages stand out as an anomaly here. They haveextremely large consonant inventories, but tend to have simple syllable structure.)From the perspective of sound change through transmission error, this correlation makessense. As discussed in the rst chapter, sound changes occur when learners misperceive ormisanalyze some part of the signal as something other than what the speaker intended (e.g.Ohala (1981), Blevins (2004)). For instance, a voiced stop produced in nal position maylack some cues for voicing, leading a listener to misinterpret it as voiceless. Misperceptionssuch as this are typically context-sensitive. Simpler phonotactics means a more highlyrestricted set of contexts, which in turn means a smaller number of sound changes arelikely, or even possible. More complex phonotactics means more sounds come into contactwith each other, and there are more opportunities for more, and more dierent, kinds ofchanges.For example when a language has CV as its maximum syllable, consonants can onlyappear in two kinds of environments: utterance initial, or intervocalic. Compare this to alanguage that permits up to CCVCC, so that consonants can appear initially, nally, andwith either vowels or consonants on either side. There will be contextual eects on thearticulation and perception of consonants in this language that will never apply in the CVlanguage. To take the example of nal devoicing again, this is a process that simply cannothappen in a language that has no codas. This means that voiceless consonants cannot becreated in a CV language through nal devoicing, while this can occur in a CVC language.The more contexts, therefore, the more diverse possibilities for sound change, and inthe long run languages with more permissive phonotactics should develop larger inventoriesthan those with more restricted phonotactics. This is stated below as the rst hypothesisof this chapter, which will be tested by computer simulation in Chapter 5.Hypothesis #1 Inventory size is tied to phonotactic complexity, since sound change ispartly context-sensitive, and phonotactics denes the set of possible contexts in a language.Languages with more permissive phonotactics should tend to eventually develop largerinventories than those with more restrictive phonotactics.994.2 Inventory contents4.2.1 OverviewInventories dier not only in how many segments they have, but of course also in termsof which segments. Sound inventories are extremely diverse, and there is no universalconsonant that appears in all languages. Nasals are the most common type of sound. InUPSID (Maddieson and Precoda 1989) /n/ occurs in more inventories than any other, andin P-base (Mielke 2008) the most frequently occurring consonant is /m/. These nasals eachappear in over 90% of languages in their respective databases. It is also interesting to notethat no vowel is universal either. The most common one is a high front unrounded vowel/i/, appearing in more than 90% of the languages in both UPSID and P-base. The image inFigure 4.10 shows the pulmonic consonant portions of the International Phonetic Alphabetchart, with cells size warped by relative frequency of the consonants in P-base1. Note thatconsonants are not very evenly distributed. There are a few categories which have verylarge cells, but there are many more which are almost too small to see.Figure 4.10: IPA chart warped to show consonant frequency in P-base (Mielke 2008)Finding absolute universal properties of consonant inventories has proven dicult. Hy-man (2008) proposed only four:1. Every phonological system has stops2. Every phonological system contrasts stops with non-stops1Credit goes to Je Mielke, the creator of P-base. This image can be generated using a visualizationtool available on his website at http://pbase.phon.chass.ncsu.edu/1003. Every phonological system uses place of articulation contrastively4. Every phonological system has coronal phonemesShortly after Hyman's article appeared, Blevins (2009) published a reply article suggestingthat #4 may not actually be a universal. She argues that Northwest Mekeo, an Oceaniclanguage, has no coronal phonemes. Blevins does report that surface coronals appear,however. The velar stops become palatalized coronal aricates before [i], and the velarnasal has a surface variant [n]. Coronals therefore are articulated by speakers of Mekeo,but the coronal place of articulation is not used contrastively in the language. However,Blevins also notes that there are borrowings that include the lateral [l], and she does notargue that this [l] is allophonic of anything, so perhaps coronal phonemes do exist in thislanguage, even if only marginally.Given the range of inventory sizes, as discussed in section 4.1, and the lack of universalconsonants, it is unsurprising that there exist virtually no cases of languages with identicalphoneme inventories (and it is probably impossible to nd any with the same phoneticinventory). P-base contains a few identical inventories, but in some cases these belongto dialects of the same language: Kirzan Armenian and Standard Eastern Armenian arelisted as having the same inventory, as are several varieties of Irish English. There wereonly four pairs of languages in P-base with identical inventories that were not listed asdialects, and three of them are Australian languages: Arabana and Wangkangurru, Garawaand Ganggulida, Wambaya and Nyulnyul. The only inventories found outside of Australiathat were identical belong to two Indo-Aryan languages, Bagri and Marwari (Shekhawatidialect).More matches can be found by narrowing the search so as to look for only identi-cal consonant inventories, and excluding vowels. Bagri and Marawari are still the onlyIndo-European languages that turn up. Hiligaynon and Balangao, Austronesian languagesspoken in the Philippines, also have identical consonant inventories. There was only a singlepair of Afro-Asiatic languages with the same consonants: Hurza and Muyang, both Chadiclanguages of Cameroon. There was only a single pair of languages in the Americas withmatching consonants: the Uto-Aztecan languages Comanche and Shoshoni. Finally, therewere four sets of Australian languages with identical inventories: (1) Ganggulida, Garawa,(2) Biri, Ngiyambaa, (3) Gunin, Wambaya, Nyulnyul, and (4) Arabana, Martuthunira,Muruwari, Wangkangurru, Yinggarda. It is worth noting that in every case, the languagesare spoken in areas nearby one another, and they share linguistic ancestors. There are nocompletely unrelated languages in P-Base with identical inventories.While it may be rare to nd entirely identical inventories, there is another, similar,relationship that is more common: some inventories are supersets of others. In other words,there are many pairs of languages A and B, where A has all of the phonemes that B has,and then some. The scatter plot in Figure 4.11 shows the number of superset inventoriesfor each inventory in P-base.101Figure 4.11: Consonant inventory size and number of superset inventories in P-baseThere is a very clear trend: small inventories have many supersets, large inventorieshave none. From a numerical perspective, this makes sense. The bigger an inventory gets,the smaller the chances that there will be another, yet larger, inventory that can act as asuperset. There is a drop o when inventories reach 20 consonants. After this, inventoriesrarely have any supersets at all. This suggests that as inventories become larger, there isan increasing amount of diversication that takes place. Put otherwise, large inventoriestend to have all the sounds also found in small inventories, and then some more rare ones.I searched P-base for the frequency of each consonant symbol, and found that there are299 unique consonants in the database, that is, consonants that appear in only a singlelanguage (and some languages have more than one unique consonant). Figure 4.12 showsa plot of inventory size against the number of unique consonants, and the expected trendappears: larger inventories tend to have more unique consonants. Virtually all inventoriesabove 40 have at least one unique sound.The unique consonants are ones that would be regarded as complex, such as /ndB/ or/XQw/. On the other hand, P-base has 22 highly frequent consonants that are found in 200or more languages each: /p, b, t, d, k, g, P, f, v, s, z, S, tS, h, m, n, ñ, N, l, r, j, w/, all ofwhich would be considered simple.It is important to point out that the superset relations are at least in part due tothe fact that P-base contains related languages. If languages share a common ancestor,then it becomes more likely that they will have phonemes in common. For example, theinventory of Comanche is a subset of the inventory of Western Shoshoni, and both languagesare listed in the Central Numic group in P-base. Languages may also have phonemes incommon if they are geographically close to each other. However, there are many examplesof languages that are in a subset/superset relation, and which are neither genetically norgeographically close to one another. The inventory of Blackfoot (Algic, western Canada) is asubset of Tiv (Niger-Congo, Cameroon), Ainu (isolate, Japan), and Javanese (Austronesian,102Figure 4.12: Consonant inventory size in P-base and number of unique consonantsIndonesia) among others. Ideally, comparisons of inventories would be done using a corpusof languages, with a balanced number of language families and locations, but this is adicult task to accomplish, given the limited phonological data that is available.A similar superset relationship has been found in the languages of UPSID, in a studyby Lindblom and Maddieson (1988) (see also Maddieson (2011)). They grouped all of theconsonants in UPSID into three sets, which were supposed to represent increasing consonantcomplexity. Set 1 they refer to as containing basic articulations, Set 2 contains elaboratedarticulations and Set 3 contains combinations of elaborated articulations. They begin bydening Set 2 consonants, then Set 3, leaving anything unspecied in Set 1.Set 2 contains voiced obstruents and voiceless sonorants, which Lindblom and Mad-dieson say are departures from the default mode of phonation. Pre-nasalization, nasalrelease, and pre- and post-aspiration are also included in Set 2. Also included are ejec-tives, clicks, and implosives. Five places of articulation also belong in Set 2: labiodental,palato-alveolar, retroex, uvular, and pharyngeal. Finally, palatalization, labialization, andpharyngealization are included in Set 2.A sound can only belong to a single one of these categories to remain in Set 2. If itcombines two or more, then it goes into Set 3. For instance, /d/ would be in Set 2, as avoiced obstruent, but /dj/ would be in Set 3 because it additionally has palatalization. Theconsonant /k'/ belongs in Set 2, but /k'w/ is in Set 3 because it is an ejective and it haslabialization. Set 1 consists of the leftovers, any sounds not grouped into Set 2 or Set 3.Lindblom and Maddieson specically list the following as Set 1 consonants in UPSID: /p,t, k, P, s, h, m, n, N, l, r, w, j/.Lindblom and Maddieson found that all languages use at least some of the consonantsin Set 1. Interestingly, they also found that there is a correlation between inventory size103and the probability of having consonants from one of the three sets. Small inventories tendto be made up primarily of Set 1 consonants, and as inventories get bigger more Set 2 andSet 3 consonants are found.I attempted to replicate Lindblom and Maddieson's study using the inventories of P-base. It was somewhat challenging to decide which sounds from P-base should go intowhich Set. Lindblom and Maddieson described sounds in terms of IPA-style articulatoryfeatures. P-base denes all of its segments based on phonological features, and these donot all have a nice t to articulatory descriptions. In the end, categorization was achievedthrough of a combined search of both phonological features and the descriptive names of theUnicode strings. Voiceless nasals, for example, are easily found with the feature set [+nasal,−voc, +son, −voice]. On the other hand, it turned out to be easier to nd consonants withpharyngealization by looking for the existence of the name MODIFIER LETTER SMALLREVERSED GLOTTAL STOP in a Unicode character.A simple Python script was created with a list of phonological feature values and Uni-code names that should be agged as belonging to the elaborated set (Set 2), based on thedescriptions of Lindblom and Maddieson. For each segment in P-base, a score was assignedfor the number of agged descriptions. If a segment only matched one ag, then it wasassigned to Set 2. If it matched more than one ag, it was assigned to Set 3. If it matchednone of the ags, it was assigned to Set 1.This actually yielded a very dierent collection for Set 1 than what was reported byLindblom and Maddieson. For instance, doubly-articulated voiceless stops are not men-tioned at all by Lindblom and Maddieson, so they by default end up in Set 1 since theymatch no other descriptions. Only a single voiceless aricate was in Set 1 in UPSID, butthere are more than a dozen in P-base. Interdental fricatives are also not mentioned atby Lindblom and Maddieson. I agged only the voiced ones, which matches the spirit oftheir proposal that voiced fricatives be considered Set 2 while voiceless fricatives belong inSet 1. Geminate consonants were not mentioned by Lindblom and Maddieson either, but Idecided to ag them as well.Despite some of these dierences, the results for P-base are very similar to what Lind-blom and Maddieson found for UPSID. Figure 4.13 shows a plot of inventory size againstnumber of segments in each complexity class. For inventories larger than about 40 seg-ments, the results are unsurprising. The cardinality of Set 1 is less than that of Set 2 orSet 3. As inventory size grows, it is only natural that the number of Set 1 consonants willbe less than the number of Set 2 or 3 consonants. The interesting part of this gure isthe lower left quadrant, with the smaller inventories. It is entirely plausible for these smallinventories to consist mostly of Set 3 consonants, since this set has the largest number ofmembers to draw from, but instead we nd that Set 1 consonants are heavily represented.There are two parts to Lindblom and Maddieson's correlation: the frequency of sounds,and their articulatory complexity. I only want to focus on the former. In any case, it isquestionable whether a coarse grouping into three sets is the best way to classify soundsby complexity. As I found in trying to replicate the analysis, there are many sounds which104Figure 4.13: Segment complexity plotted against inventory size for the inventories of P-baseLindblom and Maddieson did not consider, and depending on how these other sounds getsorted the correlation might come out dierently. Therefore, I do not think we can concludemuch about the relative frequency of simple and complex sounds.However, ignoring complexity, we are still left with the kind of superset relation that Ifound in P-base: there exists a small set of sounds which are common to inventories of allsizes, and inventories become increasingly diverse as they get larger.Lindblom and Maddieson hypothesize that the superset relations are due to the waythat inventory growth happens: inventories initially saturate the smaller space of easy-to-articulate sounds, and then move into more complex territory. This would explain why allinventories seem to make use of simple sounds from Set 1, and why only larger inventoriestend to have Set 2 and Set 3 sounds. The authors use a metaphor of a rubber band anda magnet. Imagine a space of all sounds that could possibly be articulated. Somewhere inthere is a small subspace of neutral sounds (i.e. those in Set 1). As consonant inventoriesgrow, they rst use up this space, then begin to expand beyond it into more complex sounds(Set 2 and Set 3). A metaphorical rubber band pulls the sounds back towards the simplerarticulations, while a metaphorical magnet pushes sounds apart from each other.They note that one way to refute this would be to look for inventories that reversethis pattern, that is, small inventories with mainly Set 3 consonants, or large inventorieswith mainly Set 1 consonants. I conducted a search of P-base for counter-examples, whichturned up only a handful of languages.105Figure 4.14: Consonant inventories from P-base with reversed segment complexityThe smallest inventory with mainly Set 2 and Set 3 consonants belongs to Sinaugoro(Austronesian, Papua New Guinea). This language has 10 consonants from Sets 2/3 andonly 6 from Set 1, and the complex sounds are all voiced obstruents (some of them are alsolabialized). Other small inventories included Central Ojibwe (Algonquian, Canada), Dagur(Altaic, China), and Pileni (Austronesian, Solomon Islands). Ojibwe has 10 consonantsfrom Sets 2/3 and only 5 from Set 1, and the complex sounds are all geminates. Dagur has11 from Sets 2/3 and only 5 from Set 1, with all the complex sounds coming from voicedobstruents, or aspirated voiceless stops. Finally, Pileni has 11 from Sets 2/3 and 5 from Set1, and its complex sounds include aspirated stops, aspirated nasals, and voiced obstruents.The full consonant inventories of these languages are show in Figure 4.14.The other reverse pattern would be inventories that are large and have mainly Set 1sounds. This is dicult to nd, if only because of the limited number of Set 1 consonants.The closest inventory is that of Shilluk (Nilo-Sarahan, Sudan). Shilluk has quite a largeinventory of 31 consonants, 11 from Set 1 and 10 each from Set 2 and 3, so it has more Set1 consonants than either Set 2 or Set 3, but not combined. It has some aspirated stops,voiced stops, and geminate sonorants. No elaborated places of articulation are used andthere are no implosives or ejectives.Another, similar, statistical analysis of UPSID was undertaken by Choudhury et al.(2006). They constructed a bi-partite graph where nodes represented either languages orconsonants. A language node was linked by an edge to a consonant node if that consonantwas found in that language's inventory. Choudhury et al. then looked at the so-calleddegree distribution. The degree of a node is simply the number of edges connected to it.They found that the frequency of consonants follows a power law, where a small numberof consonants are extremely frequent. They hypothesis a principle of attachment, where106consonant inventories grow by rst selecting from the set of most common consonants beforeselecting from the less common ones.4.2.2 Hypothesis #2 - Common consonantsIf taken literally, Lindblom and Maddieson's claim about inventory growth is questionable.They provide no historical evidence that languges actually do grow by expanding from abasic set, and no specic examples of sound change mechanisms were oered to substantiatethe rubber-band and magnet metaphor. Choudhury et al.'s principle of attachementfairs no better, since it relies on the same unfounded assumption that languages rst growfrom a specic set of sounds, before diversifying.I propose that the force drawing languages toward a common set of consonants comesfrom context-free sound changes, rather than context-sensitive ones. Smaller inventoriestend to have simpler phonotactics (by Hypothesis #1), which means fewer overall context-sensitive changes can take place, limiting the potential sounds that could join the inventory,and giving context-free change a greater weight. Larger inventories tend to have morecomplex phonotactics, which means that they could potentially be aected by a widervariety of context-sensitive changes, in addition to the context-free ones. This leads to asituation where both small and large inventories have sounds in common (namely the onesthat are the result of context-free changes), while larger inventories can also develop rareor unique sounds from context-sensitive changes.Hypothesis #2Common sounds exist because of context-free biases in transmission that aect all lan-guages, regardless of phonotactics. Rarer segments are rare because they require morespecic phonetic environments to appear, and these are more likely to exist in larger in-ventories, because larger inventories have more, and more dierent, phonetic contexts (byHypothesis #1).4.3 Inventory organization4.3.1 OverviewEven though consonant inventories dier quite considerably from language to language,they are not random collections of sounds. As even casual observation of sound inventorieswill show, they tend to line up along dierent feature dimensions, rather than being spreadrandomly about. For instance, consider the inventories of Noon (Niger-Congo, Soukka 2000)and Tamazight (Afro-Asiatic, Abdel-Massih 1971), shown in Figure 4.15.Noon's stop system is nearly a perfect square. There are stops at four places of articula-tion: labial, alveolar, palatal, and velar (plus the lone glottal stop). At each place, stops can107be voiceless, voiced, pre-nasalized, nasal, or implosive. Only the velar implosive is missingfrom the language. This language is very stop-heavy, having only a few fricatives. Theinventory of Ait Ayache Tamazight makes signicant re-use of a small number of features.Nearly all consonants can be either short or long. Velar and uvular stops have labializedversions. Pharyngealization is contrastive on coronal consonants, as is length. There arestill few gaps in this system: there is no long /z/, the pharyngeal fricative has no lengthcontrast, and the uvular fricatives have no labialized version.Figure 4.15: Consonant inventories of Noon and TamazightIn comparison, consider the randomly generated inventory in 4.16. In this, sounds arespread more widely around the chart, and there is less re-use of a given feature. Certainlythere is some re-use that occurs: there happen to be quite a few labial sounds, and the velarsare well populated too. There are also cases of poor feature re-use, like the three voicedalveolar stops, which are the only instances of labialization, palatalization, or implosives inthe entire inventory. There are also very few pairs of sounds that dier by a single feature,which is common in natural languages. Most pairs of sounds dier by more than one. Thisinventory also looks unnatural because it is small, yet has numerous complex or rare sounds,and lacks many of the simple common sounds.Figure 4.16: Randomly generated consonant inventory108This observation about the re-use of features was formalized into the concept of featureeconomy by Clements (2003). Clements was not the rst to discuss the idea, however.Ohala (1992) suggested that consonant inventories tend to obey a principle of maximalutilization of the available distinctive features. Martinet (1952) referred to this as thetheory of pattern attraction and, assuming functionalist principles, reasoned that havingfewer articulations (i.e. distinctive features) made sounds more distinct, which made lan-guage easier to perceive and possible to produce. Clements (2003, p. 292) himself citeseven earlier work by de Groot (1931, p. 121), discussing sound change: those phonemesthat appear are those which have only phoneme marks already guring in the system.The concept also relates to the ndings of Lindblom and Maddieson (1988), discussed inthe previous section. They proposed that languages rst make considerable re-use of asmall set of articulations before growing, while a metaphorical rubber band eect pre-vents inventories from splitting too far apart. In more recent literature, two additionalmeasurements of economy were proposed by Hall (2007) and another by Mackie and Mielke(2011). Mackie and Mielke further showed that inventories of natural languages dier fromrandomly generated sets of segments, in terms of feature economy scores.4.3.2 Feature economyClements (2003) denes the term feature economy as the tendency for languages to maxi-mize the ratio of segments over features. He is careful to contrast this notion of economywith others. Feature economy in this sense is not the same thing as parsimony or simplicity,which would suggest that, all other things being equal, a smaller inventory, one without toomany parts, is better than a larger inventory. Feature economy says that a language willmaximize the segments it has, given a particular set of contrasting features. The absolutevalue of the size of the inventory is not what is important.This is an important distinction to make, since the actual size of the inventories of theworld's languages varies enormously, as discussed in a previous section. Feature economydoes not presuppose anything about inventory size, and a range of sizes is compatible withits predictions, whereas a simplicity-driven account of inventory structure is clearly at oddswith the data, which does not suggest any tendency for language to economize on the sheernumber of segments. It does, however, turn out that dierent measurements of economyare biased towards dierent sizes of inventories; this is discussed more in section 4.3.2.1 .Feature economy is a description of how sounds in a language relate to each other,and it should also be distinguished from economy in the representation of features, e.gsome kind of underspecication, which involves theoretical assumptions about the mentalrepresentation of features and lexical items (Steriade 1995, Lahiri and Reetz 2010).Finally, feature economy should be contrasted with symmetry, although economicalinventories do display a certain degree of symmetry. Clements illustrates this point withthree examples of inventories, shown in Figure 4.17.109Figure 4.17: Three sound systems diering in symmetry and economy, from Clements (2003,p. 292)Assume that these example systems can be analyzed with only two features, [voice]and [continuant]. Place features are also required, but they are ignored here because allthree systems make the same place contrasts. System A is completely symmetrical, andwould be considered perfectly economical. Recall that Clements denes feature economyas maximization of the segment to feature ratio. In System A, this ratio is at its maximumand every combination of [voice] and [continuant] is in use at every place of articulation.System B is also symmetrical, but is not as economical, since there are no [+continuant,+voice] segments. System C lacks symmetry. There is a gap in the [+continuant, −voice]series, and two gaps in the [+continuant, +voice] series. Nonetheless, System C has greatereconomy than System B, because System C contrasts 13 segments to System B's 12, bothof them using the same number of features. Thus System C's segment/features ratio ishigher.Clements (2003) present two specic predictions of the feature economy hypothesis.The rst prediction is what he calls mutual attraction: a given speech sound will occurmore frequently in systems in which all of its features are distinctively present in othersounds (p.296). The second is called avoidance of isolated sounds: a given speech soundwill have lower than expected frequency in systems in which one or more of its features arenot distinctively present in other sounds (p.306).To turn to specics, consider a voiced labial sound, call it V, which would be representedas [+voice], [+continuant], and [labial] in Clements' feature system. According to thepredictions of mutual attraction, V should be more common in inventories that also haveanother labial, another voiced sound, and another continuant. Clements clearly speciesthat all three conditions would have to be satised for mutual attraction to be supported. Itdoes not necessarily have to be the case, however, that there be three other sounds. Mutualattraction would be supported by inventories with V, P (a voiceless labial stop) and Z (avoiced coronal fricative).To check this prediction, Clements analyzed the languages of UPSID. Here I summarizehis ndings for the co-occurrence of V and Z. The method he used is to compare the observedvs. expected co-occurence frequencies of dierent segment pairs within inventories. In thiscase, that involved counting how many language have a V but not a Z, how many have a Zbut not a V, how many have a Z and V, and how many have neither segment. The table inTable 4.1 comes from Clements (2003, p. 303) and shows the results for V and Z in UPSID.110Table 4.1: Co-occurrence of V and Z in UPSID (from Clements (2003, p. 303) )The numbers outside the parentheses in Table 4.1 are the actual occurrences of thesegments. The numbers in parentheses are the expected frequencies, which are calculatedon the assumption that the frequency of V or Z in a given cell of the table is proportionalto its frequency in the sample as a whole. The expected frequencies are calculated as thesum of the row multiplied by the sum of the column that the cell is in, divided by the totalnumber of languages.If the prediction of mutual attraction is correct, then the actual frequency of languageswith both V and Z is going to be higher than the expected frequency, i.e. the ratio ofobserved/expected frequencies is going to be greater than 1. Similarly, if the prediction ofavoidance of isolated sounds is correct, then the cells corresponding to languages with onlyV or only Z are going to have a ratio of observed/expected of less than 1. As Figure 4.1shows, the actual dierences between observed and expected frequencies are in the directionof these predictions. Clements (2003, p. 304) reports that the dierences are signicantlydierent, and highly so (.1 5 ))1:2(+; p < (:((()).To illustrate the second prediction, avoidance of isolated sounds, consider an inventorywith a voiceless labial stop P and a voiceless coronal stop T. Such an inventory can becontrasted with the feature [labial] alone. Adding in a voiced labial stop B to this inven-tory requires adding a new features, [voice], to contrast P and B. However, this is not aseconomical as having both B and a voiced coronal stop D. Avoidance of isolated soundspredicts that the observed/expected ratio for languages with B and D will be higher thanexpected and B without D will be lower than expected.In addition to these reported correlations, Clements' paper also proposes a way ofcalculating a feature economy score for an inventory. However, he calculated the economyof only three languages. Later work by Hall (2007) proposed two other metrics, but againonly using the same three languages as Clements. Mackie and Mielke (2011) used thesethree metrics, plus one more, to calculate the economy of nearly 500 languages in P-base.These metrics are described in the next section.1114.3.2.1 Measuring feature economyClements (2003) oers a particular measurement of feature economy, which I will refer to asthe Simple Ratio measurement. Using this measurement, the economy value E is calculatedas:ES:R: 5 b=F (4.1)where S is the number of segments in the inventory, and F is the minimum number offeatures required to contrast them all. The term contrast in this case has a particularmeaning. Normally, saying that two segments contrast with each other suggests that there isa minimal or near-minimal pair in the language based on these segments. For the purposesof evaluating feature economy, segments are said to be contrastive if they dier by at leastone feature.Clements did not oer any specic methods for nding this minimum set of features,although some algorithms have since been proposed. The Feature Economist algorithm inMackie and Mielke (2011) works by doing a pairwise comparison of every segment in aninventory checking for contrast. On the rst pass, it checks using fully specied segments,then it discards increasingly larger sets of features until contrast becomes impossible. It isdiscussed in more detail later in this section.The Successive Division Algorithm (Dresher 2003) is another way of nding contrastivefeatures. This algorithm starts by selecting a feature at random, then it tries to divide theinventory into groups of segments that are either [+F] or [−F] (assuming binary features).Within each of these groups, a new feature is selected, and the segments again divided into[+F] and [−F] groups. This continues until the inventory has been divided up into groupsconsisting each of a single segment, and the contrastive features are whatever features werenecessary to create these divisions.Obviously, nding F depends on which feature system is in use, so the score is alwaysrelative to some feature set. (In fact, Clements suggests that feature economy could evenbe used as a way of comparing dierent feature theories.) In Clements (2003), the featuresystem is a custom set of features that he selects for the paper, without any justicationprovided. They are [sonorant, labial, dorsal, nasal, voice, spread glottis, constricted glottis,continuant, posterior, apical, lateral]. To demonstrate how to calculate economy, Clementsuses the inventories of Hawaiian, French and Nepali (see Figure 4.18).112Figure 4.18: Inventories of Hawaiian, French and Nepali (from Clements (2003, p. 288)).Dashed boxes represent areas of the inventory that Clements' considered representative offeature economy eects.Hawaiian has 8 consonants and is analyzed as requiring 5 features for contrast: [sono-rant, labial, nasal, spread glottis, constricted glottis]. Given this, the economy value isE 5 0=5 5 ):6. Clements is not explicit about how he comes to this minimal set. What-ever the method used, the minimal features are those for which not all segments in theinventory have the same value. For instance, Hawaiian has no lateral consonants, so everyconsonant is [−lateral], and therefore [lateral] is a non-contrastive feature.French has 18 consonants and needs 7 features: [sonorant, labial, dorsal, nasal, voice,continuant, posterior]. Its economy value is E 5 )0=7 5 2:57. Finally, Nepali has 27 conso-nants and needs 10 features to be contrasted, that is, every feature except for [constrictedglottis]. It scores the highest economy value of E 5 27=)( 5 2:7.The maximum number of segments an inventory can be have is nF for a system withF n-ary features. Since phonological feature systems are usually binary, the maximum isusually 2F . The maximum economy score is therefore going to be reached when b 5 2F .Practically speaking, it is unlikely for any language to reach this maximum, even for smallnumbers of features. Having ten features means it is possible to contrast up to 1,024segments, an impossible size for a natural language inventory.Clements (2003, p. 289) intends the metric to measure economy so that the higher thevalue of E, the greater the economy. This turns out not be the case, however, and the simpleratio measurement is biased toward larger inventories, meaning that larger inventories tendto get assigned higher scores. For example, a 32 segment inventory that can be contrastedwith only 5 features is perfectly economical, since 24 5 +2. Its economy score is E 5+2=5 5 6:4 On the other hand, a 64 segment inventory that can be contrasted with 6features is also perfect, because 25 5 64. The larger inventory has an economy score ofE 5 64=6 5 )(:666 : : :. The larger inventory appears to be more economical, but both113inventories are actually as economical as they can be for their size.More importantly, large imperfect inventories can have higher scores than small perfectinventories, which should not be the case if this number is truly measuring economy. Forexample, a 16 segment inventory that takes 4 features for contrast is perfect, since 23 5 )6.Its economy score is E 5 )6=4 5 4:(. On the other hand, a 50 segment inventory requiring7 features would have an economy score of E 5 5(=7 5 7:)42. In fact, for 7 features, themaximum inventory size is 26 5 )20, so a 50 segment inventory is not nearly as economicalas it could be.This is just a consequence of using a simple ratio as a measurement: a linear increasein the number of features results in an exponential increase in the number of possiblesegments. This results in inventory size being tied to economy score, rstly in the trivialsense of perfect scores necessarily being a power of 2 (or whatever base the feature systemuses), but more signicantly in that larger inventories can achieve higher scores that areimpossible for smaller inventories.Hall (2007) proposed two alternative measures, which he termed Exploitation and Fru-gality. The Exploitation metric measures how close an inventory comes to having themaximum number of segments, given the feature set. It is calculated as:EexpRoitation 5 b=2F(4.2)The other measurement proposed by Hall is called Frugality, and it is in some sense themirror image of Exploitation. The Frugality metric measures how close an inventory comesto having the minimum feature set, given the size of its inventory. It is calculated as:EfrugaRity 5log1bF(4.3)Both of these metrics have an advantage over the Simple Ratio metric, which is thatthey are bounded between 0 and 1, where 1 is a perfectly economical inventory. Thiseliminates the problem with Simple Ratio where dierently sized perfect inventories couldhave dierent scores. This means that large inventories cannot outscore smaller inventoriessimply by being large.In fact, large inventories will tend to be punished slightly under Hall's two metrics.This is because the minimum feature set for a large inventory is going to be larger thanthe minimum feature set for a small inventory. A language with 50 segments requires atleast 6 features, because 25 M 5( M 24, while an inventory of 15 segments can potentiallybe contrasted with as few as 4 features, because 23 M )5 M 22.Inventories of these sizes would be assigned dierent scores on these metrics, even thoughboth have the minimum possible feature count given their size. The large 50-segmentinventory has a maximum Frugality score ofRog2405 5 (:14(, and the smaller 15-segmentinventory has a maximum Frugality ofRog2043 5 (:176. For Exploitation, the 50-segmentinventory scores40165 (:70) and the smaller 15-segment inventory scores 04145 (:1+75.114Mackie and Mielke (2011) proposed yet a fourth metric, called Relative Eciency, thatattempts to address this problem. This metric looks at how close an inventory is to theminimum feature count, relative to what the minimum and maximum counts would be foran inventory of that size. It is calculated as:ER:E: 5 )−√F − FSinFSax − FSin (4.4)Where F is the minimum number of features actually required to contrast the inventory,Fminis the minimum possible number for an inventory of this size, and Fmaxis the maximumnumber of features. More specically, FSin 5 ⌈log1b⌉, meaning that log1b is rounded upto the next integer, and FSax 5 b − ). When an inventory is maximally economical,then F 5 FSin, and the term inside the square root evaluates to 0, which gives an overallscore of 1. Relative Eciency will assign the same score to a 50 segment inventory with 6features as it will to an inventory of 15 segments with 4 features because in both cases thenumerator evaluates to 0 (unlike Frugality and Exploitation, as shown above). When aninventory is maximally uneconomical, then F 5 FSax and the numerator and denominatorare therefore equal, so the inside term evaluates to 1, which gives an economy score of 0.4.3.3 Cross-linguistic tendenciesMackie and Mielke (2011) measured the feature economy of 479 languages in P-base usingall four of the metrics discussed above. The segments in P-base are all fully specied usinga modied version of the Sound Pattern of English feature system (Chomsky and Halle(1968), see Mielke (2008)). To determine the minimum number of features required for agiven inventory, the Feature Economist algorithm was used. An overview of this algorithmcan be found in Mackie and Mielke (2011). In section 5.4.4 there is a more technicaldescription of the implementation of Feature Economist used in this dissertation.Figure 4.19 gives the distribution of scores calculated in the languages of P-base byMackie and Mielke (2011) using this algorithm. Exploitation is shown on a logarithmicscale because the small range of scores is more clearly displayed this way. It should be notedthat there is a general problem with the statistics reported by Mackie and Mielke, whichis that the languages of P-base are not independent. Many of the languages are relatedto each other, and since related languages have similar phoneme inventories, they will alsohave similar feature economy values. Additionally, languages spoken in geographically closeregions may borrow phonemes from each other, resulting in similar inventories and similareconomy scores. Ideally, economy would be calculated over a set balanced by languagesfamily and geographical location.115Figure 4.19: Ranges of feature economy scores in the inventories of P-base (Mielke 2008)using the Feature Economist algorithm (Mackie and Mielke 2011). Exploitation is shownon a logarithmic scale.Of the languages in P-base, West Greenlandic (Fortescue 1984) scored the highest onall metrics except Simple Ratio. The inventory is shown in Table 4.2. This language has 17segments, and requires 5 features. This gives it a very high Frugality score of 0.817, because5 features is the minimum possible for a 17 segment inventory. It scores even better onRelative Eciency, reaching the maximum score of 1.The maximum number of segments for 5 features is 32, so the Exploitation score is only0.531, but this is still the highest score achieved in the entire sample of languages. This isbecause the maximum of 32 is a relatively small number, within the range of the majorityof inventories in P-base. As the number of features climbs, the maximum inventory sizegrows exponentially, making it increasingly harder to get a high Exploitation score. Themaximum size for a 7 feature inventory is 128 segments, which is already larger than thatof practically any language on Earth. It is likely that no language could have an inventorylarge enough to reach a perfect Exploitation score for 8 or more features. In this way,Exploitation is the reverse of Simple Ratio, because it tends to assign higher scores tosmaller inventories. (This dierence in scores was discussed in some detail at the end ofthe previous section.)116Table 4.2: The inventory of West GreenlandicIndeed, it is obvious that languages will very rarely attain perfect scores on any of theseeconomy metrics, with West Greenlandic being the outlier in this case. There are reasonsother than pure economy that constrain the shape of inventories. To attain a large Frugalityor Exploitation score, every single feature value combination must be used, and in somecases this is not possible, or at least unlikely. For instance, it is common for obstruentsto contrast in voicing, but not sonorants. This means that in a system that requires both[sonorant] and [voice], there is probably not going to be maximum use of features, andthe combination [+son, −voice] has forces acting against it that are unrelated to economy.Some feature combinations may be literally impossible, such as [+high, +low].If feature economy is an organizational principle of language, whatever its basis, thenwe would expect that randomly generated sets of segments will not show the same economyeect. Mackie and Mielke (2011) tested this prediction by creating randomly generatedsets of segments, and calculating their economy scores. For each inventory in P-base, a newinventory of equal size was generated. Segments were added to each inventory by drawingrandomly from a pool of all the segments in P-base. The probability of selecting a particularsegment was equal to its relative frequency in P-base.Of the 479 inventories generated, 121 were non-contrastive, meaning that they weregenerated with segments containing identical feature specications. This is a limitationthat results from the fact that P-base uses modied SPE features which does not fullydistinguish between all of its segments. For instance, /tSh/ and /tCh/ are given exactlythe same specications, namely [+cons, −voc, −son, −cont, −voice, −nasal, +cor, −ant,+strid, −lat, −back, −low, +high, −round, +distr, ncovered, −syl, +tense, +del_rel,ndel_rel_2, −glot_cl, +hi_subgl_pr, nmv_glot_cl, −LONG, −EXTRA]. If a languagehappens to have both of these segments, it is non-contrastive, because there is no wayto contrast these two elements of the inventory. This means it is not possible to calculatefeature economy of these inventories: the feature-reduction algorithm will fail before evenstarting, since it will not be able to nd contrast on even the rst step. Only 358 of the 479inventories generated were contrastive and subjected to analysis. The results are given in117Figure 4.20.Figure 4.20: Feature economy scores of natural languages and randomly generated inven-toriesMackie and Mielke performed an ANOVA with type (natural vs. random) and inventorysize as factors for each of these metrics. Main eects were signicant for all four metrics.The interaction was signicant with all except Exploitation. Inventory size and economy arepositively correlated when using the Simple Ratio or Relative Eciency measures, meaningthat larger inventories have higher economy scores. The correlation is negative for Frugalityand Exploitation, meaning that larger inventories have lower economy scores. The reasonsfor these correlations were discussed in some detail in section 4.3.2.1. Natural languageinventories tended to score higher on all four metrics compared to randomly generatedinventories.Coupé et al. (2011) have, in contrast, argued that feature economy is not a very strongtendency, based on their own analysis of UPSID. Their methodology diers considerablyfrom the one used by Mackie and Mielke (2011). Rather than using a small set of phono-logical features, Coupé et al. used a set of 100 phonetic features based on the labels usedin the standard IPA chart.The authors use what seems to be an Exploitation-type measurement, measuring actual118segments compared to possible segments, although they do not cite the Exploitation metricdened by Hall (2007) (they also conate economy with ease of articulation, but this mightbe forgiven considering that they are using only articulatory features).If feature economy (ease of articulation) were to be the only principle actingon the content of PI [phonological inventories], we would expect systems toshow a maximal use of features. In other words, the ratio of the actual numberof segments in a system by the number of possible segments (given the set offeatures of this system) should be close to 1. (Coupé et al. 2011, section 3.2)The authors compared economy scores in inventories when segments are specied for allfeatures, to the economy of inventories if only minimal specications are used. Unsurpris-ingly, if no feature reduction algorithm is used (i.e. every segment gets full specication),then inventories are very uneconomical. Given the 100 features being used, an inventoryneeds 2100segments to reach maximum economy.To nd the minimum feature set, Coupé et al. used a method that is not clearlydescribed, and is questionably ecient: We developed an algorithm calculating all thepossible minimal underspecications of a system. It relies on a massive test of acceptabledescriptions with subsets of features (tens of millions for the largest system) (Coupé et al.2011, p. 2). The Feature Economist algorithm described in Section 5.4.4 generates around100,000 feature subsets over the course of analyzing a single inventory. Part of the reasonthat Coupé et al. generate so many more sets is because they use a massive 100 features,whereas the feature system used in PyILM has only 19 features.It is not clear why the authors chose to look for every possible minimal underspec-ication. For the purposes of feature economy, it is not necessary to know how manyunderspecied feature sets can be used to contrast an inventory. It is only necessary toknow how small the smallest of these sets is. It also does not matter if there are multi-ple smallest possible sets, because we only need the cardinality of the set to do a featureeconomy calculation. The specic members of the set are irrelevant.In absence of a full description of the methods in Coupé et al. (2011), and given the moredetailed results in Mackie and Mielke (2011), it can be concluded that natural languagesinventories are more economical than would be expected by chance alone. That is, featureeconomy is a property of phonological inventories of human languages.4.3.4 Explaining economyVery little work has been done examining the diachronic origins of feature economy. Vir-tually all of the existing literature relates to the synchronic phenomenon, although veryrecently some work has appeared that attempts to address diachrony.1194.3.4.1 A computational modelPater and Staubs (2013) appear to be the rst to attack the issue, which they call anunsolvable problem for phonology. They present an agent-based computational model forthe emergence of feature economy. The model uses only two agents, and a small articiallanguage. The language consists of 6 meanings, and each meaning is paired with a set of3 possible phonological forms. Meaning 1 and Meaning 2 can be pronounced as any of[pi], [bi], [phi], Meaning 3 and Meaning 4 can be pronounced as any of [di],[ti],[thi], andMeaning 5 and Meaning 6 can be pronounced as any of [gi],[ki],[khi]. Learning agents useconstraints that map meanings to phonological forms. For example, a constraint could beM) → [px], which should be read as Meaning 1 is associated with the surface form [pi].Constraints are ranked and violable, and agents produce language by selecting a meaning,then selecting the surface form demanded by the highest ranked constraint that referencesthe selected meaning.The constraint ranking is not given in advance, but learned by each agent. Learningagents are only given a surface form, and must infer the meaning of the word on their own.Agents use a technique called Robust Interpretive Parsing (Tesar and Smolensky (1998)) todo this. This learning technique can be simply summarized as follows: The listener updatestheir grammar, by promoting and/or demoting certain constraints, if the pronunciation itwould have chosen for the inferred meaning does not match the one in the observed output.For example, suppose the speaker produces [bi] with the meaning M1. The learner hears[bi], but does not know the meaning. There are only two possible meanings for [bi] in thissimulation (M1 or M2), and the learner will guess one of them. Let us say the learner picksM2. She then checks what her production grammar would generate for M2, and if thatword is [bi], the learner promotes the constraint M2→ [bx] (because it matches the input,even though she is wrong about the meaning). Then the learner demotes the constraintthat has the same output, but the the alternative meaning. In this example, it would beM) → [bx] that gets demoted. If the grammar would have generated something dierentfor M2, like [pi], then learner should demote M2→ [bx].This has the eect that a simulated lexicon will come to have categorical pronunciationsfor words, i.e. meanings will be consistently paired with single pronunciations which dierfrom the pronunciations of other meanings. The alternative outcome would be lexiconswhere every meaning has a multiple possible pronunciations and one is chosen with acertain probability on each utterance.In fact, categorical pronunciation was the result in 9979/10000 simulations. Giventhe parameters of the simulation, there are really only three ways of achieving categoricalpronunciation, because there will always be two words with a labial (Meanings 1 and 2),two words with a coronal (Meanings 3 and 4), and two words with a velar (Meanings 5 and6).The rst possibility is that the nal lexicon can be contrasted using these place featuresand only one other feature e.g. the lexicon is [pi, bi, ti, di, ki, gi], where [voice] is the120Table 4.3: Feature economy eects in Pater and Staubs (2013)only contrast used at each place. As far as feature economy is concerned, this is the mosteconomical outcome because of the low number of features required.The second possibility is that one feature will be used for contrast at one place ofarticulation, and a dierent feature will be used at the other two places. This is lesseconomical than the rst possibility, and an example lexicon would be something like [pi,phi, ti, thi, gi, ki]. This set requires the place features, in addition to the features [voice]and [aspirated], which are not used to their full potential (velars are the only set to use[voice]).Finally, the third outcome, which Pater and Staubs consider to be the least economical,is one where each place would have a dierent feature contrast, e.g. [pi, phi, ti, di, gi, khi].It is not clear why three contrasts are considered less economical than two, given thevery limited parameters of the simulation. Be there 2 or 3 contrasts, the inventory stillrequires both the features [voice] and [aspirated]. As far as calculating economy goes, bothkinds of inventories would score the same, because they both have 6 segments and 3 features.In cases of 1 or 2 contrasts, segments at the same place of articulation dier by onlyone feature. What makes the case of 3 contrasts dierent is that there will be one pairof segments that dier by both [voice] and [aspirated], e.g. /khi/ vs. /gi/. The inventoryis highly restricted, so the 3 contrast cases are unfairly penalized. There is no way for[+voice, +aspirated] sounds to occur in this simulation, so there's no way for /khi/ to geta matching /ghi/ and increase economy.Results from Pater and Staubs' simulations are shown in Table 4.3. The number ofcontrasts refers to the number of features required beyond the place features. An exampleof a 1 contrast outcome would be one where the nal learner's lexicon is [pi, bi, ti, di, ki,gi], so that [voice] is the only necessary feature. This is also the most economical outcome.By chance, this outcome is only expected about one tenth of the time, but it occurred oneninth of the time in the simulations. Pater and Staubs interpret this as a preference foreconomy.The authors argue that this shows how feature economy can emerge without the needfor any constraints that specically encourage it. Instead, feature economy is the result of121learners preferring categorical pronunciations for words. This tendency in turn is a resultof using Robust Interpretive Parsing. This model is an interesting attempt to account forfeature economy from a diachronic perspective, but it falls short in several respects.First, the results are not very generalizable. They rely on some very strong assumptions,namely Robust Interpretive Parsing, and a highly specic constraint set. The constraintsare also very unusual for phonology in that they map meanings to sounds directly. Typicalphonological constraints refer only to phonological (and perhaps morphological) material.This leads to a kind of non-Saussurean model of language. Sounds and meanings have anecessary connection, rather than an arbitrary one; agents know that if the word has alabial in it, then it must either convey Meaning 1 or Meaning 2, and the same with coronaland velar sounds, mutatis mutandis.Second, sound change is not realistically modeled. What happens is that the pro-nunciation of words shifts randomly over time, within a small dened space of possiblepronunciations. Meaning 1 may start out as [pi] and turn out as [bi] on one run of thesimulation, [phi] on another but stay [pi] on another. This is not a true reection of howsound change happens. The probability of /p/ voicing to [b] in word-initial position beforea high front vowel is not the same as the probability that it aspirates in this position, noris there an equal chance of it remaining a [p]. The most likely outcome is aspiration, sinceaspiration tends to be longer before high vowels (Klatt 1975). There is also no possibilityfor changes to cross the place of articulation boundary due to the way that constraints work(e.g. Meaning 1 could never be paired with /ta/ or /ka/).The role of sound change in feature economy is an important issue to address. How canlanguages tend toward economy, all the while undergoing (phonetically-motivated) soundchanges?4.3.4.2 Whistle experimentsA very dierent sort of experiment that attempted to shed light on feature economy wasundertaken by Verhoef and de Boer (2011). In this experiment, participants were taught12 dierent signals on a slide whistle. The slide whistle was chosen because it allows forthe production of a large number of sounds, requires little training, and it is unlikely thatthere would be interference from the participants' rst language.Participants underwent a very short training session, hearing each signal only 4 times.At the end of the training session, participants were required to play all 12 signals to the bestof their abilities. If they could not recall them correctly, then they were to play whateverthey could remember. Learning was entirely unsupervised, meaning that participants werenever told if they were correctly or incorrectly reproducing a whistle.The experiment begins with a single participant, who learns from some whistles recordedby the experimenters. The whistles that are produced as responses during the testingsession are recorded and used as the target whistles for the second participant. The secondparticipant is unaware that their learning data was produced by a previous participant.122Figure 4.21: Example of whistle recombinations from Verhoef and de Boer (2011, p. 2)The process is repeated with the second participant, and their responses are provided tothe third participant, and so on.Errors were common during the testing stage, but participants were not corrected,and these errors were passed on to the next participants. The nal set of whistles inthe experiment looks quite dierent from the initial set. This alone is unsurprising. Theinteresting outcome is that whistles produced at later points in the experiment seem toconsist of sub-parts that are re-used across multiple whistles. Verhoef and de Boer wereable in some cases to trace back the ancestry of some of these sub-parts, as shown in Figure4.21.The reason that the re-use of sub-parts occurs is because the task of remembering 12arbitrary whistle signals, with only 4 exposures each, is dicult. Participants often forgotwhat a signal was, but were forced to produce a signal regardless, so they simply playedanything at all that they could remember (i.e. they played parts of other whistles theyactually did learn correctly). This has the eect of introducing signals with shared sub-parts into the input to the second generation, even though those similarities did not existin the input to the rst generation.The second participant will also not learn all 12 signals, and will rely on the samestrategy of playing anything they do remember. This introduces more subwhistles into theinput, or possibly spreads the previous subwhistle to new signals. By the end of a chain oflearners, it appears as though there is a set of sub-parts that can be recombined to make asignal, though that was not a feature of the initial set of signals in the chain.Verhoef and de Boer speculate that these subwhistles are analogous to features in nat-ural language, and that what happened in the experiment parallels the development offeature economy: the formation of building blocks here does not resemble the simplest dis-persal models, but is more reminiscent of the `Maximal Utilisation of Available DistinctiveFeatures' principle proposed by Ohala or `feature economy'. If a building block is present,123it tends to get mirrored and reused before new ones appear (Verhoef and de Boer 2011, p.2069).The results depend on the participants at some point failing to learn the signals prop-erly, and inventing something new (which contains old, re-used parts). One reason thatparticipants had diculty remembering whistles is that they are completely arbitrary se-quences of sounds. This is quite unlike natural languages, where phonemes are meaningless,but sequences of phonemes (i.e. morphemes) are meaningful. Perhaps if the whistle signalswere paired with meanings, then the task of learning them would be easier.This was addressed in a later experiment in Verhoef et al. (2013). In this study, theauthors trained participants to learn signals on a slide whistle that were paired with imagesof objects. In one condition, the whistles produced by one participant were used to trainthe next participant, exactly as in the previous study. This was the intact condition.In the scrambled condition, the experimenters randomly re-associated whistles from oneparticipant with dierent objects, and the next participant was trained on this pairing.Verhoef et al. (2013) found no signicant dierences between the groups. The nal setsof whistles in both conditions showed the same kind of re-use of sub-parts. One interestingdierence that the authors noticed was how soon this re-use started to occur. For this theycreated a method for segmenting the whistles and calculating entropy. Entropy is a measureof uncertainty, and in this case entropy is used to measure dierences between whistles -high entropy values indicate whistles that are less similar to each other (see Verhoef (2012)for details on how this calculation was done). They conclude: the main `drop' in entropyin the intact condition took place approximately from generation four to eight, while in thescrambled condition, this was sooner, approximately from generation one to ve (Verhoefet al. 2013, p. 3674).This experiment is quite interesting, and it is in line with previous results in the iter-ated learning literature. Early results came from Kirby (2000) who produced agent-basedcomputational models for the emergence of compositional syntax from non-compositionallanguage (see discussion in Chapter 1, section 1.2).Still, the whistle experiments do not directly touch on speech production or perception,so it is dicult to say if the results apply to language or not. The crucial claim is that thesub-parts of a whistle are comparable to features in language. However, it seems like Verhoefand de Boer (2011) are intending each whistle signal to be analogous to an entire word.If this is the case, then re-use of sub-parts makes those sub-parts more like consonantsor vowels, not features, so talk of economy or maximal re-use of features is not entirelyrelevant.Even treating the entire whistle signal as one very long sound, there is the problemthat these re-used sub-parts are sequential. In natural language, phonological features co-occur with each other. For example, a voiced fricative like [z] might be identied as being[+voice] based on F0 and [+cont] based on aperture in the oral tract. These articulatoryevents occur more or less at the same time. It is not the case that in the production of [z]there is a period of voicelessness followed by a period of continuancy. On the other hand,124this is what happens in Verhoef et al. (2011): the features of a whistle that were identiedby the authors are always sequential portions of the signal.In Verhoef et al. (2013), non-sequential features were reported. For instance, the samepitch shape would be used in two dierent words, but in one the pitch was smooth and inthe other it was broken up into smaller parts. Unfortunately, the authors make no furthermention of this, and in their analysis of the entropy of the whistle signals they seem to haveonly considered sequential portions.These experiments also avoid discussing the topic of sound change entirely. Changeoccurs in the experiment when speakers cannot remember how to produce something, andtheir errorful production is relearned by the next participant. This is quite unlike howsound change occurs in natural language, where it is driven by articulatory or perceptualfactors, not memory recall errors.4.3.5 Hypothesis #3 - Sound change and feature economyI propose a dierent approach to this origin of feature economy. I do not think that featureeconomy functions as any kind of basic organizing principle that all languages are boundto achieve. Instead, I suggest that feature economy is emergent from the way that soundchange operates. In particular, it derives from the fact that sound changes aect phoneticproperties (features), not specic sounds. For example, a change such as nal devoicingis one that aects the voicing of obstruents in a particular environment (even if not allobstruents are equally likely to undergo the change). It is not a change that literallyturns /b/ into /p/ and /d/ into /t/, etc. Thus, sound change has the eect of creating anew class of sounds that diers by exactly one feature (whichever feature was aected bymisperception) from an existing class of sounds, and this tends to lead to an increase ineconomy.The key word here is tends. Languages could be considerably more economical thanthey actually are. There are many conceivable, highly economical inventories that are notknown to exist. Perfect economy is rarely achieved because of other factors. One such factoris that not all members of a set are equally aected by misperception. For example, Papeet al. (2003) found that word-initial [b] was much less prone to devoicing than word-initial[g], although this also depended on the quality of the following vowel.Another factor is that some combinations of phonetic features are dicult or impossible.Glottal stops literally cannot be voiced, so if a language has a voicing contrast in its stopseries and also [P], then perfect economy is impossible to achieve. It is quite common forlanguages to have matching sets of voiced and voiceless obstruents, but it is extremely rareto nd voicing contrasts in sonorants. This is probably due to articulatory and acoustic-perceptual factors. In languages where voiceless sonorants are reported, phonetic analysisshows that they are at least partially voiced, see e.g. Ladefoged (1995) on voiceless approx-imants in Tee, and Dantsuji (1984) on voiceless nasals in Burmese. (However, these soundsare considered to be phonologically [−voice], which is what would matter for the purposes125of calculating feature economy.) This means that voiceless sonorants will be unlikely to sur-vive long-term cultural transmission, excluding them from most inventories. The fact thattheir presence in an inventory would increase economy does not outweigh the diculties inproduction and perception.Hypothesis #3Feature economy eects are emergent from the fact that sound change aects phoneticfeatures, rather than whole sounds. This creates the possibility that a new set of soundswill emerge in an inventory, all of the members of which dier from an older set of soundsby one feature. This in turn creates the appearance of economy in an inventory.In other words, economy is an outcome of sound change, but not something necessarilyfavoured by cultural transmission. Inventories with both high and low economy scoresare just as likely to be successfully retransmitted. It is inevitable that all inventories willeventually display some kind of economy eects because they all undergo sound change,which targets classes of sounds.This idea that feature economy is grounded in sound change would help explain theresults in Mackie and Mielke (2011). Their randomly generated inventories displayed lowereconomy values compared to natural language inventories because the randomly-generatedones have not been shaped by millennia of sound change.4.4 SummaryThis chapter examined natural language inventories from three dierent perspectives: thenumber of sounds, the types of sounds, and the organization of these sounds. In each case,I proposed a hypothesis related to sound change.In the case of inventory size, the most relevant empirical fact is a correlation betweeninventory size and phonotactic complexity reported by Maddieson (2007). Languages withmore restrictive phonotactics (e.g. languages allowing only CV syllables) tend to havesmaller inventories, while languages with more complex phonotactics (e.g. allowing upto CCVCC syllables) tend to have larger inventories. In Hypothesis #1 I proposed thatthis correlation might actually be causation: since misperceptions are, for the most part,context-sensitive, languages with more complex phonotactics have more diverse phoneticenvironments and hence a greater variety of sound changes can take place in such a language.Over a long period of time, languages with CCVCC syllables will tend to develop largerinventories than languages with CV syllables simply because a greater variety of soundchanges can occur in a CCVCC language.This chapter also discussed the typological generalization that small and large inven-tories share a certain number of basic consonants, and as inventories grow they gain more126complex consonants. This was rst reported by Lindblom and Maddieson (1988) who ex-amined the inventories of UPSID. In this chapter, I replicated their ndings using theinventories of P-base (Mielke 2008). Hypothesis #2 proposed that this could be explainedas a result of the interaction between context-free and context-sensitive sound changes.The context-free changes could account for the common simple sounds found in nearly allinventories, while the context-sensitive changes could be responsible for the rarer soundsfound only in larger inventories (which, by Hypothesis #1 are large due to more complexphonotactics).Finally, this chapter looked at the issue of feature economy. This refers to the tendencyfor languages to minimize the ratio between the number of segments in the inventory andthe number of feature required to contrast all the segments . Work by Clements (2003, 2009)and Mackie and Mielke (2011) strongly suggests that feature economy is a real property ofphonological inventories. Hypothesis #3 proposes that feature economy is the result (butnot the goal) of sound change. This is due to the fact that sound change targets broadclasses of sounds, and it not specic to individual segments.These hypotheses rely on some assumptions about sound change that were discussedin Chapter 1. In the following chapter, these hypotheses are tested using simulations fromPyILM.127Chapter 5Simulating inventory evolution5.1 IntroductionThis chapter uses simulations from PyILM to test the hypotheses proposed in the pre-vious chapter regarding cross-linguistic tendencies in inventory structure. In the rst setof simulations, I look at how total inventory size grows over time, in particular how thephonotactic patterns of a language aect this growth. The second set of simulations demon-strates how the size principle of Lindblom and Maddieson (1988) can emerge through soundchange. The nal simulations look at the issue of feature economy (Clements 2003, Mackieand Mielke 2011) and how it changes over time. In all three cases, it will be shown thatthe observed typological generalizations emerge from cultural transmission, given certainassumptions about sound change.One important caveat to the results presented here is that the simulation parametersneed to be manually tuned. For instance, the similarity threshold of a learning agentis artically set in order for the simulation to have a normal looking outcome. If thethreshold is too low, then inventories collapse into a single category almost immediately,if it is set too high then the size of the inventory explodes. The frequency and salienceof misperceptions and biases are also somewhat arbitrary. For discussion on how dierentparameters settings aect a simulation, see Chapter 3. For the complete list of parameters,and their default values, see Chapter 2, Section 2.2.2. Ideally, some of these parameterscould be learned by agents, but doing so would be a computational challenge that fallsoutside the scope of this dissertation.5.2 Inventory sizeIn Chapter 4, I provided a hypothesis on inventory size, which I repeat here.128Table 5.1: Conguration for testing phonotactic eects on inventory sizeHypothesis #1 Inventory size is tied to phonotactic complexity, since sound change ispartly context-sensitive, and phonotactics denes the set of possible contexts in a language.Languages with more permissive phonotactics should tend to eventually develop largerinventories than those with more restrictive phonotactics.This hypothesis can be tested through simulation by running multiple simulations withthe same starting conditions, varying only the phonotactic restrictions. I begin with asimple illustration of three simulations. Table 5.1 shows the conguration details. Followingalong the lines of Maddieson (2005), I choose three conditions: the simple phonotactics aremaximally CV, the moderate phonotactics are maximally CVC and the most complexpermit up to CCVCC. It is important to note that these syllable shapes are the maximumpossible, not the only ones possible (see Section 2.2.2.8 on the phontactics parameter formore information). Furthermore, words in these simulations can have as many as threesyllables. Multi-syllabic words can have any combination of possible syllables, e.g. in theCV condition a two syllable word could be any one of the following: CV.CV, CV.V, V.CV,V.V.The simulations are run with a set of misperceptions based on sound changes collectedfrom Hock (1991) on the assumption that these sounds changes could plausibly be explainedas misperceptions, in the sense introduced earlier in this dissertation (see Section 3.2 onpage 66). Without proving in each case that they are, I have tried to select sound changesthat are indicated to have occurred in numerous languages, and avoid sound changes thatare specic to just one language. There are eight misperceptions, listed in Table 5.1.Let us consider rst how the CV language could evolve over the simulation. To begin129with, not all of the misperceptions can possibly have an eect, because not all of the neces-sary environments exist. Indeed, there are only two possible environments for consonants insuch a language: they are either word-initial and prevocalic, or they are intervocalic. Fourmisperceptions can occur in these environments: stop aspiration and fortition in initialposition, labialization and lenition intervocalically.In contrast, the CVC and CCVCC languages will evolve dierently, because it is pos-sible for every misperception listed in Table 5.1 to occur (assuming, of course, that thelexicon contains items in which the appropriate segments occur in the relevant environ-ments). Nonetheless, there are still phonotactically-induced dierences in the probabilityof a misperception occurring. For instance, consider the post-nasal fortition change, whichrequires the environment [+nasal, −vocalic]_. In the CVC language this context only oc-curs in words with 2 or more syllables (because CC clusters can only happen at syllableboundaries) which reduces the total number of words in the lexicon where this contextmight occur. Consonant clusters are overall more likely in CCVCC languages because theycan potentially occur even in words that are only one syllable long.It is also worth noting that in CV languages, the number of misperception-triggeringcontexts is very large, relative to the total number of possible contexts. In fact, 100% of thecontexts in which consonants can occur are contexts in which some kind of misperceptioncan happen. In CVC and CCVCC languages, all possible misperception-triggering contextscan occur, but so can a large number of non-triggering contexts.This is of course partly due to the articial nature of the simulations, which includeonly a small number of misperceptions. It would be possible to run simulations with alarger number of misperceptions which cover a larger number of contexts, but the pointremains: the number of misperceptions that can occur in a simulation depends largely onthe phonotactics.This has an eect on how the size of inventories changes over time. To illustrate, supposea language with CV phonotactics has /p/ in the inventory at generation 0. Assuming themisperceptions listed in Table 5.1, all instances of this sound have the potential to become/ph/ when they occur word-initially, and to become /f/ when they occur between vowels.The original sound /p/ is not likely to survive an entire simulation. This results in a netgain of one phoneme: the inventory grows from /p/ to /ph, f/.In comparison, in languages with more complex phonotactics, such as CCVC, there areenvironments in which /p/ can continue to exist, in addition to the environments in whichit will undergo change. For instance, it might occur as the rst element in a complex onset,or as a coda consonant. In this case, the nal generation might still contain words with a/p/ in them. This results in a net gain of two segments, one more than with a CV language:the inventory grows from /p/ to /p, ph, f/.In other words, inventories grow as misperceptions create allophones, and allophonesbecome phonemes. This transition to phoneme, in PyILM, requires one of the followingtwo conditions to be true: (a) the allophone comes to dominate over the older phoneme ina context where misperception occurs; (b) the allophone appears in a new context through130the process of lexical invention.5.2.1 Simulation resultsFor a test of Hypothesis #1, I ran 50 simulations under each of the three phonotacticconditions (for a total of 150 simulations). Starting inventory sizes ranged from 10 to 20consonants which were generated by sampling uniformly at random (with replacement) fromthe pool of segment symbols in P-base. The results are given in Figure 5.1, which shows theaverage total size of inventories at each generation over 50 generations of simulation. AnANOVA was performed with nal inventory size as the dependent variable and phonotacticcondition as the independent variable. The results are signicant with F (2; )47) 5 )0:207,1 5 (:)11, p < 0× )(−7,The expected pattern emerges. The simpler phonotactics resulted in smaller inventories,while the more complex phonotactics resulted in larger inventories. All the languagesinitially undergo a large increase in inventory size as the various misperceptions take hold.The rises and falls in inventory size occur because of mergers and splits.Figure 5.1: Average inventory size for 50 simulations over 50 generations, across 3 dierentphonotactic conditions.Inventory size initially grows rather quickly, and this is due to the salience values andprobabilities assigned to the misperceptions. If misperceptions had been assigned lowerprobabilities, or had lower salience values, then the growth in inventory size would bedierent. The interaction between misperceptions and rate of growth was discussed insection 3.4.The simulation results, along with the typological correlation reported in Maddieson(2005, see also Section 4.1), strongly support the idea that inventory size is partially deter-131mined by the phonotactics of a language. The eect occurs because of the partly context-sensitive nature of sound change. Phonotactics denes the set of possible contexts in alanguage, which in turn determines the possible misperceptions that can take place. Morecomplex phonotactics means that a greater variety of misperceptions are possible, whichmeans there is the potential for a greater number of sounds to enter the inventory.This is not to suggest that phonotactics is the sole factor that determines inventorysize. There are clear counter-examples among the Khoe-San languages, which have theworld's largest inventories yet tend to have simple CV syllable structure. Other factorsare evidently at play. In particular, it is possible for phonotactic patterns to change overtime. This means that language can acquire certain sounds at a stage where they haveCVC syllables, and potentially retain those sounds after a later change to a CV syllablestructure. The simulations in PyILM make the simplifying assumption that phonotacticsare xed over time, and there are no sound changes that can alter them (e.g. no deletionsor epenthesis). See Oudeyer (2005a, 2005b, 2005c) for an example of a simulation of theevolution of phonotactics.Nonetheless, an explanation for inventory size based on phonotactics is more satis-fying than one based on population size (e.g. Atkinson (2011)), because it is grounded inphonologically-relevant eects of production and perception, factors that are far more likelyto inuence inventory size than just the sheer number of people using a language.5.3 Common consonantsSounds are not all equally represented in the inventories of the world. Some sounds areextremely common, such as /p/ or /m/ which are found in nearly every language, whileother sounds are more rare, such as /k'/ or /Ð/. As discussed in Section 4.2, there is arelationship between the size of the inventory, and the types of consonants that an inventorytends to contain. Small inventories generally consist of the most common sounds, while thelarger inventories tend to have all the common sounds, plus some more rare ones. Lindblomand Maddieson (1988) (and see also Maddieson (2011)) proposed that this eect is due tohow inventories grow over time, by rst populating a small space of neutral consonants,then gradually make use of more complex sounds.This superset eect is interesting because it is not an obvious outcome from amisperception-only view of sound change. If the contexts for misperceptions exist in thelexicon, then misperceptions should happen. The overall size of the inventory should notblock or facilitate a sound change, and the apparent complexity of the outcome should beirrelevant. How can this distribution of consonants be explained by blind sound change?The phonotactic eect discussed in the last section plays at least some role. As anexample, consider ejectives. These are relatively rare cross-linguistically and they are alsoon Lindblom and Maddieson's list of complex sounds, more likely to be found in largerinventories. The emergence of ejectives often requires a sequence of two consonants. For132example, post-nasal stops can develop into ejectives, as occurred in Zulu (Herbert 1985)where Proto-Bantu *p,*t,*k > ph,th,khexcept after nasals where *p,*t,*k > p', t', k'. Ohala(1997) argues that ejectives can emerge from a sequence of a plosive and a glottal stop,when the closure for the glottal stop overlaps with closure for the plosive, e.g. the sequence[k] + [P] can result in [k'].These are contexts that cannot occur at all in a language with simple CV phonotactics,but can occur at syllable boundaries in CVC languages, and even within a single syllablein CCVCC languages. Therefore, ejectives are more likely to emerge in CCVCC or CVClanguages, which are also more likely to have larger inventories, by Hypothesis #1.This only addresses half of the issue, which is the question of why inventories diversifyas they grow. It does not explain why small inventories tend to look similar to each other,or what happens as inventories shrink. To address this, it will be necessary to make somemodications to the way that misperception is modeled in PyILM.5.3.1 Misperception vs. biasOne problem with the current misperception model is that simulations can reach a pointwhere languages cease changing. After running a simulation suciently long it will be thecase that for any given context in the lexicon, the sounds that appear in that context willeither be (a) sounds from the 0th generation of the simulation that are unaected by anymisperception, or (b) sounds that are the result of any misperception that can occur in thatcontext.This can be visualized as a state diagram, as shown in Figure 5.2. This representspossible states of the inventory in an extremely simplied simulation with only three startingconsonants /b, d, g/. Assume there exists a single misperception of nal devoicing, andassume these sounds appear in word-nal position in the 0th generation. Each circle is apossible inventory, and arrows represent directions of change. The top green circle is theinitial state, and the bottom red circle is the only possible nal state. A change in stateoccurs when the devoicing misperception has changed a voiced stop into a voiceless one.Figure 5.2 is only a partial state diagram, since in an actual simulation run, the inventorycan enter some in-between states where the voiceless and voiced obstruent both existtogether, e.g. /p, b, d, g/. The in-between states inevitably end with the voiceless consonantwinning out over the voiced one, so I have excluded these states from the diagram for clarity.Additionally, the state diagram only represents how the stop system in nal position evolves.Outside of this context, the assumed misperception does not apply, so at any given state,the inventory will also contain whichever voiced stops are not in nal position.There is a single terminal state that, once reached, cannot be exited. There are nomisperceptions that can change a /p, t, k/ inventory back into a /b, d, g/ inventory. Sucha state is called an absorbing state. If the simulation is run for long enough, the languagewill eventually reach this state. In this simple example, there is a single misperception, sothere is a single absorbing state.133Figure 5.2: State diagram for word-nal obstruents in a simulation with nal devoicingIn a simulation with a richer set of misperceptions, the state diagram becomes morecomplicated. More than one absorbing state might exist. Given a large enough set ofmisperceptions, it may even be technically possible to have a feeding loop. Consider thesefour changes:A > B / _ CC > D / B _B > A / _ DD > C / A _When A changes to B before C, it creates the right conditions for C to become D. Thisin turn creates the right conditions for B to go back to being A. This causes D to turn backinto C, and we return to the original state, ready to loop again. However, this requiresan extremely specic set of misperceptions and an extremely specic lexicon, and theremust be no other misperceptions that could break the loop. Such a set of changes is notlikely to arise in natural language, or at least not commonly enough to play any signicantrole in modeling sound change. That is, we should not use PyILM to construct loops likethis because the outcome of such simulations will tell us little about the way that naturallanguages evolve.The possiblity of absorbing states seems very unnatural. All human languages areconstantly undergoing sound change. It would be desirable that languages in a simulationdo the same. Of course, at some point language change has to stop in a simulation because134simulations are nite. A more realistic goal is to have a simulation where languages at leasthave the potential to continue changing right until the nal generation.Absorbing states also make it impossible to simulate the right conditions for the phe-nomenon under discussion in this section. When inventories shrink over time, they mustshrink back towards a common set of sounds, otherwise the superset relations found inUPSID and P-base would not exist.Why do absorbing states occur in a simulation? It is because misperceptions are bothcontext-sensitive and asymmetrical. The probability of A > B / _C is not the same asthat of B > A / _ C. One of those probabilities is usually equal to zero, while the other isnon-zero. This pushes languages in a particular direction, without giving any way for thelanguage to return to the state that it used to be in.One way of addressing this, and allowing inventories to return to former states, wouldbe to make misperceptions symmetrical. For every A > B change, ensure there also existsa B > A change, in the same environment. Symmetrical misperceptions would be veryeasy to model in PyILM. A rounding misperception, such as [−round, −voc]  +.5round/ _[+round, +voc], could be coupled with an unrounding hypercorrection [+round, −voc] -.5round / _[+round, +voc]. With these misperceptions at play, the inventory of alanguage can potentially continue to change forever (i.e. until the very last generation ofa simulation), possibly bouncing back and forth between states.This may solve the problem of absorbing states, but it lacks empiral support. Thereare many kinds of sound changes that are clearly not symmetrical. For example, intervo-calic voicing of stops has been observed in numerous languages, but the reverse pattern ofintervocalic devoicing is vanishingly rare.Instead of modifying the symmetry of misperception, a better approach is to balanceout their context-sensitive nature by introducing context-free changes. As useful terms fordiscussion, context-sensitive changes will continue to be referred to as misperceptions, andcontext-free changes will be referred to as biases. The idea behind a bias is essentially thesame as a misperception: it is an articulatory or perceptual eect that interferes with thetransmission of sounds, and creates the potential for a learner to acquire a dierent set ofsounds than the speaker intended to transmit.This idea was proposed in the previous chapter as Hypothesis #2, repeated here:Hypothesis #2Common sounds exist because of context-free biases in transmission that aect all lan-guages, regardless of phonotactics. Rarer segments are rare because they require morespecic phonetic environments to appear, and these are more likely to exist in larger in-ventories, because larger inventories have more, and more dierent, phonetic contexts (byHypothesis #1).Biases, in contrast to misperceptions, are factors that always aect the production or135Table 5.2: Conguration for simulations comparing simple misperceptions and biasesperception of certain classes of speech sounds, regardless of where they occur. For instance,stops are biased towards being voiceless. This is because the conditions for voicing require acertain dierence between subglottal and supraglottal air pressures, which is more dicultto maintain when airow is stopped (Ohala 1983). This of course does not make voicingof stops impossible, it is just more likely that any stop, regardless of where it is produced,could be articulated as voiceless.This means that, all else being equal, voiced stops are constantly at risk of being misper-ceived as voiceless, because speakers might, at any point, fail to reach the right air pressuredierential for voicing to occur. The fact that certain conditions seem to encourage thiseven more (e.g. word or utterance nal position, Blevins (2006b)) compounds the likelihoodof voiceless stops being in any given inventory.Formally speaking, biases can be modeled in almost exactly the same was as mispercep-tions within PyILM. They have the same eect of changing the phonetic values of certainsounds, but they can occur in any context, rather than in a specic one. In the notation ofPyILM, a context-free misperception is indicated with a * for the environment.Just as with misperceptions in PyILM, biases are abstractions, and using them in asimulation is not intended to be an argument for the existence of any particular kind ofbias in real language transmission. Any proposal for a bias would need to be argued foron its own merits. Here, I will simply be assuming the general existence of biases in orderto demonstrate a point: simulated inventories that are both small and large will sharesounds (namely, those which emerge through context-free bias), while larger inventorieswill additionally have rare or more complex sounds (namely, those which emerge throughcontext-sensitive misperception).To demonstrate, I will rst describe two simple simulations, each with a single misper-ception, and a single bias, so that their interaction is easier to follow. Table 5.2 shows theconguration for this example.For one simulation, the starting inventory was set to /p, t, k, i, u, a/ and for theother simulation the starting inventory was /b, d, g, i, u, a/. In other words, each of the136simulation conditions started with either a set of stops preferred by bias or the set of stopspreferred by misperceptions.It took several test runs of PyILM to decide on probabilities and salience values for thechanges. The context-free nature of biases means that there are more opportunities perutterance for a bias to inuence speech than for a misperception to do so. If the bias proba-bility is set too high, it can overpower a misperception, and lead to an absorbing state. Theidea, then, is to model biases as frequent but weak eects, while misperceptions are strongbut less frequent. In this particular case, the bias is twice a likely as the misperception,but its eect is small enough on a given utterance that it probably will not change whichcategory of sound is understood by the listener. The eect is felt only over many exposuresto tokens of a category aected by bias. Misperceptions occur half as often as biases, buttheir eect is strong enough that a listener will probably perceive a categorically distinctsound.With the right balance, inventories produced by this kind of simulation will never stopchanging, unlike inventories in simulations with only misperceptions. The specic proba-bilities and salience values for biases and misperceptions will have a strong eect on howfrequent and how abrupt the changes are (see Chapter 3, sections 3.2 and 3.3 for morediscussion on these parameters). Table 5.3 depicts the consonant inventories for a selectnumber of generations from one simulation. Figure 5.3 shows continual changes in inven-tory size over 100 generations of two simulations, starting with either voiced or voicelessconsonants.The constant change in inventory size comes from the interplay of the bias and themisperception. Assume an inventory that begins with only /p, t, k/. The misperceptionwill create voiced stops between vowels, which increases the size of the inventory. The biasreduces the voicing in all contexts, so some of these intervocalic voiced stops can mergeback with voiceless stops, decreasing the inventory size. The reverse occurs in a simulationthat begins with /b, d, g/. Some of these will devoice due to the bias, increasing inventorysize, while the intervocalic voicing misperception can cause the voiceless sounds to mergeback with the original voiced categories. This can be seen specically for /g/ in Table 5.3.No /g/ exists in Generation 10. By generation 25, /g/ has emerged as an allophone of /k/.It becomes a full phoneme by Generation 50, then disappears again by Generation 70.5.3.2 Simulation resultsIn order to test Hypothesis #2, I ran 120 simulations, with both biases and misperceptions.Some, but not all, balance each other out. For example, I created a labialization misper-ception, then an anti-labialization bias. These ideas were roughly based on the descriptionsof Set 2 and Set 3 consonants in Lindblom and Maddieson (1988). In addition, I createda small number of misperceptions and biases that do not counteract each other. The fullset is listed in Table 5.4. As with the Hypothesis #1 test, simulations were divided evenlybetween CV, CVC, and CCVCC phonotactics (40 each), each with randomly generated137Table 5.3: Example of individual inventories in a simulation with misperception and bias,starting from only voiceless stopsFigure 5.3: Change in inventory size for two simulations, one starting with voiceless stops,one with voiced stops138Table 5.4: Misperceptions and biases for testing Hypothesis #2starting inventories, consisting of anywhere from 8-12 consonants, and 3-5 vowels, selectedat random from P-base.Each simulation ran for 50 generations, and the nal inventory of each was collected.The expectation is that context-free biases will be responsible for a set of sounds foundin most inventories, while the context-sensitive misperceptions are what lead to more raresounds in larger inventories.The segments of the nal inventories were therefore roughly categorized this way: soundsthat are the possible outcome of a bias were counted separately from the others. Forinstance, there is a bias against retroex stop consonants. Retroex is represented as[−ant, −distr, −cont, +con, −son, −voc] in PyILM. The bias aects anything with thatfeature set, and raises the [ant] value, thus a segment marked [+ant, −distr, −cont,...] is apossible outcome of a bias, and would be agged.The initial inventories of the simulation were generated by sampling uniformly at ran-dom from the set of all possible biased and other (non-biased) sounds. The total numberof biased segments is much smaller than the total number of other segments and so the ini-tial inventories tended to have a high proportion of non-biased sounds. If sound change hadno eect on the relative proportion of segment types, then the expectation would be for theinventories of the nal generation to also have a greater proportion of non-biased sounds.As Figure 5.4 shows, however, this is not what happens. Instead in smaller inventories, thenumber of biased segments is sometimes equal to or greater than the other segments.139Figure 5.4: Biased and non-biased sounds in the nal simulated inventories.This parallels the relationship found in natural language inventories. Lindblom andMaddieson (1988) used the metaphor of a magnet and a rubber band to explain this. Thesesimulations suggest that the metaphor can replaced by more concrete notions, namely thatcontext-free biases are the rubber bands drawing languages toward common sounds, andthe context-sensitive biases are the magnets, pushing inventories to expand into dierentregions of phonetic space.5.4 Feature economyThis nal section of the chapter turns to the topic of feature economy (Clements 2003, Hall2007, Mackie and Mielke 2011, see also discussion in section 4.3.2). Economy can be calcu-lated in several ways, but it is essentially a measurement of how many phonemes exist inan inventory, relative to the number of phonological features required to keep all phonemesdistinct. A discussion of various economy metrics was presented in Section 4.3.2.1. Naturallanguage inventories have been shown to be more economical than randomly generated setsof segments (Mackie and Mielke (2011)). In the previous chapter, I introduced a hypothesisabout the diachronic origin of feature economy, which will be tested through simulation inthis section. I repeat the hypothesis here:Hypothesis #3Feature economy eects are emergent from the fact that sound change aects phoneticfeatures, rather than whole sounds. This creates the possibility that a new set of sounds140will emerge in an inventory, all of the members of which dier from an older set of soundsby one feature. This in turn creates the appearance of economy in an inventory.This hypothesis does not predict economical inventories to be favoured by cultural trans-mission. Greater economy does not mean greater learnability. Misperceptions and biasesare a more powerful force. For instance, it is common to nd voicing contrasts amongobstruents, but this is rare among sonorants, even though it would be much more econom-ical to use the [voice] feature across the entire consonant inventory. The articulatory andacoustic-perceptual diculties associated with voiceless sonorants outweigh any increase ineconomy that might result from adding them to the inventory. In other words, my claimis that economy emerges as a side-eect of sound change and since all natural languagesundergo sound change, all of them display some degree of economy.This section is organized as follows. First, I will describe the general relationship be-tween sound change and economy, and provide a simple simulation as illustration. Next, Iwill move to testing the hypothesis more directly. This will be done by comparing results ofsimulations run with misperceptions that aect classes of sounds, and simulations run withmisperceptions targeting individual sounds. If the hypothesis is correct, then economy willbe ultimately higher in cases where misperceptions are dened over classes.5.4.1 How economy can change over timeEconomy scores are all calculated using two values: S, the number of segments in the inven-tory, and F, the minimum number of features needed to contrast the inventory. Economychanges as S and F change. Economy scores are raised if either S increases without in-creasing F or else F decreases without decreasing S. Increasing both S and F an equalamount will result in a lower economy value, while decreasing both S and F and equalamount will result in an increase in economy. This is true regardless of which of the foureconomy metrics are used. Let us consider in turn how each of these values can change.S changes whenever a segment is added or lost. Sound are added to the inventorythrough phonemic split, when misperception changes an existing sound in a particularcontext. In most cases, splits result in an increase to S. If a sound happens to have its(lexical) distribution strictly limited to contexts aected by a misperception, then the newsound completely replaces the old one, and S does not change. It is possible for S todecrease due to merger, when all instances of one category become instances of anotherexisting category.Note that sound change can occur without any change to S at all. For instance, supposethat an inventory consists of /b, t, d/, and all instances of /b/ devoice to /p/. This is nota merger, since /p/ was not already in the inventory, nor is it a split, because there areno instances of /b/ left behind. The resulting inventory /p, t, d/ is the same size as theoriginal inventory, so S is unchanged.How about changes to F? They come along with changes to the inventory. As far asthe simulation is concerned, it would be impossible for F to change without a change in141inventory, since F is determined using the Feature Economist algorithm (see section 4.3.3).The same input inventory will always result in the same number of output features. Fincreases if a segment is added to the inventory that requires a feature which previouslywas not necessary. For instance, if an inventory has only stops contrasting in voicing andplace, and a fricative is added, then F will increase as [continuant] becomes a necessaryfeature.Losing a segment may result in a decrease of F if this segment was in minimal contrastwith another segment that now stands alone. For instance, imagine an inventory with aseries of voiceless obstruents, only one of which has a voiced counterpart, e.g. /p, t, k, s,z/. The feature [voice] is necessary to contrast /s, z/ only. If /z/ were to disappear, thenthe need for [voice] as a feature is also lost, and so F and S both decrease (and the samewould be true if /s/ were lost instead).There is an eect that occasionally occurs in PyILM simulations, call it feature carry-over, which can result in a decrease in feature economy. Suppose there is an inventorywith six segments /p, b, tj, d, f, s/. The feature economy of this inventory is relativelyhigh, as only three features are required to contrast everything: [voice, continuant, coro-nal]. The palatalization on /tj/ obviously involves another feature too, and realistically, aspeaker of such a language would need to have this additional articulatory information rep-resented somehow. However, for the purposes of simply working out the smallest numberof phonological features necessary to contrast these segments, only three are required.Suppose there is a sound change that increases the [continuant] value of a segment, e.g.stops become fricatives, and that through this sound change the /tj/ lenites to /sj/. Now,it is necessary to introduce a feature for palatalization in order to contrast /s/ and /sj/.This has an overall eect of lowering economy, because although S increases by 1, F alsoincreases by 1 (this is assuming that some instances of /tj/ do not undergo change; if every/tj/ becomes /sj/ then S would not increase, but F would, and economy would decreaseeven more). Changes that increase both S and F by the same amount will result in anoverall decrease in economy. This is simply due to the fact that for any inventory S >F, therefore an increase in S is proportionally less than an equal increase in F. Considera series of sound changes that each add one segment and one feature. Using the SimpleRatio measurement of economy, we would get this series of shrinking values: +=2 5 ):5,4=+ 5 ):+++ : : :, 5=4 5 ):25, 6=5 5 ):2, and so on.The simple ratio economy of /p, b, tj, d, f, s/ is E 5 6=+ 5 2 whereas adding in /sj/means E 5 7=4 5 ):75. This eect of feature carry-over can occur in natural languagechanges, for instance a rounded back vowel may front, carrying its roundedness with it,potentially creating a contrast with an existing unrounded front vowel.Change in economy over time is illustrated in Figure 5.5 for a hypothetical simulation.That is, all the values are constructed for the purposes of illustrating how change in economyhappens. No actual simulations were run to obtain these numbers. The top of the gureshows change in Simple Ratio, while the other three metrics (Frugality, Exploitation, andRelative Eciency) are shown at the bottom. This is because the latter three metrics are142Figure 5.5: Change in economy score for a hypothetical languagebounded between 0 and 1, while Simple Ratio has no upper limit, so they cannot easily beshown on the same scale.Each step along the x-axis is comparable to a generation in a simulation. At a glance,one can spot that the metrics do not increase or decrease in a uniform way over time.Consider just the change from the rst to the second generation. In the rst generation,the inventory is quite small, with only 5 segments and 3 features. At the second generation,both the number of segments and the number of features has grown. This results in SimpleRatio and Frugality both increasing, while Exploitation drops, and nothing happens toRelative Eciency. These dierences make sense, when we consider what each metric isactually measuring.Simple Ratio, as the name implies, is simply the ratio of segments to features. Since9/4 > 5/3, Simple Ratio goes up in the second generation. In fact, Simple Ratio increasessteadily until the 8th generation because each subsequent ratio is higher. In the eightgeneration, Simple Ratio falls because although the inventory size grew, it was not enoughto make up for the change in features. The inventory at the 9th generation would need tohave 38 segments in order to see a gain in Simple Ratio compared to the previous generation.Frugality is a measurement of how close an inventory comes to having the minimumnumber of features it could have, given its size. For S segments, this minimum number islog2S, rounded up to the next whole number. Frugality barely changes between the rst andsecond generation because in both cases this minimum number is achieved. For 5 segments,at minimum 3 features are needed, and for 9 segments at minimum 4 features are needed.Frugality is not a strict measure of this ratio however, since these inventories would score1.0 otherwise. Larger inventories actually receive slightly higher Frugality scores. This canbe observed by the way that Frugality changes between the second and third generation. In143both cases, the inventory requires 4 features and in both cases this is the minimum possiblefor an inventory of that size. These 4 features could potentially be used to contrast as manyas 16 segments, and so the third generation inventory with 15 segments scores higher thanthe second generation inventory with only 9 segments.Exploitation is, in a sense, the opposite of Frugality, and it measures how close aninventory comes to having the maximum number of segments for a given number of features.The maximum inventory size for F binary features is 2F, so for an inventory to remain highlyeconomical on the Exploitation metric, its size must increase by a power of 2 every timea feature is added. In the rst generation, with 3 features, it is possible to have as manyas 8 segments. The nal generation shown in Figure 5.5 has 8 features, so the inventorywould by then need to have 256 segments to reach a perfect Exploitation score. This meansthat generally speaking, sound changes that result in an increase in the number of featureswill tend to result in a decrease in Exploitation scores. There are only two examples ofExploitation increasing in Figure 5.5: in the third generation when the number of segmentsincreases without any change to the features, and in the eight generation when the numberof segments and the number of features both decrease.Relative Eciency looks at the minimum and maximum number of features required fora given inventory size, and assigns a score based on where an inventory falls in that range.The rst four generation all score perfectly on Relative Eciency because they all havethe minimum possible features. This should be contrasted with Frugality, which assigneddierent scores to these inventories based on their size. Relative Eciency falls in the fththrough seventh generations because the number of features rises to 5, while the inventorysize stays in a range that could potentially require as few as 4 features. When the numberof features drops in the ninth generation, Relative Eciency once again goes up.Another issue to consider is that not only does economy change over time, the rangeof possible scores changes as well. To understand why this is so, it is helpful to plot allpossible economy scores for a range of features and segments. This is shown in Figure 5.6for Simple Ratio and Figure 5.7 for Frugality. The gures show inventory size and thenumber of features on the x and y axes, while the z-axis shows the feature economy scorethat a language would have with that combination of segments and features. Not everypoint in space is lled, because it is not possible for certain combinations to occur. Aninventory with S segments needs at minimum log1b features (rounded up).One important dierence to note between Frugality and Simple Ratio is where theminimum scores lie for each feature value. For Simple Ratio, the minimum is the same.An inventory of S segments cannot possibly need more than S -1 features, so the minimumSimple Ratio lies just above 1.0 for all values of F.For Frugality, the lowest possible score actually varies with the number of features. Ifan inventory requires two features, then Frugality cannot go below 0.79. If an inventoryrequires three features, then the minimum score is now 0.6. At four features the minimumscore drops to 0.58, and so on.This means that sound changes requiring the addition of a new feature to the inventory144Figure 5.6: Range of possible Simple Ratio scoresFigure 5.7: Range of possible Frugality scores145have a greater impact on the Frugality score than Simple Ratio. For example, an inventorywith 25 segments and 5 features has a Simple Ratio score of 5.0 and a Frugality score of0.928. If a sound change occurred increasing the inventory to 26 segments, but at the costof adding a 6th feature, then both scores will drop. If the inventory can add just four moresegments, for a total of 30, then it would regain its old Simple Ratio score of 5. In the caseof Frugality, the inventory would need to balloon to 48 segments to equal its old score.5.4.2 An illustrative exampleIn this section, change in economy scores is illustrated using a simple simulation. The initialinventory was selected to be a simple set of consonants, mainly obstruents, with just threevowels: /b,d,g,q,f,z,x,m,n,i,u,a/. There were four context-sensitive misperceptions includedfor this simulation, and they only aect the obstruents in the language:Devoicing [+voice, −son, −cont] segments have their [voice] value reduced by .5 inthe environment of _# (p=.25)Lenition [−son, −cont] segments have their [cont] value increased by .5 in the envi-ronment of +voc_+voc (p=.25)Fortition [−son, +cont] segments have their [cont] value reduced by .5 in the envi-ronment of #_ (p=.25)Assimilation [−son, −voc, −voice] segments have their [voice] value increased by .5in the environment of +voice, −voc_ (p=.25)Every sound change was assigned a .25 probability, meaning that on any utterance of asound in an appropriate context, there is .25 probability that a given misperception occurs.Each misperception alters a token's feature values by .5, which is a large enough change invalue, given the parameters of the learning algorithm, to practically ensure that the tokensaected by sound change will be categorized as something dierent than tokens of the samesegments that go unaected by change.The phonotactics of the language are set to be maximally CVCC, and words can be oneor two syllables long. The phonotactics are such that it is possible for all four misperceptionsto occur at some point. Devoicing can occur in any C-nal word, Assimilation can occur inany CC-nal word. Lenition requires two vowels, which means that it requires a two-syllableword to be triggered. Fortition could happen in any word that starts with a fricative.At the end of the simulation, the feature economy of the language was calculated at eachgeneration, using the four metrics described in Figure 4.3.2.1. Feature economy, as originallydened by Clements (2003), measured the organization of the phonological inventory, notthe full surface inventory. Similarly in simulations with PyILM, only the core or underlyinginventory is considered. Allophones, that is sounds which occur uniquely as variants of146others and never on their own, are not counted. See section 3.2 in Chapter 2 for discussionof how allophones are identied in PyILM.Change in feature economy for this simulation is shown in Figure 5.8. Economy risesgradually over the course of the simulation, reaching a maximum of around 3.6 on the SimpleRatio measurement. Growth is not consistent. There are a few areas where economy hitsa plateau, and there are also times when it goes down.Figure 5.8: Change in feature economy for a simple simulationEconomy continues to rise, on all metrics, until just past the 20th generation. Theinitial increase is due to pairs of segments emerging from sound change. Generation 17represents the peak of the Simple Ratio scores, at 3.6. From here, economy dips and rises,but never falls lower than 3.2.The reason that economy never settles at a particular score is because some segmentsare in environments where they are subject to more than one misperception. For example,some voiced stops appear in nal position following another voiced stop. This means thatboth devoicing and voicing assimilation can both apply. A voiceless sound can thereforeappear through one misperception, only to be wiped out by the other.Relative Eciency evolves dierently from all of the others. In some cases, it stays at1.0, a perfect score, while other scores drop (especially around the 20th generation). This isbecause, for a given feature value, there is range of segment inventory sizes that will alwaysget a Relative Eciency score of 1.0. Specically, for an inventory of size S requiring Ffeatures for contrast, Relative Eciency is 1.0 if it is the case that 2F−0+) ≤ b ≤ 2F . Forexample, an inventory of 28 segments that requires 5 features for contrast will get a perfectRelative Eciency score because 23 + ) < 20 < 24. In fact, any inventory between 17 and32 segments will receive the same score.Practically speaking, this means that an inventory can shrink in size without aectingthe Relative Eciency score, so long as the number of features does not change. This is147just what happens in the simulation shortly after generation 20. The inventory had evolveda voicing distinction at every place of articulation for both stops and fricatives. Therewere 18 total phonemes (16 obstruents and 2 nasals), and the inventory required 5 featuresfor contrast. That is the minimum possible so Relative Eciency was 1.0. At a latergeneration, one of the voicing contrasts collapsed as a voiceless sound merged with a voicedsound. This reduced the inventory size to 17, but without any change to the contrastivefeatures. This caused Simple Ratio to drop, because064 <074 , but Relative Eciency wasunaected because 23 + ) < )0 < 24.5.4.3 Segment-specic misperceptions vs. class-level misperceptionsHypothesis #3 proposes that feature economy emerges over time because sound changes af-fect phonetic features, and hence classes of sounds, rather than targeting individual sounds.Importantly, the claim here is not that sound change aects classes of sounds simultane-ously. At any given generation, any number of members of a class might be aected, orperhaps none are, but the net result, over many generations, is that a class of sounds willhave undergone sound change. In other words, given enough time, sound change can cre-ate a new class of segments out of an old one, with the two classes diering by whateverfeature was aected in the sound change. This results in inventories with classes of soundscontrasting along particular feature dimensions, i.e. feature economy.To test this hypothesis, it is necessary to run simulations under two dierent conditions.In one condition, changes are dened over broad classes of sounds (the class-level condi-tion), i.e. the familiar kinds of misperceptions and biases already used in this dissertation.In the second condition, changes target individual sounds in an inventory (the segment-specic condition). This is in a sense like simulating two possible worlds, where soundchange operates in dierent ways. The class-level condition represents the actual world,while the segment-specic condition represents a hypothetical alternative world that wecan compare against. If Hypothesis #3 is correct, then inventories in the class-level condi-tion should generally have higher economy scores than those in segment-specic condition.It is important to distinguish between segment-specic changes fabricated for thesesimulations, and multiple instances of class-level changes aecting individual segments.Even though it is fairly clear that real sound changes aect broad classes of sounds, itdoes not mean that every sound in a class is equally likely to be aected. Consider thecase of a lenition-type change, which increases the continuancy value of a stop betweenvowels. Hualde et al. (2011) discuss how lenition of intervocalic stops in Romance languagesvaries with context, and between types of stops. From these ndings, we might decide toimplement several segment-specic lenition misperceptions, one for every plosive in aninventory. Each misperception would have a dierent salience and probability, and perhapseven somewhat dierent environments. That way, /b/ would be aected by lenition in itsown way, slightly dierent from how /d/ or /g/ might be aected.However, this would not be the right approach. All of these lenition changes have148identical outcomes, even if their triggering conditions are dierent. It would not matterwhether we dene lenition for each segment, or whether we dene lenition over a class ofsegments, because after a simulation has been running long enough, all stops between vowelswill have become fricatives (assuming there is no counter-acting bias). In other words, theseare not really segment-specic changes. They are instances of the same class-level change,with minor variations.Instead, to make something truly segment-specic, then not only would each mispercep-tion have to be dened separately over each individual segment, but the outcome of eachmisperception needs to be dierent as well. For example, all stops in intervocalic positioncan still be subject to change, but each stop would have a dierent feature aected: /b/might have its [nasal] value increased, and change into /m/ while /d/ might devoice to/t/. In this way sound change is actually targeting individual segments, and not (natu-ral) classes. To be clear, such changes are extremely unnatural, and they are only beingintroduced as a way of testing Hypothesis #3.In order to generate these kinds of misperceptions, I made a modication to PyILM. Thesimulations in the segment-specic condition are initialized with the same set of class-levelmisperceptions used in the other simulations. At the beginning of each generation, PyILMlooks through the inventory of the speaking agent, and checks to see if any of the class-levelmisperceptions could potentially apply. A misperception is considered to potentially applyif, for any segment, the set of features targeted by the misperception is a subset of thesegment's full feature specication.If a class-level misperception would apply, then PyILM generates a unique segment-specic one. The new misperception will have the same environment as the class-level one,and it will have the same probability of occurring, but it targets a set of features equalto the full specication of the segment in question. The outcome of the segment-specicmisperception has the same salience value as the class-level one, but it applies to a new,randomly-selected, feature. After checking the entire inventory, the simulation runs asnormal, but using these segment-specic misperceptions instead of the class-level ones.At the beginning of each generation, the old set of segment-specic misperceptions isdeleted, and a new set is created. This is done to ensure that class-level and segment-specic misperceptions have the same chances of applying in a given lexicon. Due to thehighly-restricted nature of segment-specic changes, it is generally going to be the case thatthey only apply once. Suppose there there is a /b/-specic misperception that changes /b/to /m/. Once this occur, that /b/-specic misperception will never do anything else in thesimulation, unless by sheer chance another segment-specic change has created a /b/ fromsomething else. In practice, this means that inventories will change only during the rstfew generations, and then never again. By creating new segment-specic misperceptionsat each generation, based on how class-level misperceptions would apply, the outcomes ofsimulations in either condition should be comparable.For example, suppose that a simulation has a starting inventory of /p, t, k, z, i, a/, andthe following misperceptions.149Devoicing [+voice, −son] → [−.5voice] / _#, p=.25Lenition [−son, −cont] → [+.5cont] / +voc_+voc, p=.25The Devoicing misperception could apply to /z/ and the Lenition misperception can applyto /p, t, k/ (whether they actually apply depends, of course, on the lexicon having the rightenvironments). In this case, PyILM would create one new misperception for each of /z/,/p/, /t/ and /k/. The segment-specic misperceptions could look something like this:z-Change [−voc, −son, +cont, +voice, −nasal, +cor, −ant, −strid, −lat, −back,−low, −high, −round, −distr, −glot_cl, −hi_subgl_pr] → [−.5back] / _#, p=.25p-Change [−voc, −son, −cont, −voice, −nasal, −cor, +ant, −strid, −back, −low,−high, −round, +distr, −glot_cl, −hi_subgl_pr]→ [+.5nasal] / +voc_+voc, p=.25t-Change [−voc, −son, −cont, −voice, −nasal, +cor, −ant, −strid, −lat, −back,−low, −high, −round, −distr, −glot_cl, −hi_subgl_pr]→ [+.5round] / +voc_+voc,p=.25k-Change [−voc, −son, −cont, −voice, −nasal, −cor, −ant, −strid, −lat, +back,−low, −high, −round, −distr, −glot_cl, −hi_subgl_pr] → [+.5cont] / +voc_+voc,p=.25These misperceptions have the same environment and probability as the original Devoicingand Lenition, but they target a set of features that are specic to one particular segment.They all have the same salience as Devoicing and Lenition, altering a feature value by ±.5,but which feature they aect is dierent in each case. As mentioned above, the aectedfeature is randomly selected.5.4.4 Calculating feature economyAfter running a simulation, the feature economy of the inventory at each generation wascalculated using the Feature Economist algorithm. The results are reported in the nextsection, and this section describes the details of how the algorithm works. In brief, ittakes a set of segments with full feature specications as input, and it returns the smallestnumber of features necessary to contrast every member of that set. The algorithm has beendescribed in Mackie and Mielke (2011), and a version of it appears in P-base (Mielke 2008). For this dissertation I wrote my own implementation.Calculating feature economy requires two numbers: S, the number of segments in theinventory I, and F, the smallest number of features from a feature set φ required to contrastall of the segments in I. For the purposes of calculating feature economy, two segments arecontrastive with respect to a feature set φ if they dier by at least one feature in φ. Themost useful mathematical tool for this is the concept of a combination. A k -combinationof a set A is an unordered subset consisting of k elements of A. The problem of nding150the smallest number of features from φ that are necessary to contrast the segments in Ibecomes the problem of nding the largest k -combination of features that are unnecessaryfor contrast.The number of k -combinations that can be drawn from a set of size N is written as(Nk), which is read as N choose k. This is equal toN !k!(N−k(! if N M k, otherwise it is equal to0.The Feature Economist algorithm begins with a pre-processing step where non-contrastive features are removed from φ. These are features for which every segment inthe inventory shares a value. For instance, if there are no laterals in the inventory, thenevery segment will be [−lateral]. This means there are no contrasts based on [lateral] sothis feature can be discarded immediately.The algorithm then goes through a loop of creating larger and larger k -combinations offeatures. For each k -combination, the algorithm removes that combination from φ, creatinga new subset φ′. A pairwise comparison of the segments in I is done to check if each pairis contrastive with respect to φ′. If not, that is if two segments become identical withoutthis particular k -combination of features, then the combination is added to a special list ofcrucial features. If contrast is still possible without this k -combination, then the set φ′is designated the final set (replacing any previous final set), and the algorithm carrieson to the next k -combination. When all k -combinations have been tried for some value ofk, then k is increased by 1, and the process of removing k -combinations repeats.If at any time the size of the final set and the value of k dier by 2, e.g. k 5 |final|+2,then the algorithm terminates. It terminates at this point because this dierence meansthat the algorithm has attempted to remove every single k -combination for some value ofk and did not succeed. Whenever some k -combination can be removed and contrast ismaintained, the contents of the final set are updated, and its cardinality becomes equalto |φ′|− k (the full set of features, minus the combination currently removed). After tryingall k -combinations, the value of k is increased by one. If no k -combinations can be removedfor this new value of k, then the contents of the final list will not change, and k will againincrease. At this point k 5 |final|+ 2 and the algorithm should halt.For each value of k, all possible k -combinations are generated. If the combination isfound to be a superset of any element of the crucial list, then it is skipped and theinventory is not checked for contrast. For example, if it was previously found that re-moving [nasal,son] left two segments identical, then there is no point in trying to remove[nasal,son,voice]. At rst, checking every combination slows down the algorithm, and oftenit needs to do a pairwise comparison of the inventory for every k -combination up to k=4.Around this point, the crucial list begins to ll in, and after this the algorithm runs muchfaster as it can immediately reject k -combinations, without the need to run the pairwisecomparison across the inventory.To help understand some of the numbers involved, here is an example of the algorithmat work. The consonant inventory [Pj, |h, bl, n,>kp, d” :, j˜, pSw, kw, H, f, n d r ] was ran-domly generated and given as input. The initial set of features consisted of 19 features.151The pre-processing step removed 2 non-contrastive features. Early in the calculation, mostof the feature possibilities need to be considered. At k=5, for example, the number of5-combinations is(064)5 6; )00 and the algorithm tried a pairwise comparison of the in-ventory for 5,374 of those combinations (87%). At the point where k=10, the number of10-combinations is(0600)5 )1; 440 but the algorithm only checked the inventory for contrastusing 3,165 of them (16%), because the rest were supersets of combinations that failedearlier. This saves more than 10,000 comparisons at this step. Eventually, the algorithmremoved 12 more features, thus a minimum of 5 features are needed to contrast this inven-tory.5.4.5 Simulation resultsFor each of the class-level and segment-specic conditions, I ran 90 simulations, for a total of180 simulations. Each condition was broken into three phonotactic groups: 30 simulationswith CV phonotactics, 30 with CVC phonotactics, and 30 with CCVCC phonotactics. Thisensures that a variety of dierent inventories will emerge, both in term of inventory sizeand contents.Starting inventories sizes varied as well, and each phonotactic group was broken in threesize categories: 10 simulations started with small consonant inventories, ranging from 8-15consonants, 10 started with medium-sized inventories between 20-40 consonants, and 10started with large inventories of 60-80 consonants. Every inventory had between 3 and 5vowels, although no misperceptions aect vowels so this set remained constant for the entiresimulation. Each simulation ran for 30 generations.The starting inventories were generated by sampling uniformly at random from P-base.Only 90 inventories were created, and these were shared across the two conditions. Addi-tionally, the 90 starting lexicons from the class-level condition were re-used in the segment-specic condition. In other words, for each class-level simulation, there was a segment-specic simluation that had identical starting conditions.Figure 5.9 shows change in average feature economy scores for the 90 simulations withclass-level changes, for all four metrics. Results for simulations with segment-specic mis-perceptions are given in Figure 5.10.Overall, there is an increase in economy on the Simple Ratio metric, in both types ofsimulations. The Exploitation metric, on the other hand, goes down in both. Changes inSimple Ratio and Exploitation can mostly be explained by changes in inventory size. Inven-tories generally grow in size over the course of a simulation. Larger inventories tend to scorehigher on Simple Ratio, and they tend to score lower on Exploitation. This relationshipbetween size and economy score was discussed in more detail in Section 4.3.2.1.To see if the dierences between the two simulation types were signicant, I t theeconomy scores from generation 30 to a linear regression model. The independent variableswere inventory size and misperception type (class-level or segment-specic). The dependentvariable was economy score. It is important to note that my choice to use the economy152Figure 5.9: Change in average feature economy for simulations run with class-level changesFigure 5.10: Change in average feature economy for simulations run with segment-specicchanges153Table 5.5: Results of two-way ANOVA with inventory size and misperception type aspredictors and economy score as dependent variable.scores from generation 30 is somewhat arbitrary. It is not clear that the economy scoreshave completely stabilized at this point, nor is it clear how many generations would besucient. Ideally, one would calculate a stationary distribution of inventories, but this ischallenging given the extremely large space of possible inventories that could evolve in asimulation.The model was calculated using the anova function from the R programming language(R Core Team 2016). Results are shown in Table 5.5. Misperception type has a signicanteect on economy scores when using Simple Ratio (F (); )6+) 5 )+:672; p < (:((+), Frugality(F (); )6+) 5 )(:640; p < (:((2), and Relative Eciency (F (); )6+) 5 5:176; p < (:(2).Exploitation is the only metric not to show any signicant eect of misperception type(F (); )6+) 5 ):)57; p 5 (:2)):::).Inventory size is a signicant factor for both Simple Ratio (F (); )6+) 5 +01:(4; p <2:2×)(−05) and Exploitation (F (); )6+) 5 )24:1)7; p < 2×)(−05), but it is not signicantfor Frugality (F (); )6+) 5 02:15+; p < 2:12×)(−05) nor for Relative Eciency (F (); )6+) 5(:652; p < (:4+). There is no signicant interaction between misperception and inventorysize for any of the metrics, except Simple Ratio (F (); )6+) 5 )+:672; p < (:(5).These results show that when languages are transmitted, via iterated learning, underconditions where sound change inuences classes of sounds, the resulting inventories aremore economical than they would be if sound change targetted individual segments. Thisis consistent with Hypothesis #3, which said that feature economy is an emergent conse-quence of the way sound change operates, as opposed to it being an inherent property ofphonological systems.In simulations with class-level misperceptions, economy scores generally rise becausesound changes are aecting classes of sounds, as predicted by Hypothesis #3. For example,suppose there is a simulation with an inventory with a set of three voiceless stops /p, t, k/ in154the initial generation. Suppose further that a class-level intervocalic voicing misperceptionis active. If all three voiceless stops occur between vowels somewhere in the lexicon, theneventually a set of three voiced stops will appear and the inventory will be /p, b, t, d, k, g/.Since these new stops are minimally dierent from the old ones, diering only by [voice],economy will increase because 3 new sounds are added at the cost of only a single feature.If the feature [voice] is already in use for sounds outside of this stop series, then the increasein economy is even greater, because three new sounds are added for free, without the costof an additional feature.In a simulation with segment-specic changes, the evolution of the inventory will bedierent. Assuming the same voiceless stop inventory of /p, t, k/, three new misperceptionswill be generated, each of which aects a dierent feature. It might be that /p/ becomes /f/between vowels, while /t/ becomes /th/ and /k/ becomes /k'/ in the same environment. Ifall three segment-specic change occur, then the inventory will not achieve greater economy.Three new sounds will enter the inventory at the cost of three new features.Toward the end of the simulations, Frugality and Relative Eciency rise slightly in thesegment-specic condition. This rise probably has to do with growth in inventory size.When the inventory grows large enough, the feature space is saturated and even changestargeting randomly-selected features will end up creating sounds that share a feature withan existing sound simply by chance.Additionally, it is possible for multiple segment-specic changes to have the cumulativeeect of class-level changes. For example, a class-level misperception that would haveaected /p, t, k/, might turn into segment-specic misperception that changes the [cont]value of /p/ in the rst generation, but not the [cont] values of /t/ and /k/. In the secondgeneration, there might be a segment-specic misperception that is generated which changesthe [cont] value of /t/ (but not /p/ or /k/), and in the third generation another might begenerated changing the [cont] value of /k/ (but not /p/ or /t/). A class-level misperceptionhas eectively occurred over the course of three generations. This is probably a rare event,especially in three consecutive generations, but it is plausible that this happens by chanceto at least some sound class over a long number of generations.Another factor is that the randomly generated segment-specic misperceptions can tar-get any arbitrary feature, creating combinations of features that would never occur in thereal world, and giving these changes perhaps an unfair advantage over the class-level oneswhich were more carefully crafted to avoid these unnatural outcomes. For example, it ispossible for a segment-specic change to raise the [high] value of a [−high, +low] sound.This could result in a [+high, +low] sound, which is a highly unrealistic sound that couldnonetheless increase the economy of an inventory.Overall, however, these result seem to generally support Hypothesis #3 because (a)economy has been shown to increase due to the way sound change operates and (b) class-level changes lead to inventories with higher economy scores, compared to segment-specicchanges.155Chapter 6ConclusionIn this dissertation, I introduced three hypotheses about how sound change shapes conso-nant inventories, and tested these hypotheses through computer simulation.The rst hypothesis was that inventory size is related to phonotactic complexity. Lan-guages with more complex phonotactics will tend to develop larger inventories, while lan-guages with more restrictive phonotactics tend to develop smaller inventories. This isbecause sound change is (mostly) context-sensitive, and phonotactics dene the set of pos-sible contexts in a language. Having more possible contexts in a language means that thereis a greater diversity of sound changes that could occur. As sounds introduced throughchange become phonologized, the inventory grows.This hypothesis was tested by running a large set of simulations grouped into dierentphonotactic categories. All simulations were initialized with randomly-generated inventoriesof the same size. The same set of potential misperceptions was used for each simulation. Theoutcome was that inventories of languages restricted to maximally CV syllables grew theleast. Languages with maximally CVC syllables grew into larger inventories, and the largestinventories were found among languages with CCVCC syllables. These results supportHypothesis #1.The second hypothesis concerned the frequency of consonants across languages. Soundsare not evenly distributed, and some are far more common than others. Small inventoriestend to be made up of just the most common sounds, while large inventories contain rareor unique sounds (Lindblom and Maddieson 1988, Maddieson 2011, see Section 4.2).The existence of cross-linguistically common sounds was hypothesized to be due to theexistence of context-free sound changes, which, by denition, apply in inventories of anysize. Smaller inventories tend to be made up primarily of the most common sounds, becausethey have limited phonotactic contexts (by Hypothesis #1). Large inventories have more,and more diverse, phonetic contexts, leading to the evolution of a more diverse array ofsounds.This hypothesis was tested in a way similar to the rst one, by running a large number156of simulations. Simulations were initialized with randomly generated sets of segments, andall simulations used the same set of biases and misperceptions. The outcome was thatsmaller inventories tended to contain mostly those sounds favoured by bias, and inventoriesdiversify as they grow.The third hypothesis was about feature economy, which is the tendency for inventories tomaximize the ratio between the number of segments and the (minimal) number of featuresrequired to contrast them (Clements 2003, Mackie and Mielke 2011). Hypothesis #3 statesthat feature economy emerges in inventories because sound change is dened over classesof sounds, rather than individual sounds. Over time, this produces inventories with sets ofsounds diering by only one feature, which is essentially what feature economy measures.Testing this hypothesis was done by running two kinds of simulations. In one, theprobabilistic biases that underlie sound change were dened to take scope over broad classesof sounds, in another they were dened such that they could only aect specic segments.Feature economy was calculated at each generation of these simulations. A linear regressionmodel, with inventory size and misperception type as predictors and economy scores asdependent variable, showed that misperception type had a signicant eect on economyscores for all metrics except Exploitation.These results lend support to the theory that typology is shaped by diachronic forces(e.g. Blevins 2004). However, in contrast to most of the existing research, which tendsto focus on specic sound changes, this dissertation has taken a higher-level approach bysimulating multiple interacting changes over many generations of language transmission.The results are also demonstrate how the concept of selection for learnability (Brightonet al. 2005) can be applied to the study of phonological inventories. The sounds that aninventory has are those which are most likely to be successfully retransmitted over time.This gives us a way of understanding certain properties of inventories in a non-teleologicalframework.The simulation software designed for this dissertation, PyILM, was built to be open-ended, and could be modied to potentially study other phenomena. There are severalchanges that could made to the code to increase its utility. For instance, PyILM was con-structed with the intention of studying the evolution of consonant inventories, but it couldbe extended to vowel inventories. Vowel systems were not included in the current studybecause the way that they change over time seems to be quite dierent from consonants.In particular, vowel systems show an eect of dispersion (e.g. de Boer 2002), where vowelstend to spread out over the available phonetic space. This is the opposite of the featureeconomy eect seen in consonant inventories, where a small number of features are re-used.Vowels are also known to undergo chain shifts, which occur much less often in consonantinventories. It would be ideal to update PyILM so that both vowel and consonant evolutioncan be simulated.There are also several improvements that could be made to the way that misperceptionsare modeled. Currently, only one feature at a time can be aected by misperception,but it would be useful to increase that number. Additionally, it would be convenient to157have features linked in some way, such that misperceptions targeting one feature wouldnaturally include another feature (e.g. a misperception that makes a consonant more orless nasal should also make it more or less sonorant).It would also be useful to diversify the changes that can be modeled as misperceptions.For instance, changes that result in metathesis may be the result of misperception (e.g.Blevins and Garrett (2004)) and could be modeled. Changes could also be non-local, andtarget segments further away, in order to simulate the evolution of harmony patterns.Modeling epenthesis and deletion would be very useful, and would have the biggestimpact on the results reported in this dissertation because of the potential to disrupt thephonotactics. In particular, the results reported for the tests of Hypothesis #1 and Hy-pothesis #2 both rely heavily on phonotactics as a main factor, and things may come outdierently if phonotactic patterns are not frozen. For example, suppose there is a languagewith strictly CV syllables, and suppose there is the possibility for vowel reduction/dele-tion. This means that a CVCV sequence could become a CCV sequence. This puts twoconsonants adjacent to each other, something that is normally impossible given CV-onlyphonotactics, and it creates the potential for misperceptions which would otherwise onlyapply in languages with more complex syllable structures.Many of the parameters in the simulations are xed ahead of time, and it would bean improvement if at least some of these values could be more exibly adjusted over thecourse of a simulation. In particular, it would be good to have misperception probabilitiesand salience values be aected by other factors. For instance, functional load plays arole in change, such that sounds which carry a higher functional load are less likely toundergo change (Bouchard-Côté et al. 2013) and avoidance of homophones may be a factorin inhibiting change (Blevins and Wedel 2009). The learning algorithm is another place forimprovement. Agents have a few parameters that are set by hand, such as the thresholdfor deciding if two sounds are distinct or not. Ideally, this is information that agents couldlearn from data.Finally, an improvement of a dierent kind would be to have more of a social environ-ment in the simulations. Currently, PyILM has only one speaker agent and one listeningagent per generation. Having a larger population would make it possible for other kinds ofsound changes to be modeled. For instance, in a larger population, pronunciations can beconsidered to be more or less prestigious, and agents can adopt or reject certain pronunci-ations based on social relationships.Despite the limitations of PyILM, it still produced interesting and useful results. This,to some extent, makes the results even more interesting. While it might be expected thatonly a simulation including linguistically signicant eects such as notions of contrast, trueallophony, or social factors, might be required, PyILM shows that even with very simpleassumptions, it is possible to simulate the emerge of phonological patterns in inventories.158BibliographyAbdel-Massih, E. T.: 1971, Tamazight verb structure: A generative approach, Indiana Uni-versity, USA.Atkinson, Q. D.: 2011, Phonemic diversity supports a serial founder eect model of languageexpansion from Africa, Science 332, 346349.Bandhu, C., Dahal, B., Holzhausen, A. and Hale, A.: 1971, Nepali segmental phonology,Tribhuvan University, Kirtipur.Bauer, L.: 2007, The linguistics student's handbook, Edinburgh University Press, Edin-burgh.Bermúdez-Otero, R.: 2007, Diachronic phonology, in P. de Lacy (ed.), The Cambridgehandbook of phonology, Cambridge University Press, Cambridge, pp. 497517.Berwick, R. C., Pietroski, P., Yankama, B. and Chomsky, N.: 2011, Poverty of the stimulusrevisited, Cognitive Science 35(7), 12071242.Blevins, J.: 2004, Evolutionary Phonology. The Emergence of Sound Patterns, CambridgeUniversity Press.Blevins, J.: 2006a, New perspectives on English sound patterns: "natural" and "unnatural"in Evolutionary Phonology, Journal of English Linguistics 34(1), 625.Blevins, J.: 2006b, A theoretical synopsis of Evolutionary Phonology, Theoretical Linguis-tics 32(2), 117166.Blevins, J.: 2007, Interpreting Misperception: Beauty is in the Ear of the Beholder, OxfordLinguistics, Oxford, pp. 144154.Blevins, J.: 2009, Another universal bites the dust: Northwest Mekeo lacks coronalphonemes, Oceanic Linguistics 48(1), 264273.Blevins, J. and Garrett, A.: 2004, The evolution of metathesis, in B. Hayes, R. M. Kirch-ner and D. Steriade (eds), Phonetically based phonology, Cambridge University Press,Cambridge, pp. 117156.159Blevins, J. and Wedel, A.: 2009, Inhibited sound change: An evolutionary approach tolexical competition, Diachronica 26(2), 143183.Bostoen, K. and Sands, B.: 2012, Clicks in south-western Bantu languages: Contact-induced vs. language-internal lexical change, in M. Brenzinger and A.-M. Fehn (eds),Proceedings of the 6th world congress of African linguistics, Vol. 5 of World Congress ofAfrican Linguistics, Köppe Verlag, pp. 129140.Bouchard-Côté, A., Hall, D., Griths, T. L. and Klein, D.: 2013, Automated reconstruc-tion of ancient languages using probabilistic models of sound change, Proceedings of theNational Academy of Sciences 110(11), 42244229.Breen, G. and Pensalni, R.: 1999, Arrernte: A language with no syllable onsets, LinguisticInquiry 30(1), 125.Brighton, H., Kirby, S. and Smith, K.: 2005, Cultural selection for learnability: Threeprinciples underlying the view that language adapts to be learnable, in M. Tallerman(ed.), Language origins: Perspectives on evolution, Oxford University Press, Oxford,pp. 291309.Butcher, A.: 1999, What speakers of Australian aboriginal languages do with their velumsand why: The phonetics of the nasal/oral contrast, in J. Ohala, Y. Hasegawa, M. Ohala,D. Granville and A. Bailey (eds), Proceedings of the International Congress of PhoneticSciences, University of California, San Francisco, pp. 479482.Chang, S., Plauché, M. and Ohala, J.: 2001, Markedness and consonant confusion asym-metries, in E. Hume and K. Johnson (eds), The Role of Speech Perception in Phonology,Academic Press, London/San Diego, pp. 79101.Chen, M. Y.: 1997, Acoustic correlates of English and French nasalized vowels, The Journalof the Acoustical Society of America 102(4), 23602370.Chen, N. F., Slifka, J. L. and Stevens, K. N.: 2007, Vowel nasalization in American English:Acoustic variability due to phonetic context, in J. Trouvain and W. J. Barry (eds),Proceedings of 16th International Congress of Phonetic Sciences, Saarbr ucken, Germany,pp. 905908.Chomsky, N. and Halle, M.: 1968, The sound pattern of English, Harper and Row, NewYork.Choudhury, M., Mukherjee, A., Basu, A. and Ganguly, N.: 2006, Analysis and synthesis ofthe distribution of consonants over languages: a complex network approach, Proceedingsof the COLING/ACL, COLING-ACL '06, Association for Computational Linguistics,Stroudsburg, PA, USA.160Clements, G.: 2003, Feature economy in sound systems, Phonology 20(3), 287333.Clements, G.: 2009, The role of features in phonological inventories, in E. Raimy andC. Cairns (eds), Contemporary views on architecture and representations in phonology,MIT Press, Cambridge, Massachusetts, pp. 1976.Colarusso, J.: 1988, The Northwest Caucasian Languages: A Phonological Survey, GarlandPublishing, New York.Cornish, H.: 2010, Investigating how cultural transmission leads to the appearance of designwithout a designer in human communication systems, Interaction Studies 11(1), 112137.Cornish, H., Tamariz, M. and Kirby, S.: 2009, Complex adaptive systems and the originsof adaptive structure: What experiments can tell us, Language Learning 59, 187205.Coupé, C., Marsico, E. and Philippson, G.: 2011, How economical are phonological inven-tories?, in F. Pellegrino, E. Marsico, I. Chitoran and C. Coupé (eds), Proceedings of theICPhS XVII, Walter de Gruyter, pp. 524528.Craig, C.: 1977, The Structure of Jacaltec, University of Texas, Bloomington.Cysouw, M., Dediu, D. and Moran, S.: 2012, Comment on `Phonemic Diversity Supports aSerial Founder Eect Model of Language Expansion from Africa', Science 335(6069), 657.da Silva, G. R.: 2014, Squib: A feature geometric analysis of Pirahã phonology and tonology(Mura), Rivista Linguistica 10(2), 120.Dantsuji, M.: 1984, A study on voiceless nasals in Burmese, Studia Phonologica 18, 114.Day, C.: 1972, The Jacaltec Language, Indiana University, Bloomington.Day, C.: 1973, The Jacaltec Language, Indiana University, Bloomington.de Boer, B.: 2000, Self-organization in vowel systems, Journal of Phonetics 28(4), 441465.de Boer, B.: 2001, The origins of vowel systems, Oxford University Press, Oxford.de Boer, B.: 2002, Evolving sound systems, in A. Cangelosi and D. Parisi (eds), Simulatingthe evolution of language, Springer-Verlag, London, pp. 7997.de Groot, A. W.: 1931, Phonologie und Phonetik als Funktionswissenschaften, Travaux duCercle Linguistique de Prague 4, 116147.Dell, F.: 1985, Les règles et les sons, revised 2nd edn, Hermann, Paris.Donohue, M. and Nichols, J.: 2011, Does phoneme inventory size correlate with populationsize?, Linguistic Typology 15, 161170.161Dresher, B. E.: 2003, The contrastive hierarchy in phonology, Toronto working papers inlinguistics 20, 4762.Dryer, M. S. and Haspelmath, M.: 2013, WALS Online, Max Planck Institute for Evolu-tionary Anthropology, Leipzig.Elbert, S. and Pukui, M.: 1979, Hawaiian Grammar, University Press of Hawaii, Honolulu.Everett, D. L.: 1986, Pirahã, in D. C. Derbyshire and G. K. Pullum (eds), Handbook ofAmazonian Languages 1, Mouton de Gruyter, Berlin, pp. 200325.Feldman, N. H., Griths, T. L. and Morgan, J. L.: 2009, Learning phonetic categoriesby learning a lexicon, Proceedings of the 31st annual conference of the Cognitive ScienceSociety pp. 22082213.Firchow, I. B. and Firchow, J.: 1969, An abbreviated phoneme inventory, AnthropologicalLinguistics 11, 271276.Fortescue, M.: 1984, West Greenlandic, Croom Helm Descriptive Grammars, Croom Helm,London.Garrett, A. and Johnson, K.: 2012, Phonetic bias in sound change, in A. Yu (ed.), Origins ofsound change: Approaches to phonologization, Oxford University Press, Oxford, pp. 5197.Geisler, W. S.: 2003, Ideal observer analysis, The visual neurosciences pp. 825837.Griths, T. L. and Kalish, M. L.: 2007, Language evolution by iterated learning withBayesian agents, Cognitive Science 31(3), 441480.Guion, S. G.: 1997, The role of perception in the sound change of velar palatalization,Phonetica 55(1-2), 1852.Güldemann, T. and Stoneking, M.: 2008, A historical appraisal of clicks: a linguistic andgenetic population perspective, Annual Review of Anthropology 37, 93109.Hall, D. C.: 2007, The role and representation of contrast in phonological theory, PhDthesis, University of Toronto.Hall, K. C.: 2009, A probabilistic model of phonological relationships from contrast to al-lophony, PhD thesis, Ohio State University.Hansson, G. Ó.: 2007, On the evolution of consonant harmony: The case of secondaryarticulation agreement, Phonology 24(01), 77120.Hansson, G. Ó.: 2008, Diachronic explanations of sound patterns, Language and LinguisticsCompass 2(5), 859893.162Harrington, J.: 2006, An acoustic analysis of `happy-tensing' in the Queen's Christmasbroadcasts, Journal of Phonetics 34(4), 439457.Harrington, J., Palethorpe, S. and Watson, C.: 2000a, Monophthongal vowel changes in Re-ceived Pronunciation: An acoustic analysis of the Queen's Christmas broadcasts, Journalof the International Phonetic Association 30, 6378.Harrington, J., Palethorpe, S. and Watson, C.: 2005, Deepening or lessening the divide be-tween diphthongs? an analysis of the Queen's annual Christmas broadcasts, in W. J.Hardcastle and J. M. Beck (eds), A gure of speech: A Festschrift for John Laver,Lawrence Erlbaum Associates, Mahwah, NJ, pp. 227261.Harrington, J., Palethorpe, S. and Watson, C. I.: 2000b, Does the Queen speak the Queen'sEnglish?, Nature 408, 927928.Haudricourt, A.: 1961, Richesse en phonémes et richesse en locuteurs, L'Homme 1, 510.Hay, J. and Bauer, L.: 2007, Phoneme inventory size and population size, Language 83, 388400.Hayes, B.: 2011, Introductory Phonology, Blackwell Textbooks in Linguistics, Wiley-Blackwell, United Kingdom.Herbert, R. K.: 1985, The puzzle of Bantu ejectives and aspirates, in J. Fisiak (ed.),Papers from the 6th International Conference on Historical Linguistics, John BenjaminsPublishing, pp. 251267.Hock, H. H.: 1991, Principles of Historical Linguistics, revised 2nd edn, Mouton de Gruyter,Berlin.Hualde, J. I., Simonet, M. and Nadeu, M.: 2011, Consonant lenition and phonologicalrecategorization, Laboratory Phonology 2(2), 301329.Huttar, G. and Huttar, M.: 1994, Ndyuka, Routledge, London.Hyman, L. M.: 2008, Universals in phonology, The Linguistic Review 25(1-2), 83137.Johnson, K.: 2007, Decisions and mechanisms in exemplar-based phonology, in M.-J. Solé,P. S. Beddor and M. Ohala (eds), Experimental approaches to phonology, Oxford Univer-sity Press, Oxford, pp. 2540.Jongman, A., Wayland, R. and Wong, S.: 2000, Acoustic characteristics of English frica-tives, The Journal of the Acoustical Society of America 108(3), 12521263.Kalish, M., Griths, T. and Lewandowsky, S.: 2007, Iterated learning: Intergenera-tion knowledge transmission reveals inductive biases, Psyconomic Bulletin and Review14, 288294.163Kirby, J.: 2014a, Incipient tonogenesis in Phnom Penh Khmer: Acoustic and perceptualstudies, Journal of Phonetics 43, 6985.Kirby, J.: 2014b, Incipient tonogenesis in Phnom Penh Khmer: Computational studies,Laboratory Phonology 5(1), 195230.Kirby, J. and Sonderegger, M.: 2013, A model of population dynamics applied to phoneticchange, in M. Knau, M. Pauen, N. Sebanz and I. Wachsmuth (eds), Proceedings ofthe 35th Annual Conference of the Cognitive Science Society, Cognitive Science Society,Austin, TX, pp. 776781.Kirby, S.: 1996, Function, Selection and Innateness: The Emergence of Language Univer-sals, PhD thesis, University of Edinburgh, Scotland.Kirby, S.: 1998, Language evolution without natural selection: From vocabulary to syntaxin a population of learners, Technical Report EOPL-98-1, Department of Linguistics,University of Edinburgh, Edinburgh.Kirby, S.: 2000, Syntax without natural selection: How compositionality emerges fromvocabulary in a population of learners, in C. Knight (ed.), The Evolutionary Emergenceof Language: Social Function and the Origins of Linguistic Form, Cambridge UniversityPress, Cambrige, pp. 303323.Kirby, S.: 2001, Spontaneous evolution of linguistic structure - an iterated learning modelof the emergence of regularity and irregularity, Evolutionary Computation, IEEE Trans-actions on 5(2), 102110.Kirby, S.: 2002, Learning, bottlenecks, and the evolution of recursive syntax, in T. Briscoe(ed.), Linguistic Evolution through Language Acquisition: Formal and ComputationalModels, Cambridge University Press, pp. 173204.Kirby, S., Cornish, H. and Smith, K.: 2008, Cumulative cultural evolution in the labora-tory: An experimental approach to the origins of structure in human language, PNAS105(31), 1068110686.Klatt, D.: 1975, Voice onset time, frication, and aspiration in word-initial consonant clus-ters, Journal of Speech and Hearing Research 8(4), 686706.Klein, T.: 2006, Creole phonology typology: Phoneme inventory size, vowel quality distinc-tions and stop consonant series, in P. Bhatt and I. Plag (eds), The Structure of CreoleWords: Segmental, Syllabic and Morphological aspects, Walter de Gruyter, pp. 323.Kochetov, A. and Colantoni, L.: 2011, Spanish nasal assimilation revisited: A cross-dialectelectropalatographic study, Laboratory Phonology 2(2), 487523.164Krakow, R. A.: 1994, Nonsegmental inuences on velum movement patterns: Syllables, sen-tences, stress, and speaking rate, Haskins Laboratory Status Report on Speech Research,SR-117/118 pp. 3148.Labov, W.: 2007, Transmission and diusion, Language 83(2), 344387.Ladefoged, P.: 1995, Voiceless approximants in Tee, UCLA Working Papers in Phoneticspp. 8588.Lahiri, A. and Reetz, H.: 2010, Distinctive features: Phonological underspecication inrepresentation and processing, Journal of Phonetics 38(1), 4459.Legate, J. A. and Yang, C. D.: 2002, Empirical re-assessment of stimulus poverty argu-ments, The Linguistic Review 18(1-2), 151162.Lin, S., Speeter, P. and Coetzee, A.: 2014, Gestural reduction, lexical frequency, and soundchange: A study of post-vocalic /l/, Laboratory Phonology 5, 936.Lindblom, B. and Maddieson, I.: 1988, Phonetic universals in consonant systems, in L. Cand L. Hyman (eds), Language, Speech, and Mind, Routledge, London, pp. 6278.Lisker, L.: 1986, Voicing in English: A catalogue of acoustic features signaling /b/ versus/p/ in trochees, Language and Speech 29(1), 311.Lorenzino, G.: 1998, The Angloar Creole Portuguese of São Tomé: Its grammar and so-ciolinguistic history, LINCOM studies in Pidgin and Creole languages, City University,New York.Mackie, S. and Mielke, J.: 2011, Feature economy in real, random, and synthetic invento-ries, in G. N. Clements and R. Ridouane (eds), Where Do Phonological Features ComeFrom?: Cognitive, Physical and Developmental Bases of Distinctive Speech Categories,John Benjamins Publishing Company, pp. 4363.Maddieson, I.: 1984, Patterns in Sounds, Cambridge Series in Speech Science and Commu-nication, Cambridge University Press, Cambridge.Maddieson, I.: 2005, Issues of phonological complexity: Statistical analysis of the relation-ship between syllable structures, segment inventories and tone contrasts, UC BerkeleyLab Annual Report pp. 259268.Maddieson, I.: 2007, Issues of phonological complexity: Statistical analysis of the relation-ship between syllable structures, segment inventories and tone contrasts, Oxford UniversityPress, pp. 93103.165Maddieson, I.: 2011, Phonological complexity in linguistic patterning, in W.-S. Lee andE. Zee (eds), Proceedings of the 17th International Congress of Phonetic Sciences, HongKong, China, pp. 2835.Maddieson, I.: 2013a, Consonant inventories, in M. S. Dryer and M. Haspelmath (eds),The World Atlas of Language Structures Online, Max Planck Institute for EvolutionaryAnthropology, Leipzig.URL: http://wals.info/chapter/1Maddieson, I.: 2013b, Uvular Consonants, Max Planck Institute for Evolutionary Anthro-pology, Leipzig.URL: http://wals.info/chapter/6Maddieson, I. and Precoda, K.: 1989, Updating UPSID, UCLA Working Papers in Pho-netics 74, 104111.Martinet, A.: 1952, Function, structure, and sound change, Word 8(1), 132.Martinet, A.: 1955, Economie des changements phonétiques, Francke, Berne.McDougall, D.: 2006, Survival comes rst for the last Stone Age tribe world. Accessed:2016-01-10.URL: http://www.theguardian.com/world/2006/feb/12/theobserver.worldnews12McMahon, A.: 2002, An introduction to English phonology, Edinburgh University Press,Edinburgh.McMullin, K. J.: 2016, Tier-based locality in long-distance phonotactics: learnability andtypology, PhD thesis, University of British Columbia.URL: https://oc-web.library.ubc.ca/cIRcle/collections/24/items/1.0228114Mesoudi, A. and Whiten, A.: 2008, The multiple roles of cultural transmission experimentsin understanding human cultural evolution, Philosophical Transactions of the Royal So-ciety B: Biological Sciences 363(1509), 34893501.Mielke, J.: 2008, The emergence of distinctive features, Oxford University Press.Mooshammer, C. and Geng, C.: 2008, Acoustic and articulatory manifestations of vowelreduction in German, Journal of the International Phonetic Association 38(02), 117136.Moran, S., McCloy, D. and Wright, R.: 2012, Revisiting population size vs. phonemeinventory size, Language 88, 877893.Moran, S., McCloy, D. and Wright, R.: 2014, PHOIBLE Online, Max Planck Institute forEvolutionary Anthropology, Leipzig. Accessed: 2016-05-05.URL: http://phoible.org/166Morén-Duolljá, B.: 2005, The segmental sound pattern of Palauan. Accessed: 2015-06-22.URL: http://www.hum.uit.no/a/moren/palauannew.pdfMoreton, E.: 2008, Analytic bias and phonological typology, Phonology 25(01), 83127.Newport, E., Bavelier, D. and Neville, H.: 2001, Critical thinking about critical periods:Perspectives on a critical period for language acquisition, in E. Dupoux (ed.), Language,brain and cognitive development: Essays in honor of Jacques Mehler, Citeseer, pp. 481502.Nichols, J.: 1992, Linguistic Diversity in Space and Time, University of Chicago Press,Chicago.Ohala, J.: 1981, The listener as a source of sound change, in C. S. Masek, R. A. Hendrick andM. F. Miller (eds), Chicago Linguistic Society, Parasession on Language and Behavior,Chicago Linguistics Society, Chicago, pp. 178203.Ohala, J.: 1983, The origin of sound patterns in vocal tract constraints, in P. F. MacNeilage(ed.), The production of speech, Springer-Verlag, New York, pp. 189216.Ohala, J.: 1992, What's cognitive, what's not, in sound change, in G. Kellermann andM. D. Morrissey (eds), Diachrony within synchrony: Language history and cognition,Peter Lang Verlag, Frankfurt, pp. 309355.Ohala, J.: 1997, Emergent stops, Proceedings of the 4th Seoul International Conference onLinguistics [SICOL], Seoul, pp. 8491.Ohala, M. and Ohala, J.: 1991, Epenthetic nasals in the historical phonology of Hindi,Proceedings of the XIIth International Congress of Phonetic Sciences 3.Oudeyer, P.-Y.: 2005a, How phonological structures can be culturally selected for learn-ability, Adaptive Behavior 13(4), 269280.Oudeyer, P.-Y.: 2005b, The self-organization of combinatoriality and phonotactics in vo-calization systems, Connection Science 17(3-4), 325341.Oudeyer, P.-Y.: 2005c, The self-organization of speech sounds, Journal of Theoretical Bi-ology 233(3), 435449.Pallier, C.: 2007, Critical periods in language acquisition and language attrition, inB. Köpke, M. S. Schmid, M. Keijzer and S. Dostert (eds), Language Attrition: Theo-retical perspectives, John Benjamins, Amsterdam/Philidelphia, pp. 155168.Pape, D., Mooshammer, C., Hoole, P. and Fuchs, S.: 2003, Devoicing of word-initial stops:A consequence of the following vowel?, in J. Harrington and M. Tabain (eds), SpeechProduction, Macquarie Monographs in Cognitive Science, Psychology Press, Sydney, Aus-tralia.167Pater, J. and Staubs, R.: 2013, Feature economy and iterated grammar learning, Paperpresented at the 21st Manchester Phonology Meeting.Peperkamp, S.: 2004, A psycholinguistic theory of loanword adaptations, in M. Ettlinger,N. Fleisher and M. Park-Doob (eds), Annual Meeting of the Berkeley Linguistics Society,Vol. 30, Berkeley Linguistic Society, Berkeley, CA, pp. 341352.Peperkamp, S., Le Calvez, R., Nadal, J.-P. and Dupoux, E.: 2006, The acquisition ofallophonic rules: Statistical learning with linguistic constraints, Cognition 101(3), B31B41.Pierrehumbert, J.: 2001, Exemplar dynamics, word frequency, lenition, and contrast, inJ. Bybee and P. Hopper (eds), Frequency eects and the emergence of linguistic structure,John Benjamins Publishing Company, Amsterdam, pp. 137157.R Core Team: 2016, R: A Language and Environment for Statistical Computing, R Foun-dation for Statistical Computing, Vienna, Austria. Accessed: 2016-05-19.URL: https://www.R-project.orgRaphael, L. J.: 2005, Acoustic cues to the perception of segmental phonemes, in D. Pisoniand R. Remez (eds), The handbook of speech perception, Blackwell Publishers, Oxford,pp. 182206.Recasens, D.: 2014, Coarticulation and Sound Change in Romance, Current Issues in Lin-guistic Theory, John Benjamins, Amsterdam.Robinson, S.: 2006, The phoneme inventory of Aita Rotokas, Oceanic Linguistics45(1), 206209.Sanko, G. and Blondeau, H.: 2007, Language change across the lifespan: /r/ in MontrealFrench, Language 83(3), 560588.Schourup, L. C.: 1973, A cross-language study of vowel nasalization, in A. Malikouti-Drachman, G. Drachman, M. L. Edwards, J. E. Geis and L. C. Schourup (eds), WorkingPapers in Linguistics 15, Ohio State University, Columbus, Ohio, USA.Smith, K. and Kirby, S.: 2008, Cultural evolution: implications for understanding thehuman language faculty and its evolution, Philosophical Transactions of the Royal SocietyB: Biological Sciences 363(1509), 35913603.Smith, K., Kirby, S. and Brighton, H.: 2003, Iterated learning: A framework for theemergence of language, Articial Life 9(4), 371386.Smith, K. and Wonnacott, E.: 2010, Eliminating unpredictable variation through iteratedlearning, Cognition 116(3), 444449.168Snyman, J.: 1969, An Introduction to the !Xu Language, Communications from the Schoolof African Studies, University of Cape Town, Balkema Cape Town.Soukka, M.: 2000, A descriptive grammar of Noon: A Cangin language of Senegal, Vol. 40of LINCOM studies in African linguistics, LINCOM Europe.Stanton, J.: 2016, Learnability shapes typology: the case of the midpoint pathology, Lan-guage 92(4), 753791.Steriade, D.: 1995, Underspecication and markedness, in J. Goldsmith (ed.), Handbook ofPhonological Theory, Blackwell, Oxford/Cambridge MA, pp. 114174.Swadesh, M.: 1952, Lexico-Statistic Dating of Prehistoric Ethnic Contacts: With SpecialReference to North American Indians and Eskimos, Proceedings of the American Philo-sophical Society 96(4), 452463.Tesar, B. and Smolensky, P.: 1998, Learnability in Optimality Theory, Linguistic Inquiry29(2), 229268.Traill, A.: 1985, Phonetic and Phonological Studies of !Xoo Bushman, John BenjaminsPublishing Company, Amsterdam.Trudgill, P.: 2004, Linguistic and social typology: The Autronesian migrations andphoneme inventories, Linguistic Typology 8, 305320.Trudgill, P.: 2011, Social structure and phoneme inventories, Linguistic Typology 15, 155160.Tupper, P.: 2015, Exemplar dynamics and sound merger in language, SIAM Journal onApplied Mathematics 75(4), 14691492.Verhoef, T.: 2012, The origins of duality of patterning in articial whistled languages,Language and Cognition 4, 357380.Verhoef, T. and de Boer, B.: 2011, Cultural emergence of phonemic combinatorial struc-ture in an articial whistled language, Proceedings of the 17th International Congress ofPhonetic Sciences (ICPhS XVII) pp. 20662069.Verhoef, T., Kirby, S. and de Boer, B.: 2013, Combinatorial structure and iconicity inarticial whistled languages, inM. Knau, M. Pauen, N. Sebanz and I. Wachsmuth (eds),Cooperative Minds: Social Interaction and Group Dynamics: Proceedings of the 35thAnnual Meeting of the Cognitive Science Society, Cognitive Science Society, Austin,TX,pp. 36693674.169Verhoef, T., Kirby, S. and Padden, C.: 2011, Cultural emergence of combinatorial structurein an articial whistled language, in L. Carlson, C. Hoelscher and T. Shipley (eds),Proceedings of the 33rd Annual Conference of the Cognitive Science Society, CognitiveScience Society, Austin, TX, pp. 483488.Wedel, A. B.: 2007, Feedback and regularity in the lexicon, Phonology 24(1), 147185.Whiten, A. and Mesoudi, A.: 2008, Establishing an experimental science of culture: animalsocial diusion experiments, Philosophical Transactions of the Royal Society B: BiologicalSciences 363(1509), 34773488.Wichmann, S., Muller, A., Velupillai, V., Brown, C., Holman, E., Brown, P., Sauppe,S., Belyaev, O., Urban, M., Molochieva, Z., Wett, A., Bakker, D., List, J.-M., Egorov,D., Mailhammer, R., Beck, D. and Geyer, H.: 2011, The ASJP database (version 13).Accessed: 2016-05-31.URL: http://asjp.clld.org/Wichmann, S., Rama, T. and Holman, E.: 2011, Phonological diversity, word length, andpopulation sizes across languages: The ASJP evidence, Linguistic Typology 15(2).Wilson, C.: 2006, Learning phonology with substantive bias: An experimental and compu-tational study of velar palatalization, Cognitive science 30(5), 945982.Yang, C.: 2010, Who's afraid of George Kingsley Zipf?URL: http://www.ling.upenn.edu/ ycharles/papers/zipfnew.pdfZuidema, W.: 2003, How the poverty of the stimulus solves te poverty of the stimulus,Advances in neural information processing pp. 5158.170

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0340865/manifest

Comment

Related Items