UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Phonological processes interacting with the lexicon : variable and non-regular effects in Japanese phonology Rosen, Eric Robert 2001

You don't seem to have a PDF reader installed, try download the pdf

Item Metadata

Download

Media
831-ubc_2001-611671.pdf [ 16.84MB ]
Metadata
JSON: 831-1.0099704.json
JSON-LD: 831-1.0099704-ld.json
RDF/XML (Pretty): 831-1.0099704-rdf.xml
RDF/JSON: 831-1.0099704-rdf.json
Turtle: 831-1.0099704-turtle.txt
N-Triples: 831-1.0099704-rdf-ntriples.txt
Original Record: 831-1.0099704-source.json
Full Text
831-1.0099704-fulltext.txt
Citation
831-1.0099704.ris

Full Text

P H O N O L O G I C A L PROCESSES INTERACTING WITH T H E L E X I C O N : V A R I A B L E A N D N O N - R E G U L A R EFFECTS EN J A P A N E S E P H O N O L O G Y by ERIC ROBERT R O S E N B.G.S. Simon Fraser University, 1975 M . A . The University of British Columbia, 1996  A THESIS SUBMITTED IN P A R T I A L F U L F I L L M E N T OF T H E R E Q U I R E M E N T S FOR THE D E G R E E OF DOCTOR OF PHILOSOPHY in THE F A C U L T Y OF G R A D U A T E STUDIES Department of Linguistics We accept this thesis as conforming to the required standard  THE UNIVERSITY OF BRITISH C O L U M B I A April 2001 © Eric Robert Rosen, 2001  In presenting  this  thesis in  degree at the University of  partial  fulfilment  of  the  requirements  for  an advanced  British Columbia, I agree that the Library shall make it  freely available for reference and study. I further agree that permission for extensive copying of this thesis for department  or  by  his  or  scholarly purposes may be granted by the head of her  representatives.  It  is  understood  that  copying  my or  publication of this thesis for financial gain shall not be allowed without my written permission.  V  Department The University of British Columbia Vancouver, Canada  DE-6 (2/88)  11  ABSTRACT In current generative linguistic theory, a speaker of a natural human language possesses a language faculty that includes a lexicon: a set of language-specific input forms, and a grammar: a set of constraints or rules that derive the surface or output forms of the language from a structured combination of input forms. In Optimality Theory (Prince & Smolensky (1993)), output forms are determined by an ordered set of constraints. Constraints operate only on output forms and not input forms: thus we should expect input forms not to show the effects of a grammar and to occur in statistically random patterns. Conversely, we expect output forms to occur in patterns that are favoured by the grammar.  This thesis examines apparent exceptions to both these predicted tendencies, with respect to two phonological phenomena in Japanese: (a) rendaku voicing, which causes voicing at a morpheme juncture in a compound word and (b) pitch-accent patterns of monomorphemic Yamato (native) nouns, which fail to show a predicted randomness of statistical patterns.  I argue that a compound word is prone to exceptions to processes like rendaku voicing because its input form must occur partly as a minimal lexical item of its own rather than simply a concatenation of two constituent input forms. This opens up the possibility for a compound word's input form to include independent phonological features that can lexically block a process like rendaku. Here, the lexicon affects the regularity of output forms.  For noun pitch-accent, I argue that apparent non-randomness of input forms is actually due to the lack of constraints on input forms in Optimality Theory. If any type of input is possible, then the surface forms of monomorphemic nouns do not necessarily reflect their input forms. Apparent non-randomness of inputs can arguably be due to a convergence of several input types on one output type, as required by the grammar, and can thus be explained as non-randomness of outputs, not inputs.  In current linguistic theory, both of these claims have important implications forthe nature of the lexicon, the grammar, and the way they interact.  TABLE OF CONTENTS  ABSTRACT  ii  Acknowledgements  V'iu  T A B L E OF CONTENTS  hi  List of tables (excluding constraint tableaux)  Vh  1. Introduction: Irregular effects in Japanese phonology 1 1.1 Other examples of de-regularized phonology and grammaticalization of the lexicon 5 1.2 Theoretical framework 6 1.2.1 Optimality Theory 6 1.2.2 Local Conjunction 8 1.2.3 Constraints on the lexicon? 10 1.2.4 Output-Output Correspondence (Benua (1997)) 12 2. Lexicalization of compound words and its relation to rendaku voicing 14 2.1 Why the lexicon has an effect on rendaku voicing 14 2.2 The dual nature of compound words 15 2.2.1 Ways in which compound words act like morphologically complex words 16 2.2.1.1. Resistance to Lyman's Law 16 2.2.1.2. Rendaku voicing as a productive phonological process . . . . 17 2.2.2 How compounds behave like simplex words 19 2.2.2.1 Compounding in Japanese is not completely productive. . . . 19 2.2.2.2 Idiosyncratic meanings 20 2.2.2.3 Unpredictable phonological properties 20 2.3 The structure of compound words 21 2.3.1 Lexical listing of compounds as references to other listings 21 2.4 Why can lexical specification occur only for compounds? 22 3. Rendaku Voicing and blocking in noun-noun compounds: blocking by lexical prespecification 24 3.1 Introduction to rendaku voicing 24 3.2 Patterns of rendaku in noun-noun compounds 27 3.2.1 Rendaku voicing in compounds that meet a prosodic size criterion . . . 27 3.2.2 Rendaku voicing in compounds that are "too short" 33 3.3 A n account of rendaku voicing 42  iv  3.3.1 Previous analyses of rendaku 3.3.2 A n alternative approach to rendaku 3.3.3 Rendaku blocking 3.3.4 Blocking of blocking 3.4 The prosodic size factor 3.5 Underspecification of the [-voice] feature on the initial obstruent  42 52 56 59 70 75  4. Why there are lexicalized exceptions to rendaku voicing: an account based on the evolution of rendaku from an utterance-to-utterance variable phenomenon to a word-to-word variable phenomenon 76 4.1 Literature review of analyses of derived variable effects 77 4.1.1 Unranked constraints: Ringen & Heinamaki (1999) and Anttila (1997) 77 4.1.2 Boersma& Hayes (1999) 78 4.2 The problem of explaining word-to-word variation 79 4.2.1 Applying unranked constraints to rendaku voicing 79 4.3 Rendaku irregularity as the legacy of utterance-to-utterance variation 81 4.4 The effects of the above factors 87 4.5 Summary of chapter 4 92 5. Blocking of rendaku in noun-verb compounds 5.1 Examples of rendaku blocking in noun-verb compounds 5.1.1 Adjunct compounds 5.1.2 Argument compounds  94 95 95 100  6. A n experiment to test the frequency of rendaku voicing in fictitious noun-verb compounds 106 6.1 Subjects, experimental design, and data collection 106 6.2 Structure ofthe data 108 6.3 Results Ill 6.4 Discussion of results  112  7. Analysis of phonological and morphological factors that induce blocking of rendaku in nounverb compounds 114 7.1 Factoring out lexical prespecification 114 7.2 Blocking vs. triggering 115 7.3 Blocking through local conjunction of a blocking constraint with Dep-[+voice] 116 7.3.1 Blocking in non-bimoraic verb stems 117 7.3.2 Blocking in compounds with an argument-head relation 124 7.4 The variable effect 130  7.5 Ganging up of blocking factors 7.6 Utterance-to utterance variation vs. word-to-word variation 7.7 The lexical effect 7.8 Conclusions  133 134 135 137  8. Statistically underrepresented pitch accent patterns in monomorphemic Yamato nouns 142 8.1 Variable effects in Japanese pitch-accent patterns in trimoraic nouns 144 8.1.1 Occurrence of accent patterns of trimoraic words in various Japanese dialects: 148 8.1.1.1 Types of modern Japanese dialects 148 8.1.1.2 Tokyo dialect 149 8.1.1.3 Kyoto-type dialects 150 8.1.1.4 Historical relationship among modern dialects and historical connections to Proto-Japanese 150 8.1.1.5 Accent types for trimoraic words in modern dialects 151 8.1.1.5.1 Accent types for trimoraic words in modern Tokyo dialect 151 8.1.1.5.2 Accent types for trimoraic words in modern Kyoto dialect 152 8.1.2 Data on frequencies of accent types in modern Japanese dialects . . . 152 8.1.2.1 Kyoto-type dialects 153 8.1.2.2 Tokyo-type dialects 153 8.1.2.2.1 unaccented dominant: 153 8.1.2.2.2 unaccented most common but not as dominant . . 156 8.1.2.2.3 No strongly dominant accent pattern: 157 8.1.2.2.4 medial accent dominant: 157 8.1.2.2.5 Summary of data on Tokyo-type dialects: 158 8.2 Statistical skewing cannot simply be a reflex of over- or underrepresented accent patterns in an earlier stage of Japanese 158 8.2.1 Correlation between accent patterns in modern Tokyo dialect and accent patterns in 11th century Japanese, 158 8.2.1.1 Accent patterns in 11th century Kyoto Japanese 159 8.2.2 Accent patterns in modern Akita dialect 162 8.3 The Derivation of a biased distribution of accent patterns ..' 164 8.3.1 Distribution of accent patterns in 11th century Kyoto Japanese 164 8.3.1.1 Derivation of 11th century Kyoto accent patterns for trimoraic nouns from an unbiased set of inputs 165 8.3.1.2 Proposed grammar for 11th century Kyoto Japanese 169 8.3.1.3 Derivation of outputs for 11th century Kyoto Japanese . . . . 173 8.3.2 Deriving the accent pattern frequencies for modern Tokyo Japanese 180  vi  8.3.2.1 Review of the accent pattern of modern Tokyo Japanese . . 181 8.3.2.2 Proposed grammar for modern Tokyo Japanese 184 8.3.2.3 Deriving accent patterns for bimoraic nouns in Tokyo dialect 206 8.3.3 Conclusions 215 8.4 The question of lexicon optimization and rendaku voicing revisited  217  9. Conclusions  222  BIBLIOGRAPHY  226  Appendix A . Compounds with N l or N2 greater than 2p  232  Appendix B : N l : 2 u N 2 : l p :  238  Appendix C: Noun-noun compounds 2u-2p  242  Appendix D: Compounds with second members that never experience voicing:  254  Appendix E. Compounds whose second member usually or always voices  257  Appendix F. Compounds with second members that resist voicing:  278  Appendix G Yamato nouns that contain a single obstruent  280  List of tables (excluding constraint tableaux)  Allophones of Japanese obstruents Summary of possibilities for floating voicing features on compounds . . . Summary of results of experiment Overrepresented and Underrepresented Surface Types Occurrences of input types Convergence of input types Frequencies of accent patterns in modern Kyoto dialect Frequencies of accent patterns in trimoraic words in Tokyo dialect Frequencies of accent patterns in Hiroshima dialect Frequencies of accent patterns in Izu dialect Frequencies of accent patterns in Matsue dialect Frequencies of accent patterns in Izumo Dialect Frequencies of accent patterns in Hamada dialect Frequencies of accent patterns in Akita dialect Frequencies of accent patterns in Hatto dialect variant 1 Frequencies of accent patterns in Sapporo dialect Frequencies of accent patterns in Narada dialect Frequencies of accent patterns in Aomori dialect Frequencies of accent patterns in 1 1 t h century (Kyoto) Japanese Correlation of accent patterns in cognates between 1 1 t h century Kyoto Rates of correlation between 1 1 t h century Kyoto accent patterns and Frequencies of accent patterns in Akita dialect revisited Frequencies of accent patterns in 1 1 t h century (Kyoto) Japanese revisited Correlation between input forms and output forms for trimoraic nouns in Convergence of input types on output accent patterns for trimoraic nons Frequencies of accent patterns for bimoraic nouns in Tokyo dialect Frequencies of accent patterns for bimoraic nouns in Tokyo dialect: Frequencies of accent patterns for bimoraic nouns in Tokyo dialect: Convergence of input types on output accent patterns for bimoraic nouns Bias in input and output forms for Rendaku voicing and pitch accent  26 68 Ill 145 147 148 153 154 154 154 155 155 155 156 156 157 157 157 159 160 162 163 164 179 202 206 206 207 215 221  yii I  Acknowledgements As a teenager living in mostly monolingual Toronto in the early and mid-nineteen sixties, it was hard to come by influences that might eventually lead one (thirty years later, as it turned out) to begin studying linguistics. I did, however, have the good fortune to attend, for three-and-a-half years, a high school where one was required to study not only English, but also Latin, French, and a third language — a somewhat rare situation those days. I recall a few particularly talented teachers there: one was a Mr. R. G. Harrison who taught English in a way that was not only greatly inspiring but which fostered an interest in human language for its own sake; another was a foreign language teacher: a Mr. F. W. Ignatz, who seemed able to speak fluently every European language known to the average Canadian and who likely would have, at our request, taught us Russian i f we had not at that time been at the height of the Cold War. In more recent times, the people in the academic world who I have the most to thank for this degree are Doug Pulleyblank and Michael Rochemont. One afternoon in the summer of 1992,1 wandered into the Department of Linguistics at U B C , and happened upon Michael Rochemont, whom I had never met before, and whom I gingerly asked about the possibility of returning to university to study Linguistics. I still remember vividly that first conversation with Michael, where he took a great deal of time to talk to an unannounced visitor who was unlikely to last in a course more than a few months. He went on in considerable detail and with great enthusiasm about the latest exciting developments in current linguistic research. I ended up taking Michael's undergraduate syntax course that winter as a part-time student, and it was mainly because of his outstanding teaching that I decided to continue taking linguistics courses. Two years later, thanks to Michael's encouragement, I enrolled as a graduate student in linguistics at U B C . I was also hired that autumn as a research assistant on Michael's Focus Project, and as a result, ended up grappling with the phenomenon of rightward dislocation in Japanese, which became my M A thesis topic. Throughout those years, Michael was for me not only a superb advisor and teacher, but he was also of tremendous help and encouragement during the frequent periods when the task of trying to complete a linguistics degree while in one's mid-forties and trying to support a family seemed like an rather insane undertaking. When Michael had to go on medical leave at the end of my first year in the Ph.D. programme, it was Doug Pulleyblank who for me took over Michael's role of being a lot more than just an academic advisor. I first met Doug in the second semester of the M . A . programme, where I was the only student in his Phonology 510 class. This class consisted of meeting with Doug once a week in his office and being presented with some particularly insoluble-looking phonological problem or problems which we would grapple with for three hours. It was a rather novel way of teaching phonology, I thought, presenting students with problems that no one had ever solved, and I realized after a few weeks that what Doug was giving me each week was his own research-  in-progress, and that he was generously inviting me share a part of it. A few years later, after a year off studying and one year of the Ph.D. programme, I attended a seminar on laryngeals given jointly by Doug Pulleyblank and Pat Shaw. In that seminar, doing phonology again after a three-year hiatus, something brought me back to that first course in Doug's office, and I decided, at the end of the laryngeals seminar, to switch my plans for a thesis in syntax to one in phonology, with Doug as supervisor. I recall having to come up with a prospectus that summer of 1999 and having only the vaguest ideas of what I wanted to do. I recall talking with Doug about some vague thoughts about the nature ofthe sound system of Japanese and having the pleasant surprise of being taken seriously, even though it would have been very easy for anyone to say something like: "Your ideas are all too vague. Come back when you have something to say." In working with Doug I always found that to be a particularly rare trait of his: that he had the ability to pick out the most positive aspects of what someone was saying or thinking, to be able to find a core of meaning in it, and to lead them into making something constructive out of it. When, in September 2000,1 had to return to working full-time with my thesis still unfinished, it was Doug's highly positive attitude that somehow got me through finishing the thesis. We would meet at Doug's office every second Tuesday evening throughout that winter, and I recall coming out of every one of those meetings feeling inspired and recharged. There were also other people to whom, over the years at U B C , I owe considerable gratitude for finishing this degree. Mark Hewitt taught the first phonology courses I took, undergraduate and graduate, and did so with a flair, expertise and interest that had a lot to do with my eventual decision to work in phonology. Henry Davis taught me an introductory graduate course in syntax, and from that time on was always there as someone to offer insightful and generous advice and a great deal of encouragement. Henry was on my syntax generals committee and stepped in to serve on my thesis committee at the 11th hour. Rose-Marie Dechaine was also involved in my graduate work in syntax, supervising my syntax generals, and like Henry, always being around as someone who was more than willing to offer insightful advice and encouragement. The research I did under Rose-Marie and Henry on the syntax of Japanese compound words ended up complementing the research on the phonology of Japanese compounds that eventually led to a large part ofthe work in this dissertation. By the fluke of timing, I ended up not taking as many courses with Pat Shaw as I would have liked to, but the laryngeals seminar she taught with Doug in the fall of 1999 had a lot to do with  my decision to eventually work in Phonology. John Alderete, who was at U B C in 1998-2000 led a seminar on morphologically-controlled accent in winter 1999. His ideas and leadership were of tremendous help in getting me back into working in phonology and were part of the motivation for me to delve into the matter of the Japanese pitch accent system. Bryan Gick's arrival at U B C at the time I began thesis work was extremely fortuitous. He served as a committee member during the time I worked on the thesis and it was mainly on Bryan's advice and encouragement that I embarked on the new challenge of attempting to do a psycholinguistic experiment on fictitious compound words. He provided a great deal of valuable assistance with the details of experimental design and the undertaking of the experiment. Janet Werker and her graduate students in psychology also provided some useful advice on the experiment during its planning stages. Laura Downing served on my thesis committee during the first year of work on the thesis but unfortunately left U B C during those formative stages of research. I am grateful for her support during those early stages of work.  It's not possible to name all the graduate students who provided support and ideas during my time at U B C . In particular I am grateful to the following Japanese-speaking graduate students who were of generous help in answering numerous questions about the language over the years: Tomio Hirose, Ikuyo Kaneko, Yumiko Nakamura, Michiko Suzuki, and Akihiko Uechi. Most of my years at U B C were partly supported by working part-time as an elementary school teacher in School District #44, North Vancouver, thanks to their enlightened policy of allowing teachers to job-share. I am particularly grateful for the support of four extremely professional and collegial job-sharing partners over the eight years from 1992-2000: Kathleen Huxley, Shannon Sharp, Alice Jennings, and Tracey Todd. I am also grateful for financial support from three University Graduate FellowshipsfromU B C in 1994-1996,1998-1999, and 1999-2000. Some of the research on my syntax generals was also supported by a research assistantship through Rose-Marie Dechaine. I was extremely fortunate to have the best possible examining committee for my defense: in addition to my thesis committee of Doug Pulleyblank, Bryan Gick and Henry Davis, Ross King from Asian Studies served as Chair, and Laurel Brinton from English and Joe Stemberger from Linguistics served as University Examiners. I thank them all for their perceptive and constructive comments during the defense. Armin Mester, who served as External Examiner, provided some extremely useful and encouraging remarks. The suggestions I received from all the people above will provide food for thought and for future research for many years to come.  Above all, I would like to thank my wife Keiko and children Sanae and Yochan for their support during the past nine years of studying linguistics. If it hadn't been for the encouragement, I never would have come close to completing this degree.  1 Phonological processes interacting with the lexicon: variable and non-regular effects in Japanese phonology  1. Introduction: Irregular effects in Japanese phonology This study is concerned with surface patterns that occur in the phonology of Yamato Japanese. "Yamato" Japanese refers to one particular subset of the Japanese lexicon that is also referred to as the "native" vocabulary by Ito & Mester (1995). Another subset of the Japanese lexicon, the Sino-Japanese vocabulary, differs from the Yamato subset in a number of ways, although they have some similarities. Both Yamato words and Sino-Japanese words can be written both in the "hiragana" syllabary and in "kanji" or Chinese characters whereas words from a third subset, foreign borrowings, are usually written in the "katakana" syllabary. Yamato words often are paired with Sino Japanese morphemes of equivalent meaning that have the same Chinese character but a different and unrelated pronunciation. For example, the Yamato word hito "person" has Sino-Japanese equivalent zin, nin, "person" which has the same Chinese character and which occurs in morphologically complex words such as san-nin "three people" or ni-hon-zin "Japanese person." A Chinese character that has both a Yamato and a Sino-Japanese pronunciation has a different term to refer to each of the two pronunciations: the Yamato word is called the "kun" reading, and the Sino-Japanese word, the "on" reading. In determining whether a word is Yamato, I have used the following criteria: 1. the word is listed as a "kun" reading for a character in Nelson (1962). 2. the word is listed as being of Japanese Native origin in Martin (1987), a comprehensive historical study of the Japanese language. Ito & Mester (1995) show that the native or Yamato lexicon shows different phonological behaviour from other sub-lexica of Japanese with respect to constraints on post-nasal voicing, ungeminated /p/, and a ban on voiced geminate consonants. Other properties of Yamato words that distinguish them phonologically from Sino-Japanese words are as follows. Sino Japanese morphemes have an overwhelming tendency to be monosyllabic, or, disyllabic when the final vowel is an epenthesized/u/ or HI. (e.g. kyaku "customer.") Only Yamato verbs and adjectives can take inflective affixes. , 1  The fact that Yamato morphemes differ phonologically from non-Yamato morphemes in a  'There are a few simplex Sino-Japanese words like sin "truth", which have fused with the light verb sur-u to derive inflectible verbs like sin-ziru/sin-zuru "to believe", but arguably, these are bimorphemic, with the light verb acting like a suffix to support inflection.  2 number of ways supports Ito and Mester's proposal that they constitute an independent lexical sub-domain and that the distinction between Yamato and non-Yamato morphemes is psychologically real and not just a matter of historical origin. Given these facts, I have restricted this study of Japanese phonology to include only morphemes of Yamato origin, so that none of the phonological patterns I am examining are affected by whether a word is of Yamato, SinoJapanese, or other origin. Specifically, I shall examine irregular patterns that occur in Yamato Japanese phonology and how these irregular patterns are due to an interaction between the grammar and the lexicon. The word "irregular" here refers to two related types of phenomena. The first is a case of a phonological process that does not apply in 100% of the cases where we would expect it to occur. The second is the opposite type of situation: in a set of morphologically simplex words, what appear to be underived input forms behave like derived forms in that they show biased surface patterns. That is, certain surface patterns are preferred and others are dispreferred. If the surface patterns of these morphologically simplex words reflect their underlying forms, then it would appear as if the lexicon had been subjected to phonological constraints or rules. The first case can be called irregular because we expect phonological processes to derive surface forms in some uniform way, but in this case they do not do so completely. In the second case, we expect that phonological processes should have no effect on input forms, but they appear to do so, to the extent that surface forms of morphologically simplex words reflect their input forms. Adopting the framework of Optimality Theory (Prince & Smolensky (1993)), I will focus on one example of each of these two types of phenomena in Japanese. The first is the well-known phenomenon of Rendaku voicing, (McCawley (1968), Ito & Mester (1986), Mester & Ito (1989), Ito & Mester (1998)), which is a process that occurs in derived forms: specifically, in compound words. Rendaku voicing, which applies mainly to compounds of Yamato (native Japanese) origin only occurs in about 75% of noun-noun compounds. If rendaku is due to a phonological process, this process occurs with only 75% predictability. I shall examine the question of why rendaku is irregular and to what extent its irregularities can be systematized. The second phenomenon involves frequencies of pitch-accent patterns that occur in underived Yamato nouns. We might expect that underived lexical items should show no effects of the grammar: that is, the lexicon should be random, with any given pattern of phonological features occurring as often as any other pattern. A bias towards one particular pattern at the expense of another should be due to the effects of the grammar, and we do not expect the lexicon to have already been affected by the grammar. Yet we shall find that across the many dialects of Japanese, the standard picture for a given dialect is that some pitch-accent patterns of simplex nouns are strongly overrepresented and other patterns are underrepresented.  3 Another way of seeing these two types of phonological irregularities is as follows. Let us start by looking more closely at rendaku voicing and comparing it to other phonological processes. Rendaku voicing in compound words is a process that lies at the borderline between idiosyncrasy and regularity. It is regular enough that there is reason to consider rendaku outputs as obeying some predictable phonological process. Yet rendaku is not as predictable or regular as various other phonological processes that occur in the language. For example, when a Japanese verb stem occurs with the perfective suffix /-ta/, both the final consonant of the stem and the inflective affix change in complex but completely predictable ways as shown below in (1). (See Martin (1975).) (1) Change in final consonant  Verb stem  Stem plus suffix  kak- "write" kik- "hear" sak- "bloom" wamek- "shout" ok- "put"  kai-ta kii-ta sai-ta wamei-ta oi-ta  kaseg- "earn" isog- "hurry" kag- "smell" nug- "disrobe"  kasei-da isoi-da kai-da nui-da  /s/ unchanged HI epenthesized  kas- "lend" hos- "dry" kes- "extinguish" mus- "steam"  kasi-ta hosi-ta kesi-ta musi-ta  IM unchanged  kat- "win" mot- "carry" ut- "hit"  kat-ta mot-ta ut-ta  Inl unchanged IM -> Id/  sin- "die"  sin-da  Ibl -> Inl IM -> Idl  yob- "call" manab- "learn"  yon-da manan-da  Hd -> I'M  It/ unchanged  /g/->/i/  /t/'->/d7  2  2  3  The Japanese phoneme Is/ has a palatalized allophone that occurs before high front vowel HI.  3  HI.  A s an allophonic effect, IM is affricated before IxJ and both palatalized and affricated before  4 / m / - > / n / /t/->/d/  yom- "read" ham- "graze" kanasim- "be sad"  yon-da han-da kanasin-da  /w/ - > IM  kaw- "buy" nuw- "sew" ow- "chase"  kat-ta nut-ta ot-ta  Ixl --> /t/  kar- "clip" or- "break" kir- "cut"  kat-ta ot-ta kit-ta  As complex as they are, there are absolutely no exceptions to these phonological patterns of consonant mutation of verb stems and perfective affixes. Rendaku voicing on the other hand, though quite a simple process formally, has plenty of exceptions. Many examples of the irregularity of rendaku will follow. For the time being, I present in (2) one example of the unpredictability of rendaku: (2) (a) kuti "mouth"  kuse "habit"  kuti-guse "way of speaking"  (b) sake "rice wine"  kuse "habit"  sake-kuse "habit of drinking"  In spite of the morphological and semantic similarity between (2)(a) and (2)(b), rendaku voicing occurs in (a) but not in (b). Moreover, both kuti and sake are unaccented when they occur as simplex nouns, and both compounds in (2) are unaccented. Both sake and kuti are bimoraic and contain two voiceless obstruents. There is thus no obvious phonological, morphological, or semantic difference between the two compounds that could account for the lack of voicing in (2) (b). One further example, asi-kuse (foot-habit) "way of walking" is even more semantically similar to (2)(a) than is (2)(b), since both asi-kuse and kuti-guse have a first member that refers to a body part that is the focus ofthe particular habit that the compound refers to. Such a semantic near-minimal pair rules out the possibility that morpho-semantic factors could determine whether voicing occurs or not in noun-noun compounds such as these. 4  Given this kind of evidence for the irregularity of rendaku voicing we need to ask the following question: why does rendaku show this irregularity but not other phonological processes such as consonant mutation in verb stems in the perfective? We expect rendaku to show regularity but  I will show later that when we consider noun-verb compounds, semantic factors do play a role in determining the likelihood that rendaku will occur. Specifically, noun-verb compounds with an argument relation are much less likely to voice than when there is an adjunct relation. This fact is also observed by Ito & Mester (1998:46). 4  5 idiosyncrasy gets mixed in. In examining the Japanese lexicon, we discover the reverse problem. We expect the lexicon to be random and unpredictable, but we find that certain pitch accent patterns are either strongly preferred or strongly dispreferred. Viewed from the point of view of pitch accent, the Yamato lexicon looks as if it has already been subject to phonological processes, even though we expect that the lexicon, in a generative framework, is composed of inputs, not outputs, and thus should not be subject to the effects of phonological rules or constraints. Without abandoning the premise that the lexicon and the grammar are two separate and different entities, how do we account for these kinds of phenomena? M y answer to that question, which is what I shall explore on the following pages, is that although the lexicon and the grammar are separate, they can affect each other in ways that can make surface patterns of simplex lexical items take on regular or "grammatical" properties to some extent. Conversely, the lexicon can affect the grammar such that certain phonological processes can become de-regularized and idiosyncratic because of the effects of the lexicon. M y task will be to show, in the context of Japanese phonology, how these interactions occur in explainable ways. 1.1 Other examples of de-regularized phonology and grammaticalization of the lexicon Both the deregularization of phonological processes and the grammaticalization of the lexicon are well-known and have been discussed in the recent phonological literature with respect to a number of languages. For example, one type of phenomenon that de-regularizes certain phonological processes is what researchers such as Anttila (1997) refer to as "variation." For a given speaker, process P will occur for a given word in a certain percentage of utterances and it will not occur for that word the rest of the time. This means that input A will have (at least) two possible outputs: B , in which process P has applied, and B', where process P has not applied. Each output will occur some predictable percentage of the time. 5  This phenomenon of variable outputs that occur for a given input is examined by Anttila (1997) and Ringen & Heinamaki (1999), who explain variation in an OT framework through a system of unranked constraints. It is also analysed by Hayes & Boersma (1999) who propose a different type of model, also in the framework of Optimality Theory, but employing a modification of the constraint system. (See page 77ff for a discussion of these various models that account for variable outputs.) We might call the type of variation examined by Anttila (1997), Ringen & Heinamaki (1999), and Hayes & Boersma (1999) "utterance-to-utterance variation", since it is a type of variation in output forms that can occur for the same input but will vary from one utterance to another, even  5  See page 77 for a specific example of variation analysed by Anttila.  6 for the same speaker. A different type of variation is one where, for a set S of input forms that are subject to process P, some apparently random subset of S will undergo process P in deriving the output form and the rest of forms in S will not. The subset of S that undergoes process P is large enough that process P can be considered a real phonological process, even though it is not completely predictable. We might call this type of variable effect as "word-to-word variation", to distinguish it from variation that derives more than one output form from a given input for a given word. "Word-to word" variation means that for a given word in set S, the outcome of process P will always be the same, but there is variation between words in the set with respect to whether or not process P occurs. This kind of effect has been termed "lexical exceptions" and has been recently examined by Inkelas, Orgun and Zoll (1996). They argue that lexical exceptions are best explained through prespecification: that is, in cases where a phonological process does not apply, it is because the input form to which it would apply is prespecified with a phonological feature whose presence will block application of that process. As for the converse problem: apparent phonological processes acting on the lexicon, a wellknown example is OCP-related co-occurrence restrictions on consonant place features in Arabic roots, examined originally by Greenberg (1950) and later analysed by McCarthy (1991). This phenomenon is similar to co-occurrence restrictions on certain pitch accent patterns with obstruent voicing in the Yamato lexicon in that here too, certain patterns are statistically avoided, but are not ruled out completely. In part I, I examine the variable effects that occur in rendaku voicing. In part II, I examine patterns of pitch accent in the Japanese lexicon. I will argue that in both instances, the irregular effects that are observable are due to an interaction between the grammar and the lexicon.  1.2 Theoretical framework 1.2.1 Op timality Theory This analysis of lexicon-grammar interactions in Japanese phonology will be cast in the framework of Optimality Theory (Prince & Smolensky (1993)), henceforth "OT". OT is a constraint-based system rather than a rule-based system. In parallel OT, surface forms are derived by a one-step process where the surface form chosen by the grammar is the one that best satisfies a hierarchy of ranked constraints. If constraint C is ranked above constraint C , we say that C, dominates C , and we write C, » C . This hierarchical ordering of constraints is crucial in determining the correct output form from an underlying or input form in that it is more important for an output form to satisfy C, than to satisfy C i f C, » C . Constraints discriminate among possible output forms, referred to as candidates. A given candidate will either satisfy or violate a given constraint. Constraints are violable in that an optimal output candidate need not satisfy all the constraints of the grammar. x  2  2  2  2  2  7 For a given input form, the generative component of the grammar, G E N , generates possible output forms. There are no constraints on these possible output forms other than that they must be phonologically well-formed. The optimal candidate is chosen as follows. The evaluative component of the grammar E V A L compares candidates with respect to their satisfaction of the constraint hierarchy C O N . If more than one candidate satisfies a given constraint, a lower-ranked constraint will determine the optimal candidate among the set of constraints that satisfy the higher-ranked constraint. Given a candidate set of possible output forms {O,, 0 ,... O } generated by G E N , and a ranked hierarchy of constraints C, » C » ... » C } , the optimal candidate 0 is determined as follows. For any pair of candidates Oj and Oj, Oj is more optimal than 0 iff the highest-ranked constraint C that is violated by O and satisfied by Oj outranks any constraint C that is violated by Oj and satisfied by 0 . The candidate of the set that is designated as grammatical is the one that is optimal when compared with all other candidates. 2  2  n  m  ;  (  s  s  r  (  For clarity of exposition, constraint violations by candidates are shown in a constraint tableau, such as (3), which shows graphically how constraints are compared. In a tableau, relevant constraints in a derivation are listed in hierarchical order, with the highest ranked constraint on the left. A violation of a constraint by a candidate is indicated by asterisk in the same row as the candidate and the same column as the constraint it violates. If a violation has the effect of eliminating a candidate from contention for optimality, an exclamation mark occurs after it as an expository device. Also, for ease of reference, the optimal candidate is indicated by a pointing hand symbol: «s" Cells that are not crucial in determining the optimal candidate are shaded.  8 (3) input I  c,  output candidate (a)  *!  c,  c  c,  4  *  *  * output candidate (b) output candidate (c)  *!  *  output candidate (d) output candidate (e)  *!  *  *  I  In (3), candidate (b) is optimal. We can chart constraint violations in a tableau by first examining the highest ranked constraint and eliminating candidates that violate this constraint. For example, C[ eliminates candidates (a), (c) and (e). Then, the same is done for the next constraint, and so on, until only one candidate remains. For example, candidates that are not eliminated by C will be decided by lower-ranked constraints. If a constraint happens to eliminate all the candidates that do not violate higher-ranked constraints, then the next ranked constraint is used to compare candidates. Both candidates (b) and (d) violate C , so they both remain in contention until one of them is eliminated by a lower-ranked constraint. :  2  1.2.2 Local Conjunction As an extension to the system sketched above, the principle of "Local Conjunction" of two constraints that form a natural class has been proposed recently in order to account for a number of cross-linguistic phenomena, including derived environment effects (Smolensky (1993), Lubowicz (1998), Ito & Mester (1998), Alderete (1999). When two constraints are conjoined, they act together as a single constraint, which is violated iff both conjuncts are violated in the same domain D. The term "local" conjunction refers to the fact that there is some local domain  9 D in which we calculate violation of the two sub-constraints.  6  The idea of avoiding a double violation of two similar constraints was originally proposed in Smolensky (1993) and further developed by the above researchers. The basic idea of local conjunction is that two constraints can be conjoined in the grammar and their conjunction ranked above each of the two simplex constraints when they are measured in isolation. The conjoined constraint is violated when both of the conjuncts are independently violated in the same domain. In the present study I will use local conjunction as a means of capturing a kind of "either/or" condition that can occur in the way the grammar determines output forms. We find, for example, that in pitch accent patterns of modern Tokyo dialect, there is often a default accent pattern that the grammar resorts to when it cannot satisfy certain phonological conditions on outputs. A n example of a default output pattern in Japanese phonology is a rule for determining the accent pattern of noun-noun compounds, where both nouns have at least two moras and at least one noun has more than two moras. (Kubozono (1994, 1995), Alderete (1999).) Under these conditions, the accentuation of the compound will adhere to one of the following two patterns : 7  (a) faithfulness to the accent pattern of N : if the second noun of the compound is underlyingly accented, and if this underlying accent of the second noun can be maintained in the compound, this accent surfaces as the accent of the compound. (If this accent occurs on the final mora, it cannot be maintained, since final accent is not possible in noun-noun compounds.) 2  (b) default accent: i f the underlying accent of the second noun cannot surface, the grammar chooses a default accent pattern. This default accent occurs on the first mora ofthe second noun if the second noun exceeds two moras, and otherwise on the final mora of the first noun. Notice that this default pattern occurs in an either/or situation. EITHER faithfulness to the accent of the second noun is maintained OR default accent occurs. We shall see that a similar disjunctive situation occurs for the accent pattern of simplex nouns when the relevant pattern is analysed in terms of H and L tones. Certain complex disjunctions of this type can be best expressed through conjoined constraints. For example, if we have a faithfulness constraint F and a markedness or alignment constraint C that requires a default output, conjunction of F and C in domain D will require that either F be satisfied, or, C be satisfied.  A n alternative way of conjoining constraints, put forth by Hewitt & Crowhurst (1996), is that a conjoined constraint is violated iff either one of its two conjuncts are violated. Under these proposals, a conjoined constraint C with conjuncts C, and C will be satisfied only if both C, and C are satisfied. This type of conjunction will yield different results than the type of conjunction originally proposed by Smolensky (1995). 6  2  2  7  See Kubozono (1994, 1995) for discussion.  10 (4) F&C  F  C  *  *  *  * *  As shown in (4), the conjoined constraint is violated only if both F and C are violated. Another way of seeing this is that we must respect either F or C. Any candidate that respects either of them will satisfy the conjoined constraint. This idea of expressing a disjunctive situation through conjoined constraints will be developed further in §8, where pitch accent patterns of Tokyo nouns are analysed. 1.2.3 Constraints on the lexicon? Optimality Theory is a departure from rule-based systems in the following way. In a rule-based system, the grammar generates the correct output by applying rules to an input form. It may also be necessary in a rule-based system to have filters that prevent certain ill-formed outputs from surfacing. As expressed by Archangeli (1997), OT can be seen as one way of dealing with the dichotomy between constructing a grammar out of generative rules and constructing a grammar out of filters. A grammar would be conceptually more elegant i f it had only generative rules or only filters but not both. In OT, the generative component of the grammar G E N is allowed to freely generate any kind of possible output and it is C O N , the filter part ofthe grammar, alone, that is active in determining licit outputs. C O N only restricts outputs. G E N allows any phonologically well-formed input to be considered as a possible input. In order to deal with the problem of having a potentially infinite set of possible inputs for a given output, Prince and Smolensky (1993) proposed a principle of Lexicon Optimization, by which an optimal input is chosen for a given output. The choice of an optimal input is determined by consideration of the following factors: (a) the lexicon contains as little information as possible (b) lexical inputs should be optimized so that out of a number of choices of input forms for a given output, the optimal input is the one that is most harmonic with respect to the grammar. To express the first principle, Prince & Smolensky (1993:196) propose a constraint that requires that an input form have as few features as possible. They dub their proposed constraint "*Spec:"  11 (5) *Spec: Underlying material must be absent. This constraint seeks to make inputs as simple as possible, by banning underlying material. This constraint interacts with other constraints in the grammar in their proposed process of Lexicon Optimization, which selects an optimal input for a given output. Lexicon Optimization takes an optimal output as its input and selects an optimal input candidate. This optimal input is selected by the same constraint hierarchy that determined this optimal output. This hierarchy includes the constraint *Spec, which will have some ranking relative to other constraints. In the process of Lexicon Optimization, constraints other than *Spec will evaluate candidates in the usual way. The constraint *Spec will compare candidates with respect to the amount of material in each. It is not clear exactly how this material is to be counted: that is whether features, paths between features, root nodes, prosodic nodes, etc. each count as one unit of material. But the important thing is that this constraint does not set any limits on the amount of material in an input. It does not say that X units of material is O K and that more than X units is bad. Instead it says that X + 1 units is worse than X units of the same material when comparing two input candidates. The effect of *Spec in determining an optimal input will depend on the relative ranking of the constraint *Spec with respect to other constraints in the grammar. While there may be advantages in limiting the choice of possible input forms, the strongest possible version of Lexicon Optimization will have disadvantages for explaining what I will call "bias in the lexicon." By this term I mean that the lexicon can sometimes inexplicably behave as if the total inventory input forms has been subjected to phonological rules or constraints. A wellknown example of this phenomenon can be seen in the set of triliteral roots in Arabic, where we see a strong prohibition on co-occurrence of like place-features. (See references on page 6.) In an Arabic triliteral root with three consonants, we seldom, for example, get forms like QCjCj, where the subscript index on the C refers to its place feature. In Japanese, we also find phonological patterns in simplex lexical items that are strongly underrepresented or strongly overrepresented. Detailed examples of the frequency of occurrence of various pitch-accent patterns in monomorphemic nouns of various dialects will be given in chapter 8. If we were to adopt a strong version of Lexicon Optimization, we would only allow one input form for each output form, since Lexicon Optimization chooses an optimal input for each output. Suppose that out of 1,000 monomorphemic nouns of some prosodic length L , we had 800 nouns with surface pitch accent pattern A and 200 nouns with surface pitch accent pattern B . (We actually do find such overrepresentation and underrepresentation of accent patterns among various Japanese dialects, as shall be shown in §8.) If each noun of pattern A has one unique input form of pattern A ' and i f each noun of pattern B has one input form of pattern B', we will have 800 input forms with pattern A ' and 200 with pattern B'. We now have no way of accounting,  12 through the grammar, of why pattern A is overrepresented and why pattern B is underrepresented. But if we allow each output type to have more than one possible input type, we have a different situation. I will argue in chapter 8 that many cases of apparent bias in the lexicon can be explained i f overrepresented surface types have more possible input forms than do underrepresented types. Suppose, for example, that output type A has four different possible input types A , A", A", and A™, where the grammar is such that all of the four inputs A , A", A™, and A"" will surface as A . On the other hand, output type B has only one possible input form: B'. It is now possible to have exactly 200 of each of the input types B', A', A", A", and A™. The 200 of type B ' will surface as B, and each ofthe 200 of A , A", A™, and A"" will surface as A . This allows us to account for the distribution of accent types without any bias in the lexicon. Each of the five possible input forms occurs the same number of times. It is the grammar that accounts for the skewed distribution of output types, through the fact that more than one input type can surface as the same output. The question of whether or not to adopt the version of Lexicon Optimization proposed by Prince & Smolensky (1993:196), or any other version, involves a trade-off between two things we want our grammatical system to do. Adopting Lexicon Optimization gives us a simpler lexicon but it leaves us, I will argue, with no grammatical explanation for biases in the lexicon. This matter will be pursued in chapter 8.  1.2.4 Output-Outout Correspondence tBenua (T997V) According to Benua's principle of Output-Output Correspondence, not only does the grammar require that outputs and inputs correspond (1-0 faithfulness), it also requires that there be correspondence between the output form of a word and the output form of a morphologically related word, known as a "base" form. For calculating both 1-0 and 0 - 0 faithfulness I adopt a version of Optimality Theory in which featural identity is determined by the faithfulness constraints "Max" and "Dep." (See McCarthy & Prince (1998), Lombardi (1995, 1998), Myers (1998), Pulleyblank(1996).) (a) "Dep" constraints which require that some feature F' in the output have a corresponding feature F in the input of base form (b) "Max" constraints which require that some feature F in the input or base form have a corresponding feature F' in the output. By the term "feature" I include here not just phonological features but also nodes on a feature tree, paths between features and nodes, as defined in Archangeli and Pulleyblank (1994), and nodes 8  The definition of "path" in Archangeli and Pulleyblank (1994) is as follows. (continued...)  13 of prosodic structure. I do not posit faithfulness constraints of the "Ident" type, which compare a whole segment-sized output with a whole segment-sized input. (See Pater (1998).) A n example of 0 - 0 faithfulness is as follows. For a compound word composed of two nouns such as asa-gao (morning-face) "morning glory", the output form must correspond not only with the input form but also with an output-output base form which could be, for example, the output form of noun kao "face." The distinction between 1-0 faithfulness and 0 - 0 faithfulness is important in cases where certain features in an input form are underspecified. For example, if the initial IkJ in kao "face" is underspecified for voicing in the input, its surfacing as /g/ in the compound will not violate a "Max" type of 1-0 faithfulness constraint for the [-voice] feature, since that feature does not exist in the input. The /g/ will, however, violate the corresponding 0 - 0 faithfulness constraint with respect to base form kao, since that base form will be an output form itself, with fully specified features. Given lack of any evidence for underspecified voicing features in Japanese outputs, I will assume that for these features, outputs are fully specified. I will also assume that terminal features are binary: e.g. [±voice]. I will employ Benua's principle of Output-Output Correspondence in chapter 7 when examining the behaviour of noun-verb compounds with respect to rendaku voicing. Specifically, I will propose that a difference in behaviour between two different morpho-syntactic classes of compounds can be attributed to a difference in the base form that occurs for each type.  (...continued) "There is a path between a and P iff a. a and P belong to a linked set S of nodes or features or prosodic categories and b. In the set II there is no more than one instance of each node or feature or prosodic category."  14 2. Lexicalization of compound words and its relation to rendaku voicing  I begin an examination of irregular effects in Japanese phonology in chapter 2 with the question of how rendaku voicing is deregularized by lexicalization. In §2.11 propose that rendaku voicing, unlike some other grammatical processes, is subject to irregularity because it occurs in compound words. In §2.2 I propose that compound words have a dual nature: not only do they behave like morphologically complex words; they also behave in some ways like simplex lexical items in that they show evidence of being listed to some extent in the lexicon. In this subsection I give examples that support both sides of this characterization of compound words. In §2.3 I propose a model for how compound words are listed in the lexicon. This model explains both sides ofthe nature of compound words: the fact that they behave in some ways like simplex lexical items and the fact that in other ways they behave like morphologically complex words. In §2.3.41 discuss the question of why compound words are listed lexically but not other types of morphologically complex words.  2.1 Why the lexicon has an effect on rendaku voicing The main question about rendaku voicing that I shall investigate is why it is so de-regularized by apparent lexical exceptions while many other phonological processes are much more regular. As we saw in (1) consonant mutation in verb stems and affixes behaves according to processes that have no exceptions whatever. In my analysis of rendaku I will argue that rendaku is irregular because it allows grammatical processes to be affected by the lexicon in ways that cannot happen for other processes. Rendaku allows the interaction of the grammar with the lexicon for the following reason. Rendaku applies specifically to compound words, which, as I shall argue, differ from other complex morphological forms in that they are necessarily subject to lexicalization. I will propose that the way in which lexicalization occurs for compounds gives them a dual nature: they behave both like complex words and like single lexical items. Inasmuch as they are morphologically complex, they are subject to phonological processes that affect derived forms. But under certain conditions, their lexical nature will affect their output form, but only under conditions that are determined by the grammar. This means that most of the exceptions to rendaku are not simply lexically listed exceptions, but rather are cases in which the effects of lexicalization of compounds become emergent only when certain phonological conditions fail to be met. A n example of this fact is the following. Noun-noun compounds that are eligible for rendaku voicing and that meet a certain prosodic size requirement are predictable with respect to rendaku voicing. That is, noun-noun compounds of this size will always voice except when the second conjunct is a noun that uniformly never voices in compounds. On the other hand, noun-noun compounds that do not meet this prosodic size requirement only voice about 75% ofthe time, and most cases of non-voicing occur with second conjuncts that sometimes voice in compounds and  15 sometimes do not. More details of these facts will be examined on pages 27ff. A very similar phenomenon occurs for pitch accent patterns of compound words. The effects of prosodic size of a compound word on the predictability of its accent pattern have been observed by Haruo Kubozono (personal communication). When at least one member of a compound exceeds two moras, (the same size requirement as for regularity of rendaku voicing patterns) compound accent is completely predictable. (See, for example, Kubozono (1994),(1995) and Alderete (1999)). But when compounds do not meet this prosodic size requirement, their accent pattern is not completely predictable and must to some extent be specified lexically. For example, according to Kubozono, compound words in which both conjuncts are bimoraic will be unaccented in 70% of cases, but whether such a compound is unaccented or not must be specified lexically. Most of the other 30% have antepenultimate pitch accent. Compounds whose first member is bimoraic and whose second member is monomoraic are unaccented in 50% of cases and initially accented in the other 50% of cases, Once again, whether a given compound in this set is unaccented or initially accented must be specified lexically. In both the case of compound accent and in the case of rendaku voicing, as long as certain constraints are satisfied by the output (in this case constraints on prosodic size), the grammar will completely determine the nature of the output in a predictable fashion. But when certain constraints are not met, the lexicon takes a larger role in determining outputs and outputs are variable. We shall also see that in the case of rendaku voicing, the constraints that open the door to lexicalization when they are not met tend to act together in a group. That is, it is not always just one constraint whose violation causes the emergence of lexicalization but rather several constraints acting together. These constraints do not always have a visible effect on the output when one of them alone is violated, but sometimes it is necessary for more than one violation to occur together in order that an otherwise predictable output is blocked. When this occurs, underlying lexical features of morphemes that normally are not allowed to surface can now emerge to affect the output. That is, when a phonological process that tends to make members of a class of outputs more uniform is blocked, outputs become more idiosyncratic and underlying lexical features emerge that would not otherwise do so. In order to show how these kinds of processes occur for compound words, we first need to investigate the nature of compound words, and why they admit lexicalization in a way that other derived forms do not.  2.2 The dual nature of compound words The fact that the lexicon can have a stronger effect on outputs in a process like rendaku than it can on some other phonological processes is due, I will claim, to the nature of compound words, and crucial ways in which they differ from other kinds of derived forms. M y central claim will  16 be that compound words have a dual nature. In a number of ways, compound words behave both (a) as if they are derived from two or more lexical entries and (b) as if the whole compound is listed in the lexicon. 2.2.1 Ways in which compound words act like morphologically complex words Let us first consider ways in which compounds pattern like morphologically complex words. 2.2.1.1. Resistance to Lyman's Law The well-known phenomenon of Lyman's Law in Japanese (6), prohibits the co-occurrence of more than one voiced obstruent within a native prosodic word (Ito & Mester (1986), Mester & Ito (1989), Ito & Mester (1998)). (6)onna+kotoba~>onna-kotoba (*onna-gotoba contains two voiced obstruents: [g] and [b]) woman word "women's speech" Lyman's Law applies (with almost no exceptions) within a single morpheme. We do not allow morphemes on the surface to contain more than one voiced obstruent. Thus rendaku will be blocked in words with an initial voiceless obstruent and a voiced obstruent further in the word. But Lyman's Law does not apply within a larger morphological domain: more than one voiced obstruent is allowed in a compound word as long as the two voiced obstruents do not occur in the same morpheme . In (7) are just a few of the large number of compounds that have a voiced obstruent in the first member and rendaku voicing in the second member. 9  Haruo Kubozono (personal communication) suggests that Lyman's Law may apply across a morpheme boundary in certain situations. The cases he offers seem to be lexically restricted to certain morphemes when they occurs as the second conjunct of a compound. For example, proper names whose second conjunct is the morpheme ta "field" appear to obey Lyman's Law in the proper noun examples given by Kubozono: 9  sima-da ima-da yama-da naka-ta  "island-field" "now-field" "mountain-field" "middle-field"  naga-ta siba-ta  "flow-field" "turf-field"  On the other hand, morphemes such as sono "garden, park", do not undergo Lyman's Law across a morpheme boundary, as is the case for his own family name Kubo-zono.  17 (7) ebi "lobster"  kani"crab"  ebi-gani "crayfish" zari-gani "crayfish"  mizu "water"  kiwa "brink"  mizu-giwa "water's edge"  suge "sedge"  kasa "bamboo hat"  suge-gasa "sedge hat"  mizu "cold water"  huro "bath  mizu-buro "cold water bath"  hiza "knee"  hone"bone"  hiza-bone "knee bone"  kugi "nail"  hako "box"  kugi-bako "nail box"  The non-application of Lyman's Law in these cases can only be explained if compound words are considered to be composed of separate morphemes. 2.2.1.2. Rendaku voicing as a productive phonological process Secondly, rendaku voicing shows good evidence that it is an active phonological process that applies to morphologically derived forms. Consider, for example, the hypothesis that all compounds are listed as such in the lexicon and that the rendaku voicing that occurs is simply a reflex of a now obsolete historical process. This hypothesis cannot be supported when we consider (a) historical evidence that rendaku applied to compounds that were coined as recently as the early twentieth century and (b) experiments with native speakers pronouncing fictitious compounds. As far as the first point is concerned, Martin (1987:113ff) gives evidence that illustrates the fact that rendaku is still active in the formation of recently coined compounds. He suggests that this process is due to analogy with older compounds that had a similar pattern. For example, ama-gutu ("rain-shoe" = "galoshes") was coined after 1912, but kutu "shoe" already experienced rendaku in older compounds such as wara-gutu ("straw-shoes"). However, the following two compounds are both recently coined and have a second member that according to Martin "is not of particularly early attestation." (8) denki "electricity"  kotatu "quilt"  denki-gotatu "electric-quilt"  kabusiki "stocks"  kaisya "company"  kabusiki-gaisya "stock company"  18 The following are other recently-coined compounds given by Martin that undergo rendaku: oo "big"  kenka "quarrel"  oo-genka "big quarrel"  huuhu "couple"  kenka "quarrel"  huuhu-genka "domestic dispute"  tati "stand"  keiko "practice"  tati-geiko "rehearsal"  ko "small"  kiyoo "cleverness"  ko-giyoo "cleverness"  yuki "snow"  kesiki "scenery"  yuki-gesiki "snow scene"  soo "general"  soo-daisyoo taisyoo "commander" "commander-in-chief  mori "quantity"  takusan "much"  mori-dakusan "lots and lots"  00  "big"  taiko "drum"  oo-daiko "big drum"  yasu "cheap"  yasu-busin husin "construction" "cheap construction"  manga "comic"  hon "book"  manga-bon "comic book"  00  "big  son "loss"  oo-zon "big loss"  ate "guess"  suiryoo "guess"  ate-zuiryoo "guess"  murasaki "purple"  suisyoo "quartz"  murasaki^zuisyoo "amethyst"  'Martin also cites Martin (1952) as a source of more examples of recent rendaku.  10  19 uta "song"  saimon "minstrel"  uta-zaimon "ballad"  asa "hemp"  sie "wisdom"  asa-zie "shallow wisdom  tamari "soy"  sooyu "soy"  tamari-zyooyu "soy sauce"  ao "blue"  syasin "photo"  ao-zyasin "blueprint"  These facts are evidence that compounds in Japanese behave like morphologically complex words. Experiments that I conducted with native speakers pronouncing fictitious compounds are discussed in §5. Since fictitious compounds cannot be listed in a speaker's lexicon, the productive voicing that occurs must be due to a phonological process that acts on derived forms. This is another piece of evidence that supports the hypothesis that Japanese compound words behave as if they are morphologically derived. 2.2.2 How compounds behave like simplex words On the other hand, Japanese compounds show other properties that suggest that they are listed in the lexicon. There are three main facts that support this: (a) the fact that compounding is not completely productive, the idiosyncratic meanings of many compounds, and (c) the fact that compound words can exhibit unpredictable phonological properties. 2.2.2.1 Compounding in Japanese is not completely productive. We cannot simply take two morphemes and put them together to form a comprehensible compound word. For example, the following compound words are in common use: natu "summer" haru "spring"  musi "insect" ame "rain"  natu-musi "summer insects" haru-same "spring rain"  But the following are not considered licit compound by native speakers, even though there is nothing defective about them pragmatically, since rain occurs in summer and insects occur in spring:  20 *natu-(s)ame "summer rain" *haru-musi "spring insects"  11  That is, a speaker somehow knows what combinations of nouns can form legitimate compounds. Such knowledge cannot be part of the grammar but could only be represented by those legitimate compounds' being listed in some way in the lexicon. 12  2.2.2.2 Idiosyncratic meanings The meaning of a compound word is often idiosyncratic and cannot simply be arrived at compositionally from the independent meanings of its constituent morphemes. This fact tends to be particularly true for compound words with "short" members (lp. or 2u.) Take, for example, the compound word ana-go, literally, ana "hole" + ko "child", which means "conger eel." Clearly the meaning of this compound must be lexically listed. The following are further examples of idiosyncratic meanings of noun-noun compounds: (9) yoko "side" kuti "mouth" moto "origin" sita "under" iro "colour" kuwa "hoe"  kuruma "cart" hi "fire" te "hand" kokoro "heart" sato "hometown" kata "shape"  yoko-guruma "interference" kuti-bi "beginning of a conversation" moto-de "monetary capital" sita-gokoro "concealed plans" iroizato "prostitute district" kuwa-gata "type of insect"  2.2.2.3 Unpredictable phonological properties As mentioned above, phonological properties of compounds such as pitch accent (for compounds in which both members are less than three moras) and rendaku voicing are not completely predictable, particularly, as observed above, for compounds with short members. Many further examples ofthe unpredictability of rendaku voicing will be given in chapter 3. In summary, then, we have evidence that supports both ofthe following two alternatives: (a) that the morphemes that make up compound words are listed separately in the lexicon and (b) that  ^he possibility of allomorph same in place of ame "rain" is indicated by the Is/ in parentheses here. Allomorph same occurs in compound haru-same "spring rain" to avoid vowel hiatus. 1  In the experiment done with speakers' pronouncing fictitious compounds', to be described in chapter 5, it was made clear to speakers at the outset of the experiment that these were not real compounds but that they were to try to pronounce them as i f such a compound existed. l2  21 compound words are listed in the lexicon as one listing for the whole word.  13  Our representation of compound words must therefore account for (a) the fact that they have idiosyncratic lexical properties that requires them to be listed in the lexicon and (b) the fact that they behave like words composed of more than one morpheme, since the morpheme boundary acts as a barrier to Lyman's Law. 2.3 The structure of compound words In this subsection I propose a model for how compound words are lexically structured in a way that allows them to have this dual nature. If the output form of every compound were listed fully in the lexicon, it would make the lexicon very burdensome, since many morphemes would be listed many times. At the opposite extreme, if compound words are in no way listed in the lexicon, there is no way for the grammar to derive the accent of short compounds or the meanings of idiosyncratic compounds. Clearly, i f compounds are listed in the lexicon, it must be done in a way that minimizes the amount of material they add to it. 2.3.1 Lexical listing of compounds as references to other listings If morphemes are not to be duplicated many times for each listing of a compound word, there  It should be uncontroversial that former compound words that have subsequently gone through sound changes have become lexicalized. The following words originated as compounds but their identity as compounds is no longer transparent. Consequently, they should be represented in the lexicon as one listing. The etymologies of the following words are suggested by Martin (1987): 13  Modern form  first conjunct  aruzi owner aziro "wickerwork" higasi "east" nagori "remains" nezumi "mouse"  ari "existence' asi "foot" hi "sun" nami "wave" no "field"  second conjunct + + + + +  nusi "master" musiro "straw mat" mukasi "to face" nokori "to remain" sumi "to dwell"  It is not plausible that any of these examples are derived, in a speaker's grammar, from the original conjuncts, even though the similarity of parts of the modern word to the original conjuncts are clear when the etymology is suggested.  22 must be some way in which one entry in the lexicon can refer to another entry. That is, for a word like te-gami (hand-paper) "letter", instead of listing all the semantic, and phonological information of its constituent morphemes, there must be a listing which simply has references to the two entries for the constituent morphemes. Besides the two references to the entries for the constituent morphemes, the listing will give any unpredictable information about the meaning of the compound and the order in which the two morphemes are combined. Thus, when the lexicon is accessed for that word, it recursively refers to the entries for the constituent morphemes. In addition, there must be, for the case of "short" compounds, information about the pitch-accent pattern of the compound. M y proposal is that the lexical entry of a compound word is a kind of morphological template of the form xy, where x and y are each references to other lexical entries. When the output form is determined by the grammar, the lexical entries for x and y are substituted for the references in the template. If this is the way in which compound words are listed, then the template entry need not reduplicate any information about the constituents of the compound that is already present in their individual entries. The necessity of having a minimal lexical listing for compound words opens up the possibility that features can be added to this listing. Under our proposal that the lexical entry of a compound word contains references to the listings of the individual morphemes of the compound, those references are only pointers to other listings: the internal structure of the entries to which the reference points cannot be accessed or changed in the compound listing. But it should be possible to add further features to the references to the constituent morphemes. This is what I propose must happen in the lexical entries for short compounds, whose surface pitch accent pattern is unpredictable from the pattern of the constituent morphemes. On the other hand, if we adopt the premise that the listing of a compound word has only pointers to the location of the constituent morphemes in the lexicon, there will be restrictions on the degree to which the output form of a compound word can be pre-specified. For example, i f the initial obstruent on noun tama "ball" is underspecified for voicing in its lexical entry, then it should not be possible to have a [-voice] feature linked to that obstruent in the input form of any compound word with constituent tama, since that would involve changing the listing of the simplex entry of tama. This kind of restriction will have important implications for the kind of analysis that can be adopted for compound words that block rendaku voicing. 2.4 Why can lexical specification occur only for compounds? Our hypothesis, then, is that because compound words are lexically listed as references to their constituent morphemes, further features can be added to this lexical listing, for example in the form of floating features, but the structure of the listings of the constituents cannot be changed. But adding further lexical information to the input form of a derived form other than a compound word, such as a noun plus a case-particle clitic, is simply impossible. Consider, for example, the  23 morphologically complex input form ame + ga (rain + NOM). There is no reason for this input to be lexically listed, even as references to the two morphemes since (a) any noun can occur with the nominative case marker (b) the addition of the functional category to the noun does not change its meaning: it only affects the syntax of the whole sentence. Thus there is no way that further features could be added to this input. 14  The model of a lexical entry for a compound word that we have developed here allows the possibility that in a minimal way, individual phonological features can be included with these listings. In the next chapter I will show how this possibility can lead to an explanation of how irregular cases of rendaku blocking in noun-noun compounds can occur.  While the addition of an inflectional affix or clitic to a Yamato morpheme does not change its meaning in an unpredictable way, it is not clear whether the addition of a derivational affix always results in a predictable change in meaning. Japanese has fewer derivational affixes than a language like English. But one type of derivational morphology: the formation of deverbal nouns from Yamato verbs by zero affixation, is quite idiosyncratic: not all verbs have corresponding deverbal nouns. (See, for example, Ohta (1994), Kageyama (1982), McCawley (1968), Nishiyama (1998), Shibatani & Kageyama (1986). Sugioka (1984),) For example, the verb tabe-ru "eat" has no deverbal noun *tabe. Further, the meaning of a deverbal noun is often unpredictable. For example the deverbal noun tukai derived from tuka-u "use" can mean "messenger." It is therefore possible that some complex words formed by derivational morphology are also listed in some way in the lexicon in the same way as I have proposed for compound words. 14  24 3. Rendaku Voicing and blocking in noun-noun compounds: blocking by lexical prespecification In this chapter I will take the hypothesis developed in chapter 2 about the lexicalization of compound words, and apply it specifically to the problem of rendaku voicing in noun-noun compounds. I will show that this hypothesis can account for not only the irregular nature of rendaku voicing, but also the fact that exceptions to rendaku voicing are clustered in interesting ways around certain morphemes. In §3.1 I introduce the phenomenon of rendaku voicing and provide some background on the phonemic inventory of Japanese and relevant allophonic alternations among consonants. I also cite some accounts in the recent phonological literature of the cause of rendaku voicing. I discuss these analyses further in §3.4. In §3.2 I examine a large amount of data on rendaku voicing in Yamato noun-noun compounds and show how blocking of rendaku is not completely random but is influenced by two factors; (a) the prosodic size of the compound and (b) the influence of lexical factors relating to specific morphemes that either "like rendaku" or "hate rendaku". In §3.3 I review some previous analyses of rendaku voicing and propose a new account that can explain how blocking can occur in compounds that are lexicalized with added [-voice] features. In §3.41 discuss the issue of lexicon optimization (Prince & Smolensky (1993:196)) with respect to our hypothesis that the initial obstruent on nouns that undergo rendaku is underspecified for voicing. 3.1 Introduction to rendaku voicing Let us now look more closely at how the grammar and the lexicon will interact for rendaku voicing. We begin with a discussion of the nature of rendaku voicing itself. The well-studied phenomenon of rendaku voicing (Ito & Mester (1986), Mester & Ito (1989), Ito & Mester (1998)) which occurs in Japanese compounds consisting of native words, has received a great deal of attention in the phonological literature. When rendaku occurs, an initial voiceless obstruent on the second member of the compound becomes voiced. Some typical examples of rendaku voicing are shown in (10). In each compound word below, the initial obstruent of the second member becomes voiced. (10)(examples from Higurashi (1983:65), Sugioka (1984:108,107)) asa + huro --> asa-buro [h] --> [b] morning bath morning bath  25 huyu + sora —> huyu-zora winter sky winter sky  [s] - > [z]  hito + korosi --> hito-gorosi person kill murder  M  mizu + taki water cook  [t] - > [d]  --> mizu-daki casserole  --> [g]  Rendaku occurs frequently enough that it is usually considered to be a bona-fide phonological process, yet it fails to occur for a sizeable minority of compound words that nonetheless appear to have all the right conditions for rendaku voicing. In order to see exactly how rendaku voicing affects the obstruents of Japanese, let us examine the phonemic inventory of the language. Japanese has the voiceless obstruents Ik/, I si, Itl, and Ihl. Because Ihl patterns with obstruents in its phonological behaviour in Japanese, I will treat it as an obstruent. The phoneme /p/ only occurs in the Yamato vocabulary in geminates, usually at a morpheme boundary. (See McCawley (1968), Martin (1975), Ito & Mester (1995).) When rendaku voicing occurs, Ikl, Isl, and Itl change to their voiced counterparts /g/ , Izl, and Id/. In 15  16  /h/ shows alternations with obstruent Ihl, for example, in rendaku voicing, Ihl voices to Ihl. Ihl also has allophones that are obstruents: a voiceless bilabial fricative that occurs before /u/ arid often before lol. Ihl is also pronounced by some speakers like a voiceless velar fricative before /a/. These allophonic effects are evident in the way foreign borrowings from English and German are pronounced in Japanese. For example, words with an English /fo/ sequence like "fork" are pronounced with an /ho/ sequence as in hooku "fork", and the name ofthe composer "Bach" is pronounced baha, where the German velar fricative is interpreted as the allophone of Ihl that occurs before /a/. 15  In Japanese, the phoneme /g/ has a nasal velar allophone which surfaces in predictable environments. (See Nishikawa (1987) p. lOlff for detailed discussion.) According to Nishikawa, the nasal allophone does not occur word-initially, and occurs non-word-initially in the following environments: 16  (a) on the initial Igl of the second member of a bimorphemic Sino-Japanese "stratum 1" compound (b) on the initial Igl of the second member of a "subcompound" (c) after a Native prefix such as oo "big" (d) on a clitic such as nominative marker ga (e) morpheme-internally (continued...)  26 the phonology of Japanese, the voiced counterpart of Ihl is Ihl, which replaces Ihl when Ihl undergoes rendaku voicing. Ipl is restricted to geminates in the Yamato vocabulary. The sound Is/ is palatalized to Isl before HI. The voiced counterpart of Isl, Izl always occurs as an affricate, which, like Isl, is also palatalized before HI, where it becomes an alveolo-palatal affricate. It/ is affricated before HI to an alveolo-palatal affricate and before lul to an alveolar affricate. When Itl undergoes rendaku voicing, it becomes a voiced alveolo-palatal affricate before / i / and a voiced alveolar affricate before lul. Thus, Itl and Isl converge to the same sound when they undergo voicing and occur before HI or lul. They are both a voiced alveolo-palatal affricate before HI and a voiced alveolar affricate before In/. In a phonemic transcription, we write Izl as the voiced counterpart of Itl when it occurs before HI or lul. These facts are summarized in (11)  (11) Allophones of Japanese obstruents voiceless  voiced  h  b  s (alveolo-palatal before HI  z (always affricated) (alveolo-palatal affricate before HI) (dental affricate before lul)  t (alveolo-palatal affricate before HI) (dental affricate before lul)  d (alveolo-palatal affricate before HI) (dental affricate before lul)  k  As for what kinds of rules or constraints actually cause rendaku voicing to occur, there is no clearcut consensus in the literature. Much research on rendaku simply takes it for granted that it is a phonological process and focuses on questions of blocking of rendaku by Lyman's Law rather than on why rendaku actually occurs in the first place.  (...continued) G-nasalization is blocked in dvandva compounds, which he refers to as "cocompounds" and in what Shibatani & Kageyama (1986) refer to as "postsyntactic compounds." There is no evidence that G-nasalization directly interacts with rendaku. In other words, whether or not G-nasalization occurs does not appear to be affected by whether the Igl that nasalizes is underlyingly voiced or derived from Ikl by rendaku voicing.  27 Recently a number of proposals for the cause of rendaku voicing have been offered in the literature. Ito and Mester (1986), (1998) suggest that rendaku is caused by a morpheme represented by a floating [+voice] feature that has no segmental material associated with it. They call this a morpheme of "juncture". The problem with this kind of analysis is that from a morphosyntactic point of view, it is difficult to justify the existence of a morpheme in all cases — especially, for example in the kind of argument-head compounds that we will consider here in the forthcoming discussion of argument-head noun-verb compounds. In syntax, the canonical headargument configuration is that the argument is a sister to the head: head / \ argument head In this kind of structure there is nowhere for an independent "rendaku" morpheme to intervene. Ito & Mester (1998) in an appendix propose an alternative analysis of rendaku in which rendaku repairs a marked sequence of [+voice], [-voice] features. That is, the sequence of a voiceless obstruent followed by a (voiced) vowel at the beginning of a morpheme of the form [CV...] is considered marked. Rendaku repairs the markedness of the first two segments of this sequence by making it into a sequence consisting of a voiced obstruent followed by a voiced vowel. (See Ito & Mester (1998:54ft) for discussion.) I will discuss these proposals further on page 42 and will propose an analysis of rendaku voicing that is intended to capture the fact that, under certain conditions, it can be blocked by lexical prespecification. Before developing that proposal, let us first examine more closely the patterns of rendaku voicing that occur in noun-noun compounds. 3.2 Patterns of rendaku in noun-noun compounds As discussed briefly on page 15, when we examine noun-noun compounds where both nouns are of Yamato (native) origin, there are clear differences between two different classes of compounds. Those compounds that meet a prosodic size requirement are very regular with respect to rendaku voicing. Those that do not meet this size requirement block voicing in about 25% of cases, often in unpredictable ways. The prosodic size requirement is as follows. To meet the requirement, a compound must have both members of at least two moras and at least one member exceeding two moras. Let us first examine noun-noun compounds that meet this requirement. 3.2.1 Rendaku voicing in compounds that meet a prosodic size criterion In Appendix A is a sample of noun-noun compounds that meet the conditions listed below. This and further samples of Yamato noun-noun compounds were gathered as follows. A database of about 1200 Yamato nouns was gathered from Martin's (1987) database of Yamato nouns, excluding all nouns that are transparently polymorphemic. Using this database, a list of noun-  28 noun compounds that meet conditions (a) and (b) below was gathered from several sources, referencing the compound by the first conjunct, and also, where possible, by the second conjunct. The sources used were (i) the 1999 edition of the N H K Accent and Pronunciation Dictionary (NHK Nihongo Hatsuon-Akusento Jisho. N H K Hobsoo-bunka-kenkyuu-j o), which contains over 60,000 entries including a large number of compound words in current use; (ii) the Kodansha Japanese-English Dictionary, and (iii) Version 1.31 of Stephen Chung's Freeware "JWP" Japanese Word Processor computer software (whose electronic dictionary is able to reference compound words by their second member).  (a) The compound is eligible for rendaku voicing: i.e. the second member begins with a voiceless obstruent when it occurs as a word by itself and it does not contain any voiced obstruents, whose presence would block rendaku voicing through Lyman's Law. (b) Both members of the compound are of Yamato origin.  «  (c) Both members of the compound exceed one mora and at least one of the members of the compound exceeds two moras. There are 219 compound words in this sample meeting these requirements. O f these, only the following 14 compounds or 6.3% (given in (12)) resist voicing. Three of them: tiri-hokori (dustdust), kami-hotoke (gods and Buddhas) and hituzi-saru (sheep-monkey) (intersection of two neighbouring Chinese zodiac signs in terms of direction or hour of the clock) are clearly dvandva or headless compounds. Dvandva compounds differ morphologically from other compounds in the following way. Non-dvandva compounds involve some kind of morpho-syntactic relation between a head and a dependent morpheme. For example in a compound like yuki-dama "snowball", noun yuki "snow" modifies noun tama "ball". But in a dvandva compound, neither morpheme acts like the morpho-syntactic head: rather the compound has the semantic form " X and Y " where the relationship between X and Y is symmetric. For example, the dvandva compound kami-hotoke "gods and Buddhas" does not mean "Buddha who is a god" or "a god who is Buddha" but rather the conjunction of things that are gods with things that are Buddhas. The dvandva compound nabe-kama means "pots and pans" rather than "pans that are pots" or "pots that are pans." Compounds of this type uniformly fail to voice. : 17  (12)  It is beyond the scope of this thesis to develop a principled explanation of why dvandva compounds consistently fail to voice. Dvandva compounds are discussed by Otsu (1980), who shows that dvandva compounds also have other clear phonological differences from headargument or head-adjunct compounds: for example they have a completely different accent pattern from non-dvandva compounds. For the present examination of rendaku in compounds I shall abstract away from the question of rendaku blocking in dvandva compounds and shall examine only patterns of rendaku in compounds that have a morphosyntactic head. 17  29 Dvandva compounds in Appendix A : tiri-hokori dust-dust "dust" kami-hotoke god-Buddha "gods and buddhas" hituzi-saru sheep-monkey "a zodiacal sign"  As for the second group of exceptions, which are given in (13) below, all these compounds have second members that never voice in compounds. The morpheme kasu "dregs", never experiences rendaku, as discussed in Martin (1987). To my knowledge, the same is true for nouns kemuri "smoke", kamati "tree", kanmuri "crown", hatimaki "bandana", and katati "shape."  (13) Compounds with second conjuncts that uniformly fail to voice. mado-kamati window-framework "window frame(?)" yama-kanmuri "mountain-crown" abura-kasu "oil-dregs" kao-katati "face-shape" kami-katati "hair-shape" yuki-kemuri "snow-smoke" yuu-kemuri "hot.water-smoke " sio-kemuri "salt-smoke" suna-kemuri "sand-smoke" mizu-kemuri "water-smoke"(?) siro-hatimaki "white-bandana" In summary, all cases of blocking for this group of "long" compounds fall into one of the following two categories: (a) they are dvandva compounds, which block rendaku without exception or (b) the second member of the compound is a morpheme that never voices in any compound it occurs in. In my analysis of noun-noun compounds, I will abstract away from dvandva compounds and treat only headed compounds. This leaves us with the compounds in (13) as the only exceptions to voicing in long compounds that we need to deal with. 18  Why certain types of compounds engage in rendaku voicing and others do not is an important morphological question that I leave for further research. Not only do certain types of compounds such as dvandva compounds fail to voice, but as shown below, most inflectible verb-verb compounds also resist voicing. (See Rosen (1999b) and references cited there.) 18  hiki-sagar-u (pull-leave) "to retreat" (continued...)  30  (14)Let us examine how prespecification of a voicing feature can account for the fact that certain morphemes such as kasu "dregs" uniformly resist rendaku voicing. M y key hypothesis for explaining rendaku voicing and blocking in this account will be that obstruents have binary rather than privative voicing features and that voiceless obstruents in Japanese can be underspecified for the feature [-voice] in the input. If they have this input form, the surface [-voice] feature can be derived by a grounded constraint of the type proposed by Archangeli & Pulleyblank (1994): "If [-son] then [-voice]." This constraint reflects the fact that cross-linguistically, [-voice] is the default value of voicing for obstruents. (15) Obs-Voi: "Every [-son] specification in the output must be on a path to a [-voice] feature." The model of underspecification that I adopt is similar to the "combinatorial" underspecification proposed by Archangeli & Pulleyblank (1994). Radical Underspecification (Kiparsky (1982), Archangeli (1984),(1988), Pulleyblank (1988)) would require that since obstruents are voiceless by default, they can have only two possible underlying values: [+voice] and unspecified for voicing. Contrastive Underspecification, on the other hand, requires that since obstruents can contrast in voicing in Japanese, the opposition between [+voice] and [-voice] on the surface must be represented in the lexicon for obstruents in Japanese. Sonorants, on the other hand, would be unspecified for voicing in the lexicon since they do not contrast for voicing. Neither of these two  (...continued) hiki-ka-e-ru (pull-change) "to exchange" sasi-hik-u (fill-pull) "to deduct" tati-sar-u (stand-leave) "to leave" sori-suteru (shave-throw.away) "to shave away" huri-suteru (wave-throw, away) daki-tuk-u (hold-stick) "to cling to" kaziri-tuk-u (chew-stick) "to fasten one's teeth on" (takes obi.) kami-tuk-u (bite-stick) "to bite into" tori-kakaru (take-hang ) "to set about, begin" Verb inflectional suffixes such as imperfective -ta only show voicing alternations that are conditioned by the presence of a stem-final voiced obstruent, as was shown in (1). Nouns do not occur with suffixes, only clitic case-particles; the enclitic to never voices, as discussed on page 52.  31 models will work for the kind of analysis I am proposing here, where there are three possible input values for voicing of obstruents: [-voice], [+voice] and unspecified for voicing. Archangeli & Pulleyblank make similar proposals: for example, in Ainu, even though there is no contrast for backness among low vowels, theypropose two possible inputs for low vowels: [+low, -back] and [+low, +back]. Even though these two combinations will surface as the same vowel, they can have different effects on other vowels that surface, thus explaining cases of vowel dissimilation, where /a/ can trigger the appearance of a high front vowel in some cases, and a high back vowel in others. By allowing more than one possible input form to occur for one particular combination of features in the output — in our case a voiceless obstruent — we are addressing the issue of "Richness of the Base" proposed by Prince & Smolensky (1993), where inputs are not restricted in any way. The adoption of this principle of "Richness of the Base" will be particularly important in the examination of frequencies of pitch accent patterns that is undertaken in the final chapter. Returning to our analysis of how the voicing feature of obstruents will surface in Japanese, i f the constraint Obs-Voi is dominated by the following faithfulness constraint for linking of a feature to a root node, then voiced obstruents can still surface. (16) Max-Path-Feature: "For every path of association L between a feature F and a root node R in the input there is a corresponding path of association L' between feature F' and root node R' in the output, where L corresponds to L', F corresponds to F' and R corresponds to R'. If Max-Path-Feature dominates Obs-Voi then when a [+voice] feature has a path of association to a root node in the input, it must also have a path of association in the output: The tableau below shows how this constraint hierarchy will allow a voiced obstruent to surface. input: /mado/ "window" m a T o [+voi] (17) (A capital T represents a coronal stop that is unspecified for voicing.) /mado/  Max-PathFeature  a. mato  *! *  b. mado c. maTo  Obs-Voi  *!  *  Voiceless obstruents, on the other hand can be underspecified for voicing and will surface as voiceless because of the grounded constraint.  32 input: /mate-/ "target" m a T o  (18) Max-PathFeature  /maTo/  Obs-Voi  a. mato b. mado  *!  c. maTo  *!  The possibility of underspecifying the voicing feature allows us to characterize morphemes that undergo rendaku as being underspecified for voicing on the initial obstruent while those that block rendaku have a pre-linked [-voice] feature. Lexical prespecification is the approach advocated by Inkelas et al to deal with lexical exceptions. If, for example, nouns tuti "earth", or kasu "dregs", which never undergo rendaku, have an underlying prelinked [-voice] feature, they cannot voice. The following tableau shows how lexical prespecification of the feature [-voice] of the noun kasu "dregs", will prevent it from undergoing rendaku voicing in compounds. 19) ("K" represents a velar stop uns pecified for voicing.) [-voi] /abura/ + /kasu/ "oil" "dregs"  MaxPathFeature  Rendaku  Obs-Voi  *  abura-kasu abura-gasu  *!  abura-Kasu  *!  #  *  If a morpheme is not-prelinked, it is free to undergo rendaku (barring other constraints, to be discussed below.)  For the time being, I will simply assume that some constraint exists that derives rendaku voicing. Further below I will develop a more detailed account of rendaku. 19  33 (20) /kuti/ + /Kane/ "mouth" "metal" kuti-kane  MaxPathFeature  Rendaku  *! *  kuti-gane kuti-Kane  Obs-Voi  *!  *  In summary, then, all cases of blocking of rendaku in headed "long" noun-noun compounds can be accounted for straightforwardly by the hypothesis that the initial obstruent of the second noun in a compound that undergoes rendaku is underspecified for voicing. Those nouns that block rendaku in compounds have a specified [-voice] feature on the initial obstruent in the input.  3.2.2 Rendaku voicing in compounds that are "too short" Let us now examine noun-noun compounds in which neither member exceeds two moras. Consider first a sample of Yamato noun-noun compounds, chosen in the same manner as the sample in Appendix A (as described on page 28), in which the first member is bimoraic and the second member is monomoraic. This sample is given in Appendix B . Of the 147 words in this sample, 76, or 51.7% fail to voice. Moreover, there are many second conjuncts that sometimes voice and sometimes do not. We see a great difference not only in the frequency but also in the predictability of rendaku voicing when we compare sets of compounds that are 3p-2p/2u-3p with those that are 2u-lp. When compounds meet a certain prosodic size requirement, rendaku is well-behaved and predictable. When compounds are "too short", rendaku becomes unpredictable, even i f we lexically mark certain nouns with a specified [-voice] feature.  Let us look next at noun-noun compounds in which both conjuncts are bimoraic. A sample of 481 such compounds is listed in Appendix C. These compounds also do not meet the minimum length requirement because neither member exceeds two moras. We find that compounds with this "short" prosodic length also have unpredictable accent patterns (Haruo Kubozono, personal communication): for example, yuki "snow" is unaccented, tama "ball", has final accent, and the compoundyuki-dama "snow-ball" is unaccented. By comparison, the compoundyuki-dosi "snowyear" has constituents with exactly the same accent patterns: tosi "year" has final accent, but the whole compound yuki-dosi is accented on its second mora. In this database of 2u-2u Yamato noun-noun compounds the words are listed in order of the  34 English translation ofthe second conjunct, so that compounds with the same underlying second conjunct all occur together. This way, we can observe, for each second conjunct how it behaves with respect to rendaku voicing. Excluded from this main list are the following compounds, all of which have good reason to be considered "dvandva" compounds, which, as discussed above, never undergo voicing. 2u-2u dvandva compounds: mino kasa straw.raincoat bamboo.hat "an old fashioned bamboo hat" ura hara rear belly "opposite" oku soko interior bottom "deep bottom" asi kosi foot buttocks "foot and buttocks" kata kosi shoulder loins "shoulders and loins" tosi tuki year moon "a long time" nabe kama pan pot "pots and pans" turu kame crane turtle "cranes and turtles" Also excluded from the list are compounds that have kuro "black" or siro/a "white" as one of their members. These two morphemes can behave both as adjective inflected or as nouns. When they occur as the first member of a compound, they often block voicing. Because this sample deals only with noun-noun compounds, they have been excluded. Some of the compounds in this database of 2u-2p compounds have second conjuncts that participate in very few compounds, so the evidence on their status with respect to blocking rendaku is not clear. Others, such as tuti "earth" occur in enough compounds to give us better evidence that they always block rendaku. The following Yamato nouns are listed by Martin (1987) as never undergoing rendaku voicing: (21) kita kasu kase tuti hama hasi hima hime tuya sita  "north" "dregs" "shackles" "earth" "beach" "edge" "leisure" "princess" "gloss" "below"  35 In making lists of noun-noun compounds according to the blocking status of the second conjunct, we must make a few qualifications. First, some of these nouns are found in so few compounds that it is difficult to make any kind of reliable generalization about their voicing patterns. There are also other nouns that regularly voice in noun-noun compounds, but are blocked for voicing in verb-noun or adjective noun compounds. This seems to be true, for example of sato "hometown", which is voiceless in huru-sato "old village" but voices in all known noun-noun compounds. 20  Of the 481 compounds in the sample of 2p-2p noun-noun headed compounds, 92 fail to undergo rendaku. This makes a voicing rate of 80.8%. On closer inspection, we find that (a) some second conjuncts never voice, (b) others often block voicing, but not in 100% of cases, and (c) others frequently voice or always voice. The compounds in this list whose second members never voice are listed again in Appendix D . Let us for the moment call this group "Immune to Rendaku". This group accounts for 47 of the 92 compounds in the database that did not voice. Those whose second conjuncts usually or always voice are listed in Appendix E. Adopting terms similar to those used by Haruo Kubozono (personal communication), who has made similar observations, let us call this group of nouns "Rendaku Lovers." These second conjuncts voice in at least 75 % of noun-noun compounds they occur in, and for many of these nouns, rendaku occurs in all Yamato noun-noun compounds of which they are the second conjunct. Those whose second conjuncts often block voicing (i.e more than 50% of the time) are listed in Appendix F. Let us call this group "Rendaku Haters." Looking at this sample as a whole, we can see that these 2u-2p compounds behave differently as a group with respect to rendaku blocking than longer compounds. We saw that for compound  Another factor that introduces unpredictability into patterns of rendaku voicing is the existence of compounds that are placenames and/or proper names. There are some nouns that fail to voice in placenames and proper names but otherwise voice regularly. This is especially true for very old names. For example, hata "flag" voices regularly in compounds that are not proper nouns but does not voice in the name of the small village Ko-hata (also spelled Ko-wata) (lit. "treeflag") near Kyoto. And sawa "swamp", which voices regularly in noun-noun compounds does not voice in the name of the famous Japanese film director Kuro-sawa "black swamp". (This may also be due to the fact that kuro "black" is an adjective.) 20  Given these facts about proper nouns, it may be necessary to specify compound proper nouns lexically to a greater degree than is the case for regular compounds. That is, a placename like Kohata may have to be listed lexically not just as references to the two constituent morphemes but with further phonological details in the listing.  36 words that met the prosodic length requirement, rendaku is completely predictable from the lexical representations of the component morphemes ofthe compound. For these compounds the only exceptions to rendaku for headed noun-noun compounds occurred when the second conjunct was a morpheme that always resists rendaku. But when compound words do not meet the prosodic length requirement, we find that rendaku can be blocked in cases other than where the second conjunct is "immune to rendaku" because of lexical prespecification of [-voice]. For example, compoundyuki-dama "snow-ball" voices but compound mizu-tama (water-ball) "water droplet", with the same second conjunct, tama "ball", does not. To account for the lack of voicing in mizu-tama, we cannot appeal to the idea that tama has a prelinked [-voice] feature that blocks voicing. If it did, then the voicing in yuki-dama would be a mystery. Yet for long compounds, we never find pairs like yuki-dama and mizu-tama. We only get blocking of rendaku in cases where the second conjunct is one that always resists voicing. The problem then is to account for why blocking cases like mizu-tama only occur in short compounds. Let us call this phenomenon the "Prosodic Size Factor": compounds that meet a prosodic size requirement are completely predictable with respect to rendaku voicing, if we factor out cases of nouns such as tuti "earth" that always resist rendaku voicing. Compounds that do not meet this prosodic size requirement are subject to unpredictability with respect to rendaku voicing. I will return to a discussion of this prosodic size factor on page 70. Even though bimoraic nouns that occur as the second member of a "short" 2p+2p compound show properties of being "rendaku lovers" or "rendaku haters", these properties are only visible when these nouns occur in "short" compounds. If, for example, a noun that is a "rendaku hater" occurs with a first conjunct that exceeds two moras, the compound will voice. For example, the following is a list of noun-noun compounds whose second member is kusa "grass". (22) ao kusa "green grass" buta kusa "pig grass" haru kusa "spring grass" huyu kusa "winter grass" ira kusa thorn-grass "nettle" kara-kusa Chinese grass "arabesque" miti kusa "path grass" mizu kusa "water grass" nana kusa seven grass "grasses" natu kusa "summer grass" siba kusa brushwood grass "lawn" sira kusa "white grass" sita kusa under grass "undergrowth"  37 tami kusa "people grass" turn kusa "vine grass" ume kusa "plum grass" no-gusa "wild grasses" mo-gusa duckweed-grass "water plants" hituzi gusa "sheep grass" hotaru gusa "firefly grass" The noun kusa "grass" acts like a "rendaku hater" in 2u+2u compounds, where it never voices; however, in the last two compounds: "firefly grass" and "sheep grass", where N l is trimoraic, the compound voices. As we would predict, the last two compounds, in which the initial member exceeds two moras, experience voicing. But among the other 18 compounds, voicing only occurs in two of them. The noun kusa "grass" is evidently a morpheme that "hates rendaku." The noun kuse "habit" participates in fewer compounds, but it also shows a strong tendency to resist rendaku, with voicing occurring in only 1 out of 5 compounds. (23) kuse "habit" kami kuse "hair habit" sake kuse "sake habit" asi kuse foot habit "way of walking" nana kuse seven habit kuti guse mouth habit "way of speaking"  At the opposite pole to kusa and kuse are nouns like kumo "cloud" kata "shape", kami "paper", kane "metal", hue "flute" and hako. "box" that usually or always voice in short compounds. These nouns are evidently "rendaku lovers." (2A)kumo "cloud" (voices in 8 out of 10 short compounds) mura kumo clump cloud "cloud masses" yami kumo "dark cloud" natu gumo "summer cloud" kaza gumo "wind cloud" hata gumo "flag cloud" asa gumo "morning cloud"  38 yoko gumo "side cloud" yuki gumo "snow cloud" wata gumo "cotton cloud" ama gumo "rain clouds" kata (shortened form of katati "shape") (voices in 12 out of 12 short compounds) yama gata "mountain shape" kuwa gata "hoe shape" kasa gata "umbrella shape" nami gata "wave shape" hana gata flower shape "floral pattern" masu gata measure shape kusi gata skewer shape asi gata foot shape maru gata circle shape yumi gata bow shape kumo gata bear shape hizi gata elbow shape kami "paper" (voices in 16 out of 16 short compounds) kasa garni umbrella paper iro garni colour paper hari garni needle paper tane garni seed paper tin garni dust paper yoko garni side paper wara garni straw paper ita garni board paper ao garni green paper uwa garni upper paper kata garni type paper hasi garni chopsticks paper nosi garni dried.sea.ear kabe garni wall paper obi garni belt paper hana garni tissue paper kane "metal" (voices in 9 out of 9 short compounds) muda gane useless metal obi gane belt metal  39 ura gane rear metal ara gane chaff metal sita gane white metal hari gane needle metal kuti gane mouth metal tubo gane pot metal hizi gane elbow metal hue "flute" (voices in 12 out of 12 short compounds) mugi bue barley flute musi bue insect flute yoko bue side flute kusa bue grass flute "reed pipe" kizi bue pheasant flute siba bue brushwood flute asi bue foot flute nodo bue throat flute sino bue bambo flute tuno bue horn flute kuti bue mouth flute yubi bue finger flute hako "box" (voices in 16 out of 16 short compounds) buta bako pig box "police cell" esa bako bait box geta bako geta box "shoe rack" gomi-bako garbage box hari bako needle box "sewing box" hude-bako pencil box kami bako paper box kane bako metal box kara bako empty box kugi bako nail box kusi bako skewer box kuzu bako trash box su-bako nest box, hive te-bako hand box "small box for valuables" uwa-bako outer box ki bako wooden box  40 When we look at the whole database of 2u-2u compounds that were collected, we can make the following observations. There are far more nouns that "like rendaku" (i.e. voice in 66% or more of compounds they occur in) than nouns that resist rendaku (i.e. voice in fewer than 33% of compounds they occur in.) It is also noteworthy that only one noun was found that voiced in between 33% and 66% of the noun-noun compounds it occurs in: kawa "skin" which voices in 2 out of 6 compounds that were found. In other words, there is an apparent tendency for a noun to either strongly prefer to voice, or else to resist voicing, with almost no nouns occupying a middle ground between the two tendencies. The following is a list of "rendaku haters" that productively occur in noun-noun compounds:  (25) "Rendaku haters" (usually do not voice in compounds but do voice in more than 0% of compounds) hara "field" voiced in 4/9 N - N compounds kusa "grass" voiced in 0/16 N - N 2u-2p compounds but does voice in both longer and shorter compounds. kuse "habit" voiced in 1/5 N - N compounds kawa "skin" voiced in 2/6 2u-2p N - N compounds By contrast, there were 120 nouns in the sample that voiced more than 75% of the time, and the vast majority of these voiced in virtually all noun-noun compounds in which they occurred. Of this group, the following 24 nouns were productive in forming compounds (i.e. occurred in at least 7 noun-noun compounds.) Beside each noun is listed the number of compounds in which it voiced in relation to the number of compounds in which it occurred. (26) (a) always voice hurobath 10/10 hara belly 9/9 hune boat 11/11 hone bone 9/9 kutu boot 8/8 soko bottom 10/10 hako box 13/13 kaki fence 8/8  41 hana flower 9/9 hue flute 12/12 kane metal 9/9 kuti mouth 15/15 (but blocks in as many non-noun-noun compounds) kami paper 17/17 hito person 9/9 kata shape 12/12 sora sky 7/7 kawa side 7/7 tana shelf 7/7 kiwa brink 7/7 (b) usually voice kura storehouse 6/7 tama ball 6/8 kumo cloud 8/10 tori bird 5/7 siru broth 6/7 When we factor out this group of "rendaku lovers" and look at the remaining 45 cases of rendaku blocking in the sample, we find that they are not at all randomly distributed but rather occur mainly when certain nouns form the second conjunct of the compound. For example, 29 of these 45 cases of blocking occurred in compounds whose second member was one of the four nouns in (25) (rendaku haters.) Ofthe remaining 16 cases, 8 occurred in the five nouns in (26) (b) (rendaku lovers that block in a few cases.) These results suggest that rendaku blocking, rather than occurring randomly, is strongly influenced by lexical factors. There are thus two things that an account of rendaku voicing needs to take account of: (a) the fact that blocking only occurs in "short" compounds (with the exception of blocking by nouns such as tuti that resist rendaku without exception) (b) the fact that blocking is subject to lexical patterns: some morphemes regularly participate in resisting rendaku; others show a strong preference for voicing To recap: Yamato nouns appear to be of three main types with respect to rendaku voicing: (a) nouns immune to rendaku: We have posted that these nouns are prespecified with a [-voice] feature linked to their initial obstruent (b) nouns that "hate rendaku": all but one of these nouns voice in fewer than 33% of the  42 compounds they occur in (c) "rendaku lovers": nouns that voice in more than 66% of compounds they occur in We find only one nouns that occupies a middle ground with respect to its inclination to voice: all the other nouns surveyed either voiced in fewer than 33% of compounds in which they occurred, or else they voiced in more than 66% of compounds. Before developing an account of why rendaku blocking in noun-noun compounds patterns in this way, let us first investigate more closely the nature of rendaku itself as a phonological process. In §3.4 I shall propose an account of rendaku blocking in noun-noun compounds. 3.3 A n account of rendaku voicing 3.3.1 Previous analyses of rendaku Let us first review two proposals for explaining rendaku voicing in Ito and Mester (1998). Their first proposal is that rendaku is caused by a "junctural morpheme" with a [+voice] feature (I&M p. 27). But one might ask: why should the difference in voicing between asi-kuse (foothabit) "way of walking" and kuti-guse (mouth-habit) "way of speaking" (discussed in (2)) or the difference in voicing between buta-kusa "pig-grass" and hituzi-gusa "sheep-grass", be determined by the lack of a juncture morpheme in the former compound of each pair and the presence of such a morpheme in the latter - in spite of the great morphological similarity between each member of each pair? There seems to be nothing that could distinguish one member of pairs like these from the other in terms of the presence or absence of a morpheme. Although there is historical evidence that rendaku voicing originally developed as a reflex of nC sequences created by shortening of enclitic no to a mora nasal, it is difficult to derive all modern instances of rendaku from the presence of a junctural morpheme." For example, as documented in Martin (1987), reduplicated nouns often exhibit rendaku voicing, but there is no historical evidence that these reduplicated forms ever had a morpheme like particle no occurring between the base and the reduplicant that could ave evolved into a [+voice] feature. In (27) are examples of reduplicated forms that show rendaku voicing. 21  (27) sima "island" sima-zima "islands" hito "person" hito-bito "people"  In contrast to reduplicated nouns, mimetic words, (See Mester & Ito (1988), Ito & Mester (1995)) showing full reduplication, do not ever appear to undergo rendaku voicing, (e.g. kata-kata "hitting sound" (*kata-gata)) 21  43 hi "fire" hana "flower" ki "tree"  hi-bi "fires" hana-bana- "brilliant" ki-gi "trees"  A second reason why I will not pursue a morphological account of rendaku voicing is that it cannot account for the kind of systematicity that we observe in exceptions to voicing, where nouns tend to be either "rendaku haters" that robustly block voicing or "rendaku lovers" that rarely or never block voicing. Suppose, for example, that rendaku were derived from a junctural morpheme that consisted of a floating [+voice] feature. Cases of blocking could not be accounted for by prespecification of some other feature such as a floating [-voice] feature, since in nouns like kusa, "grass", with two voiceless obstruents, nothing in the grammar would force this [-voice] feature to link in the output to the Ikl as opposed to the Isl. High ranking of an alignment constraint for the [-voice] feature such as the following would not work, since such a constraint would block voicing in all cases for nouns like kusa. no "field"  kusa "grass"  no-gusa (voicing occurs)  (28) /no/ + [+voice] + /kusa  Align[-voice] -LeftMorpheme-Left (ranked too high)  *  no-kusa (wrong result) no-gusa  Max-[+voice]  *!  This would mean that even with high ranking of a Max constraint for the [-voice] feature, rendaku cannot be blocked by lexical prespecification of such a feature:  44 miti "path"  kusa "grass"  miti-kusa (rendaku blocking)  (29) [-voice]! {/miti/ + [+voice] +} /kusa/ miti-kusa  Max [-voice]  Max[+voice] *!  [-voice], (wrong result) miti-g u s a [+voi] [-voi]. Inclusion of [-voice] in the listing fails to block rendaku. This means that under a morphological account of rendaku, the only way we can account for blocking is either (a) to have the junctural morpheme fail to appear in compounds that block, which would be inexplicable, or (b) have blocking compounds all listed in the lexicon, for example as /miti-kusa/. A n immediate problem with this latter option is that under such an account, the systematicity of rendaku blocking that we observe becomes a total mystery. If any compound can be listed in such a way then we would predict that blocking should occur in a totally random fashion, which is not what we find. Accordingly, I shall not pursue a morphological account of rendaku here. Let us next consider a phonological account of rendaku that is proposed by Ito & Mester (1998). In their appendix, Ito and Mester (1998) suggest an alternative account of rendaku based on the sequential markedness of the sequence [+voice], [-voice]. Their constraint *[+-]voi is violated by any sequence of features in the output that consists of a [+voice] feature immediately followed by a [-voice] feature. This constraint would be violated by the pair of [±voice] features that occur across the morpheme boundary in a compound word that did not undergo rendaku. For example, the compound namagome "raw rice", formed from members nama "raw" and kome "rice" voices in their tableau on their page 56 in order to avoid a sequential markedness violation. Their tableau is reproduced below in (30). Their constraint "Ident[-voi]" is a constraint that requires corresponding segments to agree in their values for [-voice].  45 (30) (I & M #152) /nama kome/ 'raw' 'rice'  *[+-]  nama k ome  *!  IDENT[-V0i]  voi  r+-i™ *  is? nama gome 1/  •  r+ivo,-  In order to explain why voiceless obstruents do not voice when preceded by vowels in underived environments (e.g. in mato "target," they also employ a sequentialfaithfulness constraint "Ident[+] " which outranks * {+-] . Ident[+-] will prevent the obstruent IXlfromvoicing in a word like mato "target," as shown in (31), based on Ito & Mester (1998) (p.60 example(161)): voi  voi  voi  (31) /ma +v  to/  Ident[+-]  1  voi  Jvoi  Ident[-voi]  -V  target"  *  US'  ma  to  +v - V mado  *  *!  1 1  +v +v The candidate /mado/ violates Ident[+-] because it does not preserve the sequence [+voice],Invoice] of the /a/ IXl sequence in the input. Their account requires that both the vowel and the voiceless obstruent be specified for voicing in the input. voi  The constraint Ident[+-] will not prevent rendaku voicing since the relevant [+voice] [-voice] sequence spans a morpheme boundary and is therefore not present as a sequence in the input. voi  46 (32) /nama kome/ 'raw' 'rice'  Ident[+-]  L  voi  Jvoi  lDENT[-VOi]  *!  nama k ome  *  ^ nama gome \l  In order to deal with the issue of high vowel voicing, however, their account would have to be modified so that the constraint Ident[+-] does not prevent devoicing of high vowels that occurs between two voiceless obstruents. voi  For example, a word like kusa "grass" devoices the lul between voiceless obstruents Ikl and Isl. This output violates the constraint Ident[+-] because in Ito and Mester's account, the sequence IvJ Isl must be specified as /u/ /s/_ with voicing features in the input just like the input features of /a/ and Itl in mato. The output kusa with a voiceless lul fails to preserve the [+voice] [-voice] sequence ofthe lu/ and Isl. In order to prevent Ident[+-] from blocking high vowel devoicing, the constraint responsible for high vowel devoicing must be ranked above Ident[+-] . voj  +voi  voi  voi  voi  But even i f we rank Hi-Vowel-Devoicing higher than Ident[+-] , we still have a problem in that the constraint Ident[+-] is violated by the candidate that actually surfaces: kusa, with a voiceless lul. As shown in (33), this is a problem because the constraint Ident[+-] is not violated by a candidate that voices the initial obstruent: gusa. The candidate gusa would be considered more optimal than kusa with voiceless IvJ because we are forced to rank Ident[-voi] lower than the two constraints that involve sequences of voicing features. Ifwe ranked Ident[-voi] higher than Ident[+]voi> would get the wrong results for rendaku voicing, as shown in (34). voi  voi  voi  w  e  47 (33) /kusa/ "grass"  Hi-V-Dev  kusa  *!  Ident[+-]  voi  *[+-]voi  Ident [-voi]  *  *  *!  kusa [-voi] Inapt  gusa *  *!  g usa [-voi]  (34) Wrong result for rendaku i f we rank Ident[-voi] higher than Ident[-+] : voi  /nama kome/ 'raw' 'rice'  lDENT[-VOi]  Ident[+-]  voi  *r+-i • L  Jvoi  * nama k ome \+  -Ivoi  *!  nama gome 1/  There are two possible ways in which this problem could be remedied. One is to modify the constraint Ident[+-] so that it is only violated when a [+voi][-voi] sequence in the input surfaces as [+voi] [+voi] but not when it surfaces as [-voi] [-voi]. This could be done, for example, through Local Conjunction. The constraint Ident[+-] could be conjoined with the constraint Ident[-voi]. voi  voi  (35)Ident[+-]  voi  & Ident[-voi];  "For every [+voice] [-voice] sequence in the input there must be a corresponding [+voice] [-voice] sequence in the output A N D for every [-voice] feature in the input there must be a corresponding  48 [-voice] feature in the output.  If this conjoined constraint is substituted for the simplex constraint Ident[+-] , it will prevent a word like mato "target" from voicing the IXl since such an output would violate both Ident[+-] and Ident[-voi]. It will not, however, be violated in a case of high vowel devoicing, since a vowel that is [+voice] in the input will not violate Ident[-voi] when it surfaces as voiceless. This will enable us to derive the correct output for a word like kusa, that has a devoiced high vowel. voi  22  (36) /kusa/ "grass"  kusa  Hi-V-Dev  *!  Ident[+-] & Ident[-voi] voi  T+-1 • L  *  Jvoi  Ident [-voi] -  es-  kusa [-voi] gusa gusa  *!  *  *!  [-voi] A second possibility would be to have a high-ranking constraint that bans a voiced-obstruent at the beginning of an accentual phrase. This would not affect rendaku voicing since rendaku never occurs at the left edge of an accentual phrase. A compound word that undergoes rendaku must be considered to constitute one accentual phrase in order to explain the fact that it has only one pitch accent. Independent evidence for high ranking of a markedness constraint that bans word-  A s pointed out by Armin Mester (personal communication), high vowel devoicing has a number of properties that suggest that it is a phonetic rather than morphophonemic process like rendaku. For example, he points out that (a) it is exceptionless, and (b) it has properties that may be better analysed as phonetic rather than phonological ones: e.g. it is phonetically gradual; it depends on intrinsic vowel duration; and it to some extent affects non-high vowels as well. (See, for example, Han (1962).) If this characterization of high vowel devoicing is correct, then we should not expect it to interact phonologically with rendaku voicing. In chapter 81 make a similar proposal for the boundary L tone of tokyo dialect pitch accent: that it is a phonetic effect that does not interact with phonological tones. 22  voi  49 initial obstruent voicing is the fact that very few Yamato words have an initial voiced obstruent. (See Appendix G.) A second complication with their account is that as it stands, it depends on the fact that the final vowel of the first conjunct ofthe compound has a [+voice] feature. This would predict incorrectly that when this final vowel experiences high vowel devoicing, which occurs for high vowels between two voiceless obstruents, rendaku will not occur on the following obstruent, since there is no marked [+-] sequence to repair. voi  There are many examples of compounds that undergo rendaku that would have final devoiced vowels in the first conjunct if rendaku were not to occur. The following are just a few. (37) tutu-guti ti-gusa nosi-bukuro asi-ge kuti-bi matu-ba musi-ba isi-bai yuki-dama  "pipe-mouth" "thousand-plants" "dried.sea.ear-bag "foot-hair" "mouth-fire" "pine-leaf "insect-tooth" "stone-ash" "snow-ball"  Ito and Mester's sequential markedness account would incorrectly predict, for example, that yukidama should surface with no rendaku voicing:  50 (38) Hi-VDevoicing  yuki tama "snow" "ball" •®"y u k i-tama  Ident[+-] & Ident[voi]  voi  *r+-i • L  lDENT[-VOi]  Jvoi  23  V [~]voi  y u k i-tama  *!  *,  \l [~]voi  yuki-dama  *!  r-ir+ivn,yuki-dama  *!  \l r+i™  Following a suggestion from Douglas Pulleyblank (personal communication) Ito & Mester's sequential markedness account could be modified by relativizing their "Ident" faithfulness constraint with respect to the difference between sonorants and obstruents. For example, a constraint of their "Ident" type that applies to sonorants could be ranked above an "Ident" constraint for obstruents. In the candidate yuki-tama, devoicing of the final I'll in yuki is fully expected, since it devoices in compounds that do not experience rendaku voicing such as the following: 23  yuki-kemuri snow-smoke yuki-korogasi snowball-rolling yuki-sigure snow-shower yuki-situ yuki-tubute snow-pebble yuki-turi snow-fishing yuki-hada snow-skin yuki-humi snow-tread yuki-huri snow-fall The voiceless l\l on yuki in these words is indicated in their listings in the N H K pronunciation dictionary.  51  (39) yuki tama "snow" "ball"  Hi-VDevoicing  Ident [+-]voi&  *[+-] . L  Jvoi  IDENT  [sonor]  Ident [obst]  Ident [-voi] *!  y u k i-tama  \\ 1 [~]voi  y u k i-tama  *!  1/ [~]voi  yuki-dama  *!  *  * yuki-dama \l r+i™  This would derive the correct result for compounds such as yuki-dama, where an output without rendaku would devoice the vowel l\l as in the first candidate above. Ito & Mester's sequential markedness account in their 1998 paper is provided as an alternative to their morphemic account of rendaku voicing. While their morphemic account is intended to capture lexical exceptions to rendaku, their sequential markedness account does not appear to be so intended. Prespecification of a non-voicing noun or a compound with some feature such as [voice] will not block rendaku since it will have no effect on the rendaku-deriving constraint *[+] which is only concerned with outputs. voi  Because Ito & Mester's two accounts of rendaku either require a junctural morpheme or else cannot readily capture lexical exceptions through lexical prespecification, I shall propose, in §3.3.2, an alternative account of rendaku voicing that depends neither on the existence of a "junctural morpheme" nor on the idea of repairing a marked sequence of [+voice] followed by [+voice].  52 3.3.2 A n alternative approach to rendaku In this account of rendaku voicing, I will abstract away from the morphological problem of why rendaku occurs for certain types of morphologically complex constructions but not others. For example, as we observed on page 28, dvandva compounds uniformly resist rendaku. Verb-verb compounds also often resist rendaku. (See Rosen (1999) for an examination of Japanese verbverb compounds.) Whether or not suffixes of nouns undergo rendaku is impossible to determine because there are, to my knowledge, no suffixes that occur with Yamato nouns. There are also no case-particle clitics such as ga (NOM), o (ACC), ni (DAT) or no (GEN) that begin with voiceless obstruents. The only postposition that begins with a voiced obstruent is to "with;and," which never voices, suggesting either that rendaku does not occur on enclitics that occur with nouns or that to is marked as [-voice]. No account of rendaku voicing has yet, to my knowledge, accounted for why voicing occurs productively only in certain types of morphological constructions, such as noun-noun compounds. A principled explanation of why rendaku is restricted to certain morphological constructions is a matter for further research. Let us begin by noting an apparent paradox in the fact that voiced obstruents in Japanese show marked status, as discussed by Ito and Mester (1998), yet rendaku voicing is a process that appears to favour the appearance of this marked feature combination [-sonorant, +voice]. We might expect voiceless obstruents to be favoured over voiced obstruents as the initial member of the second conjunct of a compound, but clearly they are not. Because voiceless obstruents represent the unmarked value for voicing of obstruents in Japanese, it is possible for them to be underspecified in the input, following the discussion on page 30. The idea that voiceless obstruents can be underspecified in the input, and their [-voice] value derived through the constraint Obs-Voi (see (15)), will be the key to the explanation of rendaku voicing that I will pursue here. As discussed on page 30, the fact that certain morphemes uniformly resist rendaku can be accounted for i f we adopt the following two hypotheses: (a) voiceless obstruents can be underspecified for the feature [-voice] in the input, but morphemes that resist rendaku have a prespecified initial voiceless obstruent; (b) voiceless obstruents are fully specified for voicing in the output. When a morpheme has a prespecified voiceless initial obstruent, it will resist rendaku voicing in compounds because of the highly ranked faithfulness constraint Max-Path-Feature, introduced in (16). As far as specification of voiceless obstruents in the output is concerned, the requirement that an obstruent be specified for voicing in the output could be due to an undominated markedness constraint such as the following: Obst[±voice]: Every [-sonorant] feature in the output must be on a path with a voicing feature of the type [±voice]. If voiceless obstruents are underspecified in the input but fully specified in the output, then they will violate the following faithfulness constraint (40), when they surface fully specified as voiceless:  53 (40) Dep[-voice]: "For every [-voice] feature in the output, there is a corresponding [-voice] feature in the input." This constraint is violated i f a voiceless obstruent surfaces in a case where its input is underspecified for the [-voice] feature, since there will be no corresponding [-voice] feature in the input. Rendaku voicing can be seen as an avoidance of a violation of this constraint, since when rendaku voicing occurs, Dep[-voice] is not violated whereas it would be i f the morpheme initial obstruent on the second conjunct of a compound surfaced as voiceless. If voiceless obstruents are, at least in certain cases, underspecified in the input, then the constraint Dep[-voice] clearly should not have an effect on outputs everywhere, or else we could never realize voiceless obstruents on the surface without fully specifying them in the input. But having this constraint effective at the beginning of the second conjunct of a compound word will enable us to capture the effect of rendaku voicing. In order to localize the effect ofthe Dep[-voice] constraint, we introduce another constraint that captures the fact that rendaku occurs only at a boundary between two morphemes, never at the beginning of a word. The fact that rendaku occurs only at this location can be captured by the idea that this position is the locus of a constraint violation: namely the mismatch in alignment between the left edge of a morpheme and the left edge of a compound word. The following constraint makes the locus of rendaku a locus of constraint violation. (41) Align-Morpheme-Left-Word-Left: "The left edge of every morpheme must be aligned with the left edge of a word." This constraint will be violated by a compound word at the morpheme boundary within the compound but not by a simplex or compound word at its boundary with other phrasal material. Now, to express the fact that it is at the morpheme boundary that rendaku voicing wants to avoid a violation of Dep [-voice], we can combine the effects of (40) and (41) by employing the principle of local conjunction. If constraints C, and C are conjoined in domain D , then the conjoined constraint is violated iff both Q and C are violated in D . A domain is some phonological or morphological constituent such as a prosodic category, a morpheme, a morphological word, etc. 2  2  To account for rendaku voicing, I propose that the two constraints Dep[-voice] (40), and AlignMorpheme-Left-Word-Left (41), are conjoined in the domain ofthe root node. This means that the conjoined constraint is violated iff both Dep[-voice] and Align-Morpheme-Left-Word-Left are violated with respect to the same root node. For example, in a compound word, the leftmost root node ofthe second conjunct is not aligned with the left edge of the whole compound, since  54 all the root nodes ofthe first conjunct precede it within the same word. This root node is also the site of a violation of Dep [-voice] if it on a path with a [-voice] feature in the output but no such feature in the input. The constraint Align-Morpheme-Left-Word-Left effectively picks out the left edge of the second conjunct of a compound as a marked position, because the left edge of this conjunct does not align with the left edge ofthe whole compound word. The root node at this left edge of the second conjunct is the locus of violation of the constraint Align-Morpheme-Left-Word-Left. Accordingly, the conjoined constraint will be violated i f the constraint Dep[-voice] is also violated at this root node. This is precisely what will happen i f an initial obstruent on the second conjunct is underspecified in the input and does NOT undergo rendaku. (42) Dep[-voice] & Align-Morpheme-Left-Word-Left: "The left edge of every morpheme must be aligned with the left edge of a word. A N D for every [-voice] feature in the output, there is a corresponding [-voice] feature in the input." This constraint will prevent a default [-voice] feature from surfacing where it normally would. The constraint cannot apply in any other location, since it is only at the left edge of the second conjunct of a compound that the alignment constraint will be violated. The idea behind this constraint is that faithfulness is relativized to certain domains. It is worse to violate a faithfulness constraint in a marked position than in an unmarked position. 24  The following faithfulness constraint will be violated by outputs that undergo rendaku voicing. (43) Dep[+voice]: "Any [+voice] feature that occurs in the output must occur in the input." This constraint must be ranked lower than our proposed pro-rendaku constraint Dep[-voice] & Align-Morpheme-Left-Word-Left in order to allow rendaku voicing to occur. Consider, for example, a compound like isi-basi "stone bridge" that undergoes rendaku voicing. Ranking the constraint Dep [-voice] & Align-Morpheme-Left-Word-Left above Son-voi will correctly derive rendaku in this compound. Given lack of any evidence for underspecified voicing features in Japanese outputs, I will assume that for these features, outputs are fully specified. I will also assume that terminal features are binary: e.g. [±voice].  As discussed in footnote 18,1 abstract away here from the morphological question of why rendaku voicing occurs rarely in some types of compounds such as dvandva compounds (never occurring) or in inflectible verb-verb compounds (infrequently). We also find no evidence for rendaku voicing occurring in suffixes (e.g. the nominalizing suffix -sa) which occurs with Yamato adjectives, or verbal inflectional suffixes such as ta (imperfective).) 24  55  I propose that obstruents that are underspecified for voicing will not surface as such, owing to a highly ranked constraint that requires obstruents to have a [±voice] feature in the output: [-son]/[±voice] "Every [-sonorant] specification in the output must be on a path to a [±voice feature."  Underspecified [-voice] feature on the initial obstruent of hasi "bridge": (44) iSi + HaSi stone bridge  [-son]/[±voice]  Dep[-V] & AL-M-L-WDL  Obs-Voi  *  *  *!  a. isi-hasi •^b. isi-basi  *!  c. isi-hazi d. isi-Hasi  Dep[+voi]  * *  *!  In this type of compound, rendaku will occur in order to avoid violation ofthe constraint Dep[voice] at the same root node where the alignment constraint is violated. Because the input is underspecified for the feature [-voice], the initial voiceless obstruent on the second conjunct of .candidates a and c will violate Dep[-voice]. These candidates will also violate the alignment constraint at the same locus. 25  As I will show in §3.3, deriving rendaku voicing in this way can give us a straightforward account of rendaku blocking. If rendaku is derived through a Dep[-voice] constraint, then the presence of a lexical [-voice] feature in the input will block rendaku, since that underlying feature will satisfy Dep [-voice]. It should be observed at this point that the process of rendaku voicing in Japanese is an unusual one in that it goes against the grain of the kinds of markedness tendencies that we should normally expect to find across languages. If rendaku is due to a phonological rather than morphological process, as I have argued above, then it is a process that is requiring that a marked featural specification occur in the output: [-sonorant, +voice].  A fourth possible candidate, isi-bazi will be ruled out by Lyman's Law, which applies almost universally in prohibiting more than one voiced obstruent in the same morpheme. (See page 16) In all the examples that follow, I will omit candidates that violate Lyman's Law and assume that they are ruled out by an undominated constraint. 2 5  56 In general, we do not expect that languages should act in a way to require underived marked featural combinations to surface. We do not want to add something like an "anti-markedness" constraint to our list of possible constraint types, since such an inclusion would make wrong cross-linguistic predictions about how languages should behave. Yet we do find cases in other languages of marked featural specifications surfacing in marked positions. For example, In Blackfoot, glottal stops can occur only in codas. (See Frantz (1991)). In Nuuchahnulth, glottal stops can occur only in onsets. (See Howe and Pulleyblank, in press.) If we tried to make a crosslinguistic generalization from these two patterns, we would be going against markedness theory, which expects marked featural specifications to avoid marked positions. Clearly some languages have idiosyncrasies that on a case-to-case basis, appear to go against general markedness tendencies we expect to find cross-linguistically. Rendaku voicing in Japanese is a possible candidate for this type of idiosyncrasy. As a result, my approach to rendaku will be that it is derived from an "odd" conjunction of constraints that is highly ranked for Japanese but not highly ranked in general across languages. 26  3.3.3 Rendaku blocking We are now ready to approach the question of how rendaku blocking occurs in noun-nouncompounds. We saw in §3.2 that patterns of rendaku blocking in noun-noun compounds show strong signs of being lexically determined: that is, some compounds appear to have a lexical feature that blocks rendaku whereas others do not. Recall the proposal regarding lexical listing of compounds developed on pages 2Iff: that compounds must be listed minimally in the lexicon as a reference to two entries. It is possible to add to this listing further information, even though the entries themselves cannot be accessed or changed in the lexical listing of that compound. For example, the lexical listing of the compound yuki-dama "snowball", is of the form A , B where A and B are references to the listings of yuki  An alternative approach to rendaku voicing might be to derive it from a simplex Dep[-voice] constraint along with a simplex alignment constraint: Align[+voice]-Morpheme-Left. High ranking of Lyman's Law would prevent the surfacing of more than one voiced obstruent in the morpheme. To make such an account work, it would be necessary to rule out the appearance of a [-son,+voice] featural combination word-initially. Word-initial voiced obstruents in the Yamato lexicon are actually quite rare, (see Appendix G), but this is arguably due to the fact that Yamato voiced obstruents are historical reflexes of nasal-consonant sequences, which could only occur word-initially i f an initial vowel disappeared. (See Martin (1987).) 26  57 "snow" and tama "ball" respectively. In this listing it is not possible to change the internal structure of what A and B refer to because they are no more than references to another listing. But it is possible to add new features generally to the listing, as must be the case for information on pitch accent for short compounds. Thus is it possible to add an autosegmental [-voice] feature to the compound, but that feature cannot be linked to any of the internal structure of the listings that the references refer to. 27  Under this hypothesis, it is possible to have a lexical listing for a compound that includes a floating [-voice]. When this floating [-voice] feature occurs as part of the input, rendaku will be blocked since the constraint Dep[-voice] is now satisfied by the presence of the [-voice] feature in the input. Normally, an underspecified obstruent will surface as voiceless because of the constraint ObsVoi, introduced in (15). Ranking of the simplex constraint Dep[-voice] below Obs-Voi will allow an output with a [-voice] feature to surface. For "rendaku-loving" nouns that occasionally block rendaku (e.g. tama "ball" in mizu-tama "water droplet") I propose that they have a [-voice] feature that is a floating feature unlinked to any root node in the input. This is to distinguish the behaviour of these nouns from those such as tuti "earth" which block rendaku without exception. Recall our proposal that the lexical listing of compounds is such that features can be only added to the basic A + B structure of compounds, where A and B are pointers to the listings of other lexical items. Linking a [-voice] feature to the input of this compound will violate our proposed restriction that forbids the lexical listing of a compound to access features and structure inside the lexical listing of one of the constituent morphemes. But adding a floating feature to the references to the two constituent morphemes will not violate this restriction. Thus the lexical listing for rendaku-blocker mizu-tama "water droplet" would be as follows: A + B [-voice]  where A is a pointer to the lexical listing for mizu, and B a pointer to the listing for tama.  The following tableaux show how ranking of Dep[-V] & A L - M - L - W D - L above Obs-Voi will result in blocking of rendaku in mizu-tama "water droplet" and huna-hasi "bridge boat".  This listing will also likely include syntactic and semantic information about the compound.  58 (45) input: A + B [-voice]; A=mizu; B=Tama  Dep[-V] & AL-M-L-WDL  Obs-Voi  Dep [-voice]  «s"mizu-tama *!  mizu-dama (46) floating  [-voi],  Huna + HaSi  Dep[-V] & AL-M-L-WDL  Obs-Voi  *  "^huna-h a s i -V,  -V  huna-h a s i -V  Dep[-voice]  *  *!  -V,  huna-b a s i I  *!  i  1 1  +v  -V,  huna-h a z i  *!  mi-  -v, +v huna-b a z i  **i  +v +v The conjoined constraint, Dep[-voice] & Align-morpheme-Left-Word-Left, would normally require rendaku voicing to occur for it to be satisfied. But when there is a [-voice] feature in the input, as in the above tableaux, it can link to the initial obstruent ofthe second conjunct in the output. This will mean that the constraint Dep[-voice] is not violated. Thus, the non-voiced candidate will not be ruled out and Obs-Voi will determine the winner. Notice that nothing requires that the floating [-voice] feature in the input must link to the initial obstruent of the second conjunct. But the fact that it is possible for it to link to this obstruent in the output means that such a candidate will satisfy Dep [-voice] and thus block rendaku.  59 Nouns like tuti "earth", differ from tama "ball" in that they invariably block rendaku. We can distinguish their behaviour from that of nouns that block rendaku variably, by regarding them as having a [-voice] feature linked to the initial obstruent of tuti "earth" in the input. This will mean that rendaku will be blocked in any compound word they occur in regardless of the prosodic length of its constituents. 3.3.4 Blocking of blocking For "rendaku-loving" nouns such as tama for which voicing is the rule rather than the exception, it is reasonable to suppose that tama in its listing as a simplex word has no floating [-voice] feature. The minority of cases in which compounds with tama block rendaku (e.g. mizu-tama) are more naturally explained if the floating [-voice] feature is present only in the listing for the whole compound. Thus compounds with second member tama are distinguished with respect to whether they voice or not by the presence or absence of floating feature [-voice]. For "rendaku-hating" nouns like kusa "grass", which block rendaku in the majority of cases, it is more natural to assume that the noun kusa itself has a floating [-voice] feature in its lexical listing. This feature has to be a floating one rather than one that is linked, as in nouns like kasu "dregs", which are completely immune to rendaku. Recall that nouns like kusa "grass", which can voice in a minority of compounds, are to be distinguished from nouns like kasu "dregs", which always block rendaku, even in long compounds such as abura-kasu "oil-dregs." I shall explain presently exactly how the difference between a floating [-voice] feature on rendaku-hater kusa and a linked [-voice] feature on kasu, (immune to rendaku) will account for their difference in behaviour.  Consider first the fact that the noun kusa does voice in a small minority of short compounds, such as no-gusa (field-grass) "wild grasses". If kusa has a (floating) [-voice] feature in the simplex listing of kusa, we need to account for cases like these where voicing occurs. The fact that voicing occurs only in a small minority of compounds with kusa suggests that further lexical specification is at work here — in this case, the addition of a [+voice] feature to a small number of compounds. If it is possible to include a floating [-voice] feature as part of a lexical listing of a compound, we would expect to also find a few cases where a [+voice] feature is included. If [+voice] has an effect on the output here in preference to [-voice] then there must be greater faithfulness to [+voice] in the output than there is to [-voice]. This is expressed in the following constraint hierarchy:  60 Max[+voice] » Max[-voice] Max[+voice]: "For every [+voice] feature in the input in morpheme M there must be a corresponding [+voice] feature in the output in morpheme M . " 2 8  If the second conjunct has more than one voiceless obstruent in its input, the grammar must determine that the floating [+voice] feature links to the morpheme-initial obstruent in the output. This can be achieved through the following alignment constraint: Align-[+voice]-Left-Morpheme-Left: "The left edge of a [+voice] feature is aligned with the left edge of a morpheme." To explain why many words surface with a non-left-aligned voiced obstruent, the above constraint must be ranked below the faithfulness constraint Max-Path-Feature (see (16)). A noun like kage "shade" does not surface as *gake because it would have a [+voice] feature that is on a path with the third root node in the input. This feature will remain linked to the same root node in the output because of the higher ranking of the constraint Max-Path-Feature. The following tableau shows how compounds like no-gusa will block blocking of rendaku by the presence of a floating [+voice] feature in addition to a [-voice] feature. The crucial ranking here is that Max[+voice] dominates Obs-Voi.  In the derivations below, sonorants are assumed to have a [+voice] feature on the surface. Ultimately, for this kind of account to work, it maybe necessary to consider sonorants to surface without a [+voice] specification unless they are licensed by an adjacent voiced obstruent. This approach to sonorant voicing is proposed by Ito & Mester (1994). In their account, a predictable feature such as [+voice] in sonorants must be licensed in the output. In our account of faithfulness to a [+voice] feature in the input, linking that feature to a sonorant instead of an obstruent will violate a licensing constraint such as the one proposed by Ito & Mester since there will be no adjacent voiced obstruent to license voicing on the sonorant. 28  If sonorants must be licensed in order to surface with [+voice], then in a compound like kuwabara "mulberry field" with noun hara "field" that often resists voicing, a floating [+voice] feature in the listing of the compound cannot link to the unlicensed sonorant kl or to a vowel: it must link to the Ihl to cause voicing.  61 (47) MaxPathFeature  A + B [+voice], A=no B=KuSa [-voice]  Max[+voice]  2  Dep[-V] & ALM-LWD-L  ObsVoi  Align[+voice]LeftMorphLeft  Max [-voice]  *!  no-k u s a -V - V  *!  no-k u s a -V  7  -V  -V  -v  2  *!  no-g u s a 1 +V  ji«liB|i^p|  *!  no-k u s a  *  *  *  *!  1 -V  no-g u s a +V, -V  +v -v  *  *!  no-g u s a 2  *  PS?  n o-gusa 1  1  r+vi, t-v] no-k u z a -V  2  *!  *  *!  +v,  no-k u z a -v +v 2  '*  +v  no-k u z a -V  *  *  *  *!  iisiii fflll^p  62 no-k u z a  *  *!  r-li, r+vi, As far as devoicing of high vowels is concerned, we find that high vowel devoicing, unlike rendaku, is completely predictable and has no lexical exceptions. That is, even having an input form with an added floating [+voice] feature will not blocking vowel devoicing. To capture this fact, we must consider the constraint for high vowel devoicing to be highly ranked. Consider, for example, how the hierarchy in (47) above would affect high vowel devoicing in simplex word kusa "grass", which devoices the AV, even if kusa were to have a floating [+voice] feature in its input. In the tableau below, kusa "grass" has a hypothetical floating [+voice] feature in its input. This is to show that such a feature still cannot block high vowel devoicing in the way that a floating [-voice] feature could block rendaku voicing in a compound. The constraint for high vowel devoicing is ranked above Max-Path-Feature.  63 (48)  KuSa [+voice]!  High Vowel Devoicing  MaxPathFeature  Max[+voi]  Dep[-V] & AL-ML-WD-L  ObsVoi  Align[+voice ]-LeftMorphLeft  voiced vowel between two voiceless obstruents: *  *!  kusa r+vi,  *!  kusa  *  *  r+vi, voiced obstruent with different index than [-voice], gusa  *!  *  *!  •*  1  [+vl  ?  ku za  . **  r-vi r+vi, kuza  *!  *  *!  *  *_  r+vi, g  us a  [+vl'[-vl kusa  *!  ***  [-Ii r+vi, vowel or voiced obstruent wil h index o * [+voice] m-  kusa  J i [+vi,  ***  Max [-voi]  64  *!  gusa r+vl, g  *!  usa  r+vi, r-vi  *!  kuza r+vi,  *!  kuza  **  II  r-vir+vi,  The first set of candidates is eliminated by the top-ranked constraint because these candidates violate high vowel devoicing. The second set of candidates are eliminated by Max[+voi] because the voiced obstruent in these candidates has a different index. The last group of candidates all satisfy the top four constraints. The optimal candidate is the only one that satisfies Obs-Voi, since it has no voiced obstruent. Recall that in Ito & Mester's account of rendaku, their Ident[+-] faithfulness constraint ruled out an output with a devoiced high vowel since the vowel lost its [+voice] specification that was the initial member ofthe [+voice] [-voice] sequence. We had to propose modification of their constraint hierarchy for deriving rendaku in order that high vowel devoicing be allowed to occur. voi  In the present account, there is no sequential faithfulness constraint that is violated by high vowel devoicing. Recall also that their system of deriving rendaku voicing had to be slightly modified for compounds that would devoice the final vowel ofthe first member i f rendaku voicing were not to occur. The relevant example was yuki-dama "snow-ball", where the output *yuki-tama satisfies their sequential markedness constraint *[+-] because of high vowel devoicing ofthe HI. voi  The system proposed here will correctly derive rendaku in this type of compound without further modification. Devoicing of a high vowel in the incorrect environment is ruled out by a markedness constraint on voicing of vowels:  Sonorant-Voicing: "Any [+sonorant] specification is on a path to a [+voice] feature."  65 (49) yuKi + Tama "snow" "ball"  High Vowel Devoicing  MaxPathFeature  Max[+voi]  Dep[-V] & ALM-LWD-L  ObsVoi  Align[+voice]LeftMorphLeft  SonVoi  *  *  *!  *  *  Max [-voi]  *!  yuki-tama  r-vir-vi yuki-tama  *  *!  r-v| r-vi yuki-dama [-v] US'  yuki-dama  We can now compare the behaviour of nouns like kusa, that sometimes allow voicing with those like kasu, which never allow voicing. Recall that the proposed difference is that kasu "dregs" has a linked [-voice] feature as part of its lexical entry. The high ranking of the constraint Max-Path-Feature will explain why nouns like kasu never undergo rendaku. (50) [-voi] /abura/ + /kaSu/ "oil" "dregs"  [+voi]  MaxPathFeature  Dep[-V] & AL-M-L-WDL  Obs-Voi  abura-kasu  *!  abura-kazu abura-gasu  *!  Because the [-voice] feature on kasu is linked in the input, it is forced to surface in the output by the highly-ranked constraint Max-Path-Feature, regardless of the presence of any floating  66 [+voice] feature in the listing ofthe whole compound as we surmised in the derivation ofno-gusa in (47) above. In summary, the kinds of rendaku blocking patterns we see for noun-noun compounds are accounted for as follows. For "long" compounds that have a member that exceeds two moras, blocking only occurs when the second conjunct is a noun that always fails to voice — that is, it has a pre-linked [-voice] feature. Selective blocking by a floating [-voice] feature does not occur for these compounds. These compounds act as i f they do not admit the addition of extra features the way that short compounds do. This apparent property of "long" compounds will be discussed further on page 70.  For short compounds that do not meet this prosodic size requirement, rendaku is blocked when there is a floating [-voice] feature added to the listing of the compound. This may occur in two ways. In the first way, the second conjunct has this floating feature as part of its simplex lexical listing. When this is the case, any compound whose second conjunct is this morpheme will block rendaku unless a [+voice] feature is added to the lexical listing ofthe whole compound (tableau (47).) If the addition of a floating feature to a simplex noun is an uncommon state of affairs, then we would correctly predict that morphemes that regularly resist rendaku are relative few, as we saw was the case on page 40. In the second way of adding a [-voice] feature to the entry of a compound word, it is added to the whole entry of a compound word. If this kind of addition is similarly uncommon, we also correctly predict that most nouns that "like" rendaku (i.e. do not have their own [-voice] feature) will only resist it in a small minority of cases, as we also saw was the case on page 40. If adding a floating [+voice] feature to a compound entry happens as infrequently as is the case for a [-voice] feature, we would predict that nouns such as kusa that resist rendaku will have only a small minority of compounds that do voice. This correctly accounts for our observation that there are no nouns that have "middle tendencies" towards voicing: they either tend to strongly voice or strongly resist voicing. The following chart sums up the possible compound types with respect to whether or not floating voicing features occur in the listings of the second noun and/or the compound. There are four main possibilities for whether floating [+voice] or [-voice] features occur in the input form of N2. For "type 1", the noun has no floating voicing features. In this case, rendaku is blocked only if the compound has added a [-voice] feature but not a [+voice] feature. For types 2 and 4, N2 has a floating [+voice] feature. In such a case, it does not matter what other voicing features are added or not to N2 or to the compound: rendaku cannot be blocked because ofthe presence of the [+voice] feature. For type 3, N2 has a [-voice] feature but no [+voice] feature. In  67 this case, rendaku will be blocked only i f the compound itself has no [+voice] feature added. Thus there are only two instances in which blocking can occur: types l b and 3a. 29  Exactly why these proposed floating features that can block rendaku voicing occur relatively rarely is difficult to account for in a principled way. In chapter 6 I argue that the lexicon should show no bias. That is, if there are two possibilities for the lexical entry of a compound: with a floating [-voice] feature and without one, we should expect each possibility to occur with equal frequency. This does not happen, however, for compounds that are eligible for rendaku voicing. 29  In the next chapter I will argue that exceptions to rendaku voicing in modern Japanese through lexicalization are the historical legacy of utterance-to-utterance variation in rendaku voicing that occurred at an earlier stage of the language as a result of various inhibitory factors such as performance and perceptions errors and sociolinguistic factors. To the extent that the degree of lexicalized exceptions that occurred reflect the degree to which rendaku voicing was blocked by these factors at an earlier stage, the most plausible explanation for the rarity of floating features is a historical one. This would mean that bias in the lexicon can occur as a result of historical factors (for example, the extreme rarity of initial voiced obstruents in Yamato nouns can be shown to be a result of historical factors (See Martin (1987.))) But i f exceptions to phonological processes can occur as a result of lexical prespecification of an otherwise underspecified feature, it ought to be theoretically possible in some languages for such a process to be blocked in a large proportion of derived forms. A n important piece of future research would be to make a cross-linguistic study of frequency of exceptions to regular grammatical processes that occur in various languages.  68 (51)Summary of possibilities for floating voicing features on compounds Type of noun according to whether it has floating voicing features  Floating [-voice] feature on N2  Floating [+voice] feature on N2  Floating [-voice] feature on compound  Floating [+voice] feature on compound  Rendaku voicing  1. No floating voicing features onN2: most common  la  no  no  no  no or yes  yes  Blocks rendaku if [-voice] but added to compound but not [+voice]  lb  no  no  yes  no  no  "RENDAKU LOVERS" (usually or always voice in compounds)  lc  no  no  yes  yes  yes  2. Floating [+voice] onN2: should be less common; voices 100% of time  2  no  yes  no or yes  no or yes  yes  3a  yes  no  no or yes  no  no  3b  yes  no  no or yes  yes  yes  "RENDAKU LOVERS" 3. Floating [-voice] on N2: should be less common; blocks in most cases: only voices i f floating [+voice] in compound entry "RENDAKU HATERS"  69 4. Floating [-voice] and [+voice in N2. Should be uncommon; always voices.  4  yes  yes  no or yes  no or yes  yes  5  doesn't matter  doesn't matter  doesn't matter  doesn't matter  no  "RENDAKU LOVERS" 5. Immune to rendaku: [-voice] linked to the initial obstruent  "Rendaku lovers" will be more common than "rendaku haters", if the inclusion of a floating [voice] feature in the listing of rendaku-hating nouns is the exception rather than the rule. Rendaku lovers will usually or always voice, because blocking can only occur if they are of type lb, with no floating features on N2 and the inclusion of a [-voice] feature in the compound listing. If inclusion of this feature with the compound listing is also the exception rather than the rule, then most compounds of type 1 should voice. In addition, all compounds where N2 is of type 4 will voice. This is what we found in the data we examined on patterns of rendaku blocking in short noun-noun compounds.  "Rendaku haters", which have a floating [-voice] feature in their listing, will usually block voicing. They will only voice i f a floating [+voice] feature is included with the listing of the compound, and once again we take the inclusion of this extra feature to be the exception rather than the rule. The only patterns that are anomalous under this kind of account are the voicing patterns for a very small number of nouns such as hara "field", which voices in 4 out of 9 compounds and kawa "skin", which voices in 4 out of 8 compounds it occurs in. This is more frequent a rate of voicing than we would expect for a "rendaku hater" with a floating [-voice] feature, and less frequent than we would expect for a "rendaku lover" with no such feature included. These 17 compounds that occur for these two nouns represent 3.5% ofthe 481 2u+2u compounds that were surveyed in this study. The phonological account of rendaku voicing that I have proposed here has the advantage that it has the capability of explaining why rendaku blocking in noun-noun compounds occurs in a systematic way. Because our account allows lexical prespecification of a floating [-voice] to occur for particular nouns, ("rendaku haters"), it can explain why those nouns are much more  70 likely to block rendaku than others. It should be stressed that the account of rendaku that I am proposing here is not the only possible account of rendaku voicing. The point is that any viable account of rendaku should fulfil the following conditions: (a) Proposed morphological differences between different compounds should be based on clear differences in semantic composition or syntactic function. For example, to account morphologically for the difference in voicing between asi-kuse (foot-habit) "way of walking" and kuti-guse (mouth-habit) "way of speaking", one would have to find evidence for a clear difference in semantic composition or morphological structure between the two compounds. (b) Any account of rendaku must be able to explain the systematicity we find in exceptions to voicing among "rendaku lovers" and "rendaku haters." 3.4 The prosodic size factor So far, we have dealt only with compounds for which neither member exceeds two moras. For these compounds, unpredictable lexical blocking can occur. But compounds for which at least one member exceeds two moras never exhibit unpredictable rendaku blocking. As shown by the data in Appendix A , these longer compounds always voice predictably. As was discussed on page 15, we find a similar situation for pitch accent patterns of compounds. Compound words for which one member exceeds two moras have predictable pitch accent patterns. Compounds for which neither member exceeds two moras have unpredictable pitch accent patters, which must be specified lexically for each individual compound. The parallel between rendaku patterns and pitch-accent patterns for the two prosodic classes of compounds is striking, and suggests a unified account of both phenomena. The delimitation of morphemes into two prosodic classes (exceeding two moras vs. not exceeding two moras) with different behaviours is arguably related to a strong preference in the language for restricting bundles of material to a maximum of one bimoraic Foot. We find, for example, that foreign words are often truncated to one bimoraic Foot per morpheme when imported into the language. For example "word processor" becomes waa-puro, "remote control" becomes rimo-kon, and "pocket monster" becomes poke-mon (imported back to the West as a recently popular children's card collecting fad.) There is other evidence that the bimoraic Foot is the canonical prosodic unit in Japanese phonology. Ito and Mester (1992) examines "Zyuuzya-go", the Japanese secret language used by jazz musicians, and argues convincingly that the bimoraic Foot is the template into which an  71 original word is mapped to derive the corresponding word in the secret language. And personal names are often truncated to fit a bimoraic template when the hypocoristic suffix -tyan is added. This phenomenon is discussed by Hewitt (1994). That the bimoraic Foot is a fundamental phonological unit in Japanese is also argued for by Poser (1990). The contrast in behaviour between "long" and "short" compounds with respect to both rendaku voicing and pitch accent also suggests that the difference between the two classes of compounds has some connection with the bimoraic Foot. Compounds that show lexical variation with respect to both rendaku voicing and pitch accent are composed of morphemes that do not exceed the limit of one bimoraic Foot per constituent. Compounds that do not show such variation have at least one morpheme that exceeds this limit. The processes that can be observed for truncation, hypocoristic forms, and the secret language Zyuu-za-go all seek to match constituents of an output to a template of a bimoraic Foot. These processes not only tend to reduce a constituent that exceeds two moras to fit a bimoraic template, but there is also evidence of lengthening a monomoraic syllable to fit the bimoraic template. For example, Hewitt (1994) gives the example of a name like Mika-ko, where a hypocoristic form constructed with the first syllable mi, obligatorily lengthens the vowel to mii: (52) Mikako  Mii-tyan  *Mi-tyan  For these phenomena, then, the grammar seeks to remedy both forms that exceed two moras and those that are less than two moras. But as far as phenomena relating to the two prosodic classes of compounds is concerned, the contrast in phonological behaviour between "long" and "short" compounds, does not appear to distinguish between compounds that have a constituent that equals two moras and compounds with a constituent that is less than two moras. Consider, for example, the compounds in (53) below where the first constituent te "hand" is monomoraic. When accented noun te "hand" combines in a compound with each of the following finalaccented bimoraic nouns, the accent patterns of the resulting compounds will vary in spite ofthe fact that the inputs have identical accent patterns in all cases. (53) kuse "habit" hata "loom" kiwa "brink'.it hata "flag"  te-kuse (penult accent) te-bata (initial accent) te-giwa (unaccented) te-bata (unaccented)  72 To correctly derive the accent patterns for the compounds in (53), we must add more information to the lexical listing of these compounds than is found in the listings of the constituent morphemes. In long compounds, on the other hand, there is no evidence that they carry any additional information that affects either the surface pitch accent of the compound or voicing of the initial obstruent of N2. This suggests that for some reason long compounds do not permit the addition of added featural material to the compound listing. We also find that compounds of the form 2p+lp are unpredictable with respect to rendaku voicing just like those in which both members are bimoraic. Consider, for example, the following compounds whose second constituent is ki "tree": (54) ude-gi arm tree yana-gi fish-weir tree yoko-gi side tree ao-ki green tree ara-ki rough wood kazi-ki oar tree kubi-ki neck tree maru-ki circle tree nama-ki live tree nami-ki row-tree oya-ki parent tree ume-ki plum tree  "roof truss" "willow" "wooden bar" "laurel" "lumber" "swordfish" "yoke" "log" "green wood" "avenue of trees "stock" "plum tree  Rendaku voicing occurs in the first three but not in the rest. The occurrence of voicing is unpredictable. In summary, compounds that have a member that is too small for the bimoraic template behave the same was as compounds both of whose members fit the bimoraic template. If the bimoraic Foot is involved in the contrast between "long" and "short" compounds, then compound behaviour only appears to distinguish between compounds with a member that exceeds a bimoraic Foot and those with members that do not exceed a bimoraic Foot. There is no apparent contrast in behaviour between compounds that have a member that is less than a bimoraic Foot, and compounds where both members match a bimoraic Foot. Let us again express the contrast between "short" compounds and "long" compounds in terms of their behaviour. "Short" compounds are unpredictable with respect to both rendaku voicing and pitch accent. "Long" compounds are completely predictable with respect to rendaku voicing i f we consider compounds whose second element is "immune to rendaku" to have a lexically marked second element. Our hypothesis is that these nouns have a [-voice] feature that is linked to the initial consonant in the input. "Long" compounds are also predictable with respect to pitch accent pattern: that is, their surface  73 pitch accent pattern can be predicted 100% of the time from information in the input forms of their constituent elements. But "short" compounds are not predictable in this way. It is possible to have two compounds where the corresponding constituents have identical prosodic structure and pitch accent, yet the two compounds surface with different pitch accent patterns. We saw examples of this in (53). One possible explanation for the differing behaviour of long compounds is that "short" compounds are able to carry further lexical information in their listing that cannot be carried by long compounds. Such further information could determine (a) whether they undergo rendaku voicing and (b) what their surface pitch accent pattern is. But such an account would violate the principle of "Richness of the Base" in Optimality Theory, which we wish to maintain. To say that only short compounds can carry extra lexical information in their listing such as a floating voicing feature is to say that there are constraints on the lexicon. In addition, such an account fails to explain why rendaku blocking does not occur with long compounds where the second conjunct is a "rendaku hater." Recall our hypothesis about certain nouns such as kumo "cloud" that usually block rendaku voicing in short compounds. If these nouns carry a floating [-voice] feature in their lexical listing, their frequent blocking of rendaku voicing could be explained under the kind of account that was proposed in chapter 3. But these nouns never block voicing when the first conjunct of the compound is long: for example in compounds hituzi-gumo "sheep-cloud." If long compounds were for some reason to always disallow the inclusion of extra features in the listing of the compound itself, such a prohibition would still have no effect on the ability of an individual noun like kumo to carry such features. And i f an individual noun were to carry such features, blocking of rendaku would occur under the account of rendaku that was proposed in chapter 3. If the only special property of long compounds was that they cannot carry extra features in their listing, there would be no reason why at least some long compounds with rendaku haters as the second conjunct should not block rendaku, but this never seems to occur.  A different possibility is that something about the prosodic size of long compounds prevents any lexical prespecification from affecting their surface form with respect to either rendaku or pitch accent — even if such lexical prespecification were to occur. For both pitch accent and rendaku voicing, long noun-noun compounds behave in a more predictable and regular fashion than short compounds, disallowing irregularity. Suppose that long compounds violate some markedness constraint because one member exceeds a bimoraic Foot. If this were the case, then the greater regularity of rendaku voicing in long compounds might be accounted for because of the markedness of long compounds. Because their size makes them violate some markedness constraint, the grammar seeks to avoid a further violation of another constraint which would be violated if blocking were to occur. This is the kind of idea that could be expressed through local conjunction of the two constraints in question: the grammar avoids a violation of both a markedness constraint that applies to long compounds, and some constraint that is violated when  74 blocking occurs. The problem is, that in the account of rendaku we have proposed, when blocking occurs by a prespecified [-voice] feature, there is no constraint that is violated when blocking occurs that is satisfied when rendaku occurs. This is because our account of rendaku is based on a Dep[-voice] constraint. When a [-voice] feature occurs in the input, Dep[-voice] is satisfied by both a blocking candidate and also vacuously by a voicing candidate. The markedness constraint that bans voiced obstruents will then block voicing. The following tableau shows how conjoining any markedness constraint that applies to long compounds with Dep[-voice] will fail to stop blocking from occurring in the long compound with rendaku hater kusa. In (55), "Markedness" refers to any markedness constraint that could be violated by a long compound. (55) input: [-voice]! /hituzi/ + /kusa/  Markedness (long compound) & Dep[-voice]  Dep[-voice] & Align MorphemeLeft-WordLeft  Son-Voi  [-voice], hituzi-kusa [-voice] i hituzi-kusa [-voice],  *!  hituzi-gusa hituzi-gusa  *!  Yet another possibility is that rather than being more marked prosodically than short compounds, long compounds are less marked, in that they satisfy a minimal word requirement. This idea is similar to a proposal by Alderete (1999) in his analysis of Japanese compound accent. Alderete proposes that for long compounds, which have predictable accent, default accent occurs on a "prosodic head", which qualifies as such by meeting a prosodic requirement that it have a Foot plus other prosodic material. Pursuing this line of approach, we take long compounds to be  75 unmarked and short compounds to be marked, in that they have no member that qualifies as a prosodic head. But this approach cannot explain why long compounds fail to block voicing. Regardless of whether long compounds are marked or not, an account of rendaku based on a Dep[-voice] constraint will mean that when there is a [-voice] feature present in the input, there is no longer anything to force rendaku to occur. Given these problems in accounting for the exceptionless voicing of long compounds, I shall leave the matter for future research. Clearly, long compounds resist the kind of blocking of rendaku that occurs with short compounds, but accounting for this behaviour in a way that is consistent with the systematicity of blocking is short compounds is a problem that remains to be solved.  3.5 Underspecification of the [-voice] feature on the initial obstruent Our account of rendaku depends on the hypothesis that most Yamato nouns with an initial voiceless obstruent have the voicing feature underspecified on that obstruent. This raises the question of why, according to our hypothesis, we do not find more nouns in the Yamato lexicon that do have a specified initial voiceless obstruent. I will delay an answer to that question until the conclusion of chapter 8. M y argument for why this obstruent must be underspecified for most nouns relates to two topics that I will explore in further chapters. One is the principle of Lexicon Optimization (Prince & Smolensky (1993:196). Another topic of relevance to this question is the question of how rendaku might have evolved historically and how compound words would have become lexicalized, given the model of lexicalization of compounds we have adopted here. The question of how the phenomenon of rendaku likely developed, along with its property of being subject to lexically determined blocking, is what I shall explore next in chapter 4.  76 4. Why there are lexicalized exceptions to rendaku voicing: an account based on the evolution of rendaku from an utterance-to-utterance variable phenomenon to a word-to-word variable phenomenon In the previous chapter we developed an account of how blocking of rendaku voicing in nounnoun compounds can be explained through lexical prespecification of voicing features in the listing of a compound and/or ofthe individual morphemes of the compound. In this chapter I will consider the question of why these lexicalized cases of blocking developed in the first place. Directly relevant to this question is the fact that rendaku voicing mainly exhibits word-to-word variation in voicing rather than utterance-to-utterance variation. That is, with the exception of a small number of compounds that can be pronounced either with or without voicing, rendaku voicing varies mainly between individual compounds. This type of variation is in contrast to what we might call "utterance-to-utterance variation", where a single input form has more than one possible output form. In §4.11 discuss recent analyses in the literature of utterance-to-utterance variation. In §4.2 I show that although these kinds of analyses can account for utterance-toutterance variation, they cannot directly account for word-to-word variation. In §4.3 and §4.41 shall argue that the kind of word-to-word variation exhibited by rendaku can be explained as the result of a process that originally began with utterance-to-utterance variation. If rendaku voicing originally showed utterance-to-utterance variation — and there is some historical evidence to support this ~ word-to-word variation could have developed as a result of certain compounds being more likely to be pronounced as voiced and others being more likely to be pronounced as voiceless. 1 discuss the kinds of factors that could cause certain compounds to be voiced more often than others at a time when rendaku voicing exhibits utterance-to-utterance variation. Once these tendencies became established for certain compounds, lexicalization of compounds that preferred not to voice could, over time, cause speakers to posit an input form for them that would result in their always being pronounced without voicing and ceasing to exhibit utterance-toutterance variation. I then use one particular model of variation (Boersma & Hayes (1999)) to show how lexicalization of rendaku blocking could cause utterance-to-utterance variation to 30  The possible historical account that I propose for the development of word-to-word variation in rendaku voicing rests on adoption to at least some extent ofthe lexical diffusion hypothesis, as discussed in detail by Labov (1994). The concept of lexical diffusion, as expounded in the work of Cheng and Wang (1977), Chen (1975), is in direct opposition to the Neogrammarian principle that sound change is completely regular and applies across the board rather than to individual words. Labov cites Osthoff and Brugmann (1878) in describing the Neogrammarian position that sound changes occur without exception. 30  Labov discusses studies such as Wang and Cheng (1977) on sound change in Chinese that present detailed and copious evidence against the Neogrammarian hypothesis of across-the-board sound change. See Labov (1994) chapter 15 and references cited there for detailed discussion of these issues.  77 change, over time, to a state of word-to-word variation for rendaku voicing. In §4.5 I return to the matter ofthe effect of prosodic size on rendaku blocking.  4.1 Literature review of analyses of derived variable effects In this subsection I examine some previous analyses of other kinds of variable effects, with a view to adapting those analyses to the kinds of variable effects I am examining here. 4.1.1 Unranked constraints: Ringen & Heinamaki (1999) and Anttila (19971 Anttila (1997) examines linguistic variation, where more than one output is possible for a given input for a given speaker. He explains these effects through unranked constraints. For example, if constraints A , B, and C are unranked, all possible rankings can occur in the grammar: A » B » C, A » C » B, B » A » C, B » C » A , C » A » B, C » B » A . Each ranking is considered equally probable. If output X is optimal for exactly two of the above rankings for a given input, X will occur 2/6 times or 33% ofthe time. 31  Anttila (1997) examines in particular the problem of allomorphy in Finnish genitive plural suffixes. The allomorphs that occur are of the following two types, which Anttila calls "strong" and "weak" allomorphs. The data in (56) are from Anttila (1997:37) (56) a. STRONG forms: heavy penult ( C V V , C V V C ) followed by suffix with /t,d/ onset: /puu/ /potilas/  "tree" "patient"  pui.den po.ti.lai.den  b. W E A K forms: light penult (CV) followed by suffix with /j/ onset or no onset: /kala/ /margariini/  "fish" "margarine"  ka.lo.jen mar.ga.rii.ni.en  The problem is that there is a class of nouns for which either the strong form or the weak form can occur. These stems are CV-final and have at least three syllables. Anttila comments that this class "includes loans and foreign names, which shows the productivity of the phenomenon." Examples of this class are shown in (57).  'Similar to Anttila's proposal that each possible ranking occurs equally often is the proposal made here in chapter 6 that each type of input-output pair in pitch accent patterns occurs equally often. 3  78 (57) /naapuri/ /Reagani/ /moskeija/ /ministeri/  neighbour" 'Reagan" II. 'mosque" h. 'minister".11 h.  naa.pu.rei.den ~ naa.pu.ri.en Rea.ga.nei.den ~ Rei.ga.ni.en mos.kei.joi.den ~ mos.kei.jo.jen mi.nis.te.rei.den ~ mi.nis.te.ri.en  The variation that occurs in (57) also, according to Anttila, has the following properties. Most speakers report that for a given word, one of the allomorphs sounds better than the other, even though both are definitely considered grammatical. In addition, which allomorph is preferred depends on the stem: for some stems it is the weak type; for others it is the strong type. Anttila explains the variation that occurs in (57) as being due to the fact that neither the strong nor the weak form will violate the highly-ranked constraints N o C L A S H (no two consecutive accented syllables), P E A K - P R O M I N E N C E (avoid stressed lights) or W E I G H T - T O - S T R E S S (avoid unstressed heavies). In addition, the principle that main stress falls on the first syllable is inviolable in Finnish. Words like /puu/ in (56)amust have strong suffixes. If the weak form occurred as either *pu-jen or *pu-jen we would violate P E A K - P R O M I N E N C E . Words like /kala/ in (56)b must have weak suffixes. If the strong form occurred as in *kd.loi.den or *kdJdi.den we would violate W E I G H T - T O - S T R E S S in the first case and N O - C L A S H in the second. But words like /naapuri/ can have either allomorph. Output naa.pu.rei.den, with secondary stress on the third syllable will violate none of the three above constraints; neither will output naa.pu.ri.en. The fact that naa.pu.rei.den and naa.pu.ri.en both occur, but with one being preferred over the other, is explained by Anttila by several lower ranked constraints that are unranked with respect to each other. One of the two outputs can occur more frequently than the other when there are more rankings of those constraints for which it would be optimal than is the case for the other output. The preference for one output over the other is thus based on phonological factors in Anttila's account. 4.1.2 Boersma & Haves (T999) Boersma & Hayes (1999) approach this kind of variation problem in a different way. They propose a grammar in which there is a continuous scale of constraint strictness rather than discreet, categorical rankings. They also propose that "at every evaluation of the candidate set, a small noise component is temporarily added to the ranking value of each constraint, so that the grammar can produce variable outputs i f some constraint rankings are close to each other."  79 Their constraint model works as follows. The ranking value of each constraint is not a single point on a scale but rather a normal probability distribution, where every constraint has a distribution with the same standard deviation. In practical terms, however, the constraint hierarchy is still categorical rather than continuous. This is because every time a constraint is implemented by the grammar, there is a process of random selection by which a temporary specific ranking value for that constraint is determined. That particular ranking value is valid only for that one occasion the constraint is used. Every subsequent implementation of the constraint requires a new process of random selection. For each process of random selection, a selection point is chosen by a randomizing function that obeys the probability distribution of the constraint. The function will choose a point with temporary ranking value r such that the probability of choosing value r conforms to that normal probability distribution. The farther away r is from the mean value of the probability distribution, the more unlikely it is to be chosen. In this way, i f two constraints A and B are located close together on the scale (say, within one standard deviation) but with A's distribution higher than B's, it is still possible for B to dominate A for a measurable minority of selections. This will mean that i f A » B produces a different output than B » A , both outputs can occur some of the time, depending on what temporary selection point was chosen for each constraint in a given utterance. In their paper, Boersma & Hayes' model is shown to explain statistical, variable effects that occur when derived forms exhibit utterance-to utterance variation of the same type that is analysed by Anttila. In the next subsection I will examine whether either of these two models can directly account for word-to-word variation in rendaku voicing. 4.2 The problem of explaining word-to-word variation Both unranked constraints (Anttila (1997) and Ringen & Heinamaki(1999)) and a continuous constraint scale like that of Boersma and Hayes (1999) can account for phenomena in which more than one output can occur for the same input; however, they cannot directly account for what I am calling "word-to-word variation". That is, they cannot account for a situation in which for a class of input forms S, some phonological process applies to a random subset of S but not to all members of S. Rendaku voicing in Japanese exhibits these kinds of properties: it applies to the majority of compound words in which the second conjunct underlyingly has an initial voiceless obstruent, but there is an unpredictable minority of compounds that resist rendaku. Let us examine how we might try to make Anttila's model of unranked constraints work for rendaku voicing in Japanese. In §2.4 below I shall illustrate what happens when we try to derive the wordto-word variation in rendaku voicing through unranked constraints. 4.2.1 Applying unranked constraints to rendaku voicing In order to try to derive the statistical effects of blocking of rendaku in certain cases, we would have to employ a constraint that blocks rendaku in the compounds where rendaku does not occur. Following the analysis in §3.2.2,1 shall adopt again the following constraint from (42), repeated  80 here as (58) to derive rendaku voicing. (58) Dep[-voice] & Align-Morpheme-Left-Word-Left: "The left edge of every morpheme must be aligned with the left edge of a word." A N D "For every [-voice] feature in the output, there is a corresponding [-voice] feature in the input." To derive rendaku blocking, let us again employ the constraint Obs-Voi, repeated from (15). (59) Obs-Voi: "Every root node in the output that has the feature [-son] must have a path to a [voice] feature. In order for Anttila's model to derive blocking of rendaku in only a percentage of cases, we must have D E P [ - V O I C E ] & A L I G N - M O R P H E M E - L E F T - W O R D - L E F T unranked with respect to Obs-Voi. When constraints A and B are unranked with respect to each other, A will dominate B 50% of the time, and B will dominate A the other 50%.  Thus i f D E P [ - V O I C E ] & A L I G N - M O R P H E M E - L E F T - W O R D - L E F T and Obs-Voi are unranked, then we would expect that half the time we would have the following hierarchy: (60)  DEP[-VOICE] & ALIGN-MORPHEME-LEFT-WORD-LEFT »  OBS-VOI  And the other half of the time the opposite hierarchy: (61)  OBS-VOI »  DEP[-VOICE] &  ALIGN-MORPHEME-LEFT-WORD-LEFT  If (60) holds, rendaku will be blocked. If (61) holds, rendaku will occur. But in Anttila's model, both hierarchies can occur for a given word. Thus, i f we adopt the hypothesis that D E P [ - V O I C E ] & A L I G N - M O R P H E M E - L E F T - W O R D - L E F T and O B S - V O I are unranked, we would expect that for a given compound word, rendaku can occur in half the utterances in which it occurs. This is not the effect we are trying to achieve. Rather, we want to explain why for certain words, rendaku occurs, and why for others it does not. The same will be true of Boersma and Hayes' model. Instead of having the two constraints unranked with respect to each other, their model would have the two constraint ranked close together on a continuous scale. If the two constraints are ranked close enough to each other, a pair of selections for each constraint for a given utterance will yield two selection points such that sometimes D E P [ - V O I C E ] & A L I G N - M O R P H E M E - L E F T - W O R D - L E F T will outrank O B S - V O I and other times the reverse will be true, with each ranking occurring for a non-infinitesimal fraction of utterances. In Boersma & Hayes' model, once variable outputs occur, the phenomenon will be, at least in the  81 short run, self-perpetuating in that a learner will hear the variable outputs and will adjust their grammar such that they will produce variable outputs as well. Boersma and Hayes (1999) show in detail through a computer simulation of their model that a learner will develop a grammar that produces variation when they are exposed to hearing variable outputs for a set of words. Although Boersma & Hayes' model cannot directly account for word-to-word variation in rendaku voicing, later in this chapter I will propose that it can help explain why rendaku blocking became lexicalized in such a way as to produce word-to-word variation in rendaku voicing.  4.3 Rendaku irregularity as the legacy of utterance-to-utterance variation In chapter 3 we saw that rendaku voicing in noun-noun compounds is subject to unpredictable cases of blocking. Many of these cases can only be accounted for by the inclusion of some feature in the lexical listing of a blocking compound that does not exist in the listings of either of the individual members. This type of blocking cannot be accounted for solely by a phonological rule or constraint since i f it did, we would have no explanation for why blocking occurs in some compounds but not others that have similar phonological properties. We advanced the hypothesis that compound words are minimally lexically listed as references to the individual morphemes in the lexicon and that because of that, further material could be added to that listing of references: e.g. a floating [-voice] feature. If adding further material to the lexical listing of compound words is possible, we must ask what restrictions there are on adding such features. If we can add a floating voice feature, why can't we add several root nodes, with features attached? Could we, for example, expect to see compounds like the following: x,y,ma x=haru "spring" y=kusa "grass" output: *haru-makusa There are a number of things that could rule out hypothetical compound words like *harumakusa. One is that the sequence of sounds /ma/ in this compound violates an Anchoring constraint: Anchor-Root-Left: "For every Root node R at the left edge of domain D in the input there is a corresponding root node R' at the left edge of domain D' in the output, where R corresponds to R' and D corresponds to D'. This constraint is violated in certain compounds in order to satisfy other constraints. For example, in the compound haru-same "spring rain", derived from nouns haru "spring" and ame "rain", the epenthesized Isl is not part of any morpheme. It occurs in the output to avoid vowel hiatus. Not all compounds insert a consonant to avoid vowel hiatus. For example, the compound haru-arasi  82 "spring storm" has the same IvJ-ldJ sequence in the output that is avoided by haru-same. Because consonant epenthesis to avoid vowel hiatus in compounds is the exception rather than the rule, it is more reasonable to posit that the Isi in haru-same occurs as part of the input form of the compound: x,/s/,y: x=haru; y=ame The I si surfaces in the output in order to avoid vowel hiatus, even though its appearance violates Anchor-Root-Left: The following constraint seeks to avoid vowel hiatus. *Nuc,Nuc: A sequence of two nuclei may not occur.  32  (62) input": x,/s/,y: x=Haru; y=ame  Dep-Feature  *Nuc,Nuc  Anchor-RootLeft  *!  haru-ame  *  •^haru-same  In compounds that do not have the I si as part of the input, no epenthesis will occur, i f DepFeature is highly ranked. (63) input": x,/s/,y: x=Haru; y=araSi  Dep-Feature  Anchor-RootLeft  *  •^haru-arasi haru-sarasi  *Nuc,Nuc  *!  *  Compounds that do not need to repair vowel hiatus will not allow an extra consonantal root node to surface. For example, a sequence of HI followed by HI across a morpheme boundary can be part of the same syllable in the compound ki-iro "yellow colour", unlike the sequence lul-lal. (64)  This constraint can be seen as an instance of Suzuki's GOCP. See page 171.  83 input": x,/s/,y: x=Ki "yellow"; y=iro "colour"  Dep-Feature  *Nuc,Nuc  Anchor-RootLeft  •^ki-iro *!  ki-siro  Thus, any material that is added to the listing of a compound word can only surface i f it serves to satisfy some highly-ranked constraint. In the case of our hypothesized [-voice] feature for compounds that block rendaku, the presence of this feature serves to satisfy the highly ranked constraint Obs-Voi, since it will prevent a voiced obstruent from surfacing. There is also a second way in which the addition of lexical material to the listing of a compound is likely to be restricted. This has to do with the fact that not only are compound words derived synchronically from simplex words; many are also derived diachronically. That is, many compound words originally developed from a syntactic combination of their constituent morphemes. The way this occurred for many compounds in Japanese is documented by Martin (1987). When rendaku voicing first appeared in an earlier stage of Japanese, it did so as a reflex ofthe functional head no, which occurred as an enclitic on the first member of a phrase that eventually became a compound. For example, a compound word like huyu-zora would have originated as syntactic phrase huyu-no sora. Martin (1987) gives historical evidence that Japanese compounds that have existed in the language for a long time were originally pronounced in this way as syntactic phrases. When the phrase became a morphological word rather than a syntactic phrase, the particle no disappeared, but was preserved as voicing on the initial obstruent ofthe second conjunct. There was likely an intermediate stage where the particle no was simply pronounced as a mora nasal, as in huyu-n-sora. This truncation of particles no and ni to a mora nasal in fast speech still occurs frequently in modern Japanese. 33  The development of the nC sequence into a voiced obstruent is exactly parallel to the original historical development of voiced obstruents in Yamato Japanese as documented by Martin (1987). If compound words developed from syntactic phrases, there must have been a time of transition for a given compound word, when it was lexicalized. A prior generation would have derived the  This story of the historical evolution of rendaku voicing accounts historically for why dvandva compounds did not develop rendaku voicing. A compound such as kami-hotoke "Gods and Buddhas" could not have originated from the phrase kami-no hotoke, which would mean Buddha of God. 33  84 word from a syntactic phrase. For example, a compound like takara-bune "treasure boat", would have been originally pronounced as syntactic phrase takara-no hune, with a genitive case-particle occurring as an enclitic on the first noun. Leaving aside for the moment the problem of rendaku voicing, there must have been a stage where one generation derived the compound as a syntactic phrase: perhaps as takara-n' hune or takara-f+voicej hune, and the next generation perceived the phrase as a lexically listed compound. When this occurred, the first generation of learners that posited a lexical listing for the compound would have been basing their input form on an output form takara-bune whose speakers derived the word as a phrase from two separate lexical listings takara and hune. There is no way that this prior generation could have produced an output form with added material unless there was some phonological reason for its occurrence (such as we find, for example in compound word haru-same "spring rain" (discussed above), derived from haru "spring" and ame "rain", with /s/ epenthesized to avoid vowel hiatus.) Thus the first generation that derived takara-bune as a compound posited an input form on the basis of an earlier generation's output form that had no room for the adding of extra material, since the derivation was based only on simplex words. Let us now return to the question of the irregularity of rendaku voicing. If we allow addition of extra material to the listing of a compound word, albeit in a very restricted way, why does it occur at all, making rendaku voicing irregular? Why would some compounds end up with this added feature that blocks rendaku while others do not? The most natural explanation for this fact, and one for which there is some historical evidence, is that rendaku is a phenomenon that originally showed much utterance-to-utterance variation and that it evolved in a predictable way such that utterance-to-utterance variation was replaced by word-to-word variation. Utterance-to-utterance variation is naturally explainable, in Boersma & Hayes' model, by the close ranking of two constraints. We only have to show, then, why rendaku would evolve to a state where it exhibits word-to-word variation instead of utterance-to-utterance variation. Let us examine the ways in which rendaku voicing could have changed from a phenomenon that originally showed utterance-to-utterance variation and then later showed word-to-word variation. When phrases such as huyu-n-sora became compounds, there is evidence that for at least some compounds, there was variation. They could be pronounced either with or without voicing on the initial obstruent of the second conjunct. For example, Martin (1987:100) gives historical evidence of what he calls "doublets" (i.e. with two variable output forms: one with voicing and one without) at an earlier stage of the language:  85 (65)  ita "board"  two "door"  ita-two/ita-dwo "plank door"  yama "mountain"  ta "field"  yama-ta/yama-da "mountain field"  kafa "river"  two "ferry"  kafa-two/kafa-dwo "river-ferry"  ka "deer"  kwo "child"  ka-kwo/ka-gwo "baby deer"  sifo "tide"  fune "ship"  sifo-fune/sifo-bune "sea-ship"  fana tati "paddy-path" "blossom"  ki/ko "tree"  tati "stand"  tati-fana/tati-bana "orange blossom"  ko-tati/ko-dati "grove"  There is also evidence that in classical times, voicing was quite variable generally, since the orthography often did not distinguish voiced from voiceless obstruents. The second ingredient that is relevant to the change of rendaku voicing from an utterance-toutterance-variable phenomenon to a word-to-word variable phenomenon is a combination of various random factors that will inevitably affect the outputs that a language learner hears. These random effects will affect some words in one direction and other words in another direction so that the total effect of these factors across the whole language is insignificant. I summarize below some of the kinds of factors that could cause some compounds to be voiced more frequently than others. Factor 1: Performance and perception errors Performance and perception factors may make speakers in a population, on the average, more likely to accidentally voice compound A (or likely to incorrectly hear compound A as voiced) and less likely to voice compound B (or hear compound B as voiced), because of the differing  86 performance and perception challenges in pronouncing and perceiving different words, but the overall effect will be that these random perturbations will cancel each other out, when we consider all the words of the language (unless independent factors were relevant.)  Factor 2: Geographic variability of rendaku in earlier stages of Japanese There is no evidence that when rendaku voicing first appeared, it appeared simultaneously among all members ofthe Japanese speaking population. The null hypothesis would be that it appeared first among some geographical and/or social subgroup. This conjecture is strongly supported by the great linguistic diversity among Japanese dialects, which still persists today in spite of the influence of mass communications. Words that were used more commonly by a substratum of the population that was more likely to use rendaku voicing would be voiced more often than words that were used mainly by groups that did not voice as much. The overall effect would be that some words would be voiced more often than others. This would not affect the average rate of voicing for the whole population but could result in differences for the rate of voicing for certain compounds. Factor 3: Variation among speakers in the rate of rendaku voicing In §41 present experimental evidence on rates of rendaku voicing for fictitious compounds with different speakers. In this experiment, native speakers were asked to read fictitious compounds. Because these compounds are not real, the speaker must use rendaku voicing as a productive process to determine whether voicing occurs or not. We find that among just four speakers, the overall rate of voicing for the same sets of compounds varied from 22% to 64%. This suggests that when presented with a new compound word, different speakers will have different grammars with respect to their ranking of constraints responsible for rendaku voicing. Such a situation would be expressed in the following way in Boersma and Hayes' model. For simplicity, let us use just two conflicting constraints that are active at the time that rendaku is productive: F for faithfulness ("anti-rendaku") and R for whatever constraint derives rendaku. If different speakers effect rendaku voicing for a compound more than others, they will have F and R at difference distances apart on the constraint scale. If different speakers were more likely to voice a new compound when it first appeared than others, then some compounds could potentially have a higher rate of voicing than others, depending on what speakers used those compounds the most and depending on sociolinguistic factors. Factor 4: The overall strength of constraints that derived rendaku voicing were higher on average during one historical period than during another If rendaku went through a period of utterance-to-utterance variation, the average frequency at  87 which all compounds in the language were voiced was not necessarily constant at 50% — it could have been any measurable level. In addition, given the fact that there was a time when rendaku did not occur at all and a later time when it was generally applied across all dialects, the average rate of voicing must have changed over time. That is, there must exist historical times T, and T such that the overall rate of rendaku voicing of all compound words across the population was lower at time T than at time T . 2  {  2  We also know that not all compound words became current at the same time. For example, in (8) we saw examples of compounds that were coined as late as 1912. It must be the case, then, that there are some compounds that became current at a time when rendaku voicing occurred in a high percentage of utterances and other compounds that became current at a time when rendaku voicing occurred for a lower percentage of utterances.  4.4 The effects of the above factors There is historical evidence that an earlier stage of Japanese had variable outputs for compound words, with both voicing and no voicing possible. Because of the effects of the above four factors, some compound words would have been voiced more frequently than others. When a learner hears a particular compound word voiced in an utterance, in Boersma and Hayes' model they will slightly adjust their grammar so that the pro-rendaku constraint is more highly ranked. When they hear a compound word not voiced, they will adjust their grammar so that pro-rendaku constraints are ranked lower than anti-rendaku constraints. If compounds were voiced, say, 75% of the time on the average at a particular historical time, a speaker would rank anti-rendaku constraints at a level relative to pro-rendaku constraints so that this 75% voicing level would be maintained for compound words as a whole. If no other factors could affect the overall rate of voicing, Boersma and Hayes' learning algorithm will predict that this state of utterance-toutterance variation will continue from one generation to the next, with no change in the overall rate of voicing.  But given our hypothesis of the lexicalization of compounds, there is an additional way in which a new generation of speakers could derive the output form for a given compound word. In Boersma & Hayes' model, not only does a learner adjust their ranking of constraints when they hear an output form. They must also determine the input form of the compound from the output that they hear and from their grammar. Let us consider the following plausible situation in an earlier stage of Japanese. Consistent with the fact that rendaku voicing currently occurs in at least 75% of nouh-noun compounds, rendaku voicing at the stage we are considering occurred more often than rendaku blocking, but rendaku exhibited utterance-to-utterance variation. That is for most compound words, rendaku voicing sometimes occurred and sometimes it did not. Because of the various sociolinguistic and performance factors outlined above, some compounds were heard voiced more often than others.  88  Let us consider compounds whose second conjunct is tama "ball." Consistent with the fact that in modern Japanese, six out of eight Yamato noun-noun compounds with this word are voiced, let us suppose that at this earlier stage, because of certain performance and/or sociolinguistic factors, a learner would hear compounds with second conjunct tama "ball", frequently pronounced with voicing. Suppose also that they heard utterances of kani-tama "crab-ball" frequently pronounced without voicing. That is, our hypothesis is that certain compounds ended up with no voicing in modern Japanese because at an earlier stage they were frequently not voiced, even though voicing was also possible. Given a linguistic environment in which voicing was heard on compounds in general more often than not, it is most plausible that a learner would have adjusted their grammar such that the pro-rendaku constraint Align-Morpheme-Left-WordLeft & Dep [-voi] was more highly ranked than the anti-rendaku constraint Obs-Voi. This means that in Boersma and Hayes' model, the learner would have made a constraint selection most of the time in which Align-Morpheme-Left-Word-Left & Dep[-voice] outranks Obs-Voi, but in a minority of selections they would have the opposite ranking. Let us consider the (more frequent) selection of Align-Morpheme-Left-Word-Left & D e p [ - v o i c e ] » Obs-Voi together with what we hypothesize to be the more frequent hearing of kani-tama with no rendaku voicing. In Boersma and Hayes' model, when a learner hears an utterance with a compound like kani-tama pronounced with no voicing, the learner would adjust their constraint ranking slightly so that AlignMorpheme-Left-Word-Left & Dep [-voice] is ranked a little lower and Obs-Voi a little higher. But over the long haul, the learner's ranking of Obs-Voi should stay below the ranking of AlignMorpheme-Left-Word-Left & Dep [-voice], since they will hear more voiced compounds than compounds without rendaku. This means that the learner will never reach a state where their grammar derives the output form of kani-tama as voiceless for most utterances. Most of the time their grammar will make the wrong prediction for this compound. Because of various extra-phonological factors, they usually hear this compound pronounced with no voicing on the initial obstruent. If this were the case for a compound like kani-tama, the most logical move would be for them to adjust their input form for the compound from (66) (a) to (66) (b). (66) (a) A, B  A=kani;  B=tama  (b) A , B, [-voice]  A=kani;  B=tama  If the learner can posit a new input form for compounds like kani-tama that frequently block voicing, interesting changes will result in the phenomenon of variable outputs, as will be shown below.  89 Recall that variable outputs are caused, in Boersma & Hayes' model, by close ranking of two constraints on a constraint scale. In our hypothesized account of variable outputs for rendaku voicing at an earlier stage of Japanese, we ranked Obs-Voi below but close to the conjoined constraint Dep[-voice] & Align-Morpheme-Left-Word-Left. In the tableau for the compound yuki-dama, shown below, as long as the speaker makes a selection in which Dep [-voice] & Align-Morpheme-Left-Word-Left dominates Obs-Voi, voicing will occur. (67) yuKi + Tama  Dep[-voice] & Align-MorphemeLeft-Word-Left  yuki-tama  *!  Obs-Voi  *  yuki-dama  If a learner hears, say, 25% of compounds pronounced with no voicing, they will maintain a constraint hierarchy in which Dep[-voice] & Align-Morpheme-Left-Word-Left is ranked a little above Obs-Voi. But consider now what will happen i f a learner posits an input with a [-voice] feature added to a particular compound. Suppose that they posit this kind of input for a compound that according to our hypothesis was frequently not voiced — say, kani-tama, which is not voiced in modern Japanese. As shown by tableau (68), their grammar will now predict that it should surface as nonvoiced 100% of the time, regardless of how a speaker selects a relative ranking of Dep[-voice] & Align-Morpheme-Left-Word-Left and Obs-Voi. (68) Kani + Tama [-voice]  Dep[-voice] & Align-MorphemeLeft-Word-Left  Obs-Voi  •^kani-tama kani-dama  *  Both candidates now satisfy Dep [-voice] & Align-Morpheme-Left-Word-Left, since there is a [-voice] feature in the input. Thus Obs-Voi will rule out the voiced candidate regardless of the ranking of the two constraints. In the minority of instances when the learner hears the compound voiced, they will try to adjust  90 their grammar such that some constraint that derives voicing will rise on the constraint scale. But the constraint Dep[-voice] & Align-Morpheme-Left-Word-Left is no longer able to derive voicing, with a [-voice] feature in the input. They will have to gradually promote some other constraint or constraints that derive rendaku for these utterances. There is no obvious alternative constraint that a learner might promote in response to hearing kani-dama as voiced when they have posited an underlying [-voice] feature. A hypothetical constraint such as the following, which takes voiceless obstruents to be marked in Japanese, contradicts the fact that they occur more robustly than voiced obstruents in the language.  (69) * [-sonorant, -voice]: Every root node with a [-sonorant] feature must not have a path to a [-voice] feature. Even i f such a constraint were possible, which is doubtful, and i f it were promoted relative to Obs-Voi whenever kani-tama is heard, there would be many more utterances that would cause the learner to demote this constraint relative to Obs-Voi. For example, suppose that the learner has raised the ranking level of * [-voice] relative to Obs-Voi such that 5% ofthe time they will make a constraint selection in which * [-voice] outranks ObsVoi. Such would be the predicted level under Boersma & Hayes' model i f the learner were to hear a compound like kani-tama voiced 5% of the time. Consider now what happens when the learner hears a simplex word with an initial voiceless obstruent. Regardless of whether the initial voiceless obstruent in sora "sky" is underspecified for voicing, the ranking O b s - V o i » *[-voice] will correctly derive the output. (70) (a) "S" represents a coronal fricative underspecified for voicing input: Sora "sky"  Obs-Voi  *  a. sora [-voice] b. zora [+voice]  * [-voice]  *!  91 (b) input: sora "sky"  Obs-Voi  * [-voice]  [-voice]  *  a. sora [-voice] b. zora  *!  [+voice] But 5% of the time the learner will make a constraint selection in which *[-voice] outranks ObsVoi. This will mean that the learner's predicted output does not match the utterance and the learner will demote * [-voice] relative to Obs-Voi. Since simplex words with initial voiceless obstruents will be heard more commonly than compounds like kani-tama that usually do not voice, the effect will be to demote the constraint * [-voice] relative to Obs-Voi to the extent that the learner would very rarely select * [-voice] above Obs-Voi, thus voicing this compound very rarely. Thus the next generation of speakers would tend to uniformly block rendaku in this kind of compound, with almost no variation — at least much less than the previous generation. Hearing simplex words with initial voiced obstruents would not affect the learner's ranking of the constraints *Obs-Voi and *[-voice], assuming high ranking of the constraint Max-PathFeature, as proposed in (16) and (17). There will be a second effect of this lexicalization of a [-voice] feature for compounds that are frequently not voiced. Once compounds like kani-tama gain a lexicalized [-voice] feature, they can be derived as non-voiced without ranking the constraint Obs-Voi almost as high as Dep[voice] & Align-Morpheme-Left-Word-Left. This will decrease the tendency to promote the constraint Obs-Voi. Suppose, for example, that before a new generation of speakers started positing a lexicalized [-voice] feature for compounds such as kani-tama, about 80% of noun-noun compounds were voiced. But as we have proposed, some compounds were voiced more often than others because of various performance and sociolinguistic factors. At this stage, the grammar would rank Obs-Voi at a level below Dep [-voice] & Align-Morpheme-Left-Word-Left such that a given compound was 80% likely to voice. As long as a new generation hears 80% of compounds as voiced, they will maintain this relative level of the two constraints in their grammar. But suppose that compounds that gained a lexicalized [-voice] feature accounted for 50% of the cases of non-voicing across the language. The generation that lexicalized those compounds now do not need to account for those 50% of the non-voiced words by giving Obs-Voi a high enough ranking, since they will derive them as non-voiced regardless of the relative ranking of Dep[-  92 voice] & Align-Morpheme-Left-Word-Left and Obs-Voi, as we saw in the tableau in (68). Hearing those words will not cause them to promote Obs-Voi or demote Dep[-voice] & AlignMorpheme-Left-Word-Left in their grammar. Thus their grammar only needs to account for the remaining 50% of the non-voiced compounds, or 10% of the total. This means that their grammar will tend to rank Dep[-voice] & Align-Morpheme-Left-Word-Left and Obs-Voi such as to derive, on the average, 90% voicing for compounds rather than 80%. If the compounds that frequently voiced and did not receive a lexicalized [-voice] feature were voiced more than the average, they will now be voiced more than 90% of the time. As this tendency progresses through each generation, and as words that regularly resist rendaku are lexicalized with a [-voice] feature, the constraint ranking between anti-rendaku constraints and pro-rendaku constraints will be affected only by the remaining words. There will thus be a tendency for the voicing rate on these remaining words to increase over each generation, until a point is reached where most words no longer exhibit variation with respect to rendaku voicing. In summary, then, two things will make it possible for utterance-to-utterance variation to gradually change to a state of word-to-word variation for a process like rendaku voicing: (a) the possibility of adding features such as [-voice] to the lexical listing of a compound; and (b) variation in voicing rate among individual compounds due to sociolinguistic and performance factors. This gives us an account of how word-to- word variation in rendaku voicing could develop from an original state of utterance-to-utterance variation. As outlined in footnote 30, this view of rendaku voicing as a phenomenon that evolved to exhibit word-to-word variation is consistent with the hypothesis, expounded in Labov (1994), and in references cited there (see, for example, Wang and Cheng (1977)), that sound change can occur through lexical diffusion. The principle of lexical diffusion views sound change as applying to words as opposed to sounds. Labov cites evidence from a wide variety of languages in studies by Chen and Wang (1975), Sherman (1973), Ogura (1987) and Krishnamurti (1978) that support the hypothesis that sound change occur in words rather than across the board for a particular phoneme.  4.5 Summary of chapter 4 In this chapter I have adopted Boersma & Hayes' model of a continuous scale of constraint ranking to account for a state of utterance-to-utterance variation in rendaku voicing that, as suggested by some evidence, may have existed at an earlier stage of Japanese. B y positing a number of extra-phonological factors that could affect the voicing or perception of voicing on a given compound, we developed the hypothesis that certain words and certain compounds deviated in one direction or another from an average frequency of voicing at such a stage ofthe language. I argued that under Boersma and Hayes' model, the possibility that a speaker could add features to their input form of a compound would predict that utterance-to-utterance variation in  93 rendaku voicing could evolve to a state where word-to-word variation of the type we see in modern Japanese could take over, with exceptions to rendaku being mediated by the inclusion of voicing features in the listing of a compound, according to the model of input forms for compounds that I proposed in chapter 2. In the next chapter I will examine a different case of blocking of rendaku voicing: blocking in noun-verb compounds. The patterns of voicing in these compounds differ from those in nounnoun compounds in that there is evidence that voicing is affected not only by lexical factors, but also by phonological factors.  94 5. Blocking of rendaku in noun-verb compounds In this chapter we will examine the voicing behaviour of what we might call "noun-verb" compounds. These compounds contrast with noun-noun compounds, examined in previous chapters, because, although they also experience rendaku voicing, voicing appears to be blocked by phonological and morphological factors as well as by lexical factors. Noun-verb compounds are formed from a noun and a verb, but the verb occurs in its deverbal noun form, which is the bare verb root plus an epenthesized vowel i f the verb stem is consonantfinal. The whole compound behaves like a nominalization in that it cannot be inflected, but it can syntactically behave like a verb in certain constructions such as a light verb construction. A typical example is kusay tori (grass-take) which means "weeding." It can function as a noun when it occurs in a case-marked position. It can also be used predicatively when inflection is borne by the light verb sum, as in the phrase kusa-tori-suru, which means "to do weeding." Notice that the Ixl does not voice to a /d/. This blocking of rendaku is typical for a noun-verb compound that has similar phonological and morphological properties to kusa-tori, as we shall see in §5.1.2 below. v  In noun-verb compounds the patterns of blocking are much more complex than what we saw for noun-noun compounds in §3. Some of the conditions under which blocking of rendaku voicing can occur for noun-verb compounds will be described presently. Some of the specific properties of rendaku blocking in noun-verb compounds are as follows. Voicing is more likely to be blocked when the verb stem has less than two moras: for example when the citation form of the verb is CVC-u. Voicing occurs more often when the verb's citation form is C V C V - r u or C V C V C - u . But when the verb is even longer: for example C V C V C V C - u in the citation form, voicing occurs less often again. We also find that other factors can affect blocking: (a) the morpho-syntactic relation between the noun and the verb and (b) the underlying accent pattern of the verb. The effect of the type of morpho-syntactic relation between the noun and the verb on voicing in Japanese noun-verb compounds has been observed in the literature. For example, Ito & Mester (1998) show that in noun-verb compounds, blocking is more likely to occur when there is an argument relation between the noun and the verb than when there is an adjunct relation. The effect of the underlying accent pattern of the verb on rendaku is observed in Sugioka (1984) who comments that the accent pattern of the verb can also influence the occurrence of rendaku. What we will observe in noun-verb compounds is that these three blocking factors: prosodic size, underlying accentuation, and morphosyntactic relation can all work together to have a "ganging up" effect on rendaku. That is, when all three blocking factors occur simultaneously, blocking of rendaku is strongest.  95 In this chapter I shall begin, in §5.1, with empirical evidence of how blocking occurs robustly when all three of these blocking factors occur simultaneously. 5.1 Examples of rendaku blocking in noun-verb compounds Let us begin by examining noun-verb compounds in which the verb is (a) underlyingly accented, (b) has a C V C stem, and (c) has an argument relation with the noun. We shall see that in this type of compound, rendaku usually does not occur at all, though there are a few exceptions that mostly occur with the same verbs. If the verb is underlyingly unaccented, rendaku will occur more often, but is sometimes still blocked. I present below two closely related sets of data for the same set of verbs. The verbs are all accented verbs with a C V C stem. In the first set of data are noun-verb compounds with an adjunct relation between the noun and verb. In the second set are compounds with an argument relation between the noun and verb. Only compounds in which the noun is of Yamato (native) origin have been included, since the presence of a Sino-Japanese morpheme in a compound can sometimes block rendaku voicing. A pitch accent on the verb is indicated by an acute accent. In §5.1.1 below I show that for compounds with C V C , accented verb stems, when there is an adjunct relation, rendaku robustly occurs. 5.1.1 Adjunct compounds  34  (71) ha-u "to crawl" (compare with (89)) hara-bai yoko-bai  belly-crawl side-crawl  lying on one's belly crawling sideways  yo-bai  Onight-crawl night crawling, sneaking visit  (72) hor-u "to dig, carve" (compare with (92)) ki-bori  tree-dig  35  wood carving  For the verb hir-u "to get dry" (compare with (91)) I can only find the following V - V compound, in which neither verb acts like the morphosyntactic head. This type of "dvandva" compound generally resists rendaku. (See Sugioka (1984) and Kageyama (1982). 34  miti-hi  get. full-get. dry  ebb and flow  'When "wood" occurs as complement of "carve", it occurs as an oblique, not as an accusative (continued...)  96 maru-bori uki-bori  round-dig float-dig  3D sculpture relief, embossed carving  (73) hos-u "to dry" (compare with (93)) kage-bosi kara-bosi (74) huk-u "to blow" i-buki  shade-dry dry-dry  "drying in the shade" (locative) "dried fish or vegetables" (manner)  36  breath-blow  (75) hur-u "to fall" (compare with (95)) dosya-buri hon-buri yoko-buri  earth, and. sand-fall original-fall side-fall  "heavy downpour" (manner) "regular rain" (manner) "driving rain" (locative)  (76) kak-u "to write, paint" (compare with (96))  37  (...continued) object. The compound hon-poo-huki "free-spirited" is listed in an internet database but not found in a dictionary. There may be an error for the characters for huki in the listing I found, hon-poo means "wild." It is possible that huki is actually the Sino-Japanese compound hu-ki "freedom from restraint", which fits well the meaning given for hon-poo-huki. 36  In the compound hito-huki, the prefix hito- "one" usually does not trigger rendaku on the morpheme that follows it: hito-huki  37  one-blow  a gust  The following compound also voices but is not included because it is Sino-Japanese:  koonoo-gaki  efficacy-write "statement of effect of medicine" (oblique)  97 te-gaki sita-gaki  hand-paint under-write  "hand-painted" (instrument) "draft" (locative)  (77) kat-u "to win" (compare with (97))38 kana-gati maru-gati ware-gati  kana(character)-win circle-win me-win  using more kana than characters complete victory everyone for himself  (78) ker-u "to kick" (compare with (98))39 .40  asi-ge  38  a.kick  The following compounds are formed from first conjuncts that are Sino-Japanese:  hu-soku-gati gyaku-ten-gati hantee-gati  insufficient-win reversal-win judgement-win  needy circumstances coming from behind to win winning a decision  The following are V - V compounds: utu-muki-gati wasure-gati okotari-gati  39  looking down forgetful neglectful  The following is another example of lack of rendaku after prefix hito- "one":  hito-keri  40  downcast-win forget-win neglect-win  a.kick  In this compound, the verb ker-u acts as if it were a vowel-final verb: ke-ru.  98 (79) kir-u "to cut" (compare with (99)) ura-giri  41  rear-cut  "treachery" (locative)  (80) kom-u "to be crowded" (compare with (101)) The following compound is either a V - V (verb-verb) compound, which usually does not undergo rendaku, or else a N - V compound with a Sino-Japanese first conjunct: si-komi  serve-crowd  training  (81) kii-u "to eat" (vulgar) (compare with (103)) haya-gui kakusi-gui ne-gui tomo-gui  fast-eat hide-eat sleep-eat mutual-eat  (82) kum-u "to fold" (compare with (104))  42  eating fast eating on the sly (V-V) living in idleness (V-V cannibalism  43  (83) sak-u "to tear, split" (compare with (105)) yatu-zaki  41  tearing limb from limb  The following compounds have Sino-Japanese first conjuncts:  ku-giri sen-giri  42  eight-tear  section-cut thousand-cut  "punctuation" (not clear i f adjunct) "chips" (manner)  The following compound is a dvandva V - V that always resists rendaku:  nomi-kui  drink-eat  food and drink (V-V)  The only adjunct compounds I can find for this verb are the following: si-kumi tori-kumi  serve-fold take-fold  structure (either V - V or Sino_Japanese) bout (V-V)  99 (84) suk-u "to like" (compare with (107)) (These are arguably N - N compounds, formed from deverbal noun suki which is used instead of an inflected verb.) sake-zuki hito-zuki mono-zuki  sake-like person-like thing-like  yoko-zuki  side-like  44  drinker attractiveness curiosity "crazy about it but very bad at it"  (85) sum-u "to become clear; to finish; to live" (compare with (108)) uwa-zumi  upper-become.clear  (86) sur-u "to rub, grind, paint" See (109)) kari-zuri sita-zuri  45  the clear top of a liquid 46  proof printing under-print  proof printing  (87) tat-u "to stand, break off, depart, pass, cut t? 47  44Also  pronounced heta-no-yoko-zuki "clumsy-GEN side like"  In the following compounds, -zwni, derived from sum-u acts like a productive suffix. In fact, zumi is listed as a suffix in the dictionary, occurring productively with Sino-Japanese verbal nouns. 45  syoomee-zumi yoyaku-zumi  proof-finish reservation-finish  already proved reserved  If zumi- is a suffix here, these are not V - V compounds, which usually resist rendaku.  46  The following are Sino-Japanese compounds:  ni-syoku-zuri ryoo-men-zuri  two-colour-printing printing-on-both-sides  The following have Sino-Japanese first conjuncts: (continued...)  100 asa-dati saka-dati su-dati soo-dati tabi-dati tuma-dati naka-dati hara-dati hitori-dati  morning-depart upside-down-stand nest-break-off group-stand trip-depart toenail-stand middle-stand stomach-stand alone-break-off  early-morning departure handstand leaving-nest standing-in-a-group setting-off-on-trip standing-on-tiptoes mediation anger being-independent  (88) tor-u "to take" (compare with (111)) yoko-dori te-dori huti-dori  side-take hand-take edge-take  "snatch" (locative) "net profit" (instrument) "hemming" (locative)  If we factor out cases of words formed by rendaku-resisting prefixes like hito- and hatu-, and V V compounds, we find that rendaku occurs almost without exception in these adjunct-head compounds. Consider next, in §5.1.2, argument compounds with the same verbs:  5.1.2 Argument compounds Included in this list are verbs with no argument compounds. These verbs are listed (a) in cases where only adjunct compounds exist with this verb, for comparison with the list of adjunct compounds in the previous subsection and (b) for completeness, for the sake of showing all the possible Yamato verbs with initial voiceless obstruents that might form compounds that are capable of undergoing rendaku voicing. (89) haw-u  "to crawl" adjunct compounds only: (unergative verb?)  (90) her-u  "to pass,elapse": (no compounds with Yamato nouns)  (91) hir-u  "to get dry": (no argument compounds)  (92) hor-u  "to dig, carve"  (...continued) ippon-dati boo-dati yo-dati  one-break-off pole-stand night-depart  independence standing-upright setting-out-at-night  101 imo-hori ido-hori  potato-dig well-dig  "potato field" (cf. imo-o hor-u) "well-digging" (cf. ido-o hor-u)  (93) hos-u "to dry" mono-hosi  thing-dry  "drying rack"  (94) huk-u "to blow" hue-huki sio-huki hai-huki  flute-blow salt-spout ash-blow  "piper" "spouting of whale" "bamboo ash receptacle"  (95) hur-u "to fall" ame-huri simo-huri  rain-fall frost-fall  "rainfall" "frosting"  (96) kak-u "to write; to scratch, paw, shave, rake, row": e-kaki picture-write "painter" mono-kaki thing-write mizu-kaki water-row  "writer" "paddle"  (97) kat-u "to win" (no argument-head compounds, kat-u acts syntactically like an unergative no accusative object) (98) ker-u "to kick": aza-keri isi-keri  birthmark-kick stone-kick  (99) kir-u"tocut": isi-kiri eda-kiri  48  "ridicule" "hopscotch"  48  stone-cut branch-cut  "stonecutting/-er" "pruning"  The following compounds are non-Yamato:  kan-kiri en-kiri  can-cut marriage-cut  "can-opener" "divorce"  102 kawa-kiri kippu-kiri kuti-kiri mizu-kiri  "leather-cutter" "punching.ticket" "beginning;to-the-brim "draining water;drainboard"  hide-cut ticket-cut mouth-cut water-cut  (100) kow-u "to love" (not active in modern Japanese as a verb: the deverbal noun is more common) (101) kom-u "to be crowded"  49  (102) kor-u "to be stiff, get stiff kata-kori  shoulder-get.stiff  (103) kuw-u "to eat"(vulgar) ari-kui hito-kui  ude-gumi waku-gumi  50  ant-eat person-eat  (104)kum-u "to fold"  "stiff shoulders"  "anteater" "cannibalism"  51  arm-fold frame-fold  arm wrestling frame  (105) sak-u "to tear, split" only an adjunct compound (106) sas-u "to fill,add,apply;shine"  49  The only N - V compound I can find is the following, with a Sino-Japanese N l :  zee-komi  50  "tax-included"  The following compound has a Sino-Japanese N l :  men-kui  51  tax-crowd  face-eat  "attracted by physical looks only"  The following compound has a Sino-Japanese N l :  en-gumi  marriage-fold  betrothal  103 mono-sasi  thing-add  "ruler" (for measuring)  (107) siik-u "to like" (used almost exclusively in deverbal noun form.) (108) siim-u: "to become clear; to finish; to live" (no argument compounds) (109) sum "to rub, grind, print": te-suri hand-rub  handrail  (110) tat-u "to stand, break off, depart, pass, cut"  52  This verb does not seem to block rendaku in head-argument compounds. kao-dati ko-dati hito-dati  face-stand tree-stand person-stand  features grove crowd  grass-take mouse-take mouth-take fly-catch thing-take fish-take rudder-take  "weeding" "mousetrap" "groom,stable-boy" "fly-catching" "thief "fisherman" (archaic) "helmsman"  ( l l l ) t o r - u "to take n53 kusa-tori nezumi-tori kuti-tori hae-tori mono-tori io-tori kazi-tori  (112) tok-u "to untie, solve, melt": no argument compounds  In the above compounds, rendaku is blocked consistently, with the exception of compounds with  52  The following compound has a Sino-Japanese N l :  yuu-dati  53  rain-pass  rainshower  The following compounds have Sino-Japanese Nl's:  syakkin-tori kanzyoo-tori  debt-take account-take  "bill collection" "bill collector"  104 two verbs: tdt-u "to stand" and kum-u "to fold". Both of these verbs always fail to block rendaku. These verbs are behaving in a similar way to certain nouns that we observed always voice in noun-noun compounds (e.g. huro "bath.) In the case of the nouns that never failed to voice, there was no reason to posit that these nouns had a lexical feature that forced voicing, since exceptionless voicing is the case for the majority of Yamato nouns. Of the 101 bimoraic Yamato nouns that were examined for voicing in noun-noun compounds, 20 always failed to voice, suggesting that the noun itself had a lexical [-voice] feature that blocked voicing. O f the remaining 81 nouns, 15 were variable with respect to voicing and the other 66 always voiced. This suggests that nouns that sometimes resisted voicing were the exception, and that the compounds in which they failed to voice had a lexical [-voice] feature that blocked voicing. Nouns that always voiced were more common, and thus there was no need to posit some floating [+voice] feature for these nouns to explain this. In the case of the noun-verb compounds listed above, voicing is the exception rather than the rule. This suggests the following two things: (a) that blocking of voicing in this class of compound has a phonological or morphosyntactic cause rather than a lexical one, since blocking occurs productively for this class of compound; (b) that nouns like tat-u and kum-u have an underlying feature that forces rendaku to occur even in the face of phonological conditions that would otherwise cause blocking. Recall that we posited a floating [+voice] feature in noun-noun compounds with "rendaku haters" in instances where the compound did voice. A floating [+voice] feature in the listing of verbs tat-u and kum-u could, in a similar way, force rendaku voicing through the action of the constraint Max[+voice]. In chapter 3, we posited a similar account for noun-noun compound no-gusa "field-grass", which voices in tableau (47) in spite of the fact that kusa "grass" usually resists rendaku. Recall that in our derivation of simplex noun kusa from an input with a floating [+voice] feature, the [+voice] feature surfaces on the final vowel of kusa.  Apart from exceptions like compounds with tat-u or kum-u, which could be accounted for in this way by lexical pre-specification, we have good evidence that noun-verb argument compounds in which the second member has a C V C stem and is underlyingly accented, have some set of phonological and/or morphological properties that causes them to block rendaku. In our examination of noun-noun compounds, we basically found that there was one phonological factor that set the stage for blocking: the prosodic size of the compound. In the case of noun-verb compounds, we can observe not one but up to three possible phonological and/or morphological factors that can affect whether rendaku blocking can occur: (a) the prosodic length of the verb stem; (b) the morphosyntactic relation between the noun and verb; and (c) the underlying accentuation of the verb. The fact that such a high proportion of argument compounds with C V C stem verbs resist voicing suggests that it is more than just the presence of some permitted lexical feature that blocks voicing in these cases. If compounds like isi-kiri "stone-cutter" were blocking voicing because of the presence of a [-voice] feature in the lexical entry of the compound, we would need to explain why such a high proportion of compounds of this type have such a  105 prespecified lexical feature, since 37 out of 44 cases block voicing. If it is the presence of some added lexical feature that blocks voicing here, we should expect that the addition of such features is the exception rather than the rule, as was the case for noun-noun compounds. It is more plausible that most of the compounds in (89) through (112) resist voicing because of some phonological factor or factors. There is a clear morpho-syntactic difference between the compounds in §5.1.1 and §5.1.2. When there is an adjunct relation, voicing occurs. When there is an argument relation, voicing is usually blocked. This suggests that these argument compounds have morphological and phonological properties that cause them to block voicing. There is, of course, nothing to prevent the possibility that some compounds of this type also have lexical features included with the listing of the compound that will affect the voicing outcome, as was proposed for noun-noun compounds. The possibility that the voicing outcome of noun-verb compounds can be affected by both lexical and phonological/morphological factors makes the picture more complex for noun-verb compounds than it was for noun-noun compounds. It makes it more difficult to distinguish the contribution of lexical prespecification from the contribution of phonological blocking factors in determining why blocking occurs. In order to deal with this problem when we analyse the effects of phonological blocking factors on noun-verb compounds, I propose that rather than trying to deal with the foregoing data directly, we can better understand the nature of rendaku blocking in noun-verb compounds by examining how blocking occurs infictitious compounds. Accordingly, in the next chapter I report on a psycholinguistic experiment that was done with native speakers pronouncing fictitious compounds. This experiment is intended to test the hypothesis that phonological rendakublocking factors are active in the grammar of modern speakers.  106 6. A n experiment to test the frequency of rendaku voicing in fictitious noun-verb compounds In this chapter I describe and discuss a psyche-linguistic experiment done with native speakers pronouncing fictitious noun-verb compounds. Whereas patterns of rendaku voicing in noun-noun compounds, which were examined in chapters 2-4, are arguably affected mainly by lexical factors (factoring out cases of blocking by Lyman's Law), we find that real noun-verb compounds behave differently, showing apparent phonological and morphological effects on voicing that do not appear in noun-noun compounds. In addition to evidence that phonological and morphological factors directly cause blocking of rendaku in noun-verb compounds, there is also evidence that lexical prespecification can affect the voicing of individual compounds, as we observed for nounnoun compounds. The purpose of the experiment is to attempt to factor out effects of possible lexical prespecification on voicing in noun-noun compounds, and examine just phonological and morphological factors that might affect voicing. Specifically, a fictitious noun-verb compound is one that is composed of a real noun and a real verb stem, but whose occurrence together in a compound is not recognized by native speakers as a bona-fide output form that they would pronounce or hear in normal discourse. Under the hypothesis that a fictitious compound does not have its own lexical listing in a speaker's lexicon in the way that we proposed for real compounds, it ought not to be possible for a fictitious compound to carry a floating voicing feature of the type that we proposed for real (noun-noun) compounds that idiosyncratically block rendaku voicing.  6.1 Subjects, experimental design, and data collection 6.1.1 Subjects The subjects consisted of four adult native speakers of Japanese, all female, and ranging in age from roughly 25 to 45: " M " , " Y " , "K", an "S". None of the speakers was given any information of what was being tested in the experiment. 6.1.2 Experimental design Before running the experiment with these four subjects, two trial runs were made with a different native speaker who was not a subject for the final runs of the experiment. This speaker, a female graduate student in her twenties, was somewhat aware of the nature of the experiment. These trial runs were done in order to make sure that the experiment ran in the way that was intended and to make any necessary adjustments to the procedure for presenting the tokens to the subject and to the structure of the data that is discussed in §6.2.  During each run ofthe experiment, one speaker was each presented with a set of 200 fictitious  107 Japanese compound words, which are discussed in §6.2. When presented to the subject, the tokens were randomly mixed. The words appeared, one at a time, in large screen fonts, on a computer screen. Each compound word was composed of two real Japanese words: a noun followed by a verbal root; occurrence of the two morphemes together in a compound was independently judged to be ungrarnmatical by native speakers. The compound was also followed by the nominative case particle ga. This had two intended effects: (a) being case-marked, the compound would be unambiguously perceived as a deverbal noun: noun compounds are much more likely to receive rendaku voicing than compounds whose syntactic category is verbal; (b) the pitch accent pattern of the compound could be perceived even i f it is accented on the final mora, since final accentuation will occur as pitch fall from high on the final mora to low on the case particle. Other than control examples (discussed below), each compound is eligible for rendaku voicing because the second conjunct begins with one of the four Japanese voiceless obstruents Ikl, Isl, Itl, Ihl.) Crucially, each compound was written in Sino-Japanese ideographic "kanji" characters so that the pronunciation of an initial obstruent in the second member would not be indicated orthographically . For each compound, the speaker read into a tape-recorder how he or she would pronounce the word. After each word was pronounced, the experimenter would hit the page-down key that controls the programme displaying the words so that the next word was presented. The sound of this key being hit is clearly audible on the tape: thus, it is 54  5556  Random mixing was achieved as follows. When each token is typed into the experimenter's list, it is given a two-digit number that is obtained by reading the hundredths-of-a-second part of the readout of the computer's time clock at that moment. The tokens are then arranged in order by the two-digit number that was assigned to each. 54  In some cases, the noun is displayed in "hiragana" syllabic characters when its kanji characters are obscure. This will still not provide any clues about the pronunciation of the verbal part ofthe compound. 55  Using Chinese characters or kanji to represent the two morphemes gives us the advantage of being able to represent the compound with orthography that does not indicate voicing. This is easy to do unambiguously for noun-verb compounds because compounds with verbal morphemes are more likely than those formed exclusivelyfromnouns to have Yamato reading that is required for our purposes. In addition, verbs with a consonant-final stem will surface with a final epenthetic HI, which will also indicate that they must have a Yamato as opposed to Sino-Japanese pronunciation. 56  If we were to construct fictitious noun-noun compounds, it would be more difficult to ensure that the subject pronounced them with Yamato as opposed to Sino-Japanese reading, since SinoJapanese noun-noun compounds are very abundant. For example, a fictitious compound like usikusa/usi-gusa "cow-grass" written in Chinese characters could conceivably be pronounced gyuusoo (Sino-Japanese), where each ofthe morphemes for "cow" and for "grass" receive their SinoJapanese pronunciation.  108 possible to determine exactly how long it took the subject to read and pronounce each token. 6.1.3 Data collection The taped results of each run of the experiment were played back, replaying the tape several times over for each token, in order to correctly determine the following measurements for each token: -whether or not the initial obstruent of the second member has been voiced, -the pitch accent pattern of the compound -whether there was any hesitation in the pronunciation of the compound  57  6.2 Structure of the data The tokens were subdivided into the three groups shown below. The purpose for the inclusion of each group is given in the list below. A. Control tokens 10  1. real simplex words:  These tokens consisted of real nouns that consisted of one morpheme. These were included in order to test whether the speaker is pronouncing words in standard Japanese with predicted pitch accent patterns  10  2. real compound words that are subject to rendaku voicing: These tokens were included to test whether the speaker applied rendaku voicing when expected to real compound words.  20  3. fictitious compound words that are not subject to rendaku voicing a. second conjunct does not begin with voiceless obstruent b. rendaku is blocked by the presence of another voiced obstruent in the second conjunct (Lyman's Law)  "Consultation with several native-speaker graduate students suggested that if the speaker has trouble reading the compound and reads the two conjuncts as separate words, there is a tendency not to effect rendaku voicing, since in such a case the compound may be regarded as a syntactic phrase  109 The (a) subgroup of tokens was included so that it would not be transparently obvious to the subject that the experiment involved voicing of an initial obstruent of a compound. The (b) subgroup was to test whether the subject would correctly apply Lyman's Law in blocking voicing. B . Tokens being tested These tokens were all fictitious noun-verb compounds constructed as follows. Real inflectible Yamato verbs were chosen so that there were representatives from each of the three groups: (a) monomoraic verb stem (e.g. kir-u "cut") (b) bimoraic verb stem (e.g. hirak-u "open") and (c) trimoraic verb stem (e.g. hatarak-u "work".) Compounds formed from verbs of each of these three prosodic classes were put into the classes described below. Verbs were chosen that participate robustly in the formation of real noun-verb compounds, with the exception of a few trimoraic verbs. Because of trimoraic verbs are less common than verbs with stems of shorter prosodic length, and are less likely to occur in noun-verb compounds, it was impossible to choose a varied set of fictitious compounds with trimoraic verbs that occur in a substantial number of real compounds. Based on the data on real compounds in §5.1 a hypothesis was made about which kinds of compounds were the most or least likely to voice. Argument compounds with C V C , accented verb stems block robustly in real compounds; therefore compounds of this type were included as potential "worst voicers." Because voicing is more likely for compounds with C V C V verb stems, and because both an adjunct relation between noun and verb and lack of underlying accentuation in the verb both tend to support voicing in real compounds, a potential "best voicers" group was formed from this type of compound. Other groups of tokens were constructed from various combinations of choices from among the following three variables: (a) prosodic length of verb stem (b) argument-adjunct distinction, (c) accentuation of verb. It was not possible to construct a group for every possible combination of choices of the above three variables, since this would have made the corpus of tokens too large. Based on the hypothesis that the "worst voicers" class has three blocking factors, the listing for each group below includes the potential blocking factors present for tokens of that group. The following is a list of classes of tokens that were constructed. There were twenty tokens of each of the following types. (113) CLASSES OF TOKENS (20 of each)  58  Included in the tokens were two compounds constructed with each of the two verbs tat-u "stand" and kum-u "fold", which, as we saw in the data on real noun-verb compounds, fail to block voicing in argument compounds where other verbs of the same class robustly block voicing. (continued...) 58  110 1. "WORST VOICERS" fictitious N - V argument compounds with monomoraic, accented verb root (preliminary runs of the experiment indicate that this group is most likely to block rendaku blocking) 3 blocking factors: 1. not 2u  2. accented  3. argument relation  2. fictitious N - V adjunct compounds with monomoraic, accented verb root 2 blocking factors: 1. not 2u  2. accented  3. fictitious N - V argument compounds with monomoraic, unaccented verb root 2 blocking factors: 1. not 2u  2. argument  4. fictitious N - V adjunct compounds with monomoraic, unaccented verb root 1 blocking factor: 1. not 2 p. 5. fictitious N - V argument compounds with bimoraic, unaccented verb root 1 blocking factor: 1. argument 6. "BEST VOICERS": fictitious N - V adjunct compounds with bimoraic, unaccented verb root (preliminary runs of the experiment indicate that this group is least likely to block rendaku blocking) no blocking factors 7. fictitious N - V argument compounds with trimoraic, unaccented verb root 1 blocking factor: 1. not 2u Compounds with accented verb roots were only included in one group of tokens: i.e. those in which the verb stem is monomoraic. This is the only case where accentuation seems to affect voicing. (See Rosen (1999) for discussion.) Thus we did not test compounds with the blocking factor combinations {accented} or {accented, argument}. C. First 10 and Last 10 tokens  (...continued) One argument compound and one adjunct compound was included for each of these two verbs.  Ill There was an additional group of tokens (taken from all the categories below) that made up the first ten and last ten tokens of the experiment. These tokens, which are less likely to be statistically meaningful, were not considered when analysing the data. 6.3 Results One of the four subjects had difficulty pronouncing a sizeable number of tokens. This subject also substituted a different word for one of the morphemes for 15 tokens of the data and repeatedly expressed underconfidence about her ability to correctly read Chinese characters while doing the experiment. As a result, the results from this speaker were discarded. The percentage rate of voicing for each subject is shown below for each group of tokens. (114)Summary of results of experiment Numbers at the top of each column correspond to the token types described in (113). 1  2  3  4  5  6  7  lp acc arg  lp acc adj  lp unacc arg  lp unacc adj  2p unacc arg  2p unacc adj  3p unacc adj  Mean of all types  Mean of arg types  Mean of adj types  M  50  80  65  68  67  55  64  61  74  Y  33  73  27  60  60  65  37  50  40  66  K  23  90  50  80  65  89  41  63  46  86  Mea n of subjects  35  81  47  69  64  77  44  56  48  76  Subject 1  58 (average of groups 1,2)  58 (average of groups 3,4)  59  70 (average of groups 5,6)  For some unexplained reason, when the experiment was run with this subjects, only onefourth of the tokens in this group of data were present on the audiotape. 59  112 6.4 Discussion of results A l l three subjects whose results were considered applied rendaku voicing at an overall rate of 50% or greater. This rate ranged from a low of 50% to a high of 64% among the three subjects considered. The rate of voicing averaged among the three subjects varied according to the category of tokens. This rate ranged from a low of 35% for tokens with a monomoraic, accented verb and an argument relation (predicted "worst voicers"), to a rate of 81% for tokens with a monomoraic, accented verb and an adjunct relation (predicted "best voicers.") This variation among groups of tokens strongly suggests that the rate of voicing is influenced by at least some of the factors that varied across groups of tokens: (a) the prosodic length of the verb, (b) the underlying accentuation of the verb, and (c) the morpho-syntactic relation between the verb and its complement. 60  The following are some observations we can make from the data in (114). From an examination of real noun-verb compounds in §5, group 1 was predicted to have the lowest rate of voicing among groups of tokens because tokens in this group have three factors which are seen to block voicing in attested compounds: having monomoraic verbs, underlying accentuation on the verb, and a head-argument relation. This group did have the lowest average rate of voicing in the experiment, and also had the lowest rate for the first and third speaker and the second lowest for the second speaker.  Group 2, which had the highest average rate of voicing also had the highest rate for all three subjects. From an examination of real noun-verb compounds, this group had not been predicted to have the highest rate of voicing, since it lacks only one ofthe three observed blocking factors. Tokens in this group had monomoraic, underlyingly accented verbs and lacked only the headargument relation. When other variables were held constant, compounds with an adjunct relation voiced at a higher rate than compounds with an argument relation. This "adjunct effect" was most prominent for monomoraic, accented verbs (35% for arguments vs. 81% for adjuncts). For monomoraic unaccented verbs and bimoraic unaccented verbs the difference in frequency of voicing between adjunct and argument compounds was not great enough to be conclusive. (The experiment did not test for the effects of arguments vs. adjuncts for trimoraic verbs.)  A s discussed in ftn. 58, two verbs that fail to block voicing in real N - V compounds were included in the tokens. These two verbs both blocked voicing in an argument compound for all three subjects in the experiment and voiced for two out of three subjects in an adjunct compound. In other words, any lexical marking of these verbs seemed to have no effect in the experiment. 60  113 The underlying accentuation of the verb influenced the rate of voicing for monomoraic verbs. (This variable was not tested for longer verbs.) For argument compounds, voicing occurred more frequently when the verb was underlyingly unaccented (47%) than when it was accented (35%). However, the reverse effect occurred for adjunct compounds of monomoraic verbs. These results are inconclusive with respect to the effect of the underlying accentuation of the verb, since this effect could be seen to occur in both directions. The rate of voicing for tokens with a trimoraic verb was relatively low, even though these were all adjunct compounds. The "adjunct effect" of increasing the rate of voicing was most pronounced for monomoraic verbs and decreased as the verbs got longer. The rate of voicing was higher for bimoraic unaccented verbs (70%) than for monomoraic unaccented verbs (58%) or trimoraic unaccented verbs (44%). Thus, when other variables were held constant, bimoraic verbs were more likely to voice than monomoraic verbs or trimoraic verbs. When we average out the rate of voicing for adjunct tokens with argument tokens for each of the accented and unaccented categories of monomoraic verbs, we find that accentuation makes no difference: both combined groups had an average rate of voicing of 58%. For one and two mora verbs we can see that having an adjunct relation made it more likely to voice than i f it had an argument relation. If so, then we would expect that unaccented monomoraic verbs, which have a higher rate of voicing when they have an argument relation than monomoraic accented verbs do, to have a higher rate of voicing than when they are adjuncts than accented monomoraic verbs. It is a puzzle, then, why monomoraic verbs with an adjunct relation had a lower rate of voicing when they are unaccented than when they are accented. In order to make a more conclusive analysis of how various variables affect the rate of rendaku voicing, it will be necessary in future research of this phenomenon to (a) test more subjects in order to get more reliable statistics and (b) test other types of tokens: for example trimoraic verbs with an argument relation. However, the results so far do enable us to make some tentative conclusions about some of the phonological and morphological factors that affect rendaku blocking in noun-verb compounds. In the next chapter I will propose a phonological analysis of two of the blocking factors that could be observed in the above experimental results: the prosodic length of the verb stem and the argument/adjunct effect.  114 7. Analysis of phonological and morphological factors that induce blocking of rendaku in nounverb compounds. The results of the experiment discussed in the previous chapter suggest that three factors can affect the rate of rendaku voicing in noun-verb compounds: (a) the prosodic length of the verb stem, (b) the morpho-syntactic relation between the noun and verb and (c) the underlying accentuation of the verb stem. Because the precise nature of the effect of the underlying accentuation of the verb on rendaku voicing was inconclusive in the experimental results, I will focus here on the first two factors: prosodic length and argument vs. adjunct relation.  7.1 Factoring out lexical prespecification In the experiment discussed in the previous chapter, speakers were given compound words that were composed of two real words whose occurrence together in a compound was unlikely to be heard frequently, i f at all, in normal discourse. Our hypothesis is that the output form of a fictitious compound cannot be affected by any lexical features that are included with the listing of the compound, as we posited was the case for noun-noun compounds in chapter 3. Under the premise that a fictitious compound cannot itself be listed in the lexicon, it must be derived outright from the two constituent morphemes. Thus it will not be possible for the input form of a fictitious compound to contain features that are not part of the listing of either of the constituent morphemes. In our examination of actually occurring noun-noun compounds we saw evidence of pre-specification of voicing or non-voicing for individual compounds. Prespecification should not be possible for fictitious compounds, since there will be no lexical listing for these compounds in the speaker's language faculty: only listings for the individual morphemes. It is still, of course, possible that there is some prespecification of voicing or blocking for individual morphemes. In noun-noun compounds, for example, we saw that compounds whose second member is tuti "earth" or kase "shackles", voicing never occurs. We characterized these morphemes as being pre-liked to a [-voice] feature. We should expect that this type blocking of rendaku voicing could occur for fictitious compounds. In chapter 3 we observed that the prosodic length of the constituents of noun-noun compounds would determine whether blocking by lexical prespecification is possible, but there was no evidence that blocking itself was directly caused by any phonological factor for noun-noun compounds. By contrast, noun-verb compounds showed evidence of a direct effect of prosodic length on voicing. For real noun-verb compounds, (§5.1.2) we saw that there was a high rate of blocking of voicing when the verb stem is monomoraic; for fictitious noun-verb compounds we saw a similar effect ((114).) The rate of blocking was so high in both cases that an explanation of all such cases by lexical prespecification becomes implausible. If our experimental results are correct that rendaku can be blocked by phonological factors, it is a significant one. Apart from Lyman's Law, no other phonological factors that block rendaku voicing have been widely considered in the literature. In addition, whereas Lyman's Law blocks rendaku in a categorical  115 fashion, the types of blocking that we shall see in this chapter occur variably: that is, the presence of a blocking factor will make rendaku more likely to be blocked but will not guarantee that blocking will occur, as is the case for Lyman's Law. 1.2 Blocking vs. triggering Let us first address the question of exactly how the factors given above affect rendaku voicing in fictitious noun verb compounds. Do factors that are present when rendaku voicing is more frequent actually trigger rendaku, or is it that the factors that are present when rendaku voicing is less frequent are blocking rendaku? In our earlier examination of rendaku voicing in noun-noun compounds, we saw that compounds both of whose members did not exceed two moras were subject to unpredictable blocking of rendaku voicing, except in clear-cut cases where the non-voicing noun was one that always resisted voicing. We also saw that the rate of voicing was fairly high for all prosodic classes of noun-noun compounds. Those facts suggested that rendaku is an active phonological process that can be blocked in certain cases. The alternative is a triggering explanation: that rendaku is triggered when certain conditions are met in the compound and that lack of voicing occurs when these triggering conditions are not met. This option was not considered for noun-noun compounds because the rate of voicing was fairly high for all prosodic classes of compounds. Blocking was the exception rather than the rule for all subclasses of noun-noun compounds that were identified.  As far as noun-verb compounds are concerned, we find that the rate of voicing is substantially lower for certain classes of noun-verb compounds. This raises the possibility of an account in which the factors that influence the likelihood of rendaku voicing are triggering rather than blocking factors. For example, having an adjunct as opposed to argument relation in the compound, increases the rate of voicing. We could either see the adjunct relation as triggering rendaku or the argument relation as blocking rendaku. Compounds with a bimoraic verb stem have a greater frequency of rendaku voicing than compounds with longer or shorter verb stems. We could either view a bimoraic size as a trigger of rendaku or a failure to meet the bimoraic condition as a blocking factor for rendaku.  Let us consider for a moment this "triggering" hypothesis. Recall the following experimental results. The mean voicing frequency for monomoraic and bimoraic adjunct compounds was 76%. For monomoraic and bimoraic argument compounds it was 48%. If we try to say that rendaku is triggered by some particular morphosyntactic relation between conjuncts of a compound word, then it must be triggered by both an argument relation and by an adjunct relation, since rendaku occurs at a sizeable rate for both types of compounds. To explain why it occurs less for argument compounds under a triggering hypothesis, we would have to have two processes triggering  116 rendaku, with one more vigorous than the other. Under a "blocking" account of rendaku on the other hand, our analysis can be much more straightforward. In this case, rendaku is triggered by some factor that is common to all compounds, as we proposed in chapter 3, but it can be blocked by any of the properties that trigger blocking that we have observed: (a) an underlyingly accented verb stem, (b) a nonbimoraic verb stem, or (c) an argument relation between the noun and verb. Because of its greater naturalness and simplicity, I shall adopt the blocking hypothesis for explaining the variable frequency of rendaku voicing among different classes of fictitious nounverb compounds. In the next subsection I propose how a blocking analysis could be developed. 7.3 Blocking through local conjunction of a blocking constraint with Dep-|"+voice] Of the three conditions that block voicing in noun-verb compounds, I shall analyse the following two blocking conditions that are observable in the above data: (a) the "prosodic size" effect: rendaku is blocked when the verb stem is not bimoraic. (b) the "argument-adjunct" effect: Rendaku is blocked when the morphosyntactic relation between the noun and verb is an argument relation as opposed to an adjunct relation. For fictitious compounds, we cannot explain blocking of rendaku through a posited [-voice] feature in the input unless that feature is associated with one of the morphemes in its simplex lexical entry. In such a case, we should expect rendaku blocking in all cases in which that morpheme appears in a compound. This suggests that blocking in these compounds is morphophonologically rather than lexically caused. In an OT framework, we must look for some phonological constraint or constraints that could cause blocking under the conditions we have observed. 61  First, recall our hypothesis, developed in chapter 3, that rendaku voicing is derived by satisfaction of the following constraint: Dep [-voice] & Align-Morpheme-Left-Word-Left In our account, rendaku occurs because this constraint dominates the markedness constraint *Obs-Voi, which bans voiced obstruents. In addition to violating the markedness constraint for voicing Obs-Voi, rendaku outputs also violate a faithfulness constraint, since the voicing of the  Recall that in the data on real noun-verb compounds, we found no evidence for any verb that has a lexical feature that blocks rendaku. We found two verbs: tat-u "stand" and kum-u "fold" that resist blocking in argument compounds where we would expect blocking to occur, but we found no verbs that blocked voicing an adjunct compounds where voicing normally occurs. 61  117 initial obstruent does not occur in the output. Recall that the following faithfulness constraint (43) will be violated by outputs that undergo rendaku voicing. (115) , repeated from (43): Dep[+voice]: "Any [+voice] feature that occurs in the output must occur in the input." This constraint was ranked lower than our proposed pro-rendaku constraint Dep[-voice] & AlignMorpheme-Left-Word-Left in order to allow rendaku voicing to occur. But if the faithfulness constraint Dep[+voice] is conjoined with some other constraint C, and i f this conjoined constraint outranks the pro-rendaku constraint Dep[-voice] & Align-MorphemeLeft-Word-Left, rendaku will be blocked in outputs that violate constraint C. That is, local conjunction of C with the faithfulness constraint Dep[+voice] allows C to express some condition that can block rendaku voicing. In the next subsection I will propose how some phonological property possessed by certain compounds can block rendaku voicing, through local conjunction of a relevant phonological constraint with Dep[+voice]. 7.3.1 Blocking in non-bimoraic verb stems In this subsection I will propose an analysis of how the prosodic length of a verb stem can be a factor in triggering blocking of rendaku in noun-verb compounds. In the results of the experiment in the preceding chapter, we found that a non-bimoraic length of verb stem lowers the frequency of voicing when other factors are held constant. In this sense, a bimoraic morpheme is the optimal morpheme. (See the discussion on page 70.) Let us begin by considering what kinds of constraints might be violated by a non-bimoraic verb stem. Given the canonical nature of the bimoraic Foot in Japanese, the necessity that a verb must be mapped to a bimoraic Foot can be expressed by the following two highly-ranked alignment constraints: (116) Align-[+V]-Right-Foot-Right: "The right edge of a morphosyntactic [+V] category must be aligned with the right edge of a Foot. (117) Align-[+V]-Left-Foot-Left: "The left edge of a morphosyntactic [+V] category must be aligned with the left edge of a Foot. These two constraints are evaluated on the moraic tier.  118 To ensure that Feet surface wherever possible as bimoraic, I propose the following highly-ranked constraint: (118) Ft=Bin: "Every Foot in the output is composed of two moras." If a mora cannot be incorporated into a bimoraic Foot, then it is directly dominated by the Prosodic Word. Pwd / \ Ft u / \  Both ofthe constraints (116) and (117) cannot be simultaneously satisfied when the verb stem is non-bimoraic. For verbs with a monomoraic, consonant-final stem, and a final epenthesized 62  Poser (1984) argues convincingly that all classes of words derived from verbs that occur with a final HI to the right of the stem involve zero affixation, with the HI being epenthetic. This includes deverbal nouns. In addition to the deverbal noun form these forms include the infinitive form (which Poser refers to as a participle) (l)(a), the "conjunctive" form that occurs in compounds (l)(b), and the form that occurs with such affixes as nagara "while", or the "tough construction" affixes niku-i "hard to", or yasu-i "easy to" (l)(c). 62  (l)(a) hiraki (b) hiraki-kom-u (c) hiraki-yasu-i  "to open" "to open completely" "easy to open"  Poser gives the following arguments for his epenthesis hypothesis: 1. A l l of these forms derived from verbs end in HI if their stem ends in a consonant. If the stem ends in a vowel, no HI occurs after the vowel. Poser argues against the idea that the HI is an affix that is deleted after vowels on the basis that there is no other pattern of vowel deletion evident in the language. For example, HI can freely occur after HI and Id. If the HI is an affix for each of the unrelated morphological forms, it would seem coincidental that in each case it occurs in exactly the same circumstances — when the stem ends in a consonant. 2. If the conjunctive form ofthe verb that occurs in V - V compounds has an epenthetic HI, then a more natural explanation of what Poser calls "reduced compounds" is available. In, reduced compounds, no HI appears after the stem ofthe first verb; instead, the final consonant ofthe verb stem is licensed by its forming a geminate with the initial consonant of the second verb. A n example is but "hit" + kom "be full" --> buk-kom-u. (See Poser (1984:65) for more examples.) (continued...) v  v  119 vowel, the constraint Align-[+V]-Left-Foot-Left will be satisfied if the whole unit comprises a bimoraic Foot. The constraint Align-[+V]-Right-Foot-Right will necessarily be violated: (Bimoraic Feet are shown with round brackets; morphemes with square brackets.) (119)  [kir] -i v  u  u  \ / Ft Pwd Vowel-final bimoraic verbs will satisfy Align-Right:  (...continued) The final consonant of V , assimilates its place feature to that of the initial consonant of V so that a geminate can be formed. (The only CC sequences that are permitted in the language are full geminates and nC sequences, where the nasal assumes the place feature of the following consonant.) Reduced compounds are a natural and expected phenomenon in the language if V - V compounds are formed from two bare verb stems, with the IM following V , being epenthetic. Reduced compounds simply employ a different strategy than epenthesis for licensing the final consonant of the stem ofthe first verb. If the final IM of V , were an affix, we would need some explanation of why this affix deletes in reduced compounds. Poser also argues that an epenthesis hypothesis explains why verbs with vowel-final stems never enter into reduced compounds as V , . If concatenation in V - V compounds involves bare stems, then a vowel-final stem as V , will have no reason to lose its final vowel to form a geminate sequence with the initial consonant of V . On the other hand, if reduced compounds were formed by deletion of the final vowel of V , , then we should expect this process to also occur with vowel-final verb stems. 2  2  If the final IM of the conjunctive form of the verb in compounds is due to epenthesis, it would seem natural that epenthesis should also occur for other verb forms ending in IM, when lack of epenthesis would create an illicit coda consonant. Based on Poser's (1984) arguments, I shall adopt his hypothesis that deverbal nouns occur with a segmentally null affix.  120 (120)  [tome]  v  u p | / Ft  I  Pwd Consonant-final bimoraic verbs with a final epenthesized vowel can satisfy Align-Right. (121) satisfies: [hirak] -i v  III  p p,  t does not satisfy: [hirak] -i v  III  p  \l I  H \ /  |  Ft /  I  \/ '  |  Pwd  Pwd  Ft /  121 Trimoraic vowel final verbs cannot satisfy both alignment constraints: (122) [tazune]  Satisfies Align-Left; violates Align-Right  v  I  V  Ft | \ | Pwd [tazune]  I  I  Satisfies Align-Right; violates Align-Left  v  I  n n n  v  I  I  Ft | / Pwd  Nor can consonant-final trimoraic verbs: (123) [hatarak] -i v  f  Satisfies Align-Left; violates Align-Right  li  \/ Ft  \ / Ft  \ / Pwd [hatarak] -i v  n  Satisfies Align-Right; violates Align-Left  M- P- M-  I \/ I Ft  I  |  \ I /  Pwd  For trimoraic verbs, which of the two alignment constraints is satisfied will depend on which one is higher ranked. Suppose that Align-[+V] -Right-Foot-Right is higher ranked. This will mean that  122 monomoraic verbs will violate the higher ranked constraint Align- [+V] -Right-Foot-Right (124), and trimoraic verbs will violate the lower-ranked constraint Align-[+V]-Left-Foot-Left (125). (124) input: [kir]  Align-[+V]-RightFoot-Right  v  Align-[+V]-LeftFoot-Left  *  ^[kir] -i v  " M\ / Ft *  [kir] -i v  *!  (125) input: [hatarak]  v  Align-[+V]-RightFoot-Right  *!  [hatarak] -i v  Align-[+V]-LeftFoot-Left !!!!!!!  p pp p \/ \ / Ft Ft «®"[hatarak] -i v  p uu  *  p  V  Ft These two alignment constraints can act as two blocking constraints to derive the blocking of rendaku voicing that occurs frequently for noun-verb compounds with verbs that have fewer than two moras or verbs that have more than two moras. B y conjoining each of these alignment constraints with the blocking constraint Dep[+voice] in the domain of the morpheme, we derive blocking for noun-verb compounds with non-bimoraic verbs. Here we employ a morphological rather than phonological domain that determines how constraint conjunction is calculated. The conjoined constraint is violated i f both conjuncts are violated separately within the same morpheme.  123 Align-[+V] -Right-Foot-Right & Dep[+voice]: "The right edge of a morpheme with the syntactic feature [+V] is aligned with the right edge of a Foot and every [+voice] feature in the output has a corresponding [+voice] feature in the input." Align-[+V]-Left-Foot-Left & Dep[+voice]: "The left edge of a morpheme with the syntactic feature [+V] is aligned with the left edge of a Foot and every [+voice] feature in the output has a corresponding [+voice] feature in the input." I propose that these constraints are conjoined in the domain of the morpheme. The conjoined constraint is violated i f both conjuncts are violated in the same morpheme. The idea behind these two conjoined constraints is that verbs that do not constitute a bimoraic Foot are defective in that they do not fulfil the requirement that a morpheme constitute one Foot. The grammar seeks to avoid "the worst of the worst" (Smolensky (1995)), which would occur if the same morpheme also violated faithfulness by voicing an obstruent that is not voiced in the input. Ranking of the conjoined constraint Align-[+V]-Right-Foot-Right above the pro-rendaku constraint Dep[-voice] & Align-Morpheme-Left-Word-Left will block rendaku in compounds like isi-kiri "stonecutter." (126) input: *,y x=isi "stone" y=kir "cut"  Align-[+V]-RightFoot-Right & Dep[+voice]  *  «^isi-[kir] -i v  u  Dep [-voice] & AlignMorphemeLeft-WordLeft  M\ / Ft  isi-[gir] -i v  p u \ / Ft  *!  ••KSSaKJll llllllillltsS' wlMBB^ ^^Kl^tii;*' Pllllllt $  Both candidates violate the sub-constraint Align-[+V]-Right-Foot-Right because the monomoraic verb stem is not aligned with the right edge of a bimoraic Foot. The second candidate also violates Dep[+voice] since the [+voice] feature of the /g/ did not occur in the input. If the fkJ on  124 the verb stem kir is underspecified in the input, the first candidate will violate the lower-ranked second constraint because the voiceless Dd that surfaces does not have its [-voice] feature in the input, and, the morpheme kir is not left-aligned with the word. Because our proposed blocking constraint includes the faithfulness constraint Dep[+voice] as a conjunct, we are now in a position to explain how lexical prespecification can derive compounds like kao-dati "face-stand", which voice under conditions where we would expect blocking. For noun-verb compounds with verbs such as tat-u "stand" that voice, the presence of a lexical floating [+voice] feature associated with the morpheme tat- will allow voicing, since its presence will satisfy the constraint Dep[+voice]. (127) input: x,y meaning-'features" x=kao "face" y=tat [+voice] "stand"  Align-[+V]-RightFoot-Right & Dep[+voice]  Dep [-voice] & AlignMorphemeLeft-WordLeft *!  kao-[tat] -i v  V\ / Ft  u  "®"kao-[dat] -i v  \ / Ft 7.3.2 Blocking in compounds with an argument-head relation So far, we have abstracted away from the argument-adjunct factor that also affects voicing in noun-verb compounds. It is clear from the empirical evidence on voicing in real noun-verb compounds that the morphosyntactic relation between the noun and the verb also affects rendaku voicing. We saw that argument compounds show a tendency to resist rendaku voicing, From a syntactic point of view it is somewhat surprising that argument compounds are behaving in a less regular way than adjunct compounds. If we were to try to derive both types of compounds syntactically, through syntactic adjunction of the non-head to the head, (see, for example, Baker (1985)), there is no apparent syntactic reason why this adjunction should be more difficult for an argument than an adjunct in a syntactic model like Chomsky (1995), where arguments are typically merged with  125 a head to form a complement. (128) V /  \  N V argument head If an argument had a more distant structural relation with a head, making head-adjunction more difficult, we might expect such a structure to be marked in some way and tb therefore resist the process of rendaku voicing. The fact that argument compounds are more resistant to rendaku than adjunct compounds is therefore unexpected from a syntactic point of view in that we should expect the phonological behaviour of argument compounds to be more regular. I will present here a possible explanation for the differing behaviour or argument and adjunct compounds, recognizing that the account that follows will be somewhat problematic. Let us pursue the hypothesis that the crucial difference between argument N - V compounds and adjunct N - V compounds with respect to rendaku voicing is in their morphological structure. In an argument N - V compound, the morpho-syntactic head is a verb, in the sense that the noun has a morpho-syntactic relation with the verb, being its argument. For an adjunct compound, on the other hand, the noun's morpho-syntactic relation is arguably not with the verb, but with a derived deverbal noun. For example, compare the two real compounds isi-kiri (stone-cut) "stonecutter" and ura-giri (rear-cut) "treachery." In the former compound, the noun is an argument ofthe verb, and to the extent that morphological structure must mirror syntactic structure (Baker (1985)), we expect the argument noun to be a sister to the verb in morphology. 63  (129)  / N  V \ V  A n adjunct relation in syntax is expressed structurally by a more distant relation between the head and the adjunct. For example, the adjunct maybe a sister to some higher projection ofthe head.  This account adopts the premise that in syntax, the internal argument of a verb is its sister in syntactic structure. See, for example, Chomsky (1981), (1993), (1995), Hale & Keyser (1991), (1993). Larson (1988) on the other hand proposes a model in which adjuncts are generated lower down in the projection of V than arguments. 63  126 In the adjunct compound ura-giri, I propose that ura "rear" is, in morphological structure, not sister to the verb but to the deverbal noun derived from verb kir-u "cut". (130) N / V cut  \ N (null)  Since noun-verb compounds in Japanese all function syntactically like nouns, being case-marked, there is arguably a nominal projection that heads their structure. The following, then, is the structure I propose for an adjunct compound like ura-giri (rear-cut) "treachery": (131) N2 / Nl ura  \ N2 / \ V N2 giri (null)  Having proposed a distinction between argument compounds and adjunct compounds in terms of their morphological structure, I now propose to capture the effects of this distinction in terms of choice of a base form for correspondence between two related outputs, in the type of OutputOutput correspondence theory proposed by Benua (1997). In this theory, an output form is required to correspond to another morphologically related output form that is referred to as its baseform. M y proposed distinction between argument compounds and adjunct compounds is in the choice of base form for each. In both cases, the base form will be the morpho-syntactic head ofthe compound. For an argument compound, the base form is the canonical output form ofthe verb: i.e. its citation form. For an adjunct compound, the relation of N l is with a deverbal noun projection. Thus the base form will be one of the following: (a) the phonologically null morpheme that derives a deverbal noun from a verb (b) the deverbal noun form ofthe verb, if one exists (c) no base form, if no deverbal noun exists Let us now examine how each ofthe two morpho-syntactic types of N - V compounds fare with respect to 0 - 0 Faithfulness. Argument compounds will violate Output-Output faithfulness because the base form, which must be a real word and not a bound morpheme, must be the citation form ofthe verb, including its inflective ending. The base form of the verb "cut" is kir-u,  127 not the stem kir, which because ofthe coda condition in Japanese, can never occur on its own as an output form. Consider, for example, the argument compound isi-kiri. The base form of this compound will be the inflected imperfective (or "non-past") verbal form kir-u. In compound isi-kiri, the morpheme that corresponds to the 0 - 0 base form kir-u does not perfectly satisfy 0 - 0 correspondence because the inflective affix is missing. Specifically, there is not a perfect correspondence in features and root nodes between [kir] i in isi-kiri and [kir] -u . The epenthesized HI in kiri has a root node that was not present in the input at all. The lul affix in kiru is an inflective affix. The two root nodes do not correspond morphologically or featurally. v  v  infl  i si-k i r i I I I X k i r - u  On the other hand an adjunct compound like ura-giri (rear-cut) has base form deverbal noun kiri. Here, there is featural correspondence between the epenthesized HI that occurs in both the deverbal noun and the noun-verb compound. M  This distinction between argument and adjunct compounds in terms of 0 - 0 faithfulness can capture the difference in degree of rendaku voicing between the two types of compounds. M y specific proposal is that argument compounds violate correspondence between the inflected base form and the verba morpheme in the compound at the right edge. The right edge of a inflected verb is the suffix -u; the right edge of the verb stem in the compound is the verb stem with epenthesized HI. If rendaku voicing occurs in the compound, the left edges of the verb stem and citation form will not correspond, since there is a featural difference between the initial obstruents in terms of voicing.  The motivation behind this proposal is that argument-head compounds are defective morphophonologically in that their base form is a verb but this verb stem is not adequately phonologically represented in the compound, where the verb lacks its inflective affix and surfaces with an epenthesized vowel. The base form for an argument-head compound of this type arguably cannot be a noun, for the following reasons. Simplex deverbal nouns in Japanese can never act as complex event nominals in the sense of Grimshaw (1990) and can thus never take arguments that are participants in a complex event (See Ohta (1994) and references cited there). This means that a compound like isi-kiri cannot have the following structure, since in this structure, the noun isi "stone" is sister to the deverbal noun:  "Depending on the verb, this deverbal noun may or may not be able to surface on its own as a free morpheme.  128 N, / \ N isi  2  N, / \ . V null kiri  In the following structure, argument isi is a sister to a V projection:  N, / \ V Nj / \ null N V isi kir 2  On the other hand, an adjunct compound like ura-giri can have the nominalization of verb kiri as its base form, which forms a constituent of the compound. This compound, unlike an argument-head compound, exhibits greater faithfulness to the base form, which in this case is the nominalization kiri. The argument-head compound, which already violates faithfulness to its verbal base form, avoids further violation of faithfulness by avoiding rendaku voicing. The idea that the compound avoids 0 - 0 correspondence violations at both edges can be expressed through local conjunction of two related anchoring constraints: (132) Anchor-Root-Left . base  output  & Anchor-Root-Right . base  output  "If B is a 0 - 0 base form for output form O, and B', a constituent of O, corresponds to B, then if Root-Node N l occurs at the left edge of the B there must be a corresponding Root Node N l ' that occurs at the left edge of B', where N l corresponds to N l ' A N D i f B is a O-O base form for output form O, and B', a constituent of O, corresponds to B, then if Root-Node N2 occurs at the right edge ofthe B there must be a corresponding Root Node N2' that occurs at the right edge of B', where N2 corresponds to N2'." This constraint requires that anchoring occur for at least one of the edges of the output correspondent of 0 - 0 base form B . Under this account, rendaku blocking in argument compounds can be seen as an avoidance of simultaneous 0 - 0 violations at both edges ofthe verb morpheme.  129 This constraint will block rendaku voicing in argument compounds.  If this constraint dominates the pro-rendaku constraint Dep[-voice] & Align-Morpheme-LeftWord-Left, blocking of rendaku voicing will occur for argument compounds but not adjunct compounds.  (133)argument-head compound: input: x,y x=kusa "grass" y=tor "take" BASE= tor-u  Anchor-L & AnchorR  Dep[-voice] & AlignMorpheme-Left-WordLeft  *  «s°kusa-tori *!  kusa-dori  Candidate kusa-dori violates Anchor-Left because the /d/ in the output does not correspond to the It/ in the base form. It also violates Anchor-R because the epenthesized /if is at the right edge in the output but not in the base form. Now consider an adjunct-head compound:  (134) adjunct-head compound: input: x,y x=yoko "side" y=tor "take" BASE=tori  Anchor-L & AnchorR  Dep[-voice] & AlignMorpheme-Left-WordLeft  N  yoko-tori  *!  •^yoko-dori Both candidates respect the conjoined constraint because there is no violation of Anchor-R. This is because the epenthesized I'd at the right edge of each candidate corresponds with the epenthesized lil in the deverbal noun base form tori. The constraint hierarchy that I have proposed above predicts that blocking of rendaku will occur 100% of the time in argument-head compounds, but, as we saw in the results of the experiment, blocking did not occur in an average of 35% of cases among speakers. In the next section, I  130 discuss the variable nature of blocking in N - V compounds. There are a couple of problems with the foregoing account. One is that we have expressed an assumed more distant structural relation between an adjunct and a head by having the adjunct a sister to a deverbal noun in morphological structure, rather than a sister to the verb stem. First, there is no independent evidence that the relation of the adjunct morpheme in a compound like ura-giri (rear-cut) "treachery" is to a nominalized verb rather than to the verb stem itself. A second problem is the nature ofthe phonologically null deverbal affix. This affix does not occur productively for simplex verbs in that only a restricted number of simplex verbs can form deverbal nouns through this affix. On the other hand, if adjunct N - V compounds are productively formed from composition of N with the nominalized V , then this affix is mysteriously occurring on verbs with which it cannot occur when the verb occurs alone. For example, there is no deverbal noun *kaki derived from kak-u "write" even though we find both argument and adjunct N - V compounds formed from this verb as shown in (96) and (76). The fact that compounds form nominalizations much more easily than simplex verbs is arguably due to the fact that nominalization in Japanese prefers that the nominalized form is either bimorphemic or that it exceeds two moras. (See Ohta (1994), Sugioka (1984), Kageyama (1982) for further discussion.) 7.4 The variable effect In the results of the experiment we saw that blocking of rendaku is a variable rather than categorical phenomenon. Factors that predispose a compound to blocking will not prevent rendaku in all cases but rather will decrease the statistical frequency of rendaku voicing across a sample of data. Because this experiment used only fictitious compounds, it is not possible for these compounds to be pre-specified in the lexicon in the way that we argued was possible for real compounds. This accounts for why it was possible to get utterance-to-utterance variation in the results of the experiment to a degree that does not occur for real compounds. I shall now show how Boersma & Hayes' model (see §4.1.2) can be used to explain the utteranceto-utterance variable effects we see in rendaku blocking of fictitious compounds. For simplicity, let us consider just one factor that influences the rate of rendaku voicing, for example, the morphosyntactic relation between the noun and the verb. Recall that for accented, monomoraic verbs with an argument relation, we had a rate of voicing of 35% when the compound had an argument relation and 81% when there was an adjunct relation. Even though voicing occurs much less frequently when there is an argument relation, it still occurs in roughly one-third of cases ~ a not insignificant number. If the ranking of O-Omax-Root-Node & Dep[+voice] » Rendaku was categorical, then rendaku would occur 0% of the time for argument compounds rather than 35%. But in Boersma & Hayes' model, constraint ranking is not categorical, but occurs on a continuous  131 scale where, if two constraints A and B are ranked close together, it is possible for a speaker to select either of the two possible dominance relations. The statistical patterns we observed for fictitious compounds suggest that in Boersma & Hayes' model, the constraint that blocks rendaku in argument compounds with monomoraic verb stems is ranked above the constraint that forces rendaku, yet not too high above it. (It is actually likely more than one constraint or factor that reduces the voicing rate in these compounds to 35%. We saw that the prosodic length and the underlying accentuation of the verb will also be relevant factors; however, for the sake of simplicity, let us, for the moment, suppose that there is just one such constraint that is blocking rendaku: Dep[+voice] & O-O-Max-Root-Node.)  O  O  DEP[+VOICE] & O - O - M A X - R O O T . N O D E  -more dominance  RENDAKU  less dominance-  In the cases where there is an adjunct relation, where voicing occurs 81% of the time, there must still be factors that are responsible for the 19% of cases where no voicing occurs. Once again, for the sake of simplicity, let us for the moment ignore other factors that block voicing. Let us say that some constraint or group of constraints "C" is responsible for this small number of blocking cases. Constraint C must be ranked below Rendaku, but not too far below it. 65  o  o  DEP[+VOICE] & O - O - M A X - R O O T . N O D E  -more dominance  o  RENDAKU  C  less dominance-  Let us first consider adjunct compounds. The conjoined constraint ROOT.NODE  DEP[+VOICE] &  O-O-MAX-  will not apply to these compounds, as I argued above, so we need just to consider  the two other constraints. Because R E N D A K U is ranked above C, most pairs of selections will yield the following constraint ranking: RENDAKU  »  C  In these cases, rendaku will occur. But if C is not ranked too far below Rendaku, in a minority of cases, rendaku will not occur. The 19% rate of no voicing for this class of compound can be derived by having C at just the right  For example, a simple faithfulness constraint above that requires a verb to be bimoraic. 65  MAX-FEATURE,  or the constraint proposed  132 level below  RENDAKU.  Consider now the cases where there is an argument relation. Most of the time selection of the three constraints will give us the following hierarchy: DEP[+VOICE] & O - O - M A X - R O O T . N O D E  »  RENDAKU  »  C  This hierarchy will result in blocking of rendaku, since the rendaku candidate will violate highly ranked D E P [ + V O I C E ] & O - O - M A X - R O O T . N O D E . We will also get blocking for selections that have the following ranking, which will occur less often, since R E N D A K U occurs on the gradient scale at a higher level than C: DEP[+VOICE] & O - O - M A X - R O O T . N O D E  »  RENDAKU  C »  But i f R E N D A K U is not ranked too low, we will sometimes get constraint selections where R E N D A K U is the highest ranked for the three:  RENDAKU  »  DEP[+VOICE] & O - O - M A X - R O O T . N O D E  RENDAKU  »  C »  DEP[+VOICE] &  »  C  O-O-MAX-ROOT.NODE  hi this minority of cases, rendaku will occur. For a very small number of selections, we will get a ranking where C is on top: C  »  C  »  RENDAKU  »  DEP[+VOICE] &  O-O-MAX-ROOT.NODE  DEP[+VOICE] & O - O - M A X - R O O T . N O D E  »  RENDAKU  These cases will also block rendaku, but will be relative few, since C occurs lowest on the scale. Depending on the exact level at which these constraints occur on the gradient scale, it will be possible to account for the kinds of percentages of voicing we found in the experimental data. Whether or not speaker-to-speaker variation occurs or not has not yet been fully tested.  It is important to note that our analysis of the experiment results predicts that variation will occur for a given compound for a given speaker: that is, unlike the case of real compounds where rendaku does not vary for a given compound, here, the constraint ranking system of Boersma and Hayes (1999) predicts that a given speaker will sometimes voice a particular compound and other  133 times not voice it. In §5.2.6 I will discuss how lexicalization may also affect the pronunciation of fictitious compounds to cause word-to-word variation rather than utterance-to-utterance variation. But first, in §5.2.6, we shall consider how more than one blocking factor may act together, to strengthen blocking effects. 7.5 Ganging up of blocking factors So far we have dealt with the two blocking factors "argument-head" and "alignment to Foot" in isolation. But we actually find that these two factors work together: that is, i f a noun-verb compound has both of the above properties, it is more likely to block rendaku than i f it has only one of them. For example, in the experiment with fictitious compounds, we found that compounds for which (a) the verb stem is monomoraic and (b) there is an argument-head relation between the verb and noun, voicing occurs at a rate of only 35% for accented verbs and 47% for unaccented verbs. Under Boersma & Hayes' model of a continuous scale of constraint ranking, the fact that we get certain percentages of tokens blocking rendaku under various conditions is due to the fact that when two constraints Cj and C are ranked close together, a speaker may choose either C, or C as the most dominant of the two constraints. If C, is ranked slightly above C , a speaker will be more likely to select a constraint hierarchy in which C, is dominant, but they will still sometimes rank C higher. For example, the results of our experiment had speakers voicing noun-verb compounds with monomoraic verb stems 58% ofthe time on the average. Abstracting away from other blocking effects, we would predict that such a result would occur i f the constraint that blocks rendaku in monomoraic verb stems is ranked just slightly below the constraint that derives rendaku. Similarly, abstracting away from the effects of prosodic length ofthe verb stem, we find that adjunct compounds with bimoraic verbs in our sample were voiced in 77% of cases and argument compounds in 64% of cases. This suggests that the constraint that blocks rendaku in argument compounds is ranked a little farther below the constraint that derives rendaku. 2  2  2  2  Cl  C2  C3  pro- blocked blocked rendaku when not when argument 2u compound -more dominance  less dominance-  If the conditions for both C and C to block rendaku are met, we are even more likely to choose a constraint ranking in which one of the blocking constraints dominates C, than when only one of C or C are eligible to block rendaku. For example, suppose that C, dominates C 60% of the time and C dominates C 55% of the time. In 40% of cases then, C would dominate C and, i f C were not a factor, rendaku would be blocked a total of 40% of the time. But in the 60% of cases where Q is selected at a higher level than C , there will be some measurable number of 2  2  3  3  2  2  3  2  3  2  l 5  134 cases where C is selected at a level above C,. Since we are supposing that C is ranked only slightly below C , it should be ranked above C, at only a slightly lower frequency than is C ranked above C,: say 30% of the time. Thus for 60% of those cases where C, outranks C , in 30% of that 60%, or another 18% of cases, C will outrank C,, accounting for a further 18% of cases of blocking. This gives us a total of 40% + 18% = 58% of cases where blocking occurs because ofthe combined effect of the two constraints: clearly a higher rate than the 40% rate of blocking that would occur if only C were a blocking constraint. 3  3  2  2  2  3  2  7.6 Utterance-to utterance variation vs. word-to-word variation This model of a continuous scale of constraint ranking predicts that for a given speaker, we will get utterance-to-utterance variation in voicing for fictitious compounds. Our experiment only tested each speaker once, so we have no evidence of to what extent utterance-to-utterance variation in voicing may occur for fictitious compounds for a given speaker. However, we clearly get variation from speaker to speaker. We also find that the blocking factors that we observed in the experimental data do not act in a categorical fashion. Instead, we get degrees of blocking for different combinations of blocking factors. To account for this, we need to posit a constraint hierarchy in which a given blocking constraint will block rendaku voicing variably rather than categorically. But for real noun-verb compounds, we do not get the same kind of gradient effect in blocking as we saw in the experimental results for fictitious compounds . Consider, for example, our experimental results for accented compounds with a monomoraic verb stem and an adjunct relation. These compounds voiced at a rate of 81% in the experiment. But as we saw in §5.1.1, these compounds voiced at a rate of 100% in our survey of real compounds of this type. If prorendaku constraints and blocking constraints for monomoraic verb stems are ranked close enough together in Boersma & Hayes' model to derive utterance-to-utterance variation for fictitious compounds, we should have expected to also get variation in voicing for real compounds of this type. 66  Recall that we proposed the following constraint to account for the blocking effect of monomoraic verb stems in fictitious compounds. Align[+V] -Right-Foot-Right & Dep[+voice] This first conjunct of this constraint is violated by a monomoraic verb stem; the second conjunct is also violated if rendaku voicing occurs.  Consultation with native speakers suggests that variation in rendaku voicing among real compounds is extremely rare, unless both forms are listed in the dictionary. A spoken form that differed from the standard form in voicing would be considered to be a different, newly coined word, according to native speakers I consulted. 66  135 Although this constraint will correctly derive the increased frequency of blocking in fictitious compounds with monomoraic verb stems, unless it is ranked very low, it incorrectly predicts that blocking should sometimes occur in real compounds that have monomoraic verbs stems and an adjunct relation between noun and verb. But we saw this type of real compound voiced 100% of the time in the data in §5.1.1. These facts leave us in a quandary. To explain why a monomoraic verb stem length is a mild blocking factor in fictitious compounds, we need to posit some constraint that blocks voicing under these conditions. But in real compounds, the condition of having a monomoraic verb stem length appears to have an effect only when it occurs in tandem with an argument relation between the noun and verb. This means that we want to ignore the effects of the same constraint for real monomoraic adjunct compounds. If we are to consider our experimental results as valid, the one way out of this quandary would be to posit some degree of lexicalization for real compounds. Another possibility is the noise factor introduced by the duress that a subject undergoes in doing an experimental task with fictitious compounds. (See discussion on page 140.) Consider, for example, the hypothesis that real compounds with monomoraic verb stems and an adjunct relation have an included [+voice] feature. If so, they will not violate the blocking constraint Align[+V]-Right-Foot-Right & Dep[+voice] if rendaku occurs. But to posit the inclusion of a [+voice] feature for all monomoraic adjuncts is an uninteresting account. There is no explanation for why the lexicon should have all members of this class possessing this feature. We would expect lexicalization to account for exceptions to a regular pattern, not to the regularity of a pattern itself. The discrepancy between voicing patterns in fictitious non-verb compounds and voicing patterns in real compounds is difficult to account for. Clearly, further research on the voicing behaviour of fictitious compounds is needed. In the next subsection I examine how lexicalization in noun-verb compounds may account for exceptions to regular patterns that we have observed. 7.7 The lexical effect In addition to utterance-to-utterance variation for fictitious compounds, we also find that lexical effects appear in the results. That is, there are certain verbs that appear to regularly block or force voicing when they appear in fictitious compounds. For example, consider the results for tokens in group 7, the compounds with trimoraic verbs, given on the last two pages of the appendix. Although these compounds generally resisted voicing, when voicing did occur, all three subjects often applied voicing for the same verbs. Of the 22 instances where voicing was applied for this group, 18 of them were instances in which all three subjects applied voicing to the same  136 compound. Compounds with the verb hatarak-u "to work, be employed", and with the verb kasane-ru "to pile up", never failed to voice for all three subjects, in spite of the fact that longer verbs of this type generally resisted voicing. This suggests that these two verbs are prespecified for voicing in a similar way to what we proposed for nouns such as huro. We also find that verbs such as kakus-u "to hide", never voice in compounds, a fact confirmed by Martin (1987) and demonstrated in the compounds shown below: (135) kami-kakusi uti-kakusi tutumi-kakusi me-kakusi en-kakusi gi-kakusi  god-hide inside-hide wrap-hide eye-hide  "mysterious disappearance" "inside pocket" "concealment" "eye bandage" "remote viewing" "imitation leather"  We might try to explain the fact that compounds with verbs like hataraku "to work" always voiced in fictitious compounds because of Output-Output conditions. That is, a fictitious compound whose second conjunct is verb hataraku might be required to have its output correspond to a base form that is a real compound, which in all cases happens to be voiced. Such an explanation may be possible for verbs like hatarak-u, but for the verb kasane-ru "to pile up", there is no evidence that 0 - 0 conditions are applying, since there are no actually attested compounds with this verb that could serve as base forms. Consider first attested compounds whose second member is hataraku "to work". As shown in (136), all known compounds whose second member is derived from verb hatarak-u undergo rendaku voicing. (136) Compounds with hataraki: uwa-bataraki sita-bataraki tomo-bataraki yo-bataraki naka-bataraki  upper-work under-work mutual-work night work inside-work  "housemaid" "assistant" "dual income" "maid working for living quarters and kitchen"  But the verb kasane-ru, on the other hand, has no known compounds that undergo rendaku. The only known complex morphological word ending with verb kasane-ru is the following, where hito "one" acts like a prefix, and regularly disallows rendaku voicing on its co-constituent. (137) hito-kasane  one-pile.up  "set of boxes"  Thus, an explanation of the irregular voicing of compounds with second conjunct kasane in fictitious compounds cannot be explained through output-output conditions. Instead, my proposal  137 is that the obligatory voicing that occurs in compounds with hataraku and kasaneru can be explained in a similar way to our account for nouns like huro. If these verbs are pre-specified with a floating [+voice] feature, then the constraint Dep[+voice], which we have posited would normally block rendaku, cannot do so for these verbs, which have a [+voice] feature in the input. (138) [+voice] uwa + hatarak "upper" "work"  Dep[+voice] & Align[+V]-RightFoot-Right  Rendaku  Obs-Voi  MaxPathFeature  *!  uwa-(hata)(raki)  * uwa-(bata)(raki) On the other hand, when hataraku occurs alone as a verb, rendaku voicing will not apply. There is no constraint ranked above Obs-Voi that rules out the candidate with an initial voiceless obstruent. Obs-Voi will rule out the voiced candidate. (139) [+voice] hatarak + u "work" INFL  Dep[+voice] & Align[+V]-RightFoot-Right  Rendaku  MaxPathFeature  Obs-Voi  hatarak-u batarak-u  *!  7.8 Conclusions In this chapter we have examined noun-verb compounds, which show different tendencies of rendaku voicing and blocking than do noun-noun compounds. We earlier saw that headed Yamato noun-noun compounds pattern into two classes with respect to rendaku voicing: (a) "long" compounds, whose only exceptions to voicing occur when N2 is one of a small group of nouns that never voice; (b) "short" compounds, whose voicing properties are not completely predictable. Noun-verb compounds are more complex to classify. Whereas noun-noun compounds classify with respect to voicing behaviour along just one dimension: that of prosodic size of the constituents, noun-verb compounds pattern along at least three dimensions. Not only  138 does the prosodic length ofthe verb affect voicing, but also the morphosyntactic relation between the noun and the verb and the underlying accentuation of the verb. 67  For example, we saw in §5.1.2 that compounds with a C V C verb stem, an underlyingly accented verb, and an argument-head relation between the noun and verb robustly block voicing; however, there are a few exceptions for verbs like tat-u "stand" and kum-u "to fold", which always voice, evidently owing to some lexical feature of these verbs. These kinds of facts make for a very complex picture to analyse. The fact that the vast majority of accented, C V C verbs block voicing in argument-head compounds strongly supports the hypothesis that it is the phonological/morphological properties of the compound rather than lexical properties that is causing blocking of voicing here. The handful of cases where voicing does occur for his type of compound is best explained by lexical prespecification. Thus, we have evidence in favour of both the grammar and the lexicon affecting voicing for this type of compound. For noun-noun compounds, the picture appeared simpler. We found that phonological properties such as prosodic length affected whether a compound had the potential of carrying lexical specification that blocked voicing, But after factoring out dvandva or headless compounds, we found no evidence that blocking occurred directly as a result of some phonological or morphological property of a compound. Because cases of blocking were the exception rather than the rule, it was more plausible to derive those cases of blocking through lexical prespecification. But for noun-verb compounds, we needed to take both lexical properties and phonological/morphological properties into account in determining why blocking occurred. Cases of blocking were so numerous for some types of noun-verb compounds that phonological or morphological processes were a more plausible explanation for blocking than lexical prespecification. Because it could be potentially difficult to distinguish between the effects of the grammar and the effects of the lexicon in finding a cause for blocking of rendaku in some noun-verb compounds, we turned our attention in the rest of the chapter to fictitious compounds. This was for two reasons: (a) because studying fictitious compounds would allow us to factor out any possible cases of lexical specification that occurred in the lexical listing of a compound; (b) because our focus in our study of voicing in noun-noun compounds in chapter 3 was on the effects of the lexicon on blocking. Here we wanted to look at the other side of the picture: namely, the effects of the grammar on blocking for the case of noun-verb compounds.  Recall that Output-Output correspondence is violated for noun-verb compounds because in the citation form of the verb there is an inflective suffix whereas when the verb appears in a compound it lacks this suffix and will have an epenthetic final vowel i f the verb stem is consonant-final. Nouns, on the other hand always have vowel-final stems and lack any visible suffix in their citation form. Thus 0 - 0 correspondence will not be violated for nouns when they appear in compounds. 67  139 In our experiment with fictitious compounds we found strong evidence that the grammar does have an effect on whether compounds are voiced on not. Specifically, three different phonological and/or morphological factors all showed an effect on voicing: the (a) prosodic length of the verb, (b) the underlying accentuation ofthe verb, and (c) morphosyntactic relation between the noun and the verb. We considered an analysis of two of these blocking factors that derived blocking through local conjunction of a reridaku-blocking constraint Dep[+voice] with another constraint that would be violated when some particular blocking property occurred. For example, blocking that occurred when the verb stem was "too long" (e.g. C V C V C V C ) or "too short" (e.g. C V C ) would be caused by violation of an alignment constraint that required the verb stem to be aligned with a bimoraic Foot. 68  This analysis predicted that blocking would occur 100% ofthe time when the conditions for blocking were met. To explain why blocking occurred in only a percentage of compounds, we adopted Boersma & Hayes' continuous scale of constraint ranking, which allows the possibility that two constraints C, and C may, when ranked close together, occur with either of two possible dominance relations. 2  As far as real noun-verb compounds are concerned, we found, for the C V C compounds with accented verbs that we examined in §5.1.2, that voicing occurred less often than for the fictitious compounds of this type that we tested in our experiment. In the experiment, subjects voiced these compounds at rates of 50%, 33%, and 23%. In the data in §5.1.2, we found that voicing occurred in just 5 out of 35 compounds, or 14% of cases. Furthermore, all these cases of voicing were with the same two verbs that always voiced, suggesting that voicing occurred because of lexical prespecification. But given the fact that fictitious compounds have never been heard by the speaker before, it is not clear that a subject in the experiment will instantly recognize whether a fictitious compound is to be interpreted as an argument or an adjunct compound. Some instances of voicing in fictitious compounds in the experiment that were intended to be interpreted as argument compounds could actually have been interpreted as adjunct compounds by the subject. The following compounds in the set of tokens with C V C accented verb stems were intended to be argument, accented, compounds but were voiced by at least one ofthe three subjects. (140) suna+hori  sand-dig  kaki+hosi  persimmon-dry  "sand" could have been interpreted as an adjunct such as a locative could have been voiced by analogy with real compound  For an analysis ofthe third factor that affects voicing in noun-verb compounds, namely the underlying accentuation ofthe verb, see Rosen (1999). 68  140  mizu+huki water-blow kao+kaki  face-draw  ume-bosi (plum-dry) "pickled plum", which maybe a leftheaded compound "water" could have been interpreted as an instrument, as in real compound i-buki 'breath-blow" could have been interpreted as an adjunct compound by analogy with real compound te-gaki (hand-draw)  Only for fictitious compound imo-kui (potato-eat), which was voiced by one subject, is it difficult to suppose a possible adjunct interpretation.  In addition, the following fictitious compounds, which were voiced by at least one subject, have verbs that participate in no known real argument compounds: (141) nuno+saki hige+sori tuki+sumi tabi+sumi  cloth-tear beard-shave moon-be.clear trip-finish  Our experiment with fictitious compounds did succeed in supporting the hypothesis that blocking of rendaku voicing is affected by phonological constraints as opposed to lexical prespecification. But we found that fictitious compounds still behaved in ways that were not completely explainable. For example, compounds with verbs such as tat-u "stand" (kabe+tati (wall-stand)) or kum-u "fold" (nuno+kumi (cloth-fold)), which always voice in real compounds, were voiced by only one of three subjects. These slightly anomalous results could simply be due to the kinds of perception and/or performance errors that subjects will make in doing an experiment of this kind. They could also be an indication that interactions between the grammar and the lexicon are more complex than in the account we are proposing here. Clearly, there is much more research that could be done on the complex matter ofthe voicing behaviour of noun-verb compounds. As to why fictitious compounds showed utterance-to-utterance and speaker-to-speaker variation in voicing whereas attested compounds do not, let us consider the following hypothesis. The nature of the task of pronouncing fictitious compounds is such that a speaker will encounter a greater level of noise in selecting a constraint hierarchy for a given utterance. That is, even if the grammar remains constant, the amplitude of the probability curve that corresponds to each constraint on Boersma & Hayes' continuous scale will be higher for a speaker when pronouncing fictitious compounds. Suppose, for example, that constraints A and B are ranked with probability 69  'This idea was suggested by Douglas Pulleyblank (personal communication).  141 curves such that with a normal amplitude, the selection A » B will occur 90% of the time and B » A 10%) of the time. If the amplitude of the probability curve is higher for a speaker who is performing the fictitious task, then the degree of overlap between the two curves will be greater, and the chance of a selection in which B » A will be higher, even though the grammar itself has not changed. This would induce a greater degree of utterance-to-utterance variation. If the amplitude also varies from one speaker to another, depending on the degree of stress and difficulty that the task imposes, we will see speaker-to-speaker variation as well. In chapter 6 we turn our attention to the flip side of the irregularity problem that we have been examining so far. In this and previous chapters we looked at cases in which outputs derived by the grammar, which we expect to show regular patterns, show irregularities. In chapter 6 we look at the converse kind of case: where underived words consisting of single morphemes exhibit patterns that favour certain kinds of surface forms over others.  142 8. Statistically underrepresented pitch accent patterns in monomorphemic Yamato nouns Our account of rendaku blocking in noun-noun compounds in chapter 3 was based on the hypothesis that the lexicon can encode patterns of blocking by adding features to the lexical listing of a compound word. In that account it was proposed that the lexicon interacts with the grammar by contributing to creating variable effects in compound words.  In this section I shall examine patterns of pitch accent that occur in nouns of Yamato origin. I shall show that rather than exhibiting the expected random frequencies of pitch-accent patterns, certain patterns occur much more frequently than others, as i f the lexicon were showing the effects of some phonological processes. We normally expect the lexicon to look like a random collection of inputs that is not affected by the grammar. In our account of rendaku blocking in noun-noun compounds, lexical exceptions to rendaku voicing were seen as an encoding by the lexicon of failed instances of the phonological process of rendaku voicing. The effect of the lexicon in this case was to deregularize a phonological process. In pitch-accent patterns of simplex nouns, we shall find evidence of an interaction between the grammar and the lexicon in the opposite direction: in this case, the grammar makes the lexicon appear less random and more patterned. The idea of the lexicon being something random and irregular is itself not a new one. For example, Bloomfield (1933) says that the lexicon "...is really an appendix of the grammar, a list of basic irregularities." But this original concept of the lexicon did not imply that there must be no statistical skewing of input types. In the work of Jakobson, (e.g. Jakobson (1941) marked features or combinations of features were expected to occur less frequently as underlying forms than unmarked ones as a matter of principle. In the framework of Optimality Theory (Prince & Smolensky (1993)), the idea that the lexicon is random and irregular is carried a step further, through its principle of "Richness of the Base", which requires that there be no restriction on inputs. Richness of the Base manifests itself in allowing more than one possible input to be considered for a given output. It also makes the prediction that no input type should naturally occur more frequently in the lexicon than any other input type. This means that markedness cannot be encoded through having marked feature combinations occur less frequently in the lexicon; it is only through the effects of the grammar in determining outputs that we should see restriction of markedness. Because Optimality Theory prohibits consrtraints on the lexicon, we should expect any possible input type to occur with equal ff eqency, with no statistical skewing of distribution of input types. In earlier linguistic theories, going back to the work of But no such principle is possible in Optimality Theory because of its countervailing  143 If we are to maintain this principle of no restriction on inputs in the OT framework, then we must explain the fact that input types often appear to be statistically skewed. A n example of this is the distribution of accent types in Yamato simplex nouns: we find that in output forms of simplex nouns certain patterns are greatly underrepresented and others are greatly overrepresented — as if the lexicon were not a set of inputs but rather a set of outputs that had undergone a phonological process. For example, trimoraic nouns can have four different possible pitch accent patterns: unaccented, initially accented, medially accented, and finally accented. We expect each pattern to occur in roughly one fourth of a random sample of bimoraic nouns. But when we look at trimoraic nouns of Yamato origin, we shall find statistical skewing of pitch accent patterns. By also examining pitch accent patterns of words in classical Japanese, I will show that this statistical skewing of patterns in modern Japanese dialects cannot be simply inherited from an earlier stage of Japanese but must have developed in response to phonological constraints that would act to avoid certain output forms. For example, in Tokyo dialect medially accented words are quite rare. If medial accent violates certain constraints in the grammar of Tokyo Japanese, then the grammar must have sought to avoid such violations when words of an earlier stage of Japanese were reanalysed in the pitch-accent patterns of modern Tokyo Japanese.  In general, nouns in the Yamato lexicon show a statistical bias towards surface forms that avoid certain combinations or patterns of features. These facts of Japanese are examples of a more general phenomenon found across languages: that an inventory of simplex lexical items often shows a statistical bias that favours or disfavours particular patterns or combinations of features. A well-known example is the co-occurrence restrictions on consonant place features in Arabic roots, examined originally by Greenberg (1950) and later analysed by McCarthy (1991) (see page 6). Recent analyses of these kinds of phenomena such as Frisch, Broe & Pierrehumbert (1997) have had to recourse to models of analysis that lie outside mainstream generative theories such as Optimality Theory, (at least in its conventional form) or rule-based systems. Their analysis departs from Optimality Theory in that the ranking level of a constraint directly determines, through a mathematical function, the frequency of words in the lexicon that violate that constraint. Through this function, the percentage of words in the lexicon that are permitted to violate some filter or constraint is directly determined by the strength of the constraint. As they put it: "... the type of constraint families we develop rely crucially on the idea that the attested words constitute a random sampling of the possible forms, with a constraint constituting a bias on the random sampling. This concept of a constraint is not available in Optimality Theory..." In this chapter I will address the following question: is statistical bias in output forms of a set of simplex lexical items a phenomenon that requires an extra-grammatical explanation, or can it be explained through a conventional set of phonological constraints?  144 The hypothesis that I will pursue regarding this question is the following. A speaker's language faculty not only enables them to generate correct output forms of natural language through a grammar applied to a set of input forms (the lexicon), but speakers also possess an awareness of statistical frequencies of patterns in a set of output forms in their language. This awareness enables them to modify their grammar in such a way that they can derive a statistically biased set of outputs (such as what we observe for accent patterns of simplex Yamato nouns) from an unbiased set of inputs. I will begin in §8.1 by examining the kinds of pitch accent patterns that occur for trimoraic nouns in various Japanese dialects, and show that for most dialects, there are certain patterns that are underrepresented, and others that are overrepresented. In §8.21 will show that for both 11th century Kyoto Japanese and for modern Tokyo dialect, the skewed distribution of accent patterns of trimoraic nouns can be derived from an unbiased complete set of mathematically possible input forms. I will then show that by modifying the proposed constraint hierarchy, the distribution of accent patterns for bimoraic words in modern Tokyo dialect can also be accounted for. In §8.3 I will show that underrepresented accent patterns in Tokyo dialect cannot simply be inherited from underrepresented patterns in an earlier form of Japanese. On the contrary, when we examine the set of words in a given accent pattern of 11th century Kyoto Japanese, we find that their historical reflexes in modern Tokyo Japanese tended to (a) converge on accent patterns in modern Tokyo dialect that are more harmonic in that grammar, and (b) avoid accent patterns that are less harmonic in that grammar. In §8.4 I will discuss the issue of lexicon optimization. I will advance here a hypothesis that lexicon optimization does not occur for input forms in deriving pitch-accent types. I will argue that this hypothesis is still consistent with our account of rendaku voicing and blocking in chapter 3, which depends on the idea that some obstruents are underspecified in the input for voicing. 8.1 Variable effects in Japanese pitch-accent patterns in trimoraic nouns In this section I will show pitch-accent data for trimoraic Yamato nouns in various dialects of Japanese. This data will show that the pitch accent patterns in a given dialect do not occur with equal frequency. For example, suppose that there are q different surface accent types of trimoraic nouns in dialect D, where each surface type has the same arrangement of H and L tones linked to moras. Set of surface types: S = {S,, S ,... S } 2  q  If dialect D contains r trimoraic nouns, and if each surface type occurred with equal frequency, the number of nouns in the dialect of each surface type Sj would be the same number n, where  145 n = r/q. N(Sj) = number of trimoraic words in D of type Sj. N(S ) = N(S ) = ...=N(S ) = n = r/q. 1  2  q  But this is not what we find. In most dialects, we find that the number of trimoraic Yamato nouns N(Sj) of each surface type Sj, will often be considerably higher or lower than n. In terms of frequency of occurrence, the set of surface types S can be divided into two subsets, a set of overrepresented types, and Z , a set of underrepresented types. 2  I f S e E „ then N(Sk) ^ n. IfS!6S ,thenN(S,)<n. k  2  (142)Overrepresented and Underrepresented Surface Types  Overrepresented types  Surface type  Number of occurrences  s,  ^n  s,  ^n ^n  Underrepresented types  s„  ^n  s  <n  v  <n <n <n  There are two possible hypotheses to account for this statistical skewing of surface forms. Let us first consider the following hypothesis: that the statistical skewing of surface types in (142) is due to similar skewing of input types as shown in (143).  146 (143) Input type  Number of occurrences  Grammar:  Surface type  Number of occurrences  ...P»Q... Overrepresented types  I,  N(I,)*n  s,  N(S * n  L  N(L) ^ n  s  N(S„ ;> n  7  ^n  ^n I.. Underrepresente d types  N(I„) * n  s„  N(S„) ± n  N(L) < n  s  N(S ) < n  v  V  N(I ) < n  N(S ) < n  <n  <n  V+1  V+1  \  n  N(IJ < n  s„  N(S„) < n  In (143), each output type Sj is derived from an input type lj by the grammar, a hierarchy of phonological constraints {...P » Q ...}. Each output type Sj occurs with a particular statistical frequency — because its corresponding input type occurs with a similar frequency. In order to explain the skewed frequency of input types, we must appeal to some extra-grammatical cause such as historical factors. But such a historical account will be interesting only i f we can eventually explain why the frequency of input types became skewed in the first place. It will not be satisfactory simply to say that the frequency of input types in dialect D is skewed because it was also skewed in the historical antecedent of D. As I shall show in §8.2, there is statistical skewing of surface types even in the earliest form of Japanese for which we have a large corpus of reliable data: 11th century Kyoto Japanese. Let us now consider a second hypothesis, namely, that input types are not statistically skewed at all, but rather it is only output types that are statistically skewed. The mismatch in statistical frequency between input types and output types is due to the effects ofthe grammar: a hierarchy of phonological constraints that derives output forms from input forms. This grammar will act in such a way that more than one output form maybe derived from the same input form. This will have the result that some output forms will occur more frequently than others, even if every input form occurs with the same frequency. This is the hypothesis that I will pursue here.  Consider again what an unbiased set of input types would look like. Suppose that in dialect D there are p mathematically possible input types for trimoraic nouns, where input types vary according to the arrangement of H and L tones on a sequence of moras. If these input types occur with a statistically unbiased distribution for the complete set of trimoraic nouns for a speaker of  147 dialect D, then there should be the same number, n, of nouns of each input type, as shown in (144). (144) r trimoraic words in D p input types r/p = n words of each input type (145)Occurrences of input types Input type  Number of occurrences  I,  =n  T  =n  I,  =n =n =n  L,  What I argue is that for each of the dialects I will examine, there exists a grammar that can account for the skewing of output forms without requiring any skewing of input forms. I will show in detail what form such a grammar might take for some of the dialect in the survey: specifically, 11th century Kyoto Japanese and modern Tokyo dialect. In (146) below, I will take p to be the number of possible input types and q to be the number of different occurring output or surface types. Because some input types may converge on just one surface type, p is likely greater than q. Thus, I distinguish between m = r/p, and n = r/q, where r is the total number of trimoraic nouns in the dialect. If there is an unskewed distribution of input types, there will in fact have to be convergence of some input types on one output type to account for skewing of output types. For example, suppose that the grammar determines that input forms L, I , and I, all surface as output type S and that only input type I surfaces as output type S . Then S will occur three times as frequently as S . S is derived from m + m + m different inputs; S is derived from only m different inputs. k  h  x  w  x  w  x  w  148 (146)Convergence of input types Input type  Number of occurrences  Grammar:  Surface type  Number of occurrences  s,  N(S * n  s,  N(S„ ^ n  ...P»Q... I,  N(I,) = m  I?  N(L) = m  I,  N(I,) = m  I  4  N(I ) = m  I  5  Overrepresented output types  n  ^n  4  N(L) = m Underrepresented output types  s„  N(S„) z n  s  N(S ) < n  v  V  N(S ) < n V+I  <n L,  N(S ) < n  N(L) = m  n  In §8.31 will show how the skewed distribution of output types in several dialects can be derived from an un-skewed set of inputs. But first, let us examine how accent patterns occur in Yamato trimoraic nouns in various dialects of Japanese. 8.1.1 Occurrence of accent patterns of trimoraic words in various Japanese dialects: 8.1.1.1 Types of modern Japanese dialects Below I present pitch-accent data from two of the three main types of Japanese dialects. (See also Uwano (1999) for a detailed description of the pitch accent patterns of a large number of modern Japanese dialects.) The two types I shall examine are Tokyo-type dialects and Kyoto-type dialects. In this study I did not consider data from the third type: southern dialects, (e.g. Kagoshima and Shuri) which have just two possible pitch accent patterns for a word of any prosodic length. Tokyo type dialects have two types of pitch melodies, which have been traditionally called "atonic" (LH) and "tonic" (LHL). For the latter type, the location of the fall from H to L can occur on any mora. 70  In a bimoraic syllable (e.g. consisting of a long vowel imooto "younger sister", diphthong kai "shell", mora nasal in the coda hon "book", or coda consonant that is part of a geminate kitto (continued...) 70  149 8.1.1.2 Tokyo dialect Tokyo dialect, which has become the standard dialect of Japan, has what have been traditionally called "atonic" phrases, which have an initial low tone followed by high tones; and "tonic" phrases, with an initial low that rises to high and then falls to low again at some location within the word or (if the accent is word-final) immediately after the word. Thus, a trimoraic word (with monomoraic syllables: see ftn. 70) has four possible pitch accent patterns: unaccented (atonic LHH), pitch fall after the first mora (HLL), pitch fall after the second mora (LHL), and pitch fall after the third mora, which will require an enclitic such as a case-particle to bear the low pitch after the fall. For the time being, I will refer to Tokyo pitch accent types as a sequence of H and L tones with each mora being linked to a H or L tone. Thus, for example, " H L L " refers to the representation in (147). (147) u u u H  L  As described by Pierrehumbert and Beckmann (1988) (henceforth P&B), pitch-accent in modern Tokyo Japanese consists of a sharp fall of pitch from high to low tone. Such a H L tone sequence can only occur once within a certain prosodic domain, which P & B call the "accentual phrase". Beyond the fact that the accentual phrase can have at most one pitch accent, neither P & B nor previous researchers such as McCawley (1968) (who calls this domain a "minor phrase") appear to define the accentual phrase in any other way. In P&B's analysis, the accentual phrase is a higher prosodic category than the Phonological Word . For our purposes here in examining monomorphemic nouns, we can consider a noun or noun plus clitic to consist of one accentual phrase. 71  Accentual phrases begin with a "boundary" low tone. Unaccented phrases contain a "phrasal" high tone, rather than a "lexical" high tone, but based on phonetic measurements, P & B (1988:48) state that the phrasal high has a lower F° than the high tone of a pitch accent. It is not clear whether (a) the phrasal H differs phonologically from an accentual high and must be given a different featural description, or, (b) whether the difference is an allophonic one, where H before  (...continued) "surely") there can be no tonal change within the syllable. Thus, even if we assume that tone is linked to the mora, it is more proper to say that a fall from H to L in Tokyo dialect can occur on any syllable. (See McCawley (1968) for further discussion.) 'That the domain of pitch accent is at least as large as the phonological word seems uncontroversial. To the extent that the phonological word in Japanese matches the morphological word, it is clear that morphological words can receive at most one pitch accent. 7  150 a L tone is realized at a higher F° level. P & B argue that vowel pitch is underspecified even at surface levels: that is, apart from the boundary low tone, the phrasal high tone, and their proposed H L sequence of a pitch accent, the pitch of other moras is interpolated for values that will create an appropriate rise or fall between moras whose tones are specified. In addition, a pitch accent can only occur once within a certain prosodic domain. I will further discuss Pierrehumbert & Beckmann's analysis of modern Tokyo pitch accent on page 181, where I will elaborate further on their surface underspecification hypothesis. At that point I will propose a surface phonological representation for each accent type that is consistent with their findings. This representation has some moras that are not specified for a tonal value at the surface. In the meantime, however, I will use a notation to describe accent patterns that have a tonal feature associated with every mora. For example, an initially-accented pattern on a trimoraic word is described as "HLL", with a H tone linked to the first mora, and a L tone to the second and third moras. Tokyo-type dialects differ from southern dialects such as Shuri and Kagoshima in that the location of accent (pitch fall) in a tonic word is not predictable in Tokyo. 8.1.1.3 Kyoto-type dialects Pitch melodies in Kyoto-type dialects differ from those in Tokyo-type dialects in that they are classified along not just one, but two dimensions: (a) an atonic/tonic distinction and (b) a socalled "register" distinction. As for the latter distinction, Kyoto-type dialects have what have been traditionally called "HIGH" register phrases and "LOW" register phrases: Words in a HIGH register begin on a high pitch. If they are tonic, they have a pitch fall to low pitch at some location within the phrase. If they are atonic, they are high pitch throughout. Words in a L O W register begin on a low pitch. If they are tonic, they have a pitch rise to high at some location within the phrase, followed by a return to low pitch, which is maintained until the end of the phrase. If they are atonic, they are low except for the last mora which rises to high pitch. 8.1.1.4 Historical relationship among modern dialects and historical connections to Proto-Japanese Martin (1987:246) says that "most scholars assume that the pitch patterns of Proto-Japanese ... are essentially those of 11th century Kyoto" and cites Hirayama (1968:41) who posits a typological evolution of accent types: Kyoto type - Tokyo type - Kagoshima (southern) type. Proto-Japanese is considered to have had both the register distinction and the accent locus distinction. According to the conventional hypothesis, the former was lost in Tokyo and the latter lost in southern dialects such as Kagoshima dialect which has only a register distinction. Martin gives accentual data for both 11th century Kyoto speech and for modern Kyoto Japanese. We  151 shall consider both dialects here, which are distinct, although they show similarities. Hirayama's hypothesis considers modern Kyoto dialect to represent a more conservative form of Japanese that was prevalent at an intermediate stage between 11th century Japanese and modern dialect. Thus, in tracing the historical evolution of accent types that lead up to modern Tokyo Japanese, we should look at modern Kyoto dialect as the representative of an intermediate stage between 11th century Japanese and modern Tokyo dialect. 72  8.1.1.5 Accent types for trimoraic words in modern dialects The following data show the accent patterns that occur for trimoraic words in one modern Kyototype dialect and several modern Tokyo-type dialects that are found in Martin's (1987) database of Japanese nouns for which there is a large enough sample to be statistically significant. (The database for each dialect listed below contains at least 88 trimoraic words.) Only nouns that are monomorphemic were extracted from Martin's database. 8.1.1.5.1 Accent types for trimoraic words in modern Tokyo dialect In the Tokyo-type dialects the four possible accent patterns for trimoraic nouns are classified as follows : 73  unaccented (LHH) initially accented (HLL) medially accented (LHL) finally accented (LHH-L) (where the final L requires an enclitic to bear the tone)  The fact that register distinctions were apparently lost in Tokyo-type dialects and that locus distinctions were lost in southern dialects such as Kagoshima is actually more consistent with hypothesis that a Kyoto-type dialect diverged into a Tokyo type and a Kagoshima type rather than evolving from a Kyoto type to a Tokyo type to a Kagoshima type, as Hirayama suggests. Given the fact that we shall not be concerned here with Kagoshima type dialects, the important point, which is common to both hypotheses, is that Kyoto dialect represents an older form of Japanese from which modern Tokyo dialect apparently evolved. 72  73  "0" "1" "2" "3"  Martin (1987) refers to accent types with numbers, as follows: unaccented (LHH) initially accented (HLL) medially accented (LHL) finally accented (LHH-L)  152 8.1.1.5.2 Accent types for trimoraic words in modern Kyoto dialect 74  In Kyoto dialect there are four possible patterns for trimoraic nouns.  75  H H H (atonic HIGH) L L H (atonic LOW) L H L (tonic LOW) H L L (tonic HIGH: fall after 1 st mora) 8.1.2 Data on frequencies of accent types in modern Japanese dialects While we might expect to find that for a given dialect, each of the possible accent patterns will occur in roughly the same percentage of words, this does not happen. Some accent patterns are noticeably overrepresented for some dialects and others are noticeably underrepresented for some dialects. In the data summaries below, I also show the co-occurrence rates of each accent type crossreferenced with the presence or absence of a voiced obstruent in a word. We shall see that in some dialects, (for example, Tokyo dialect), the presence of a voiced obstruent in a word even further reduces the statistical frequency of underrepresented pitch-accent patterns. Although I shall not attempt to give a formal explanation here of why the presence of a voiced obstruent has an apparent effect on the robustness of a particular accent pattern, I shall include this data for the following reason. Given the likely possibility that the presence of an obstruent can effect the surfacing of particular accent patterns, we need to consider frequency data just for nouns that have no voiced obstruent present. We want to consider possible grammars that will derive attested accent patterns from possible inputs. If the presence of a voiced obstruent has an effect on such a derivation, we want to be able to examine control cases in which there is no voiced  Martin (1987) uses the following numbers to identify Kyoto accent types. For Kyoto accent, the number refers to the location of the first mora that has low tone. The pattern "1:3" means that the first and third moras both have a low tone that is not immediately preceded by another low tone. 74  "0" "1" "1:3" "2" "3"  H H H (atonic HIGH) L L H (atonic LOW) L H L (tonic LOW) H L L (tonic HIGH: fall after 1 st mora) H H L (tonic HIGH: fall after 2nd mora)  There is also a H H L pattern that occurs for a very small number of words, such as onna "woman." This pattern is so rare that I will not consider it here. 75  153 obstruent affecting the derivation.  8.1.2.1 Kyoto-type dialects: (1 example) Unaccented pattern is dominant. (148)Frequencies of accent patterns in modern Kyoto dialect MODERN KYOTO  total  HHH  LLH  LHL  HLL  HHL  no voiced obstruent  186  90 49%  18 10%  23 12%  52 28%  3 1%  voiced obstruent  148  61 41%  32 22%  19 13%  36 24%  0 0%  total  334  151 45%  50 15%  42 13%  88 26%  3 1%  For this accent type, we would expect, with random distribution of patterns, that each of the four patterns (excluding HHL) should occur 25% of the time. The unaccented pattern H H H occurs with much greater frequency than it would randomly, at 45% frequency. The patterns L L H and L H L occur less frequently than a normal distribution would predict. 8.1.2.2 Tokyo-type dialects fl1 dialects represented) The dialects shown below are classified according to the kinds of over- and underrepresentation of accent patterns that are found for each. 8.1.2.2.1 unaccented dominant: In this type of dialect unaccented words make up at least 49% of the total of trimoraic words and occur at least twice as often as any other accent pattern. For some dialects there is an additional characteristic: one accent pattern is underrepresented compared to all the others. In Tokyo dialect, medial accent is avoided: it occurs for only 4% of total number of words and for only 12% of the words that have one of the three accented patterns.  154 (149)Frequencies of accent patterns in trimoraic words in Tokyo dialect TOKYO: TRIMORAIC WORDS  total  unaccented LHH  initial accent HLL  medial accent LHL  final accent LHH(L)  no voiced obstruent  199  124 62%  31 15.5%  13 7%  31 15.5%  voiced obstruent  146  107 73%  29 20%  1%  0%  231 67%  60 17%  4%  40 12%  345  total  <>  (150)Frequencies of accent patterns in Hiroshima dialect HIROSHIMA  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  73  45 62%  8 11%  6 8%  14 19%  voiced obstruent  38  20 53%  4 11%  7 18%  7 18%  TOTAL  111  65 59%  12 11%  13 12%  21 19%  For this dialect, the unaccented pattern is dominant and the three other patterns fairly evenly split in frequency.  (151 frequencies of accent patterns in Izu dialect IZU  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  70  40 57%  7 10%  9%  17 24%  voiced obstruent  38  20 53%  4 11%  7 18%  7 18%  total  108  60 56%  11 10%  13 12%  24 22%  Izu dialect has patterns of frequency similar to those of Hiroshima, but among accented patterns,  final accent is favoured:  (152)Frequencies of accent patterns in Matsue dialect MATSUE  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  60  38 63%  0 0%  11 18%  11 18%  voiced obstruent  34  16 47%  2 6%  8 24%  8 24%  total  94  54 57%  2 2%  19 20%  20 21%  153)Frequencies of accent patterns in Izumo Dialect IZUMO  total  LHH  HLL  LHL  LHH(L)  no voiced  61  44 72%  0 0%  9 15%  8 13%  voiced  51  20 39%  1 2%  11 22%  9 18%  total  111  64 58%  1 1%  20 18%  17 15%  In both Matsue and Izumo dialects initial accent is rare; the other two accented patterns are evenly split.  (154)Frequencies of accent patterns in Hamada dialect HLL  LHL  LHH(L)  31 53%  8%,  10 17%  13 22%  37  16 43%  6 16%  6 16%  9 24%  96  47 49%  11 11%  16 17%  22 23%  HAMADA  total  LHH  no voiced obstruent  59  voiced obstruent total  In Hamada dialect, the unaccented pattern (LHH) is dominant, the other three accent patterns are  156 fairly evenly split, with initial accent (LHH) being somewhat more underrepresented. 8.1.2.2.2 unaccented most common but not as dominant For this group the unaccented pattern occurs the most frequently but it is not as dominant as in the previous group. Here its occurrence ranges from 39% of the time for Akita and Hatto 1 to 42% of the time for Sapporo. In addition, Akita underrepresents initial accent, Sapporo underrepresents initial and final accent, and Hatto slightly underrepresents initial and final accent. (155)Frequencies of accent patterns in Akita dialect AKITA  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  56  26 46%  2%,  12 21%  17 30%  voiced obstruent  32 25%  6 19%  11 34%  7 22%  total  88 39",,  7 8%  23 26%  24 27%  (156)Frequencies of accent patterns in Hatto dialect variant 1 HATTO 1  total  LHH  HLL  LHL  I.TIH(L)  no voiced obstruent  50  23 46%  8 16%  16 32%  3 6%  voiced obstruent  39  12 31%  7 18%  11 28%  9 23%  total  89  35 39%  15 17%  27 30%  12 13%  157 (157)Frequencies of accent patterns in Sapporo dialect HLL  LHL  LHH(L)  34 45%  9%  26 35%  8 11%  47  17 36%  10 21%  18 38%  2 4%  122  51 42%  17 14%  44 36%  10 8%  SAPPORO  total  LHH  no voiced obstruent  75  voiced obstruent total  8.1.2.2.3 No strongly dominant accent pattern:: at least three patterns occur within a 5% frequency range For Narada, medial accent is somewhat underrepresented; the other patterns occur at roughly similar rates. (158)Frequencies of accent patterns in Narada dialect NARADA  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  63  18 29%  19 30%  8 13%  18 29%  voiced obstruent  51  16 31%  14 28%  9 18%  12 24%  TOTAL  114  34 30%  33 29%  17 15%  30 26%  8.1.2.2.4 medial accent dominant: (159)Frequencies of accent patterns in Aomori dialect AOMORI  total  LHH  HLL  LHL  LHH(L)  no voiced obstruent  66  18 27%  8 12%  26 39%  14 21%  voiced obstruent  58  13 22%  10 17%  23 40%  12 21%  total  124  31 25%  18 15%  49 40%  26 21%  158 pattern 2 occurs 2.7 times as often as pattern 1  8.1.2.2.5 Summary of data on Tokyo-type dialects: Dominance of one pattern: For 6 dialects the unaccented (LHH) pattern is dominant and for 3 more it occurs the most frequently. For 1 dialect medial (LHL) accent is dominant, occurring 1.6 times as often as any other pattern. Underrepresentation: Tokyo medial Matsue, Izumo Akita Sapporo  (LHL) accent 4% initial (HLL) accent 2%, 1% initial (HLL) accent 8% final (LHH(L)) accent 8%  slight underrepresentation: Narada Aomori  medial (LHL) accent 15% (vs. 30%, 29%, 26%) initial (HLL) accent 15% (vs. 25%, 40%, 21%)  In summary, we find that the majority of Tokyo-type dialects show at least one instance of a significant underrepresentation or overrepresentation of accent type. As discussed on pages 146ff, I will ultimately account for these kinds of skewed distributions of accent types through the effects of phonological constraints, which can derive a skewed distribution of outputs from an unskewed distribution of inputs. But first, in the following section, I will show that statistical skewing cannot simply be a reflex of over- or underrepresented accent patterns in an earlier stage of Japanese. 8.2 Statistical skewing cannot simply be a reflex of over- or underrepresented accent patterns in an earlier stage of Japanese 8.2.1 Correlation between accent patterns in modern Tokyo dialect and accent patterns in 11th century Japanese. In this section I will explore the possibility that the skewing of accent pattern frequencies in modern Tokyo dialect is due to the inheritance of skewed patterns from 11th century Kyoto Japanese. I will show that this hypothesis cannot be maintained.  159 8.2.1.1 Accent patterns in 11th century Kyoto Japanese Let us first examine the range of accent types for the earliest form of Japanese for which we have reliable pitch-accent data: 11th century Kyoto Japanese. In this historical period, Kyoto with its surrounding area was the political, cultural, and demographic centre of Japan. What is now the Tokyo area was relatively sparsely populated at this time and the site of modern Tokyo was little more than a tiny fishing village. Accordingly, 11th century Kyoto Japanese would have by far the greatest linguistic influence on subsequent modern dialects among the various forms of Japanese that were spoken in the 11th century. 76  Abundant data for 11 th century Japanese accent patterns is found in Martin (1987). The following chart shows the total number of trimoraic words of each accent pattern of 11th century Japanese that are found in Martin's database of nouns, with one column for each 11th century pattern.  (160)Frequencies of accent patterns in 11th century (Kyoto) Japanese Accent pattern  HHH  HHL  HLL  LHH  LHL  LLH  LLL  TOTAL  Number of words  68  26  11  36  27  35  80  283  Percentage  24%  9%  4%  13%  10%  12%  28%  100%  If all seven accent patterns had equal frequency, we should expect a rate of about 14% for each of them; on the contrary, we find that the H H H and L L L patterns are both overrepresented, with rates of frequency of 24% and 28%) respectively. Thus we find over- and underrepresentation of accent patterns for trimoraic words going back to 11th century Kyoto Japanese. The question we want to investigate is the following. Is the over- and underrepresentation of accent patterns in modern Tokyo Japanese directly explainable through the inheritance of overand underrepresented patterns in 11th century Japanese? In order to explore the correlation between accent patterns in the two dialects let us examine the following chart. This chart is based on pitch accent data for various dialects of Japanese in Martin's (1987) database of Yamato nouns. It shows how many words of each 11 th century accent pattern developed into each possible accent pattern in modern Tokyo dialect.  See, for example, Ivan Morris' English translation of the Sarashina Nikki diary, which includes a detailed description of the route from what is now the Tokyo area to the capital in Kyoto. 76  160  (161)Correlation of accent patterns in cognates between 11th century Kyoto dialect and modern Tokyo dialect =========== HHH  11th century  HHL  HLL  LHH  3  32  LHL  LLH  LLL  17  37  patternTokyo pattern i 77  78  LHH  63  HLL  3  3  7  4  7  14  14  LHL  0  0  0  0  1  1  6  LLH(L)  2  4  1  0  2  3  23  total  68  26  11  36  27  35  80  1 9  1 ?  We find that for three of the 11th century accent patterns the majority of words have one particular corresponding accent pattern in modern Tokyo dialect, with just a small percentage of each group deviating from this pattern. For example, of 68 words that were H H H in 11th century Japanese, 63 cognates (93%) are unaccented (LHH) in Tokyo dialect. The less common patterns such as H L L and L H L were more variable in what Tokyo pattern was taken on by the words of this class. And the L L H pattern seemed to split into two patterns in Tokyo: unaccented (LHH) (17) and initially accented (HLL) (14). The L L L pattern also diverged mainly into two patterns in Tokyo dialect: initially accented (HLL) (14) and finally accented (23). 11th cent H H H --> H H L --> L H H --> L H L --> L L H --> L L L -->  Tokyo LHH (Uncommon) LHH LHH L H H or H L L H L L or HHH(-L)  Yet there is no evidence that any ofthe under- or overrepresented patterns of Tokyo dialect are due to their being reflexes of a similarly over- or underrepresented pattern in 11th century Japanese.  A further 7 words are listed in Martin (1987) with the pattern "HHx", where the nature ofthe tone on the third mora is not clear. 77  A further 3 words are listed by Martin with the pattern "LHx", meaning that the nature of the tone on the third mora is not clear. 7 8  161 Consider, for example, medial (LHL) accent in Tokyo, which is greatly underrepresented at 8%. Of the 8 words of this pattern represented in the chart above, 6 of them have a L L L pattern in 11th century Japanese. Yet the L L L pattern is anything but underrepresented in 11th century Japanese, being the most common of all 7 patterns. Thus the underrepresentation of medial (LHL) accent in Tokyo cannot be due to this pattern being correlated with a similarly underrepresented pattern in an ancestor language. A more plausible hypothesis is that for some phonological reasons, all 11th century accent patterns, when they evolved into Tokyo dialect, avoided a medial (LHL) accent pattern. Consider now the overrepresentation of the unaccented (LHH) pattern in Tokyo dialect. We cannot simply say that this is due to the fact that most of the words with a H H H pattern in 11th century Japanese developed into this pattern in Tokyo. If 11 th century H H H were the only pattern that became unaccented (LHH) in Tokyo, then we would have 63 words out of a total of 283 trimoraic words that were unaccented (LHH) in Tokyo dialect: only 22% of the total. If the abundance of unaccented (LHH) trimoraic words in Tokyo dialect is connected with the accent patterns of 11 th century Japanese, it is because for six of the seven accent patterns in 11 th century Japanese, a substantial number of words (in many cases the majority) developed into unaccented (LHH) words in Tokyo dialect. Once again the skewed pattern is arguably due not to a reflex of some earlier historical distribution but instead to some phonological effect that influenced the process by which an earlier accent pattern changed into a later one. The following chart shows the rate at which each accent pattern in 11th century Japanese became an unaccented (LHH) pattern in modern Tokyo Japanese.  162 (162)Rates of correlation between 11th century Kyoto accent patterns and modern Tokyo Accent pattern in 11th century Japanese  Number of trimoraic words in 11th century with that accent pattern  Number of words that became unaccented words (LHH) in Tokyo Japanese  Percentage of words of that pattern that became unaccented words (LHH) in Tokyo Japanese  HHH  68  63  93%  HHL  26  19  73%  HHL  11  3  27%  LHH  36  32  89%  LHL  27  17  63%  LLH  35  17  49%  LLL  80  37  46%  We can see from this chart that the overrepresentation of the unaccented (LHH) pattern in modern Tokyo dialect is due not to its inheritance from one particular overrepresented pattern in 11th century Japanese but to the fact that many different accent patterns from 11th century Japanese evolved into an unaccented (LHH) pattern in modern Tokyo dialect at a substantial rate. 8.2.2 Accent patterns in modern Akita dialect Let us consider now a second modern dialect that has statistically skewed frequencies of accent patterns for trimoraic nouns. In Akita dialect, initial (HLL) accent is underrepresented, as we saw in (155), repeated here as (163).  163 (163)Frequencies of accent patterns in Akita dialect revisited HLL  LHL  LHH(L)  26 46%  2%  12 21%  17 30%  25%  6 19%  11 34%  7 22%  39%  7 8%  23 26%  24 27%  AKITA  total  LHH  no voiced obstruent  56  voiced obstruent  32  total  88  Let us examine the possibility, as we did for modern Tokyo dialect, that the underrepresentation of initial (HLL) accent in trimoraic nouns in Akita dialect occurs because initial (HLL) accent in Akita is the reflex of some underrepresented accent pattern in 11th century Japanese. The following list shows some of the few initially accented trimoraic words in Akita alongside the accent pattern of the same word in 11th century Japanese. initially accented words in Akita (HLL):  word  meaning  kitune fox abura oil hibari skylark kuzira whale suzume sparrow  11th cent, pattern LHH LLH LHH LHL LHH  frequency  of 11th cent, pattern  36 35 36 27 36  We find that none of these words are derived from a particularly rare pattern in classical Japanese. Thus, for Akita dialect as well, there exists an underrepresented accent pattern for trimoraic nouns whose low statistical frequency cannot be due to its being a reflex of some underrepresented pattern in 11th century Japanese. In the next section I will pursue a different hypothesis to account for the statistical skewing of accent patterns in modern dialects: namely that when we consider the possible input forms for the inventory of Yamato nouns in a dialect such as modern Tokyo, we are not forced to posit any statistical skewing of frequencies of input forms. We can derive the over- and underrepresentation of surface forms of monomorphemic nouns through the effects of phonological constraints.  164 8.3 The Derivation of a biased distribution of accent patterns We saw in the foregoing data in this chapter that there is a bias in statistical distribution of accent patterns in various dialects of Japanese going right back to 11th century Kyoto Japanese. Let us now examine the question of whether these biased sets of output forms necessarily require us to posit similarly biased sets of input forms. 8.3.1 Distribution of accent patterns in 11th century Kyoto Japanese Consider again the frequency of output forms for trimoraic nouns in 11th century Kyoto. The table in (160) is repeated below in (164). (164)Frequencies of accent patterns in 11th century (Kyoto) Japanese revisited Accent pattern  HHH  HHL  HLL  LHH  LHL  LLH  LLL  TOTAL  Number of words  68  26  11  36  27  35  80  283  Percentage  24%  9%  4%  13%  10%  12%  28%  100%  Notice the fact that the accent pattern H L H does not occur at all in the data. There are two possible ways of explaining this. One is input forms correspond closely to output forms and that the inventory of possible input forms has a mysterious gap: the absence of a H L H pattern. The second possibility is that there are no restrictions on possible input forms and that output forms are subject to phonological constraints that may prevent certain patterns from surfacing. The option I will pursue here is the second one, which is standard in generative phonology. This option affords the possibility of explaining why there are gaps in possible types of inputs. If we allow the possibility that apparent gaps in input forms are in reality gaps in output forms, then we can account for these gaps through the effects of the grammar. That is, our answer will be that there actually are no gaps in possible input forms: it only appears this way because certain input forms are not allowed by the grammar to surface with the same form as their input. For example, we can allow the possibility that a form H L H can exist in the input, but that it will have a different output form because certain constraints will disallow a H L H output form. By allowing no restriction on possible types of inputs, the mystery of gaps and statistical biases in frequencies of input forms can be explained. In effect, we are adopting a strong version ofthe principle of "Richness of the Base" in Optimality Theory (Prince & Smolensky (1993.)) (See also page 31.) According to this principle, we allow any possible type of phonologically well-formed input to occur, with no restrictions. To make a move towards allowing more than one possible input for a given output requires us  165 to abandon at least a strong version of the principle of Lexicon Optimization (Prince & Smolensky (1993)). I will consider, for trimoraic nouns in 11th century Kyoto dialect, a full range of input types that have three possibilities for each mora: (a) not linked to a tone, (b) linked to a H tone, and (c) linked to a L tone. This gives us 3 to the third power or 27 possible inputs for trimoraic nouns. Lexicon Optimization would only allow one possible input for each possible output, or seven possible inputs, since there are seven possible outputs. If we were to adopt the strongest version of Lexicon Optimization, which insists on one optimal input for each output type, we would lose the ability to explain gaps or statistical biases in the lexicon. In order to consider the twenty-seven possible tonal inputs for trimoraic nouns listed above, we must reject a strong version of Lexicon Optimization. Partly for the sake of ease of exposition, we shall still to some degree restrict the inputs that we take under consideration for the natural class of outputs that we are examining (i.e outputs that are trimoraic and have the syntactic category [+N,-V].) For example, we shall not consider inputs with moras linked to multiple tones or inputs with floating tones. In effect, we shall restrict the inputs under consideration to a natural class of possible inputs: representations in which any tonal features that occur are singly linked to a mora to which no other tone is linked. This will be done not so much for the sake of optimizing the lexicon but rather to make manageable the problem of how a grammar might make an unrestricted set of inputs converge on a small finite number of output forms. By adopting the hypothesis that a full range of possible input forms can occur, we shall have the possibility of accounting not only for apparent gaps in the range of inputs, but also for statistical imbalances in that inventory. What we will investigate is the possibility that there is a hierarchy of phonological constraints from which an unbiased set of input forms for trimoraic nouns can derive a set of output forms that exhibit the kinds of statistical patterns we see in (160). In the next subsection I shall present a hypothesis for the derivation of surface pitch accent patterns for 11th century Kyoto trimoraic nouns that can explain their statistical distribution. 8.3.1.1 Derivation of 11th century Kyoto accent patterns for trimoraic nouns from an unbiased set of inputs For trimoraic words, let us consider the following set of mathematically possible input forms. We shall consider only tones that are underlyingly linked to moras, since there is no evidence, for nouns, as there is for verbs, of floating tones whose docking location depends on the prosodic structure ofthe word. I will also not consider input forms in which more than one tone is linked to the same mora. In this respect, I am adopting a "weak" version of Lexicon Optimization that restricts inputs, not with respect to individual lexical items, but with respect to a natural class of output forms.  166 Given these assumptions, each mora has three possibilities: (a) it is not linked to any tonal feature, (b) it is linked to a L tone, and (c), it is linked to a H tone. This gives us the following 27 possible inputs.  (165) 1.  n  u  u  2.  u  p  u  u  p  p  H  L  H 3.  4.  P  P  p  H  5.  P I  H  6.  p  L  P  P  \ /  L  p  p  H  7.  8.  p  p  p  L  H  p  u  p  H  L  167 U  U  U  L H L  10.  u  p  p H  11.  12.  M-  u  u  L  H  H  H- H L  13.  u  u- H \ /  I  L  14.  H  H  H  M;  u  L  15.  p u p L  16. L  17.  u p. \ / L  u  18.  Ll  Ll  Ll  \ / L  19.  H  M-  \  20.  u  I /  L  u  Ll  Ll  Ll  u  \ / H  21.  u \ /  I  H  22.  u  L  LI  LI  \ / H  23.  24.  Ll  Ll  Ll  I  \ /  L  H  M-  H  v  H  I /  H  25.  Ll  H  Ll  Ll  H  169 26.  27.  u  u  u  H  L  H  u  u  u  L  L  For the time being, let us leave aside the question of statistical frequency of accent patterns and simply look at what kind of grammar could derive the set of possible output forms in (164) from the set of inputs in (165). 8.3.1.2 Proposed grammar for 11th century Kyoto Japanese First, the characterization of surface accent patterns for 11th century Kyoto Japanese given in Martin (1987) assumes that every mora is linked to a tone and that no moras have contour tones. If we adopt Martin's assumptions about how these tonal patterns were realized, the grammar we are seeking to capture must have the following undominated constraints: (166) Align(Tone,Left,Mora,Left): "The left edge of every tonal feature is aligned with the left edge of a mora." Violations of this constraint are calculated on the tonal tier. This constraint will be violated when there are two tones linked to the same mora:  / \ T, T  2  A contour tone like the one in (167) will violate this constraint, since the left edge of the H tone is not aligned with the left edge of the mora. (167)  p / \ L  H  (168) Mora-Tone: "Every mora must be on a path with a tonal feature." When a mora has no tone linked to it in the input, the grammar must determine not only that it will have a tone linked to it in the output, but it must also determine whether that tone should be  170 a H tone or a L tone. Making either H tone or L tone the default tone will enable us to derive all the seven possible output forms in (164); however, choosing L tone as the default tone will conflict less with the need to ban a H L H output, which has two H tones. For that reason, let us adopt L tone as the default tone through the following constraints. (See Pulleyblank (1986).) (169) * H : " A H tone must not occur in the output." (170) * L : " A L tone must not occur in the output." In order that L occurs as the default tone, the corresponding markedness constraint for L tones, *L, must be ranked below * H . In addition to assigning a preference to, say L tone over H tone as a default tone, the grammar must also determine what the output will be when both H and L tones occur in the input, but there is one mora with no tone linked in the input, as in (171) (171)  p p p I  I.  H  L  Even if * H » * L , these markedness constraints will not determine what kind of tone will occur on the middle mora in the output, since either of the two outputs below equally violates both * H and * L . 79  (172) (a) p p p H  L  (b) p p p H  L  Some ranking of the following four alignment constraints will determine which tone will spread to a vacant mora when there is a possibility of spreading from both the left and the right. (173) Align-H-Left-Wd-Left: The left edge of every high tone in the output must be aligned with the left edge of the word. (174) Align-H-R-Wd-R: The right edge of every high tone in the output must be aligned with the right edge ofthe word. (175) Align-L-Left-Wd-Left: The left edge of every low tone in the output must be aligned with the left edge of the word.  I f * H were interpreted as a Path constraint that was violated for each path from a H tone to a mora, then (a) would be preferred to (b). 79  171 (176) Align-L-R-Wd-R: The right edge of every low tone in the output must be aligned with the right edge of the word. At this point there is no particular reason why we should choose one particular relative ranking of these four alignment constraints over another. Any ranking that can determine whether a H or L tone will spread to a given mora will derive an attested accent pattern. For the time being, I will consider any ranking among the four constraints as possible. We would also expect that the seven possible output forms for trimoraic words in 11th century Kyoto would show some degree of faithfulness to input forms: that is, the patterns of H's and L's in output forms would have some resemblance to the arrangement of H's and L's in input forms. This kind of faithfulness would be determined by faithfulness constraints such as the following two constraints: (177) Max-Path-Tone: For every path of association P between a tonal feature T and a mora p in the input there is a corresponding path of association P' between tonal feature T' and mora u' in the output, where P corresponds to P', T corresponds to T' and p corresponds to p'. (178) Max-Tone: "Every tone in the input must occur in the output."  We also find that the tonal pattern H L H is absent. I will account for this fact by adopting the Generalized OCP of Suzuki (1998). Suzuki's proposals seek to explain what he calls "identity avoidance between two elements in sequence" ~ in this case as applied to a ban on the cooccurrence of multiple high tones within a domain. Suzuki's proposals differ from prior formulations of the OCP in which the elements that are prohibited from co-occurring must be adjacent on the same autosegmental tier. It also differs from proposals by Ito & Mester (1996) and Alderete (1996,1997) to explain OCP effects in terms of markedness. Suzuki's GOCP does not necessarily require what he calls "strict adjacency" between the occurrences of the element that is prohibited from multiply occurring. The GOCP can be sensitive to any sequence, including one with intervening material. Applied to high tones within a morpheme, the GOCP bans a sequence of H...H within a morpheme, even if a L tone intervenes between the two highs. (179) GOCP-H(morpheme): " A sequence of high tones (H...H) must not occur in the output." This constraint is undominated in 11th century Kyoto dialect, since a sequence of H's never occurs in the output.  172 The following summarizes the proposed constraint hierarchy for deriving the accent patterns of 11th century Kyoto.  (180) Undominated:  Mora-tone Align-Tone-Left-Mora-Left GOCP-H(morpheme)»  Max-Path-Tone, Max-Tone » *H, some ranking of: {Align-H-Rt-Wd-Rt, Align-H-Left-Wd-Left Align-L-Right-Wd-Right, Align-LLeft-Wd-Left} » *L  A l l ofthe above constraints are required in order that the output form of a trimoraic word have one of the seven attested tonal patterns, regardless of the input form. I summarize again the motivation for each of the above constraints: Mora-Tone is required so that every mora has a tone linked to it. Align-Tone-Left-Mora-Left is required so that no contour tones occur in output forms. GOCP-H(morpheme) is required so that the pattern H L H does not surface. Max-Path-Tone and Max-Tone are required so that there can be some connection between input tonal patterns and output tonal patterns. If these kinds of constraints were not highly ranked, we would expect to get only default tonal patterns that are determined solely by alignment and markedness constraints and not by the nature of the input. The fact that we see seven different tonal patterns surfacing in this dialect can only be explained i f the output patterns had some correlation with input patterns. If alignment and/or markedness were the only factor(s) determining surface forms, we should expect to see only one possible surface form. Constraints such as * H and * L that determine a default tone are required so that the grammar can determine the output for an input that has no tonal features. Relative ranking among alignment constraints is required so that the grammar can determine which tone will surface on a mora that is unlinked to a tone in the input, and where both H and  173 L tones occur in the input. 8.3.1.3 Derivation of outputs for 11th century Kyoto Japanese The hierarchy in (180) will derive the following outputs for the inputs in (165) (181) 1.  p  p  LLL  (*H,Mora-Tone derive a L linked to all morae)  p  p  p  HHH  (*H is violated by any candidate that respects Max-Path-Tone and * H . Align-H-Right will spread H to all moras,  H  L  p  p  p  H L L (See tableau.)  p  H  p  p  H  L  p  p  p  GOCP-H (morpheme)  p  *!  *H  Alignment Constraints  *L  **  **H-RT **II-Lcn *L-Rt *L-I.cfl  *  *  **H-Rt *L-Left  *  *!  *  *H-Rl **L-Left  * •  *!  *  MaxPathTone  MaxTone  H L H  US'  p  p  P  1 \/ H  L  p  p  w  1  H  p  p L  p  \ 1 / H  p  174 4.  u  u  u  H  L  H H L i f A l i g n - H - R t » Align-L-Left H L L of A l i g n - L - L e f t » Align-H-Right  u  u  H  L  u  u  MaxPathTone  MaxTone  u  \ / 1 H  U  GOCP-H (morpheme)  p  5.  u  u  *  *11-Ri **L-Lcfl  *  **H-Rt *L-Left  p  p  I  \ /  p  p  H L L (Max-Path-Tone will disallow any change of tone association)  p  H H H (Alignment constraints will spread H both ways.)  L  1  8.  If.'- j  p  H  7.  *L  L  H  6.  Alignment Constraints  L  1 \/  H  *H  n  |i |i  IL  H  u  p  p  H  L  L H H (Align-H-R-Wd-R will spread H right.)  H H L (Align-H-Left-Wd-Left will spread H left.)  175 9-  p  p  p  I  I  I  p  u  L H L (Max-Path-Tone will disallow any change of tone association)  L H L  10.  p  H H H (Align-H-Left-Wd-Left will spread H left.)  H  11.  p  p  L  p H  L L H i f A l i g n - L - R t » Align-H-Left L H H if Align-H-Left » Align-L-Right 12.  p  p | L  p | H  L L H (Align L-Left spreads L to leftmost mora) (GOCP-H bans HLH)  (184)  p  p L  GOCP-H (morpheme)  p H  MaxPathTone  MaxTone  Align-HR-Wd-R  Align-HLeft-WdLeft  *H  - H:Hc  p  p  \ /  1  L  p  p H  p  *!  p  **  **  **  H L H  13.  p  p  \ /  L  p I  H  L L H (Max-Path-Tone will disallow any change of tone association)  176 14.  u  u  u  L L L (No H to align; Align-L-Rt spreads L)  p  u  L L L (No H to align; Align-L-Rt, Align-L-Left spread L to all morae.)  p  L L L (No H to align; Align-L-Left spreads L to all morae.)  L  15.  u  L  16.  p  p  L  17.  p p \ /  u  L L L (No H to align; Align-L-Left spreads L to all morae.)  18.  p  p  p.  L L L (No H to align; Align-L-Left spreads L to all morae.)  \  /  p  p  19.  u \  I /  L L L (Max-Path-Tone will disallow any change of tone association) -  20.  u p \ / H  p  H H H (Align-H-R-Wd-R spreads H)  21.  p  p  H H L (Max-Path-Tone will disallow any change of tone association)  p  \ / H  22.  p  | L  p p \ / H  H H H (Align-H-Left-Wd-Left spreads H)  177  23.  u  u  I  u \  u  u  H  HHH  (Max-Path-Tone will disallow any change of tone association)  I  u  H H H (See tableau (185) below.)  1  H  u  H  u  u  H  u  25.  (Max-Path-Tone will disallow any change of tone association)  \ /  L  24.  LHH  u  H  GOCP-H (morpheme)  u H  MaxPathTone  MaxTone  *H  Alignment Constraints  * L  * u  u  u  \ 1/ H u  n fi  H  L  26.  *!  **  H  p H  u L  +*Align-H-Ri ** Align-Ii-Lcli *Align-l.-Right *Align-L-LcTl  >  p LLH H  In #26, the constraint GOCP-H prohibits a H L H pattern from surfacing. The following tableau shows how the constraints in (180) will derive the output L L H or H L L in #26:  178 (186) GOCP-H (morpheme)  p  p  p  H  L  H  H  L  H  p  u  p  **  **Ahgn-H-Rt ** Align-H-Lcft *Align-L-Right •Align-L-Left  *L  *  *  *  **Align-H-Left *Align-L-Rt  *  u  *  *  *  **Align-H-Right *Align-L-Left  *  \/  L  u  Alignment constraints  H  p  1  *H  1  L  p  MaxTone  *!  p u p  \ /  MaxPathTone  H  p  \ 1  *  p 1  H  Optimal output: L L H or H L L depending on the ranking among the four alignment constraints. 27.  p  p  L  p  L L L (No H to align; Align-L-Left or Align-L-right spreads L to second mora) L  The above constraints have been proposed only for the purpose of being able to derive all and only all of the seven attested tonal patterns from any possible tonal input. The chart below summarizes for how many of the 27 possible inputs each output pattern is the optimal output.  179 (187)Correlation between input forms and output forms for trimoraic nouns in Tokyo dialect . Output Accent pattern  HHH  HHL  HLL  LHH  LHL  LLH  LLL  TOTAL  Number of input forms  7  2 or 3  2 or 3  2 or 3  1  3 or 4  8  27  Percentage of total input forms  26%  7% or 11%  7% or 11%  7% or 11%  4%  11% or 15%  30%  100%  Percentage rate of occurrence in database of trimoraic words  24%  9%  4%  13%  10%  12%  28%  100%  Listed in the bottom row of the above chart is the actual frequency at which each pattern occurs in the database of trimoraic words in 11th century Kyoto dialect. Even though the constraint hierarchy we developed was one that was required simply to account for the kinds of output patterns that do and do not occur, we find that this hierarchy also predicts a frequency of occurrence for each pattern that correlates closely with actually observed frequencies. That is, if each of the 27 possible input types occurs with equal frequency, then the optimal output for each input type must occur with the same frequency as its input type. A n output type that occurs with eight different input types will occur twice as frequently, for example, as an output type that is optimal for four different input types. This means that a speaker could have a lexicon of trimoraic nouns that is very close to being "unbiased" — that is, each of the possible input types would occur with roughly the same frequency, even though the frequencies of output forms shows the skewed distribution observable in the bottom row of (187). The only ranking of constraints that we did not decide on above was the ranking of the four alignment constraints, since the ranking we choose will have no effect on the possible output types. But as we saw above, for each of two inputs, the ranking will determine which of two possible outputs will be optimal. For example, the following input could surface as either L L H or L H H depending on the ranking of alignment constraints. If a hierarchy were chosen that derived L H H , we would have 3 inputs surfacing as L H H and 3 inputs as L L H rather than 2 and 4 respectively. A ranking that derived L H H for the input from below in (188) would result in a  180 frequency of occurrence for both L L H and L H H that more closely matched actual observed frequencies when there is an unbiased distribution of input types. (188) 11.  p L  p  u H  L L H if A l i g n - L - R i g h t » Align-H-Left L H H if Align-H-Left » Align-L-Right Notice that the former of the two rankings is in accord with the bias in favour of left-to-right association that we find in autosegmental theory, (Goldsmith (1976), Pulleyblank (1986)), along with the marked status of H tone. Deriving an output L L H from the above input would occur with left-to-right association of tones along with the default nature of L tone. In Optimality Theory, this bias towards left-to-right association could be expressed through a tendency for the grammar to rank Align-L-Rt above Align-L-Left in the absence of any counterevidence to this ranking. Such a ranking would result in a convergence of input types that would come closer to allowing an unbiased distribution of input forms. We shall next examine frequencies of accent patterns in modern Tokyo dialect. We shall construct a possible grammar for that language that derives the possible output types. We shall examine how closely that grammar will predict observed frequencies of output types from an unbiased distribution of input types. 8.3.2 Deriving the accent pattern frequencies for modern Tokyo Japanese  In our examination of 11th century Kyoto Japanese, we saw that a skewed distribution of output types can be largely derived through the effects of the grammar, without having to posit a similarly skewed distribution of input types. This is the case for 11th century Kyoto dialect. Let us now examine the statistically skewed accent patterns that occur in modern dialects and try to determine whether they too can be explained through the action of phonological constraints on an unbiased set of inputs. I have chosen to examine trimoraic nouns in modern Tokyo dialect in this regard because (a) it has the largest number of words in Martin's database among modern dialects and (b) it shows substantial over- and underrepresentation of accent patterns, as was shown in (149).  181 8.3.2.1 Review of the accent pattern of modern Tokyo Japanese Following the analysis of Pierrehumbert & Beckman (1988), pitch accent patterns in modern Tokyo dialect are characterized by the following features. 1. A l l accentual phrases have an initial L tone. 2. Phrases that are unaccented (represented thus far as LHH) have a phrasal H tone that occurs on the second mora: i.e. after the initial L. This phrasal H does not occur on the first mora, unlike the H of an initially accented word. In an initially accented word, the first mora is a contour tone consisting of a L H sequence. 3. Accented phrases have a lexical H tone on one of the moras that falls to L immediately afterwards. 4. Tones that lie between any of the initial L, phrasal H , lexical H , and L that follows lexical H , are underspecified on the surface, with their F° values being determined by interpolation. The downdrift that P & B observe in Tokyo utterances is accounted for by the interpolation of F° values between phrasal H and the boundary L of the following accentual phrase. The boundary L tone is completely predictable, occurring in every accentual phrase of Tokyo dialect, It differs in this way from the phrasal H tone, which occurs only in unaccented phrases that lack an accentual H tone. Given these facts, the boundary L is arguably a phonetic effect that belongs to the utterance level rather than part of the word-level, phonological output. Accordingly, I will abstract away here from the completely predictable boundary L tone, and treat it as something that is generated at the utterance level rather than by the word-level phonology. Adopting P&B's proposal of surface underspecification of tones lying between any of the boundary L, the phrasal H , the accentual H and the L that follows an accentual H will give us the following surface representations for each of the four possible tonal patterns for trimoraic nouns in Tokyo dialect: (189) "UNACCENTED" p pp H "INITIALLY A C C E N T E D " H H H I I  HL  182 "MEDIALLY ACCENTED"  II  HL "FINALLY ACCENTED" H H H 00 H L  Recall that a complete set of possible inputs (with respect to linked H and L tones) for trimoraic nouns (given above in (189)) contains twenty-seven possible inputs. Let us first consider, as we did for 11th century Kyoto Japanese, what kind of grammar is required so that only one ofthe four attested tonal patterns can surface, regardless ofthe tonal pattern ofthe input.  From simple observation of these four possible accent types for trimoraic words given in (189), we can make the following generalizations. 1. A l l surface forms have exactly one H tone. 2. If a L tone appears, there is only one, and it always follows a H . 3. There are no contour tones or tones linked to more than one mora. 8 0  The occurrence of only one H and at most one L tone in each accentual phrase is a phenomenon most naturally explained by a version of the OCP. I again employ the constraint GOCP-H, introduced in (179) in the discussion of 11 th century Kyoto dialect, and formulated the same way here. (190) GOCP-H(p)(morpheme): " A sequence of H tones linked to moras must not occur in the output." H(p)...H(p) (191) GOCP-L(p)(morpheme): " A sequence of L tones linked to moras must not occur in the output."  Recall that we have abstracted away from the boundary L tone, which, in our analysis here is not considered to be subject to these constraints. 80  183 L(p)...L(p)  To account for the fact that no mora can be linked to more than one tonal feature I again employ the following alignment constraint repeated from (166): (192) Align(Tone,Left,Mora,Left): "The left edge of every tonal feature is aligned with the left edge of a mora." Conversely, we also find that one tonal feature cannot be linked to more than one mora in the output. (193) No multiply-linked-tones: " A tonal feature may not be linked to more than one mora."  81  The fact that a L tone in the output must immediately follow a H tone is most naturally seen as due to an alignment constraint: (194) Align-L-Left-H-Right: "The left edge of every L tone in the output is aligned with the right edge of a H tone." Constraints (190) through (194) are evidently undominated in the grammar of modern Tokyo. Accordingly, in the derivations that are to follow, I will only consider candidates that satisfy all of these four constraints, and will not list these constraints in the tableaux.  81  This condition can be expressed formally as the conjunction of two constraints.  *Tone: " A tonal feature may not occur in the output." We see evidence of this constraint in the sparseness of occurrence of tonal features in the output in Tokyo dialect, as described by P & B. Tonal patterns are such that only the following tones occur: the accentual H , the L that follows, the accentual H , and the phrasal H . A phrase of three moras that is unaccented will have the second mora linked to the phrasal H and the other moras linked to no tone. Tonal features are only required in Tokyo dialect on moras that have one of these three types of tones. Otherwise, a tone must not occur on that mora. Align-Mora-Right-Tone-Right: "The right edge of a mora must be aligned with the right edge of a tonal feature." If these two constraints are conjoined in the domain of the mora, there will be a violation of both conjuncts on a non-final mora of a group of moras that are all linked to the same tonal feature.  184 Phrasal H only occurs in the unaccented pattern. It always occurs on the second mora, for phrases that have at least two moras. 8.3.2.2 Proposed grammar for modern Tokyo Japanese Faithfulness and pitch fall I will begin constructing a grammar for modern Tokyo with a general hypothesis about faithfulness between input and output patterns of H tones linked to moras: when a linked H tone occurs in the input, that linked H tone will also surface in the output. Unless this H occurs on the second mora, it must occur as an accentual H , with a L tone following it. Thus an input such as (195)(a) will surface as in (195)(b). (195) (a)  (b)  p p p I  H  p p p H  L  I I  A faithfulness constraint will ensure that the H surfaces, linked to the same mora in the output as in the input, and an alignment constraint will ensure that a L will follow the H . The following two constraints will make (195)(b) the optimal output for input (195)(a): (196) Max-Path-H: "For every path of association P from a mora p to a high tone H in the input there exists a path of association P/ from mora to high tone H / in the output such that P corresponds to Pj', Pj corresponds to and H corresponds to H;'." ;  ;  f  ;  ;  (197) Align-H-Right-L-Left (Pitch-Fall): "The right edge of every H tone in the output is aligned with the left edge of a L tone." If a L occurs in the input linked to mora u it can only surface linked to p if it is preceded by a H tone in the output. This is because of the undominated constraint Align-L-Left-H-Right which requires that any L tone in the output be preceded by a H . And we cannot have two L tones in the output because of the other undominated constraint GOCP-L. Thus, the location of occurrence of a L in the output is to a great extent controlled by the location of H . j;  ;  For example, we want input (198)(a) to surface as (198)(b), not (198)(c) or (198)(d), even though both (c) and (d) preserve the underlying location of L and (b) does not:  185 (198) (b)  (a)  (c)  (d)  u u u  u p u(u)  p p u  L H  H L  L H violates Max-P-H  respects Max-P-H  p p p H L violates Max-P-H  A faithfulness constraint for a path of association between a L tone and a mora must therefore be lower ranked than Max-Path-H or Pitch-Fall. Max-Path-L: "For every path of association Pj from a mora p to a L tone Lj in the input there exists a path of association Pj' from mora Pj' to L tone Lj' in the output such that Pj corresponds to Pj', Pj corresponds to Pj' and L corresponds to L '." f  ;  ;  Max-Path-H, Pitch-Fall » Max-Path L  (199) input: u u u  MaxPathH  Pitch-Fall  MaxPath-L  L H p p p  *!  H L u u u  *!  11  L H sss-  *  n n u.GO H L Because it is possible to have an unaccented pitch-accent pattern that has no pitch fall, the constraint Pitch-Fall cannot be ranked too high; otherwise, unaccented forms would never surface. If, for example, Pitch-Fall were ranked above Max-Path-H, a pitch fall would have to occur for every surface form. Accordingly, let us rank Pitch-Fall below Max-Path-H.  186 Let us now summarize the constraint hierarchy ofthe grammar we have developed so far:  GOCP-H(p)(morpheme) GOCP-L(p)(morpheme) Align(Tone,Left,Mora,Left) No multiply-linked-tones Align-L-Left-H-Right »  Max-Path-H » Pitch-Fall, Max-Path L  As our grammar stands, it still cannot decide among possible outputs if more than one mora is linked to a H tone in the input. That it, it cannot determine which mora will be the sole mora linked to the H in the output, as required by "No multiply-linked-tones". There are a number of possibilities for how the grammar might deal with having more than one mora linked to a H tone in the input. One possibility is that an alignment constraint will determine that either the leftmost or rightmost mora linked to a H in the input will have a H in the output; that is, the grammar will attempt to satisfy the constraint Max-Path-H as well as possible, and some other constraint will determine which of the candidates that least violates Max-Path-H will be optimal. A second possibility is that when Max-Path-H cannot be satisfied by any viable candidate (which must be the case i f more than one mora is linked to a H tone in the input) the grammar will revert to choosing some default output pattern. It is also not clear yet what kind of surface form our grammar will derive for an input with no H tones. The constraint Max-Path-H will be of no use in determining which mora gets a H tone for this type of input since there is no H in the input. Again, a reasonable assumption would be that this kind of input surfaces with some default accent pattern. It is also not clear yet just how our current grammar will derive an output with an unaccented pattern, which has no L tone. Our grammar must allow some inputs to surface with no L tone. Moreover, this can only occur if a H tone surfaces on the second mora. The grammar must derive an unaccented pattern from certain inputs, but not from all of them, since there are other possible surface patterns. The question is, what kinds of inputs might derive an unaccented pattern? There is no type of input that has a H tone in a location that only occurs for an unaccented pattern. This is because both medial accent (pHL) and an unaccented pattern (pHp) have a H on the second mora. Given these facts, a reasonable hypothesis is that an unaccented pattern is 82  !  I will use the symbol p here to represent a mora with no tone linked to it.  187 a default accent pattern that surfaces from at least one of the following types of inputs: (a) no H tone in the input and (b) more than one path in the input from a H tone or tones to a mora. For both of these types of inputs we have posited that a default accent pattern emerges as the output. It is not clear whether an unaccented pattern might be the only possible type of default accent pattern but given the fact that it is not unique in terms of the locus of its H tone, it is a likely candidate for a default accent pattern. Let us summarize the kind of grammar we have envisioned so far: (200) If a single H linked to a single mora occurs in the input, it surfaces as an accentual H on that mora. If multiple moras are linked to a H tone in the input, either a default accent pattern emerges, or else an alignment constraint will determine which mora retains its H tone. If no H tones occur in the input, a default accent pattern surfaces. Let us explore the possibility of deriving an unaccented pattern as a type of default accent pattern for trimoraic phrases. Any constraint in the grammar that is to derive a default H tone on the second mora must be ranked below Max-Path-H, so that when there is an underlying H tone in the input, it will not be forced to surface on the second mora. One possible way of deriving the phrasal H of the unaccented pattern on the second mora is through the idea of left-edge invisibility of moras for Foot construction. That is, the unaccented pattern occurs when a Foot is constructed at the left edge ofthe phrase, with the first mora being invisible, or extrametrical. If this Foot is bimoraic ~ a hypothesis for which we have plenty of evidence elsewhere in Japanese (see page 70) — this second-mora position of a phrasal H will be on the leftmost mora of the Foot. In other words, the Foot is trochaic in that the H tone occurs on its leftmost mora. The following constraints express the conditions proposed above. Left-Extrametricality: "The leftmost mora of an accentual phrase must not be a member of a Foot. Foot=Binary: " A Foot must consist of two moras."  Foot=Trochaic: "The left edge of a Foot must be aligned with the left edge of a H tone." The idea here of extrametricality is that the leftmost mora cannot be parsed to a Foot. It is still possible, however, for a tone to link to an extrametrical mora. In the tableaux I will show some candidates that have tones so linked. Given the pervasiveness of foot-binarity in Japanese, (see page 70), I will rank Foot=Binary above Foot=Trochaic. We shall also see that this ranking will allow a final-accented pattern to occur with bimoraic nouns. If the grammar prefers to place a H tone on the leftmost mora of a (trochaic) Foot, with the left  188  mora being extrametrical, then it will prevent a H tone from occurring on the first mora. Since we do find an accent pattern that has an initial H tone, we only want our grammar to enforce leftedge extrametricality in a situation where the unaccented pattern emerges. That is, we only want to force extrametricality for inputs that have no H tone, since this is where we have proposed that an unaccented pattern will occur in the output. This kind of input with no H tone is distinguished by the fact that if a H tone occurs in the output for this type of input, the output will violate the following faithfulness constraint: Dep-H: "For every H tone in the output, there is a corresponding H tone in the input." In effect, our proposed grammar is dealing with an either/or situation. Either we derive an accentual H from an underlying H tone, OR, we derive a default accent pattern with a phrasal H on the second mora and no L tone. It is this kind of tendency — to revert to a default pattern if a constraint cannot be satisfied ~ that can be expressed through local conjunction of constraints. In developing a possible grammar of Tokyo Japanese, I shall propose that this grammar is mediating between opposing forces, each of which tends to pull the output in a particular direction. One force ends to pull the choice of optimal output towards the unaccented pattern, which has two characteristics: (a) the presence of (phrasal) H on the second mora and (b) the lack of a L tone in the output. A n opposing force is to push the output towards an accented pattern, in which a L occurs after H . These two forces interact with faithfulness constraints that seek to have correspondence between the input and output in terms of H and L tones, and what moras they are linked to. In general, we will find that conflicts in the grammar between these opposing forces tend to be expressed as "either/or" situations. The pattern is something like the following: "the output form must conform to force A , but if it cannot, then it must conform to force B." This kind of "either/or" situation can be expressed through local conjunction of constraints in some domain. When two constraints, Q and C are conjoined in domain D , then the resulting conjoined constraint is violated for candidate X iff both C, and C are violated by X in D. 2  2  Let us examine how the conjoined constraint Q & C expresses an "either/or" situation. The conjoined constraint is violated iff both conjuncts are violated. Either Cj or C (or both) must be satisfied. Thus, the constraint tells us to obey Cj OR C . If Q outranks C , the grammar is saying "Obey C,. If you can't, then, as an alternative, you must obey C ." 2  2  2  2  2  Our grammar wants the output form to correspond to the input form in that i f a H tone occurs in the output, there will be a corresponding H in the input. But in cases where that cannot be achieved, the grammar forces an unaccented pattern that has H on the second mora. In other words, the grammar says: "Respect Dep-H, but if you cannot, make the first mora extrametrical, thus forcing the H to occur on the first mora of a trochaic Foot that does not include the first mora of the phrase." <p> (p p) H  189 The following conjoined constraint expresses that condition: Dep-H & Left-Extrametricality: "For every H tone in the output, there is a corresponding H tone in the input A N D the leftmost mora of a phrase must not be dominated by a Foot." When there is a H tone in the input, the first conjunct of this constraint will be respected and Left-Extrametricality will not be forced. But when there is no H in the input, any output with a H tone will have an extrametrical mora at the left edge. High ranking of the constraint Foot=Trochaic will force this H to occur on that second mora of the phrase. A second feature ofthe unaccented pattern is that there is no L tone. To achieve this, the grammar must ban a L tone from the output, but only in cases where an unaccented pattern occurs. Recall that we are proposing that the phrasal H of an unaccented pattern occurs at the left edge of a trochaic Foot whose left edge is separated from the left edge of the phrase by one extrametrical mora, as shown below:  I  H Because left-extrametricality has been enforced, this Foot is misaligned with the left edge of the phrase. That is, this pattern violates the following alignment constraint: Align-Phrase-Left-Foot-Left: "The left edge of an accentual phrase must be aligned with the left edge of a Foot." (ALIGN-LEFT) In order to derive an unaccented pattern, we want the grammar to ban a L tone from the output when Align-Left has been violated. Again, this condition can be expressed through Local Conjunction in the following way: Align-Left & * L : "The left edge of an accentual phrase must be aligned with the left edge of a Foot A N D a L tone must not occur in the output." Because a L tone is banned when there is a non-left-aligned Foot, the grammar must force feet to be left-aligned when there is no extrametrical mora. The simplex constraint Align-Phrase-LeftFoot-Left accomplishes this. The constraint Dep-H & Extrametrical will contribute towards deriving an unaccented pattern when there is no H tone in the input. But it will not force the occurrence of a H tone in the output. It will only require left-extrametricality when a H tone does occur. To ensure that a H tone always occurs in the output, the grammar cannot accomplish this simply  190 by high ranking ofthe constraint Trochaic. Trochaic Feet will not occur when an accentual H occurs on the third mora. That constraint must also be violated when there is an accentual H on the second mora, since the first mora cannot be extrametrical in this case, lest the constraint Align-L & * L incorrectly ban a L tone. The grammar therefore requires a constraint to account for the fact that a H tone always occurs in the output. I propose an undominated constraint that is based on the idea that the accentual phrase has a head mora that is on a path to a H tone. Phrase-Headedness: "Every accentual phrase must dominate a mora that is on a path with a H  Because our account of pitch accent is based on footedness, the grammar also requires an undominated constraint that requires every accentual phrase to dominate a foot. This is a standard assumption, under the idea of properheadedness. In addition, I assume that a tonal feature must be on a path to prosodic structure in the output. Therefore, a tonal feature cannot occur on an extrametrical mora. Let us now summarize the constraint we have proposed for Tokyo pitch accent. Undominated: Phrase-Headedness: "Every accentual phrase must dominate a mora that is on a path with a H tone." (undominated) Foot=Binary  We might consider formulating this constraint as a condition that a Foot must dominate H tone and an accentual phrase must dominate a Foot. If we were to take this approach, however, we must account for the fact that in words with final accent, with a H tone on the third mora, such a Foot could not be left-aligned with the phrase: 83  U  )}phrase  H This Foot would violate Align-Foot-Left and thus ought to incorrectly ban a L tone from the output through the effect of Align-Left & * L . A possible way around this problem would be to have the constraint Align-Left & * L apply in the domain of the morpheme. Since a L tone that follows an accentual H on the third mora must occur on an enclitic for a trimoraic word, its presence would not violate {Align-Left & * L } . m o r p h e m e  191 Max-Path-H Dep-H & Left-Extrametricality Align-Foot-Left-Phrase-Left Foot=Trochaic Align-Left & * L Pitch-Fall Let us first consider a trimoraic input with no H tone. Assuming that the constraint PhraseHeadedness is undominated, the tableau below will only show candidates that have a H tone. Because Foot=Binary is highly ranked, and because no constraint that dominates it might force non-binary Feet, throughout the tableaux I will consider only candidates with binary Feet unless extrametricality makes binary Feet impossible to achieve. The constraint Max-Path H will not be relevant for this input, which has no H tone. To rule out a candidate with a right-extrametrical Foot, I propose a constraint that requires that moras be dominated by prosodic structure. Parse-Mora: "Every mora in the output must be dominated by an accentual phrase."  192 (201)  /p p p/  Dep-H & Left-Extra metrical  H  *!  input:  AlignFootLeftPhrasLeft  Foot= Troch  Parse Mora to Foot  Pitch Fall  AlignLeft & *L  (p p) p • iri :  *!  HL (p p) p  (M-  "  *!  H  1  P H L  B  H  H  I B l i p ' '^^B^^^^^^^H  *  *!  •MHfilHfellfjSI  (p p) p H  • •• H H B I H  *!  ' *  •  '  (p p) p H L  *!  (p p)p#p  *  *  *  eg-  H <p> (p p) HL  *  *  *l  <p> (p p) H  *  *!  *  *!  <p> (p p) HL  <u> (u u)  i»  193 H  *!  *  *!  *  *  (u u) <p> H  •  (u u) <u> HL  *  *!  (u u) <U> H  *  *  <u>(u u) Dep-H & Left-Extrametrical will rule out all candidates except those with a left-extrametrical mora. The remaining candidates all equally violated Align-Left. The constraint Foot=Trochaic rules out candidates with H on the third mora. Align-L & * L will rule out a candidate with a H on the second mora and a following L. this leaves as optimal an unaccented pattern, with H on the second mora and no L . Let us now consider how the above grammar will derive an output for inputs that have a single H tone linked to a single mora. In order to account for the existence of accented patterns, this kind of input should respect Max-Path-H and derive an output with an accentual H on the same mora on which it occurs in the input. In the tableaux below, I assume high ranking of Max-Path-H and, for ease of exposition, will only show candidates that respect this constraint.  194 (202) Dep-H & Left-Extra metrical  input: H  /u  p  u/  AlignFootLeftPhrasLeft  Foot= Troch  AlignLeft & *L  Pitch Fall  ParseMora  *!  H  US'  HL (p p) p *!  H  (u u)  *  <p> *!  HL (p p) <p> H <p> (p p)  *!  H L <p> (u p)  *!  * *  *  *  When the H occurs on the first mora, Max-Path-H will preserve it on that mora in the output and Pitch Fall will require that a L tone occur after the H . Dep-H is not violated since there is a H in the input. The Foot must be aligned left since a left-extrametrical mora cannot bear the H tone. A H on this mora will satisfy Trochaic. Align-L * L is respected because the Foot must be leftaligned.  195 (203) input: H /p p p/  Dep-H & Left-Extra metrical  AlignFootLeftPhrasLeft  Foot= Troch  AlignLeft & *L  Pitch Fall  H (p p) p  *  *!  H (u p) <P>  *  •*!  Parse Mora  *  *  H L (M- H H L (p p) <p>  *!  H <p> (p u)  *!  HL <p> (p u)  *!  ^jfMllsilllllf  ** >*  ****  When the H occurs on the second mora, an extrametrical mora on the left is possible, but not required, since Dep-H is satisfied. Align-Left will rule out candidates with a left-extrametrical mora. A l l remaining candidates will equally violate Foot=trochaic since the H will occur at the right edge of the foot. Pitch Fall will require that a L occur after the H . Again, we get an accented pattern.  196 (204) input: H /u p p/  ^ _ _ _ ^ Dep-H & Left-Extra metrical  AlignFootLeftPhrasLeft  =  =  Foot= Troch  AlignLeft & *L  *  H (p p) p  Pitch Fall  Parse Mora  *!  * H L (u u) p#u H <p> (p p)  *!  H L <p> (p p)#u  *!  *  >*-  H (p u) <p>  *  H L (p p) <p>  *  *  *  *  *!  *  *!  When the H occurs on the third mora, the story will be similar. The optimal candidate has a leftaligned Foot; Trochaic will inevitably be violated, and a L after the H will respect Pitch-Fall. In summary, whenever a singly-linked H occurs in the input, the output will have an accented pattern with the H on the same mora as in the input.  The second possible situation in which we may want the grammar to derive an unaccented pattern as a default accent pattern is when multiple moras are linked to H tones in the input. If we were to derive an unaccented pattern in this case, we want the grammar to say: "respect Max-Path-H, but if you can't, revert to a default pattern. We can again express this either-or condition through local conjunction. Here, C„ the first conjunct of the relevant constraint is Max-Path-H, and C is Left-Extrametricality, which derives the default unaccented pattern. 2  Max-Path-H and Left-Extrametricality "For every path of association P, from a mora \i to a high tone Hj in the input there exists a path {  197 of association Pj' from mora Uj' to high tone Hj' in the output such that P corresponds to Pj', p corresponds to Pj' and Hj corresponds to Hj' A N D the leftmost mora of the accentual phrase is not parsed by higher prosodic structure" ;  ;  We also need a second constraint in this case to ensure that the H tone ends up on the second mora. In the case of an input where there was no H tone, the constraint Max-Path-H would have no effect in determining the position of a default H tone. But in this case, the constraint Max-Path H would affect the placement of a H for inputs such as the following:  p pp H H For this input, a candidate with a H on the second or third mora would better satisfy Max-Path-H than a candidate with a H on the second mora. This would incorrectly derive an unattested pattern with H on the first or third mora and no L tone, since the constraint Align-L & * L would ban a L tone from the output. As a result, to derive an unaccented pattern here, the grammar also requires that we have a trochaic Foot. The following conjoined constraint accomplishes this:  Max-Path-H and Foot=Trochaic: "For every path of association P; from a mora Pj to a high tone H in the input there exists a path of association Pj' from mora p,' to high tone Hj' in the output such that Pj corresponds to Pj', p corresponds to \i- and Hj corresponds to Hj' A N D the left edge of a Foot must be aligned with the left edge of a H tone." (  (  In order to ensure that the H tone surface on the second mora, Max-Path-H & trochaic must be ranked above Max-Path-H. In addition, we shall see when we examine bimoraic nouns that this type of input must surface with an unaccented pattern lest the unaccented pattern not occur at all for bimoraic nouns. To ensure this, the grammar must allow the leftmost mora to be extrametrical for bimoraic nouns in this case, violating Foot binarity,. Accordingly, I propose that Max-Path-H & Extrametrical be ranked above Foot=Binary. We do not necessarily need an unaccented pattern to emerge from both of the following two types of input: (a) no H tone in the input (b) multiple moras linked to H tones in the input. But we need at least one type of input to surface as an unaccented pattern. It is possible that there is more than one possible default accent type and that different types of default patterns will surface under different conditions. One possible grammar is one in which both a "no H tone input" and a "Multiple H tone input" will surface as unaccented, but this is not the only possible grammar.  198 For the time being, let us look at what the constraint hierarchy would look like for a grammar that derives an unaccented pattern from both a "no H tone input" and a "multiple H tone input." Since H linked to multiple moras cannot occur in the output, the constraint Max-Path-H must be violated by any viable candidate. This in effect means that with high ranking of Max-Path-H & Extrametrical and Max-Path-H & Trochaic, this kind of input must surface with an extrametrical mora at the left edge and a trochaic Foot: <p> (p p) H The lower-ranked simplex constraint Max-Path-H will have no effect, since the position of the H Tone is determined by Max-Path-H & trochaic. The constraint Dep-H & Left-extrametrical will have no effect since there is a H in the input. Align-Left must be violated by one mora by all remaining candidates since the extrametrical mora prevents left-alignment. The constraint Align-Left & * L will ban a L tone from the output. Thus the winning candidate will have an unaccented pattern.  199 (205) input: H linked to multiple moras  MaxPth-H &LExtra  MaxPth-H & Troch  Foot =Bin  MaxP-H  DepH& LeftExt  AligL  Foot Troc h  AlignLft& *L  Pitch Fall  H /|\ p u p H A  p p p H A  u u u  H (p p) p  *!  *„or ,**  HL (p p) p  *!  * or **  H  *!  * or **  *  *!  * or  *  H L (p u) p  Ciilliil  H  *!  *  * or **  *  H L (u p) P#u  *!  *  * or **  *  US'  H <u> (p p) HL <p> (u p) H <u> (u u)  *  *!  * or **  *  * or **  *  * or **  *  *  *! *  *  200 *!  HL <u> (u u) HL (p u) <p>  *!  * or **  *  *  *  * or **  Let us now examine, as we did for 11th century Kyoto Japanese, which of the 27 possible input patterns will derive which output patterns. The following 8 input patterns with no underlying H tone will all surface with an unaccented pattern. ppp upL uLp uLL Luu LpL LLp LLL The following 4 input patterns with a single H on the first mora will surface with a H L p pattern: Hpp HpL HLp HLL The following 4 input patterns with a single H on the second mora will surface with a p H L pattern: uHu pHL LHp LHL  201 The following 4 input patterns with a single H on the third mora will surface with a ppH(L) pattern: ppH pLH LpH LLH The following 7 input patterns with multiple moras linked to H tones will surface with an unaccented pattern. HHL HHp HpH HLH pHH LHH HHH  The following table summarizes the number of inputs that derive each of the four possible output patterns. Since data on Tokyo pitch accent indicates that the presence of a voiced obstruent can have a strong effect on frequencies of some accent types, I consider here frequencies for nouns with no voiced obstruent present in the word.  202 (206)Convergence of input types on output accent patterns for trimoraic nons in Tokyo dialect Accent pattern  Inputs that derive this pattern  Number of inputs that derive this pattern  Percentage of inputs that derive this pattern  Percentage occurrence in Tokyo dialect (no voiced obstruent present)  Unaccented  No H : (8) uup, ppp, upp L L ppp, ppp, ppp LL L L L ppp, ppp, LL L L L  19  55%  .62%  4  15%  15%  u p p | H  Multiple H : (7) upp, pup, ppp HH HHL HH puu, ppp, pup LHH HHH H H uup HLH  Initial accent:  Single H on p i : (4)  u u p 11  H L  uup, ppp, ppp H HL H L uup HLL  203 Medial Accent  H on middle mora: (4)  Ll  LILILI, LILILI,  Li Ll  4  15%  7%  4  15%  15%  HL LHL H L 1W,IW H LH  Final Accent LI  Single H on p3: (4)  p p (p) LILILI, LILILI, ULILI  H  L  H L H LH LILILI  LLH We find that apart from the fact that medial accent occurs in attested forms with only half the frequency that this grammar would predict, the frequency of each accent pattern that our grammar would derive from an unbiased set of inputs is fairly close to the actually observed frequencies. In part, the reason for this is that we have chosen a grammar that derives an unaccented pattern as the default pattern that emerges from two types of inputs: (a) inputs with no H tone (8 possible inputs) and (b) inputs with more than one mora linked to a H tone (7 possible inputs), for a total of 15 ofthe 27 possible inputs. But this is not the only option for a grammar. The grammar could for example, derive some different default configuration for one of these types of inputs. We do, however, need to derive an unaccented pattern from at least some type of input in some natural way. It could be possible to have a grammar that derives an unaccented pattern from one ofthe two input types and have the other type surface with an accented pattern that is determined by, say, an alignment constraint. Suppose that it were instead some alignment constraint that decide the optimal output. For example, consider the following input, with a possible L tone on the first mora: p p p V  H  204 Suppose that the following alignment constraint determined the optimal outputs: Align-H-Left-Phrase-Left: "The left edge of every H tone in the output must be aligned with the left edge of an accentual phrase." If this alignment constraint were ranked just below Max-Path-H, then the optimal output would have a H linked to the second mora. (207)  ^ ^ Max-path-H  input:  =  ^ _ ^ ^ = ^ _ ^ _ ^ = ^ =  Align-H-Left  Pitch-fall  a  Default-H  p u p or p. u p V  H  | V  LH  uup H  **!  *  ppp HL  **|  *  puu H  *  *  *  *  Kg"  *!  PPP HL  >  ppp H  *  *  ppp#p H L  *  *  This grammar would give us even more inputs that derive medial accent: in addition to the 4 with the second mora as the only mora linked to a H tone, we would also have the two inputs in (207) deriving a medial accent pattern. Given the fact that there are a number of different possible grammars that could derive all four of the attested accent patterns for trimoraic words, how does a speaker choose the correct grammar? As was the case for 11th century Kyoto Japanese, I again propose that when a speaker has a choice among possible grammars for deriving output types from a possible set of inputs, they will move towards a grammar that allows the speaker to have an unbiased set of input types. For  205 example, when faced with the several possible grammars we have proposed for deriving surface pitch accent patterns in trimoraic nouns in Tokyo dialect, the speaker will choose a grammar that enables them to have a more even distribution ofthe 27 possible input types. If a speaker has the capability of doing this, it will mean that they can optimize their lexicon with respect to statistical frequencies of patterns. It will allow them to have a lexicon in which input types are less statistically skewed than would be otherwise, and it will be just the output types that are statistically skewed. The mismatch in statistical frequency between input types and output types will be due to the effects ofthe grammar: a hierarchy of phonological constraints that derives output forms from input forms. This grammar will act in such a way that more than one output form may be derived from the same input form. This will have the result that some output forms will occur more frequently than others, even i f every input form occurs with the same frequency. We have posited a complete set of mathematically possible input types for trimoraic nouns and made the hypothesis that a speaker can move towards choosing a grammar in which each of these input types can occur with roughly equal frequency. Such a grammar will derive optimal output types from each of the input types in such a way that some output types occur for more than one input type. Each output type Sj will occur for some number F of input types. If there are q input types, then the number of input types F that derive output pattern Sj will make up F / q X 100% percent of input types. This ratio corresponds closely to the actual observed frequency of S for each output type. In other words, it is solely the effects of the grammar that causes the skewed frequencies of output types, not any skewing of frequencies of input types. ;  f  i;  The grammar that we have proposed for deriving surface pitch-accent patterns of trimoraic nouns is one that makes an unaccented pattern a default pattern, which surfaces for a large number of input types. Recall that when we examined the correlation between pitch-accent types in 11th century Kyoto Japanese and those of modern Tokyo Japanese, (see tables (161), (162)) we found that most 11th century patterns evolved into an unaccented pattern in modern Tokyo at a substantial rate. This suggested that the reason why so many nouns converged on an unaccented pattern in modern Tokyo dialect was that the grammar evolved to one that preferred an unaccented pattern with a single H tone on the second mora and no L tones. Such a grammar would bias a speaker towards reanalysing an 11th century pattern with that OHO pattern at a time when accent patterns were changing. This kind of historical evidence for the development of a grammar that favoured an unaccented pattern is further support for the hypothesis that modern Tokyo speech has developed a grammar that derives an unaccented pattern from a number of possible inputs, rather than simply having a preponderance of unaccented forms as input forms for some mysterious reason.  We have examined which possible grammars can derive the attested output accent patterns of trimoraic nouns. Let us next examine how the same type of grammar would determine surface  206 accent patterns for bimoraic nouns.  8.3.2.3 Deriving accent patterns for bimoraic nouns in Tokyo dialect The following chart summarizes the frequency of each of the three possible accent patterns for bimoraic nouns in modern Tokyo dialect. The data on which these statistics are based is taken from the same database of Yamato nouns in Martin (1987). (208)Frequencies of accent patterns for bimoraic nouns in Tokyo dialect TOKYO: BIMORAIC WORDS  total  unaccented  initial accent  final (LHH(L)) accent  no voiced obstruent  462  119 26%  183 40%  160 34%  voiced obstruent  145  45 31%  78 53%  22 15%  TOTAL:  607  164 27%  261 43%  182 30%  (209)Frequencies of accent patterns for bimoraic nouns in Tokyo dialect: borderline compounds excluded TOKYO: BIMORAIC WORDS: NO BORDERLINE COMPOUNDS  total  unaccented  initial accent  final (LHH(L)) accent  no voiced obstruent  372  93 25%  147 40%  132 35%  voiced obstruent  101  35 35%  51 51%  15%  128 27%  198 42%  147 31%  TOTAL:  473  207  (210)Frequencies of accent patterns for bimoraic nouns in Tokyo dialect: borderline compounds and words with uncertain etymologies excluded TOKYO: BIMORAIC WORDS: further exclude "etymology uncertain"  total  unaccented  initial accent  final (LHH(L)) accent  no voiced obstruent  320  78 24%  127 40%  115 36%  voiced obstruent  81  29 36%  40 50%  12 15%  TOTAL:  401  107 27%  167 42%  127 32%  The three tables above show the frequency of each ofthe three possible pitch-accent patterns for bimoraic nouns in modern Tokyo dialect. The three charts differ in terms of the criteria for admitting a noun from Martin's (1987) database of Yamato nouns to the corpus of nouns being considered here. In the second ofthe three tables, we exclude words that existed as compounds at an earlier stage ofthe language according to the etymology given by Martin and in the third, we further exclude nouns whose etymology is uncertain. We find that the initially accented pattern is slightly overrepresented at about 42% occurrence, with the unaccented pattern and the final (LHH(L)) accent pattern occurring at a little less than a statistically normal 33% each. Let us examine how the grammar we have developed for deriving possible surface pitch accent patterns will determine output patterns for bimoraic phrases. The following are the 9 mathematically possible inputs for bimoraic nouns that allow single tones linked to moras.  208 (211) Ll  U  L  Ll  Li  L p  Li  H U  Li  H M- ^ \/ L Ll  Ll  \/ H Ll  Li  L  H  H  L  (212) Let us now examine how each type of input will fare under the kind of grammar we have proposed.  209 When there is a singly-linked H tone, the derivation will be very similar to what it was for this kind of input with a trimoraic noun. Here, as we see in (213) and (214), we will have a bimoraic Foot, and because of Max-Path-H constraints, we will have a H tone linked to the same mora as it was in the input. The constraint Pitch-fall will require a L tone after the accentual H tone. (213) input:  H /p p/  MaxPthH& LExtra  MaxPth-H & Troch  Foot =Bi n  MaxP-H  DepH& LeftExt  Alig -L  Foot Troc h  AlignLft& *L  *!  H (n  Pitch Fall  u  )  1®=  HL (n n )  *  *  *  *!  H (n n)  H L (H H) H <n>  *!  * •  lIlfililB'  *!  •  00  H L <p> (p)#p  *!  H L <p> (p#p) H <H> 00  *!  *  *  *  *  * -  210 (214) input:  H  MaxPth-H &LExtra  MaxPth-H & Troch  Foot =Bin  MaxP-H  /p p/ H (u u)  *!  HL (u u)  *!  *  DepH& LeftExt  Alig n-L  Foot Troc h  ISBjIil  AlignLft& *L  Pitch Fall  -  *  H  *!  * H L (u u) H <u> (u)  *!  H L <p> (p)#p  *!  *  ytiillBBBlli  *  * *!  H L <u> (u#u) H <u> (u)  IllSlili  *!  *  *  *  *  When there is no H tone in the input, however, the story will be different from what it was for trimoraic nouns. This is because of high ranking of the constraint that requires Foot binarity, which we have proposed because of evidence elsewhere in the language of a strong tendency towards having bimoraic Feet. If Foot=Binary outranks Dep-H & Extrametrical, then an extrametrical mora will be disallowed for inputs with no H tone. This is because to have an extrametrical mora for a bimoraic noun would make Foot binarity impossible. Therefore we must have the following prosodic structure in the output, even though all viable candidates with this structure violate Dep-H and Left-Ext. (u P) The constraint that requires trochaic Feet will place a H tone on the leftmost mora. The constraint Align-L & * L will be satisfied because the Foot is left-aligned. The constraint Pitch-fall will require a L after the H . Thus, the grammar will derive an initially accented pattern for this input.  211 (215) input:  Ma X-  no H tone /p p/  PthH& LExtr a  MaxPth-H & Troch  Foot =Bin  MaxP-H  DepH& LeftExt  Alig n-L  Foot Troc h  AlignLft& *L  *  H (H H)  Pitch Fall  *!  *  Eg-  HL •(JA  lO  H (P P) H L (P P) H  *!  *  *!  *  *! *  *  <"> 00 H L <p> (p)#p  *!  *  *!  H L <p> (p#p)  *!  H  *.  *  *  *  <n> 00 H  L  *!  *  <n> Oi)  For inputs with a multiply-linked H tone, we must derive an unaccented pattern, since no other inputs will derive this pattern, which is attested for bimoraic nouns. This accounts for why we have proposed ranking of the two Max-Path-H conjoined constraints above Foot=Binary, which will be violated for bimoraic nouns for which we have posited an extrametrical mora.  212 (216) MaxPth-H &LExtra  input: H A  MaxPth-H & Troch  Foot =Bin  MaxP-H  DepH& LeftExt  Alig n-L  Troc h  *!  *  HL  *!  >  H (H H)  *!  *  H L (LA Ji)  *!  * *  Pitch Fall  -  H (u P)  *  AlignLft& *L  Foot  *';  *  *  H <ll> (H) H  *!  *  *!  *  *  -  -  <"> 00 H <Ll>  L 00  T  *  *  lBll!l::SJ  213 (217) input: with enclitic ga H A /u u/#/u/  MaxPth-H &LExtra  MaxPthH& Troc h  Foot  MaxP-H  Bin  H (u p)#P  *!  *  HL (u u)#p  *!  *  H (u p)#p  *!  H L (p p)#p  *!  US'  *  •BallSI  DepH& LeftExt  Ali g-L  Foot Trch  *  *  **  -  *  *  *  *  AligLft& *L  Pitch Fall  *  H <u> (u#u) H L <u> (u#u)  *!  Let us now summarize the optimal outputs derived from the 9 possible input types for the modified grammar we have proposed.  214 (218) INPUT  OPTIMAL OUTPUT  u u  V-  u  H L u U  M- u  L  H  L  p p  p  p  \ 1  | | H L  L  u u  L  H L  u p  u  H L  H L  u u  u u  H  H L  u u  u  H L  L H u u  p,  H u \ / H  p  u-GO  u(u) 1 1 1 1  H L p  u | H  The following table summarizes the number of occurrences of each optimal output that occurs in the above tableaux for each of the nine possible inputs:  215 (219)Convergence of input types on output accent patterns for bimoraic nouns in Tokyo dialect Output pattern  Number of Occurrences  Percentage of total  Percentage frequency of occurrence in database of bimoraic nouns (no voiced obstruent present)  u P- GO  1  11%  25%  6  66%  40%  2  22%  35%  9  100%  100%  H (unaccented) H M- 00 H L (initially accented) H V- 00 H  L  (finally accented) TOTAL 8.3.3 Conclusions Although the frequency of occurrence for each pattern when derived from an unbiased set of inputs does not exactly match the actual frequency of occurrence among bimoraic nouns, our grammar does correctly predict that the unaccented pattern should be the least common pattern among attested bimoraic nouns and that the initially-accented pattern should be the most common pattern. We saw that among trimoraic nouns, the unaccented pattern is by far the most common whereas for bimoraic nouns the unaccented pattern is the least common. These differences in observed frequencies of accent patterns between bimoraic and trimoraic nouns are accounted for by the fact that trimoraic nouns can have an extrametrical mora without violating Foot binarity, whereas bimoraic nouns cannot. Under the assumption of Optimality Theory that all input types are possible, we have developed  216 a grammar that will derive attested accent patterns, and only attested patterns, for bimoraic and trimoraic nouns in Tokyo dialect. We have seen that when we construct a grammar that is intended to derive attested surface patterns and rule out unattested ones, we also find that such a grammar will, to a considerable extent, be able to account for apparent skewing of frequencies of surface types. In constructing a possible grammar for deriving pitch accent patterns of nouns in modern Tokyo dialect, we made some choices that seemed the only logical option: for example to have inputs with single H tones linked to a single mora surface with an accentual H on the same mora. In other cases, we had possible options to choose from: for example in determining how a grammar might derive an output form from an input form with multiply-linked H tones. It might be possible, for example, to have a grammar that derived an initially-accented pattern rather than an unaccented pattern for such an input type of trimoraic nouns, but such a result would incorrectly predict that initial accent occurs more frequently than it does for attested surface forms of trimoraic nouns. The question is, can a speaker be sensitive to frequencies of surface forms in choosing among possible grammars, and would they be more likely to choose a grammar that not only correctly derives attested surface patterns, but also accounts for frequencies of surface types?  Consider a very simplified, hypothetical example ofthe kind of phenomenon we have examined. Suppose that a language had nouns that were only monomoraic and that can have one of two possible surface forms with respect to tone: either a H tone or a L tone. Suppose also that twothirds of monomoraic nouns have a L tone and one-third have a H tone. If there are three possible input forms, (a) H tone, (b) L tone, and (c) no tone, a natural hypothesis is that nouns have a H tone underlyingly surface with a H tone and nouns that have a L tone underlyingly surface with a L tone. Nouns that have no underlying tone could surface with either H or L depending on the grammar. How do speakers determine how an input with no tone will surface? If they posit a grammar in which an input with no tone surfaces with a H tone, and i f surface forms with a H tone have an equal probability of having an input form with a H tone as with no tone, then onesixth of input forms will have no tone, one-sixth will have a H tone, and two-thirds will have a L tone. This is a rather biased distribution of input types. On the other hand if the speaker were to posit a grammar in which in input with no tone surfaces a L , then we ought to get a roughly equal number of each type of input form: one-third with no tone, deriving a L tone, one-third with a L tone, deriving a L tone, and one-third with a H tone, deriving a H tone. Clearly the second choice of grammar will allow a far less biased distribution of input types than the first choice. We cannot answer here the question of whether a speaker would be aware of statistical frequencies in a way that would enable them to choose the second grammar, but the question is an important one, and warrants further investigation, because on it hinges the question of the nature of the lexicon. In a model of grammar like Optimality Theory, we would assume that inputs should have a random distribution of types. When we see an apparent bias of surface types for underived lexical items, we need to investigate the question of whether this apparent bias  217 reflects a true bias in the lexicon or whether speakers can choose among possible grammars in such a way that this degree of bias is minimized. 8.4 The question of lexicon optimization and rendaku voicing revisited We have examined the matter of skewing of frequencies of accent types that occur for pitch accent patterns of Yamato nouns in several dialects of Japanese. We have shown that a grammar that is necessary for deriving possible accent patterns can also, to a considerable extent, account for statistical frequencies of these patterns. In order to do this, we have made the necessary assumption that it is possible to have more than one possible input type for one type of output. This means that we have not adopted the principle of Lexicon Optimization (Prince & Smolensky (1993:196)) with respect to the occurrence of H and L tones on moras in input forms of nouns. If Lexicon Optimization were to occur, then we would have at most one input type for each output type, culling out input forms that are less harmonic and/or input forms that have more input material than others. As a result, we would need to posit skewing of frequencies of input types in order to account for skewing of frequencies of output types. On the other hand, our account of rendaku voicing and rendaku blocking in chapter 3 rests on the hypothesis that most Yamato nouns with initial voiceless obstruents are underspecified for the voicing feature on the initial obstruent. Blocking of rendaku in noun-noun compounds occurs when there is a [-voice] feature present in the input, either linked to the feature tree of the initial obstruent or occurring as a free feature added to the lexical listing for the compound. It should be noted that positing an initial obstruent underspecified for voicing on nouns that usually undergo rendaku does not necessarily require us to posit Lexicon Optimization. Let us look again at how the lexical listing will occur for a compound that undergoes rendaku voicing. The compound word hana-gami "tissue paper, Kleenex" is derived from nouns hana "nose" and kami "paper." Let us suppose that no Lexicon Optimization occurs for noun kami and that either of the following two inputs are possible to derive its surface form. (220) kami | [-voice]  Kami [no voicing feature present on /K/]  Either input form will correctly derive the output form of kami, as we see below.  218  (221) input: kami  Max-Path[-voice]  Obs-Voi  *!  *  Max-Path[-voice]  Obs-Voi  kami garni (222) input: Kami kami garni  *!  When kami occurs as the second member of a compound word like hana-gami, the lexical listing of the compound will consist of pointers to each of the lexical listings for hana and kami, as we proposed on page 21. It is now no longer the case that either of the two possible input forms of output kami given in (220) will derive a correct output — in this case, for the compound. To correctly derive voiced compound hana-gami, the input form of simplex noun kami must be /Kami/, with an initial obstruent that is unspecified for voicing. If the input form were /kami/ with a specified voiceless obstruent, we would incorrectly get blocking of rendaku, according to our account developed on pages 56ff. This choice has nothing to do whether the word kami undergoes Lexicon Optimization. If it does not, then the reason why the voiced /g/ in the output form of compound hana-gami is derived from an underspecified velar obstruent in the input is because the lexical listing of compound hana-gami is {-[hana] + - [Kami]} and not {-[hana] + -[kami]}. This of course raises the question of why, under our hypothesis, the majority of compounds have entries that reference nouns with an underspecified initial voiceless obstruent rather than a specified one. This fact can be seen as due to the proposed origin of compound words in Japanese as phrases with genitive particle no, as discussed on page 84. If hana-gami is an old compound word, it could have originated as hana-no kami > hana-n kami > hana-n garni, with prenasalization triggering voicing on the velar consonant. At some stage in this transition, when speakers adjusted their grammar to derive rendaku from a listing of a compound word, they would choose {-[hana] + - [Kami]} as an input form. This is not a choice based on Lexicon Optimization: it is the only choice of input that would match the correct output form under the grammar we proposed for deriving rendaku voicing. If we adopt this hypothesis of underspecification of the initial obstruent in the listing of  219 compound words that undergo rendaku, we need to ask: in what sense does this imply that the lexical inventory of nouns that are eligible for rendaku voicing listings is biased? Under our hypothesis of how rendaku voicing is derived, we find far more noun-noun compounds with an initial obstruent on N2 that reference a listing of N2 with an underspecified obstruent than a specified one. If this constitutes a lexical bias, it is a different kind of bias than would be the case i f simplex Japanese nouns did not form compound words with rendaku voicing and i f the majority of listings of simplex nouns only had inputs with an underspecified obstruent. To correctly derive cases of rendaku voicing, the input form of a compound that voices must reference the listing of N2 that has that obstruent underspecified. This is the only possible input, unlike cases where there is more than one possible input for the same output form. There is what looks like a lexical bias in the sense that we get far-more compound listings that reference an N2 with an underspecified obstruent than not. What is paradoxical about this fact is that the greater the bias in this direction, the more regular is the phonological process of rendaku voicing. A n "unbiased" situation with half the inputs having a specified obstruent on the referenced N2 would cause rendaku to occur 50% of the time, but this is irregular as far as a productive grammatical process is concerned. 84  Because input forms can be calculated both from simplex morphemes (e.g. nouns that occur in isolation) and derived forms such as compounds, compounds provide evidence for input forms of nouns that their simplex forms do not provide. As a result, we see skewing of input forms for nouns in order to correctly derive the productivity of rendaku voicing. Why do we get this overrepresentation of inputs forms with an underspecified obstruent? In the account we have developed, lexical irregularity (a bias of input types with respect to voicing  For nouns (with initial voiceless obstruents) that participate in compound words, we have proposed here that a lexical bias occurs inasmuch as most of these nouns have an underspecified initial voiceless obstruent. But for nouns that do not participate in compounds, there is no reason for this bias to occur, since it will have no effect on maintaining the regularity of rendaku voicing. Ideally, we could test this prediction by examining the frequency of initial voiced obstruents among nouns with initial obstruents that participate in compounds versus nouns with initial obstruents that do not. 84  If this set of nouns are unbiased with respect to the nature of the voicing feature on an initial obstruent, we should expect one-third of each type: (a) specified voiceless initial obstruent; (b) unspecified initial obstruent, (c) specified voiced initial obstruent. Unfortunately, this picture is clouded by the fact that Yamato nouns with initial voiced obstruents are extremely rare, as shown in Martin (1987)'s database of Yamato nouns.  220  features) is tied to grammatical irregularity (the fact that the process of rendaku occurs with less than complete regularity.) If rendaku had developed as a completely regular process, then the grammar would not have to rely on lexical prespecification to derive exceptions to voicing. In such a case, it would be possible to have a grammar that derived voicing regardless of whether the value for voicing on the initial obstruent of a noun was [-voice] or unspecified for voicing. It would then be possible to have an equal number of inputs with an unspecified voicing feature as compared to inputs with a [-voice] feature. One way of seeing this apparent bias is that it reflects the regularity of the voicing process that occurred historically after N C clusters. The way rendaku voicing evolved, its realization as a grammatical process required both the right grammar to derive voicing and the right choice of input form for a compound that voices. The way we have accounted for rendaku voicing, it is both due to a phonological process and due to a kind of lexical bias towards input forms that reference an N 2 with an underspecified obstruent. In this sense, the 7 5 % rate of voicing is what we would expect: we would expect a 100%o rate if rendaku were solely due to phonological processes; we would expect a 5 0 % rate i f it were entirely due to lexical patterns. The frequency we get is in between these two values.  The account of rendaku we have developed will conclude that this bias is due to the way a historical change interacted with irregularities that occurred as a result of sociolinguistic factors: i.e. the way in which rendaku developed from nasal-consonant clusters after the genitive particle no was truncated to a mora nasal in certain frequently pronounced phrases. This historical explanation for bias in the lexicon with respect to underspecification of the voicing feature will not allow us to posit a grammatical explanation for lexical bias, the way we did for pitch accent patterns. The two kinds of lexical bias are fundamentally different in how we consider them to have evolved. Lexical bias in the input forms of compound words developed, in our account, as a result of the way the language adapted to the change of phrases like hana-no kami to compound words like hana-gami. Rendaku voicing is partly due a bias in input forms towards underspecification of voicing. Pitch accent patterns, on the other hand, reflect, in our account, the effects of the grammar on an unbiased lexicon. The grammar prefers certain output patterns in that they are derivable from a greater number of possible inputs than other patterns. The following table summarizes the contrast between the two kinds of bias.  221 (223)Bias in input and output forms for Rendaku voicing and pitch accent Phenomenon  Bias in input forms  Bias in output forms  Historical effects  Rendaku Voicing  most have underspecified initial voiceless obstruent  most show voicing  Historical factors would cause a bias among input forms to choose N2 underspecified for voicing.  Pitch Accent in Yamato Nouns  none  some patterns overrepresented  Historical factors would cause the grammar to prefer certain kinds of output forms: e.g. with a default accent pattern or accentual H aligned in some way.  some patterns underrepresented  Rendaku voicing, by our account, is blocked by lexical prespecification of a [-voice] feature that is underspecified in non-blocking cases. But this prespecified [-voice] feature only occurs in a minority of cases. This is a kind of lexical bias in that in an unbiased set of inputs we would expect listings of compounds with references to a N2 with a specified input to occur as often as those with an underspecified one. This kind of bias can be seen as a result of historical effects.  222 9. Conclusions We have examined evidence of statistical skewing of pitch accent patterns in underived nouns in various Japanese dialects. We have seen that in most dialects, some patterns are overrepresented and others are underrepresented. If these surface patterns were a direct reflection of underlying forms, we would be witnessing a skewing of input types that departs from the randomness and statistically normal distribution of input types that we should expect to see in the lexicon. In Optimality Theory, where there are no constraint on input forms, any bias towards certain types of input forms is something that is not predicted by the theory and needs to be explained if the theory is to be maintained. I have argued that the apparent skewed distribution of input types that we observe in Yamato pitch accent patterns of nouns can, in many cases, be accounted for by the effects of the grammar: that surface types that are overrepresented occur as a convergence of several input types on one output type. This convergence is directly due to a grammar that permits only a limited range of surface types in comparison to the number of mathematically possible input types. For modern Tokyo dialect, for example, a trimoraic noun has only four possible arrangements of H and L tones on the three moras, whereas there are twenty-seven possible arrangements of H and tones if we were to allow any mora to be linked to either a single tone or to no tone. If input types can all occur with equal frequency, then output types that are overrepresented are naturally explained through this convergence brought on by the grammar. I have argued that the kinds of grammar that would effect this kind of convergence is required anyway to account for the kinds of surface types that are possible or not possible. This account still does not rule out the possibility of historical factors causing skewing of input types. We have shown, however, that skewing of surface types in Tokyo dialect cannot be solely due to an inheritance of a skewed pattern from an earlier stage of Japanese. We saw evidence from a comparison of pitch accent patterns in modern Tokyo with cognates in 11 th century Kyoto Japanese that several accent types in the earlier dialect tended to converge on default types in modern Tokyo dialect such as the unaccented pattern for trimoraic nouns. This suggests that the grammar must have evolved in such as way as to cause this convergence. That is, a grammar that preferred an unaccented pattern for trimoraic nouns would make a new generation of speakers likely to interpret several accent patterns from the earlier stage as an unaccented pattern in their dialect. This evidence does not mean that historical factors had no effect on the skewing of pitch accent patterns in modern Tokyo dialect but it does show that the grammar has had a stronger effect on skewing in this case than historical factors did. Exactly why the grammar might have developed in this way remains a mystery. Recall that in proposing a grammar for deriving modern Tokyo pitch accent, we had to choose among several different alternatives for what a default accent pattern might be and from what kinds of input a  223 default pattern might emerge. Although we did not make an analysis of every dialect, we could see from statistics on accent patterns in dialect other than Tokyo dialect that even among Tokyo-type dialects that have the same possible pitch-accent types, different frequencies occurred for the four different accent patterns for trimoraic nouns. For example, Aomori dialect overrepresents medial accent, which in Tokyo dialect is the most underrepresented pattern. A possible analysis of Aomori dialect is that medial accent (OHL) is a default pattern, derived from the same input types (e.g. with no H tone) as we proposed will derive an unaccented pattern in Tokyo dialect. It will require further research to determine whether the kind of account I have proposed for Yamato pitch accent patterns can be applied to other examples of lexical skewing in other languages. For example, in languages that have both I si and Is/ in their phonemic inventory, where the more marked /§/ occurs less frequently, it may be possible to explain such a distribution through convergence of more input forms on I si than on /s/ Or in languages that allow both simple and complex onsets, convergence of inputs might account for a skewed distribution that favoured simple onsets 85  86  Suppose that we were to contrast I si and III by their values for two features under the coronal node: [±anterior] and [±distributed], where I si is [+anterior, -distributed] and III is [-anterior, +distributed]. If I si represents the default value for both [ianterior] and [idistributed], then among nine possible input values, including unspecified values, I si will surface for four inputs whereas III will surface for only one. 85  [Odist]  [Oant]  [+ant]  s  s s  T+dist] [-dist]  [-ant]  s  s  Suppose, for the sake of simplicity, that a language had just two consonants: Ixl and It/, and that the complex onset (tr) was possible but that *(rt) was ruled out by the Sonority Sequencing Principle. 86  If all inputs are possible, then the sequence /rt/ in the input is a valid input. If deletion of one of the consonants were to occur for such an input to satisfy the SSP, then of the following four possible inputs, only one would surface with a complex onset, thus predicting that simple onsets (continued...)  224  As far as rendaku voicing is concerned, we saw the opposite phenomenon to what we saw for pitch-accent patterns in nouns. There we expect to see regularity in this phonological process that affects derived forms. Blocking occurs apparently because of lexical prespecification of a [-voice] feature. In order to account for this kind of blocking in noun-noun compounds, we had to propose that most input forms or noun-noun compounds make reference to a lexical listing of a N that has an underrepresented voicing feature on its initial obstruent. Whereas for simplex nouns an input form with either [-voice] feature on the initial obstruent or an underspecified obstruent will derive the same output, for compounds we had no choice: only reference to an underspecified obstruent will derive the right result in a compound that undergoes rendaku voicing; only reference to one specified as [-voice] will get the right result for a compound that blocks voicing. This kind of skewing of input types occurred because of the historical process that caused rendaku voicing to evolve from nC clusters. For nouns that experience rendaku voicing, we have no choice among possible inputs: only one input fits the output. This is a case in which it is only in derived forms that a speaker can determine the correct input form of a simplex noun. 2  If only 50% of nouns participated in compounds, then we might expect to have less skewing of input forms: these 50% could have either specified or unspecified input forms, since in simplex words there is no way to determine which input is correct. If another 20% occurred with some kind of inflectional morphology that caused a phonological process that depended on specification or non-specification of voicing features, then this group would push the class of nouns towards greater skewing of input types. What is interesting is that the degree to which simplex lexical items occur in derived forms can influence the statistical skewing of input types. If rendaku voicing were to disappear as an active process, a simplex noun ought to be able to have either of two possible input forms: specified or non-specified. In such a case we might look for evidence elsewhere in the language of what input form is correct for a particular morpheme. What is interesting about regular processes like rendaku is that they force us to posit some  (...continued) should occur 75% of the time if each of the four inputs occurs with equal frequency: input: output: t r tr rt  t r tr r/t  (simple) (simple) (complex) (simple)  Such an account requires that inputs need not be determinate and that a learner may freely posit all possible inputs for an output they hear.  225 particular input for each morpheme involved, which is not the case when we just look at simplex nouns in isolation. If rendaku were 100% regular, we would not need such determinacy: either input would be possible. Thus when a process becomes irregular, greater determinacy in the lexicon is required.  226 BIBLIOGRAPHY  Alderete, J. (1999) Morphologically Governed Accent in Optimality TheoryPh.D. Diss. U . Mass Anttila, Arto (1997) (ROA-63-0000) Deriving variation from grammar: A study of Finnish genitives Archangeli, Diana. (1988). Aspects of underspecification theory. Phonology 5.2:183-207.  Archangeli, Diana. (1985). Underspecification in underlying representation. In In memory of Roman Jakobson: papers from the 1984 Mid-America Linguistics Conference, ed. G. Youmans. 3-15. Columbia, M O : Linguistic Area Program. Archangeli, Diana. (1988). Underspecification in Yawelmani Phonology and Morphology. New York: Garland. Archangeli, Diana, (1997) Optimality Theory: An Introduction to Linguistics in the 1990's in Archangeli, D . & D. Terence Langendoen, eds., "Optimality Theory, an Overview". Blackwell. Archangeli, Diana and Douglas Pulleyblank. (1989). Yoruba vowel harmony. Linguistic Inquiry 20.2:173-217.  Archangeli, Diana & Douglas Pulleyblank (1994). Grounded Phonology MIT Press. Cambridge. Avery, Peter. (1996). The representation of voicing contrasts. Doctoral dissertation, University of Toronto, Toronto, ON.  Baker, Mark The Mirror Principle and Morphosyntactic Explanation LI 16:373-416. Benua, L (1997) Transderivational Identity: Phonological Relations between Words. Ph. D. Diss. U . Mass Boersma, Paul & Bruce Hayes (1999) Empirical tests of the Gradual Learning Algorithm. Rutgers Optimality Archive 348-1099 Cheng, Chin-chuan, and William S.-Y. Wang (1972). Tone change in Chaozhou Chinese: A study of lexical diffusion. In "Papers in Linguistics in Honour of Henry and Renee Kahane", 99-  227 113 [Reprinted 1977] Clements, G.N. (1988). Towards a substantive theory of feature specification. Proceedings of NELS 18: :79-93.  Crowhurst, Megan and Mark Hewitt (1997). Boolean Operations and Constraint Interactions in Optimality Theory. Ms. University of North Carolina at Chapel Hill, and Brandeis University [Rutgers Optimality Archive #229]  Frantz, Donald G. (1991). Blackfoot Grammar. Toronto: University of Toronto Press. Frisch, Stefan, Michael Broe, & Janet Pierrehumbert (1997) Similarity and phonotactics in Arabic Rutgers Optimality Archive 223-1097 Goldsmith, John (1976) AutosegmentalPhonology Cambridge, Mass. MIT Ph. D> Dissertation. Published by Garland Press, New York, 1979. Greenberg, Joseph, (1950) The patterning of Root Morphemes in Semitic Word 6, 162-81 Grimshaw, Jane (1990) Argument Structure LI Monograph 18. MIT Press Han, M . (1962) Unvoicing of Vowels in Japanese Onsei no Kenkyuu 10, 81-100 Haraguchi, Shosuke (1988). "Pitch Accent and Intonation in Japanese." in Van der Hulst & Smith, eds. Autosegmental Studies on Pitch Accent Foris. Dordrecht. Haraguchi, S. (1977) The Tone Pattern of Japanese: An Autosegmental Theory of Tonology: Tokyo: Kaitakusha Hewitt, Mark (1994) Templates and Truncations in Optimality Theory. Ms. University of British Columbia Hewitt, Mark & Megan Crowhurst (1996) "Conjunctive Constraints and Templates" in Beckman, J et al eds. Proceedings of NELS 26 101-116 Amherst M A : G L S A Higurashi (1983) Accent of Extended Word Structures in Tokyo Standard Japanese E D U C A Tokyo Hirayama, Teruo The Character of the Totsukawa Dialect (Nara Prefecture) as a language  228 island. Gengo Kenkyuu 76: 29-73 Hirayama, Teruo 1968. Nihon no Hoogen Kodansha. Howe, Darin & Douglas Pulleyblank. In press. Patterns and timing of glottalisation. Phonology 18. Inkelas, Sharon, C. Orhan Orgun, and Cheryl Zoll (1996) Exceptions and static phonological patterns: cophonologies vs. prespecification Rutgers Optimality Archive 124-0496 Ito, Junko & Armin Mester (1986) The Phonology of Voicing in Japanese LI 17:49-73 Ito, Junko, Armin Mester, & Jaye Padgett ([1993] 1994) "NC Licencing and underspecification in Optimality Theory." Linguistic Inquiry 26:571-614. Ito, Junko and Armin Mester. (1995). "Japanese phonology: Constraint domains and structure preservation." In The handbook of phonological theory, Blackwell handbooks in Linguistics, ed. John Goldsmith. 817-838. Oxford: Blackwell. Ito, Junko & Armin Mester (1995) The Core-Periphery Structure of the Lexicon and Constraints on reranking M O P 180 Ito, J. & A . Mester (1998) Markedness & Word Structure: OCP effects in Japanese. Rutgers Optimality Archive 255-0498 Ito, Kitagawa, and Mester (1992) Prosodic Type Preservation in Japanese: Evidencefrom zuujago. Syntax Research Center. Santa Cruz Jakobson, Roman. (1941). Child language, aphasia and phonological universals. Mouton: The Hague.  Kageyama, T. (1982) Word Formation in Japanese Lingua 57 (215-258) Keating, Patricia. (1988). Underspecification in phonetics. Phonology 5:275-292.  Kobayashi, Yasuhide The Tone Melody of the Hirosaki Dialect. Gengo Kenkyuu 78: 40-67 Kubozono, Haruo (1995) Constraint interaction in Japanese phonology: evidence from compound accent Phonology at Santa Cruz Vol. 4, pp. 21-38  229 (1996) Lexical Markedness and Variation: A nonderivational account of Japanese compound accent W C C F L 15 pp. 273-288 Kumagai, S. (1977) An examination of vowel devoicing in the Japanese data of the Oxford Acoustic Database WEB-SLS (The European Student Journal of Language and Speech), http://web-sls.essex.ac.uk/web-sls/ Labov, William (1994) Princip