"Non UBC"@en . "DSpace"@en . "World Sanskrit Conference (17th : 2018 : Vancouver, B.C.)"@en . "University of British Columbia. Department of Asian Studies"@en . "Sathaye, Adheesh A."@en . "Huet, G\u00E9rard"@en . "Kulkarni, Amba"@en . "2020-06-10T17:56:49Z"@en . "2019"@en . "Edited volume featuring the proceedings of the Computational Sanskrit & Digital Humanities section of the 17th World Sanskrit Conference. Contents: 1. Preface / G\u00C3\u00A9rard Huet and Amba Kulkarni 2. A Functional Core for the Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB / Samir Sohoni and Malhar A. Kulkarni 3. PAIAS: P\u00C4\u0081\u00E1\u00B9\u0087ini A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Interpreter As a Service / Sarada Susarla, Tilak M. Rao and Sai Susarla 4. Yogyat\u00C4\u0081 as an absence of non-congruity / Sanjeev Panchal and Amba Kulkarni 5. An \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach to Learning Context Free Grammar Rules for Sanskrit Using Adaptor Grammar / Amrith Krishna, Bodhisattwa Prasad Majumder, Anil Kumar Boga and Pawan Goyal 6. A user-friendly tool for metrical analysis of Sanskrit verse / Shreevatsa Rajagopalan 7. Improving the learnability of classifiers for Sanskrit OCR corrections / Devaraja Adiga, Rohit Saluja, Vaibhav Agrawal, Ganesh Ramakrishnan, Parag Chaudhuri, K. Ramasubramanian and Malhar Kulkarni 8. A Tool for Transliteration of Bilingual Texts involving Sanskrit / Nikhil Caturvedi and Rahul Garg 9. Modeling the phonology of consonant duplication and allied changes in the recitation of Tamil Taittir\u00C4\u00AByaka-s / Balasubramanian Ramakrishnan 10. Word complementation in Classical Sanskrit / Brendan Gillon 11. TEITagger: Raising the standard for digital texts to facilitate interchange with linguistic software / Peter M. Scharf 12. Preliminary Design of a Sanskrit Corpus Manager / G\u00C3\u00A9rard Huet and Idir Lankri 13. Enriching the digital edition of the Ka\u00CC\u0084s\u00CC\u0081ika\u00CC\u0084vr\u00CC\u00A5tti by adding variants from the Nya\u00CC\u0084sa and Padaman\u00CC\u0083jari\u00CC\u0084 / Tanuja P. Ajotikar, Anuja P. Ajotikar and Peter M. Scharf 14. From the Web to the desktop: IIIF-Pack, a document format for manuscripts using Linked Data standards / Timothy Bellefleur 15. New Vistas to study Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP / Jayashree Aanand Gajjam, Diptesh Kanojia and Malhar Kulkarni"@en . "https://circle.library.ubc.ca/rest/handle/2429/74653?expand=metadata"@en . "\u00E2\u0080\u00A9Selected Papers Presented at the 17th World Sanskrit Conference , July 9-13, 2018Edited by G\u00C3\u00A9rard Huet and Amba KulkarniCOMPUTATIONAL SANSKRIT & DIGITAL HUMANITIESUniversity of British Columbia Vancouver, CanadaTHE 17TH WORLD SANSKRIT CONFERENCEVANCOUVER, CANADA \u00E2\u0080\u00A2 JULY 9-13, 2018Computational Sanskrit & Digital Humanities:\u00E2\u0080\u00A8Selected Papers Presented at the 17th World Sanskrit Conference, \u00E2\u0080\u00A8July 9-13, 2018, Vancouver, Canada. DOI: 10.14288/1.0391834.\u00E2\u0080\u00A8URI: http://hdl.handle.net/2429/74653. Edited by G\u00C3\u00A9rard Huet and Amba Kulkarni\u00E2\u0080\u00A8General Editor: Adheesh Sathaye Electronic edition published (2020) by the Department of Asian Studies, Univer-sity of British Columbia, for the International Association for Sanskrit Studies. Hardback edition published in 2018 by D. K. Publishers Distributors Pvt. Ltd., New Delhi (ISBN: 978-93-87212-10-7). \u00C2\u00A9 Individual authors, 2020. Content is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International license (CC BY-NC-ND 4.0). http://creativecommons.org/licenses/by-nc-nd/4.0/ All papers in this collection have received double-blind peer review. \u00E0\u00A4\u00B5\u00E0\u00A5\u0088\u00E0\u00A4\u00A7\u00E0\u00A5\u0081\u00E0\u00A4\u00B8\u00E0\u00A4\u00B5 \u00E0\u00A5\u008D\u00E0\u00A4\u00AE\u00E0\u00A4\u0095\u00E0\u00A4\u00AC\u00E0\u00A5\u0081\u00E0\u00A4\u0082\u00E0\u00A4\u009F\u00E0\u00A5\u0081\u00E0\u00A4\u0095\u00E0\u00A4\u0085 \u00E0\u00A4\u00BE\u00E0\u00A4\u00B0\u00E0\u00A4\u00BE \u00E0\u00A4\u00AF\u00E0\u00A4\u00B8\u00E0\u00A4\u0082 \u00E0\u00A4\u00A4\u00E0\u00A5\u0083\u00E0\u00A4\u00BE \u00E0\u00A4\u00AF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B8\u00E0\u00A4\u00AE\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u0083INTERNATIONAL ASSOCIATION OF SANSKRIT STUDIESComputational Sanskrit&Digital HumanitiesSelected papers presentedatthe 17th World Sanskrit ConferenceUniversity of British Columbia, Vancouver9\u00E2\u0080\u009313 July 2018Edited byG\u00C3\u00A9rard Huet & Amba KulkarniPrefaceThis volume contains edited versions of papers accepted for presentationat the 17th World Sanskrit Conference in July 2018 in Vancouver, Canada.A special track of the conference was reserved for the topic \u00E2\u0080\u009CComputationalSanskrit & Digital Humanities\u00E2\u0080\u009D, with the intent to cover not only recentadvances in each of the now mature fields of Sanskrit Computational Lin-guistics and Sanskrit Digital Libraries, but to encourage cooperative effortsbetween scholars of the two communities, and prepare the emergence ofgrammatically informed digital Sanskrit corpus. Due to its rather techni-cal nature, the contributions were not judged on mere abstracts, but onsubmitted full papers reviewed by a Program Committee.We would like to thank the Program Committee of our track for theirwork:\u00E2\u0080\u00A2 Dr Tanuja Ajotikar, Belgavi, Karnataka\u00E2\u0080\u00A2 Pr Stefen Baums, University of Munich\u00E2\u0080\u00A2 Pr Yigal Bronner, Hebrew University of Jerusalem\u00E2\u0080\u00A2 Pr Pawan Goyal, IIT Kharagpur\u00E2\u0080\u00A2 Dr Oliver Hellwig, D\u00C3\u00BCsseldorf University\u00E2\u0080\u00A2 Dr G\u00C3\u00A9rard Huet, Inria Paris (co-chair)\u00E2\u0080\u00A2 Pr Girish Nath Jha, JNU, Delhi\u00E2\u0080\u00A2 Pr Amba Kulkarni, University of Hyderabad (co-chair)\u00E2\u0080\u00A2 Dr Pawan Kumar, Chinmaya Vishwavidyapeeth, Veliyanad\u00E2\u0080\u00A2 Pr Andrew Ollett, Harvard University\u00E2\u0080\u00A2 Dr Dhaval Patel, I.A.S. Officer, Gujarat\u00E2\u0080\u00A2 Pr Srinivasa Varakhedi, KSU, Bengaluru14 contributions were accepted, revised along referees\u00E2\u0080\u0099 recommendations,and finely edited to form this collection.The first two papers concern the problem of proper mechanical simula-tion of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADady\u00C4\u0081y\u00C4\u00AB. In \u00E2\u0080\u009CA Functional Core for the Computationaliii Computational Sanskrit and Digital HumanitiesA\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADady\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u009D, Samir Janardan Sohoni and Malhar A. Kulkarni present anoriginal architecture for such a simulator, based on concepts from functionalprogramming. In their model, each P\u00C4\u0081\u00E1\u00B9\u0087inian s\u00C5\u00ABtra translates as a Haskellmodule, an elegant effective formalization. They explain an algorithm fors\u00C5\u00ABtra-conflict assessment and resolution, discuss visibility and termination,and exhibit a formal derivation of word bhavati as a showcase.A different computational framework for the same problem is offeredby Sarada Susarla, Tilak M. Rao and Sai Susarla in their paper \u00E2\u0080\u009CPAIAS:P\u00C4\u0081\u00E1\u00B9\u0087ini A\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADady\u00C4\u0081y\u00C4\u00AB Interpreter As a Service\u00E2\u0080\u009D. They explain their developmentof a Web service usable as a Sanskrit grammatical assistant, implementingdirectly the P\u00C4\u0081ninian mechanisms. Here s\u00C5\u00ABtras are records in a databasein the JSON format, managed by a Python library. They pay particularattention to the meta-rules of the grammar, and specially to defining s\u00C5\u00ABtras.Thay refrain from expanding such definitions in operative s\u00C5\u00ABtras, but insiston their emulation along the grammatical processing.These two papers conceptualize two computational models of P\u00C4\u0081niniangrammar that are strikingly different in their software architecture. How-ever, when one examines examples of s\u00C5\u00ABtra representations in both systems,the information content looks very similar, which may suggest some futureinter-operability of these two interesting tools.In the general area of mechanical analysis of Sanskrit text, we haveseveral contributions at various levels. At the level of semantic roles analysis,at the heart of dependency parsing, Sanjeev Panchal and Amba Kulkarnipresent possible solutions to the complementary problem of ambiguity. Intheir paper \u00E2\u0080\u009CYogyat\u00C4\u0081 as an absence of non-congruity\u00E2\u0080\u009D they explain variousdefinitions used by Sanskrit grammarians to express compatibility, and howto use these definitions to reduce ambiguity in dependency analysis.The next paper, \u00E2\u0080\u009CAn \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach to Learning Context FreeGrammar Rules for Sanskrit Using Adaptor Grammar\u00E2\u0080\u009D, by Amrith Kr-ishna, Bodhisattwa Prasad Majumder, Anil Kumar Boga, and Pawan Goyal,presents an innovative use of adaptor grammars to learn patterns in Sanskrittext definable as context-free languages. They present applications of theirtechniques to word reordering tasks in Sanskrit, a preliminary step towardsrecovering prose ordering from poetry, a crucial problem in Sanskrit.Concerning meter recognition, we have a contribution of Shreevatsa Ra-jagopalan on \u00E2\u0080\u009CA user-friendly tool for metrical analysis of Sanskrit verse\u00E2\u0080\u009D.The main feature of this new metrical analysis tool, available either as aPreface iiiWeb service or as a software library, is its robustness and its guidance inerror-correction.Two more contributions use statistical techniques (Big Data) for improv-ing various Sanskrit-related tasks.For instance, in the field of optical character recognition, the contribu-tion by Devaraja Adiga, Rohit Saluja, Vaibhav Agrawal, Ganesh Ramakr-ishnan, Parag Chaudhuri, K. Ramasubramanian and Malhar Kulkarni on\u00E2\u0080\u009CImproving the learnability of classifiers for Sanskrit OCR corrections\u00E2\u0080\u009D.In the same vein of statistical techniques, Nikhil Chaturvedi and RahulGarg present \u00E2\u0080\u009CA Tool for Transliteration of Bilingual Texts Involving San-skrit\u00E2\u0080\u009D, which accommodates smoothly text mixing various encodings.While much of the work in Sanskrit computational linguistics is on Clas-sical Sanskrit, researchers are also applying computational techniques toVedic Sanskrit. One such effort is a detailed formalization of Vedic recitationphonology by Balasubramanian Ramakrishnan: \u00E2\u0080\u009CModeling the Phonologyof Consonant Duplication and Allied Changes in the Recitation of TamilTaittiryaka-s\u00E2\u0080\u009D.On a more theoretical perspective on Sanskrit syntax, Brendan Gillonpresents a formalization of Sanskrit complements in terms of the categorialgrammar framework. His paper \u00E2\u0080\u009CWord complementation in Sanskrit treatedby a modest generalization of categorial grammar\u00E2\u0080\u009D explains modified ver-sions of the cancellation rules that aim at accommodating free word order.This raises the theoretical problem of the distinction between complementsand modifiers in Sanskrit.Turning to the Digital Humanities theme, we have a number of contri-butions. In \u00E2\u0080\u009CTEITagger: Raising the standard for digital texts to facilitateinterchange with linguistic software\u00E2\u0080\u009D, Peter Scharf discusses how fine-grainXML representation of corpus within the Text Encoding Initiative stan-dard allows the inter-communication between digital Sanskrit libraries andgrammatical tools such as parsers as well as meter analysis tools.A complementary proposal is discussed in the paper \u00E2\u0080\u009CPreliminary De-sign of a Sanskrit Corpus Manager\u00E2\u0080\u009D by G\u00C3\u00A9rard Huet and Idir Lankri. Theypropose a scheme for a fine-grained representation of Sanskrit corpus allow-ing inter-textuality phenomena such as sharing of sections of text, but alsoa variance of readings. They propose to use grammatical analysis tools tohelp annotators feeding digital libraries with grammatical information usingmodern cooperative work software. They demonstrate a prototype of sucha tool, in the framework of the Sanskrit Heritage platform.iv Computational Sanskrit and Digital HumanitiesMoving towards philological concerns such as critical editions, the paper\u00E2\u0080\u009CEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081v\u00E1\u00B9\u009Btti by adding variants from theNy\u00C4\u0081sa and Padama\u00C3\u00B1jar\u00C4\u00AB\u00E2\u0080\u009D, by Tanuja P. Ajotikar, Anuja P. Ajotikar, andPeter M. Scharf discusses the problem of managing complex informationfrom recensions and variants. It argues for a disciplined method of usingTEI structure to represent this information in machine-manipulable ways,and demonstrates its use on processing variants of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081v\u00E1\u00B9\u009Btti, the majorcommentary of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADady\u00C4\u0081y\u00C4\u00AB.In the same area of software-aided philology, the contribution \u00E2\u0080\u009CFrom theweb to the desktop: IIIF-Pack, a document format for manuscripts usingLinked Data standards\u00E2\u0080\u009D, by Timothy Bellefleur, presents a proposal for acommon format fit to manage complex information about corpus recensionsin various formats, including images of manuscripts. This is in view of fa-cilitating the interchange of such data by various teams using this commonformat. His proposal uses state-of-the-art standards of hypertext. It hasalready been put to use in an interactive software platform to manage re-censions for the critical edition of the Vet\u00C4\u0081lapa\u00C3\u00B1cavi\u00E1\u00B9\u0083\u00C5\u009Bati by Pr. AdheeshSathaye.The volume concludes with the contribution \u00E2\u0080\u009CNew Vistas to studyBhart\u00E1\u00B9\u009Bhari: Cognitive NLP\u00E2\u0080\u009D by Jayashree Aanand Gajjam, Diptesh Kano-jia, and Malhar Kulkarni which presents highly original research on cogni-tive linguistics in Sanskrit, by comparing the results of experiments witheye-tracking equipment with theories of linguistic cognition by Bhart\u00E1\u00B9\u009Bhari.We thank the numerous experts who helped us in the review processand all our authors who responded positively to the reviewer\u00E2\u0080\u0099s commentsand improved their manuscripts accordingly. We thank the entire 17th WSCorganizing committee, led by Pr. Adheesh Sathaye, which provided us thenecessary logistic support for the organization of this section.G\u00C3\u00A9rard Huet & Amba KulkarniContributorsDevaraja AdigaDepartment of Humanities and Social Sciences,Indian Institute of Technology Bombay,Powai, Mumbai, India.pdadiga@iitb.ac.inVaibhav AgrawalIndian Institute of Technology,Kharagpur,India.vaibhav@iitkgp.ac.inAnuja AjotikarShan State Buddhist University,Myanmaranujaajotikar@gmail.comTanuja AjotikarKAHER\u00E2\u0080\u0099s Shri B. M. Kankanwadi Ayurveda Mahavidyalaya,Belagavi,India.gtanu30@gmail.comTimothy BellefleurDepartment of Asian Studies,University of British Columbia, Vancouvertbelle@alumni.ubc.cavvi Computational Sanskrit and Digital HumanitiesAnil Kumar BogaDepartment of Computer Science and Engineering,Indian Institute of Technology,Kharagpur, India.bogaanil.009@gmail.comNikhil ChaturvediDepartment of Computer Science and Engineering,Indian Institute of Technology,New Delhics5130291@cse.iitd.ac.inParag ChaudhuriIndian Institute of Technology Bombay,Powai, Mumbai, India.paragc@cse.iitb.ac.inJayashree Aanand GajjamDepartment of Humanities and Social Sciences,Indian Institute of Technology Bombay,Powai, Mumbai, India.jayashree_aanand@iitb.ac.inRahul GargDepartment of Computer Science and Engineering,Indian Institute of Technology,New Delhirahulgarg@cse.iitd.ac.inBrendan S. GillonMcGill UniversityMontreal, QuebecH3A 1T7 Canadabrendan.gillon@mcgill.caviiContributorsPawan GoyalDepartment of Computer Science and Engineering, Indian Institute of Technology,Kharagpur, India.pawang@cse.iitkgp.ernet.inG\u00C3\u00A9rard HuetInria Paris Center,France.Gerard.Huet@inria.frDiptesh KanojiaIITB-Monash Research Academy, Powai, Mumbai, India diptesh@iitb.ac.inAmrith KrishnaDepartment of Computer Science and Engineering, Indian Institute of Technology,Kharagpur, India.amrith.krishna@cse.iitkgp.ernet.inAmba KulkarniDepartment of Sanskrit Studies,University of Hyderabad,Hyderabad, India.apksh@uohyd.ernet.inMalhar KulkarniDepartment of Humanities and Social Sciences,Indian Institute of Technology Bombay,Powai, Mumbai, India.malhar@hss.iitb.ac.inviii Computational Sanskrit and Digital HumanitiesIdir LankriUniversit\u00C3\u00A9 Paris Diderot,Parislankri.idir@gmail.comBodhisattwa Prasad MajumderWalmart LabsIndia.bodhisattwapm2017@email.iimcal.ac.inSanjeev PanchalDepartment of Sanskrit Studies,University of Hyderabad,Hyderabad, India.snjvpnchl@gmail.comShreevatsa RajagopalanIndependent Scholar1035 Aster Ave 1107Sunnyvale, CA 94086, USAshreevatsa.public@gmail.comBalasubramanian RamakrishnanIndependent Scholar145 Littleton RdHarvard, MA 01451, USAbalasr@acm.orgGanesh RamakrishnanIndian Institute of Technology Bombay,Powai, Mumbai, India.ganesh@cse.iitb.ac.inixContributorsK. RamasubramanianDepartment of Humanities and Social Sciences, Indian Institute of Technology Bombay, Powai, Mumbai, India.ram@hss.iitb.ac.inTilak M RaoSchool of Vedic Sciences,MIT-ADT University,Pune, India.rao.tilak@gmail.comRohit SalujaIITB-Monash Research Academy,Powai, Mumbai, India.rohitsaluja@cse.iitb.ac.inPeter ScharfThe Sanskrit Library,Providence, Rhode Island, U.S.A.andLanguage Technologies Research Center, Indian Institute of Information Technology, Hyderabad, India.scharf@sanskritlibrary.orgSamir Janardan SohoniDepartment of Humanities and Social Sciences, Indian Institute of Technology Bombay, Powai, Mumbai, India.sohoni@hotmail.comSarada SusarlaKarnataka Sanskrit University,Bangalore, India.sarada.susarla@gmail.comx Computational Sanskrit and Digital HumanitiesSai SusarlaSchool of Vedic Sciences,MIT-ADT University,Pune, India.sai.susarla@gmail.comContentsPreface iContributors vA Functional Core for the Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB 1Samir Sohoni and Malhar A. KulkarniPAIAS: P\u00C4\u0081\u00E1\u00B9\u0087ini A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Interpreter As a Service 31Sarada Susarla, Tilak M. Rao and Sai SusarlaYogyat\u00C4\u0081 as an absence of non-congruity 59Sanjeev Panchal and Amba KulkarniAn \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach to Learning Context FreeGrammar Rules for Sanskrit Using Adaptor Gram-mar 83Amrith Krishna, Bodhisattwa Prasad Majumder, AnilKumar Boga, and Pawan GoyalA user-friendly tool for metrical analysis of San-skrit verse 113Shreevatsa Rajagopalanxixii Computational Sanskrit and Digital HumanitiesImproving the learnability of classifiers for San-skrit OCR corrections 143Devaraja Adiga, Rohit Saluja, Vaibhav Agrawal,Ganesh Ramakrishnan, Parag Chaudhuri, K. Ramasubra-manian and Malhar KulkarniA Tool for Transliteration of Bilingual Texts In-volving Sanskrit 163Nikhil Caturvedi and Rahul GargModeling the Phonology of Consonant Duplica-tion and Allied Changes in the Recitation of TamilTaittir\u00C4\u00AByaka-s 181Balasubramanian RamakrishnanWord complementation in Classical Sanskrit 217Brendan GillonTEITaggerRaising the standard for digital textsto facilitate interchange with linguistic software 229Peter M. ScharfPreliminary Design of a Sanskrit Corpus Manager 259G\u00C3\u00A9rard Huet and Idir LankriEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5ttiby adding variants from the Ny\u00C4\u0081sa and Padama\u00C3\u00B1jar\u00C4\u00AB 277Tanuja P. Ajotikar, Anuja P. Ajotikar, and Peter M.ScharfTable of contents xiiiFrom the Web to the desktop: IIIF-Pack, a doc-ument format for manuscripts using Linked Datastandards 295Timothy BellefleurNew Vistas to study Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 311Jayashree Aanand Gajjam, Diptesh Kanojia and Mal-har Kulkarnixiv Computational Sanskrit and Digital HumanitiesA Functional Core for the ComputationalA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABSamir Janardan Sohoni and Malhar A. KulkarniAbstract: There have been several efforts to produce computationalmodels of concepts from P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. These implementationstargeted certain subsections of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB such as the visibilityof rules, resolving rule conflict, producing sandhi, etc. Extrapolat-ing such efforts extremely will give us a much-coveted computationalA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. A computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB must produce an acceptablederivation of words showing the order in which s\u00C5\u00ABtras are applied.We have developed a mini computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB which purportsto derive accented verb forms of the root bh\u00C5\u00AB in the la\u00E1\u00B9\u00AD lak\u00C4\u0081ra. Anengine repeatedly chooses, prioritizes and applies sutras to an input,given in the form of a vivak\u00E1\u00B9\u00A3\u00C4\u0081, until an utterance is derived. Amongother things, this paper describes the structure of s\u00C5\u00ABtras, the visibilityof s\u00C5\u00ABtras in the sap\u00C4\u0081dasapt\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and trip\u00C4\u0081d\u00C4\u00AB sections, phasing of thesutras and the conflict resolution mechanisms.We found that the sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 and vidhi s\u00C5\u00ABtras are relatively simple toimplement due to overt conditional clues. The adhik\u00C4\u0081ra and paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081sutras are too general to be implemented on their own, but can bebootstrapped into the vidhi s\u00C5\u00ABtras. The para-nitya-antara\u00E1\u00B9\u0085ga-apav\u00C4\u0081damethod of resolving s\u00C5\u00ABtra conflicts was extended to suit the compu-tational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. Phasing can be used as a device to defer certains\u00C5\u00ABtras to a later stage in the derivation.This paper is the first part of a series. We intend to write more as weimplement more from the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.Keywords: computational Ashtadhyayi, derivation, conflict resolution,sutra, visibility, phase12 Sohoni and Kulkarni1 IntroductionAn accent is a key feature of the Sanskrit language. While the ubiquitousaccent of Vedic has fallen out of use in Classical Sanskrit, P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s gram-matical mechanisms are capable of producing accented speech. We aim toderive an accented instance by using computer implementation of P\u00C4\u0081\u00E1\u00B9\u0087inianmethods. Our system can produce the output shown in Listing 1.Our implementation uses a representation of vivak\u00E1\u00B9\u00A3\u00C4\u0081 (See Listing 11)to drive the derivation. We model A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras as requiring a set ofpreconditions to produce an effect. The s\u00C5\u00ABtras look for their preconditionsin an input environment. The effects produced by s\u00C5\u00ABtras become part ofan ever-evolving environment which may trigger other s\u00C5\u00ABtras. To resolverule conflicts, we have made a provision for a harness which is based on theparibh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A4\u0089\u00E0\u00A5\u008D\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082 \u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083.We have used Haskell, a lazy functional programming language, to buildthe prototype. Our implementation uses a phonetic encoding of charactersfor accentuation and P\u00C4\u0081\u00E1\u00B9\u0087inian operations (See Sohoni and M. A. Kulkarni(2016)). The phonetic encoding allows for faster phonetic modifications andtesting operation, something that seems to happen very frequently acrossmost of the s\u00C5\u00ABtras.The following is an outline of the rest of this paper. Previous work re-lated to the computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is reviewed in Section 2. Section3 discusses how phonetic components, produced and consumed by s\u00C5\u00ABtras,are represented. It also discusses tracing the antecedants of components.Implementation of s\u00C5\u00ABtras is discussed in Section 4. Intermediate steps of aderivation, known as frames, are discussed in 5.1. Section 5.2 also discussesthe environment which is used to check triggering conditions of s\u00C5\u00ABtras. De-ferring application of s\u00C5\u00ABtras by arranging them into phases is discussed in6. The process of derivation is explained in Section 7. Prioritization andconflict resolution of s\u00C5\u00ABtras is discussed in Section 8. Section 9 discusses howvisible frames in a derivation are made available to a s\u00C5\u00ABtra. Some conclusionsand future work are discussed in Section 10.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 3Wiwakshaa --->[(\"gana\",Just \"1\"),(\"purusha\",Just \"1\"),(\"wachana\",Just \"1\"),(\"lakaara\",Just \"wartamaana\"),(\"prayoga\",Just \"kartari\")]Initial ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A5\u0092[]***>(6.1.162) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082[]***>(3.2.123) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00B2\u00E0\u00A4\u0081\u00E0\u00A4\u009F\u00E0\u00A5\u008D[(1.4.13) wins (1.4.13) vs (1.3.9) by SCARE]***>(1.4.13) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00B2\u00E0\u00A4\u0081\u00E0\u00A4\u009F\u00E0\u00A5\u008D[]***>(1.3.9) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00B2\u00E0\u00A5\u008D[]***>(3.4.78) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[(1.4.13) wins (1.4.13) vs (1.4.104) by SCARE,(1.4.13) wins (1.4.13) vs (1.3.9) by SCARE,(1.4.13) wins (1.4.13) vs (3.1.68) by SCARE,(1.4.13) wins (1.4.13) vs (7.3.84) by SCARE]***>(1.4.13) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[(1.4.104) wins (1.4.104) vs (1.3.9) by SCARE,(1.4.104) wins (1.4.104) vs (3.1.68) by SCARE,(1.4.104) wins (1.4.104) vs (7.3.84) by SCARE]***>(1.4.104) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u0082\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[(3.1.68) wins (1.3.9) vs (3.1.68) by paratwa,(7.3.84) wins (3.1.68) vs (7.3.84) by paratwa]***>(7.3.84) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[(3.1.68) wins (1.3.9) vs (3.1.68) by paratwa]***>(3.1.68) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u00B6\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00C4\u00A2\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[(1.4.13) wins (1.4.13) vs (1.3.9) by SCARE]***>(1.4.13) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u00B6\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00C4\u00A2\u00E0\u00A4\u00A4\u00E0\u00A5\u0092\u00E0\u00A4\u00AA\u00E0\u00A5\u008D[]***>(1.3.9) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u0085\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092[]***>(1.4.14) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u0085\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092[]***>(1.4.109) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u0085\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092[(8.4.66) wins (6.1.78) vs (8.4.66) by paratwa]***>(8.4.66) ---> \u00E0\u00A4\u00AD\u00E0\u00A5\u008B\u00E0\u00A4\u0085\u00E0\u00A5\u0091\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092[]***>(6.1.78) ---> \u00E0\u00A4\u00AD\u00E0\u00A4\u00B5\u00E0\u00A5\u0092\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092[]***>(8.4.66) ---> \u00E0\u00A4\u00AD\u00E0\u00A4\u00B5\u00E0\u00A5\u0091\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u0092Listing 1A derivation of the pada \u00E0\u00A4\u00AD\u00E0\u00A4\u00B5\u00E0\u00A5\u0091\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u00924 Sohoni and Kulkarni2 Review of LiteratureFormal foundations of a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB can be seen in Mishra(2008, 2009, 2010). The general approach in Mishra\u00E2\u0080\u0099s work is to take a lin-guistic form such as bhavati and apply heuristics to carve out some grammat-ical decompositions. The decompositions are used to drive analytical pro-cesses that may yield more decompositions along the boundaries of sandhisto produce seed-forms.1 This part is an analysis done in a top-down manner.The second phase is a bottom-up synthesis, wherein, each of the seed-formsis processed by a synthesizer to produce finalized linguistic expressions thatmust match the original input. To support analysis, Mishra\u00E2\u0080\u0099s implementa-tion relies upon a database which contains partial orders of morphologicalentities, mutually exclusive morphemes, and other such artifacts.2 In thesynthesis phase, Mishra (2010) also implements a conflict resolver using thesiddha principle.3Goyal, A. P. Kulkarni, and Behera (2008) have also created a computa-tional A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB which focuses on ordering s\u00C5\u00ABtras in the regions governedby \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00BD\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00B8\u00EE\u0085\u0083\u00E0\u00A4\u00AE (\u00E0\u00A5\u008DA. 8.2.1), \u00E0\u00A4\u00B7\u00EE\u0084\u009E\u00E0\u00A4\u00A4\u00E0\u00A4\u0095\u00E0\u00A5\u0081\u00E0\u00A5\u008B\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00B8\u00EE\u0085\u0083\u00E0\u00A4\u0083 (A. 6.1.86) and \u00E0\u00A4\u0085\u00E0\u00A4\u00BF\u00E0\u00A4\u00B8\u00EE\u0085\u0083\u00E0\u00A4\u00B5\u00E0\u00A4\u00A6\u00E0\u00A4\u00BD\u00E0\u00A4\u00BE\u00E0\u00A4\u00AD\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4 (\u00E0\u00A5\u008DA. 6.4.22).Input, in the form of prak\u00E1\u00B9\u009Bti along with attributes, is passed through a setof modules that have thematically grouped rules. The implementation em-bodies the notion of data spaces. Rules are able to see various data spacesin order to take input. Results produced by the rules are put back into theappropriate data spaces. Conflict resolution is based on paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081-drivenconcepts such as principle of apav\u00C4\u0081da as well as ad-hoc techniques.4Goyal, A. P. Kulkarni, and Behera (2008) \u00C2\u00A74 mention the features of acomputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. Also, a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB should seam-lessly glue together grammatical concepts just like the traditional A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB. It should not add any side effects, neither should it be lacking anypart of the traditional A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. Above all, a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABmust produce an acceptable derivation of words.The system described in Mishra (2010) does not use traditional buildingblocks such as the M\u00C4\u0081he\u00C5\u009Bvara S\u00C5\u00ABtras or the Dh\u00C4\u0081tup\u00C4\u0081tha, but can be made1See Mishra (2009), Section 4.2 for a description of the general process.2See Mishra (2009), Section 6.1 for details of the database.3Mishra (2010):2554 Goyal, A. P. Kulkarni, and Behera (2008), cf. \u00E2\u0080\u0098Module for Conflict Resolution\u00E2\u0080\u0099 in\u00C2\u00A74.4Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 5to do so.5 We believe that canonical building blocks such as M\u00C4\u0081he\u00C5\u009Bvara S\u00C5\u00AB-tras and A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtrap\u00C4\u0081\u00E1\u00B9\u00ADha should strongly influence the computationalA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.Peter M. Scharf (2016) talks about the need for faithfully translatingP\u00C4\u0081\u00E1\u00B9\u0087inian rules in the realm of computation and shows elaborate XMLiza-tion of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s rules. Bringing P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s rules into the area of computationuncovers some problems that need to be solved. T. Ajotikar, A. Ajotikar,and Peter M. Scharf (2016) discuss some of those issues.XML is useful in describing structured data and therefore XMLizationof the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is a step in the right direction. However, processingP\u00C4\u0081\u00E1\u00B9\u0087inian derivations in XML will be fraught with performance issues. XMLis good for the specification of data but it cannot be used as a programminglanguage. A good deal of designing a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB will haveto focus on questions such as \u00E2\u0080\u009CHow to implement (not specify) the notionX\u00E2\u0080\u009D. X may refer to things such as run-time evaluation of apav\u00C4\u0081das, ordynamically creating any praty\u00C4\u0081h\u00C4\u0081ra from the M\u00C4\u0081he\u00C5\u009Bvara S\u00C5\u00ABtras or dealingwith an atide\u00C5\u009Bas\u00C5\u00ABtra so that a proper target rule is triggered. The powerof a real, feature-rich programming language will be indispensable in suchwork.Patel and Katuri (2016) have demonstrated the use of programminglanguages to derive subanta forms. Patel and Katuri have discovered amanual way to order rules (NLP ordering) for producing subantas accordingto Bhattoj\u00C4\u00AB Dik\u00E1\u00B9\u00A3ita\u00E2\u0080\u0099s Vaiy\u00C4\u0081kara\u00E1\u00B9\u0087a Siddh\u00C4\u0081ntakaumud\u00C4\u00AB. It is conceivable thatas more rules are added to the system to derive other types of words, theNLP ordering may undergo a lot of change and it may ultimately approachthe order that comes about due to P\u00C4\u0081\u00E1\u00B9\u0087inian paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s and those compiledby Nagojibha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADa (See Kielhorn (1985)).In the present paper we describe the construction of rules, the progressof a derivation, the resolution of conflicts by modeling competitions betweenthe rules in the ambit of paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A4\u0089\u00E0\u00A5\u008D\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082 \u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083 andother such concepts.5Mishra (2010):256, \u00C2\u00A74, \u00E2\u0080\u009CThere is, however, a possibility to make the system aware ofthese divisions.\u00E2\u0080\u009D6 Sohoni and Kulkarnitype Attribute v = (String, Maybe v)type Tag = Attribute Stringdata Component = Component {cmpWarnas :: [Warna],cmpAttrs :: [Tag],cmpOrigin :: [Component]}type State = [Component]Listing 2State3 Phonetic ComponentsThe phonetic payload, which comprises of phonemes, is known as aComponent. Listing 2 shows the implementation.6 A Component ismade up of phonetically encoded Warnas. Some name-value pairs knownas Tags give meta information about the Components. Usually, the tagscontain sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s. Over the course of a derivation, Components can un-dergo changes. At times, s\u00C5\u00ABtras are required to test previous incarnationsof a sth\u00C4\u0081n\u00C4\u00AB (substituend), so a list of previous forms of the Componentsis also retained. The current yield of the derivation at any step is in theState which is a list of Components.Listing 3 shows how \u00E0\u00A4\u00AD\u00E0\u00A5\u0082 + \u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA \u00E0\u00A5\u008Dcan be represented as a intermediate pho-netic State. The Devan\u00C4\u0081gar\u00C4\u00AB representations of \u00E0\u00A4\u00AD\u00E0\u00A5\u0082 and \u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA a\u00E0\u00A5\u008Dre converted intoan internal representation of Warnas using the encode function. Suitabletags are applied to the components bhu and tip and they are strung togetherin a list to create a State.3.1 Tracing Components to Their OriginsThe s\u00C5\u00ABtra \u00E0\u00A4\u00B5\u00E0\u00A4\u00A4 \u00EE\u008C\u0083\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8 \u00E0\u00A5\u0087 \u00E0\u00A4\u00B2\u00E0\u00A4\u009F \u00E0\u00A5\u008D (A. 3.2.123) inserts a \u00E0\u00A4\u00B2\u00E0\u00A4\u0081\u00E0\u00A4\u009F \u00E0\u00A5\u008D pratyaya after a dh\u00C4\u0081tu. Thispratyaya will undergo changes and ultimately become \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Ddue to application6Excerpts of implementation details are shown in Haskell. References to variable namesin computer code are shown in bold teletype font. In code listings the letter \u00E2\u0080\u0098w\u00E2\u0080\u0099 is usedfor \u00E0\u00A4\u00B5.\u00E0\u00A5\u008D In other places the Roman transliteration is used which prefers \u00E2\u0080\u0098v\u00E2\u0080\u0099 instead of \u00E2\u0080\u0098w\u00E2\u0080\u0099.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 7egState = let bhu = Component (encode \"\u00E0\u00A4\u00AD\u00E0\u00A5\u0082\") -- phonemes[(\"dhaatu\",Nothing)] -- tags[] -- no previous historytip = Component (encode \"\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00E0\u00A5\u008D\") -- phonemes[(\"wibhakti\",Nothing) -- tags,(\"parasmaipada\",Nothing),(\"ekawachana\",Nothing),(\"saarwadhaatuka\",Nothing),(\"pratyaya\",Nothing)][] --no previous historyin [bhu, tip]Listing 3An example of Stateof it-sutras A. 1.3.2-9. \u00E0\u00A4\u00B2\u00E0\u00A4\u00B6\u00EE\u0080\u00A8\u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00EE\u0085\u0083\u00E0\u00A4\u00A4\u00E0\u00A5\u0087 (A. 1.3.8) will mark the \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Das an \u00E0\u00A4\u0087\u00E0\u00A4\u00A4 \u00E0\u00A5\u008Dcausingits removal. A. 1.3.8 should not mark the \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Dof a \u00E0\u00A4\u00B2\u00E0\u00A4\u0081\u00E0\u00A4\u009F \u00E0\u00A5\u008Dpratyaya as an \u00E0\u00A4\u0087\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D. The\u00E0\u00A4\u00B2 \u00E0\u00A5\u008D in ten lak\u00C4\u0081ras is not an \u00E0\u00A4\u0087\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D. These lak\u00C4\u0081ras should figure into A. 1.3.8 asan exception list so that A. 1.3.8 does not apply to them. However, othersutras like A. 1.3.2, A. 1.3.3 and A. 1.3.9 may still apply leaving back only\u00E0\u00A4\u00B2.\u00E0\u00A5\u008D If a list of ten lak\u00C4\u0081ras was kept as an exception list in A. 1.3.8, \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Dwill notmatch any one of those and will be liable to be dropped. Somehow, the \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Dwhich remains from lak\u00C4\u0081ras, needs to be traced back to the original lak\u00C4\u0081ra.As shown in Listing 2, the datatype Component recursively containsa list of Components. The purpose of this list is to keep around previousforms of a Component. As a Component changes, its previous formalong with all attributes is stored at the head of the list. This makes iteasy to recover any previous form of a component and examine it. Listing4 shows the traceOrigin function. In case of 1.3.8, if calling traceOriginon a \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Dproduces one of the 10 lak\u00C4\u0081ras, 1.3.8 does not mark such a \u00E0\u00A4\u00B2 \u00E0\u00A5\u008Dan \u00E0\u00A4\u0087\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D.This way of tracing back Components to their previous forms can help indetermination of a sth\u00C4\u0081n\u00C4\u00AB and its attributes under the influence of atide\u00C5\u009Bas\u00C5\u00ABtra like \u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B5\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00B6\u00E0\u00A5\u0087\u00E0\u00A5\u008B\u00E0\u00A4\u00BD\u00E0\u00A4\u00A8\u00E0\u00A4\u00BF\u00EE\u0088\u00AC\u00E0\u00A4\u00A7\u00E0\u00A5\u008C (A. (1.1.56).8 Sohoni and KulkarnitraceOrigin :: Component -> [Component]traceOrigin (Component ws as []) = []traceOrigin (Component ws as os) =nub \$ (concat.foldr getOrig [os]) oswhere getOrig c os = (traceOrigin c) : osListing 4Tracing origin of a Component4 S\u00C5\u00ABtrasAccording to one opinion in the P\u00C4\u0081\u00E1\u00B9\u0087inian tradition, there are six differenttypes of s\u00C5\u00ABtras. The following verse enumerates them;7\u00E0\u00A4\u00B8\u00EE\u008C\u0098\u00E0\u00A4\u0082\u00E0\u00A4\u00BE \u00E0\u00A4\u009A \u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00AD\u00E0\u00A4\u00BE\u00E0\u00A4\u00B7\u00E0\u00A4\u00BE \u00E0\u00A4\u009A \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00BF\u00E0\u00A4\u00A7\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8 \u00EE\u008C\u0083\u00E0\u00A4\u00AF\u00E0\u00A4\u00AE \u00E0\u00A4\u008F\u00E0\u00A4\u00B5 \u00E0\u00A4\u009A \u00E0\u00A5\u00A4\u00E0\u00A4\u0085\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00A6\u00E0\u00A4\u00B6\u00E0\u00A5\u0087\u00E0\u00A5\u008B\u00E0\u00A4\u00BD\u00E0\u00A4\u00BF\u00E0\u00A4\u00A7\u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00B0\u00EE\u0088\u00BA \u00E0\u00A4\u00B7\u00E0\u00A4\u00BF\u00EE\u0083\u00B7\u00E0\u00A4\u00A7\u00E0\u00A4\u0082 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BD\u00E0\u00A5\u0082\u00E0\u00A4\u00B2\u00EE\u008C\u0097\u00E0\u00A4\u00A3\u00E0\u00A4\u00AE \u00E0\u00A5\u00A5\u00E0\u00A5\u008DThe sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras apply specific sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s to grammatical entities basedon certain indicatory marks found in the input. They help the vidhi s\u00C5\u00ABtrasbring about changes.The real executive power of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB lies in the vidhi, ni\u00E1\u00B9\u00A3edha andniyama s\u00C5\u00ABtras. The vidhi s\u00C5\u00ABtras bring about changes in the state of thederivation. The ni\u00E1\u00B9\u00A3edha and niyama s\u00C5\u00ABtras are devices that prevent over-generation of vidhi s\u00C5\u00ABtras. They are strongly associated with specific vidhis\u00C5\u00ABtras and also share some of their conditioning information.The paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras are subservient to vidhis\u00C5\u00ABtras. They can be thoughtof as algorithmic helper functions which are called from many places ina computer program. In the spirit of the k\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a, the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s\u00C5\u00ABtras are supposed to unite with vidhi s\u00C5\u00ABtras to create a complete s\u00C5\u00ABtrawhich produces a certain effect. The paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras need not be explicitlyimplemented because their logic can be embedded into the vidhi s\u00C5\u00ABtras.The adhik\u00C4\u0081ra s\u00C5\u00ABtras create a context for vidhi s\u00C5\u00ABtras to operate. Froman implementation perspective, the context of the adhik\u00C4\u0081ra can be built intothe body of vidhi, ni\u00E1\u00B9\u00A3edha or niyama s\u00C5\u00ABtras and therefore adhik\u00C4\u0081ra s\u00C5\u00ABtrasneed not be explicitly implemented.7Vedantakeshari (2001):9-11. According to other opinions there are 7 or even as manyas 8 types of s\u00C5\u00ABtras if ni\u00E1\u00B9\u00A3edha and upavidhi types are considered.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 9The atide\u00C5\u009Ba s\u00C5\u00ABtras create situational analogies. By forming analogies,atide\u00C5\u009Ba s\u00C5\u00ABtras cause other vidhi s\u00C5\u00ABtras to trigger. In this implementationwe implement sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081, vidhi and niyama s\u00C5\u00ABtras. We have not implementedatide\u00C5\u009Ba s\u00C5\u00ABtras.In traditional learning, every paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtra is expected to be knownin the place it is taught. The effective meaning of a vidhi s\u00C5\u00ABtra is knownby resorting to the methods of yathodde\u00C5\u009Bapak\u00E1\u00B9\u00A3a or k\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a. In oneopinion, in the yathodde\u00C5\u009Bapak\u00E1\u00B9\u00A3a the boundary of the sap\u00C4\u0081dasapt\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB andthe trip\u00C4\u0081d\u00C4\u00AB presents an ideological barrier which cannot be crossed over bythe paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras for reasons of being invisible. The k\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a hasno such problem.8We are inclined towards an implementation based on k\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a asit allows us to escape having to implement each and every paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtraexplicitly and yet enlist the necessary paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras\u00E2\u0080\u0099 numbers which gointo creating ekav\u00C4\u0081kyat\u00C4\u0081 (full expanded meaning). This choice allows forswift development with less clutter. Therefore, the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 and adhik\u00C4\u0081ras\u00C5\u00ABtras are not explicitly implemented.S\u00C5\u00ABtras are defined as shown in Listing 5. In the derivation of the word\u00E0\u00A4\u00AD\u00E0\u00A4\u00B5\u00E0\u00A5\u0091\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A5\u00929 no niyama s\u00C5\u00ABtras were encountered, so they are not implemented inthis effort but could be implemented by adding a Niyama value constructor.The Widhi value constructor is used to represent all types of s\u00C5\u00ABtras otherthan sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras. The Samjnyaa value constructor is used to makesa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras. Both the value constructors appear to be same in terms oftheir parameters, only the name of the constructor differentiates them. Thisis useful in pattern matching on s\u00C5\u00ABtra values in the SCARE model (Section8.4) which treats sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras specially.4.1 Testing the Conditions for Application of s\u00C5\u00ABtrasIn a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, a s\u00C5\u00ABtra must be able to sense certain con-ditions that exist in the input and it should also be able to produce aneffect. These are the two basic requirements any implementable s\u00C5\u00ABtra mustsatisfy. The Testing field in Listing 5 refers to a datatype that has testingfunctions slfTest and condTest. A s\u00C5\u00ABtra will be able to produce its effectprovided that slfTest returns True and condTest returns a function whichcan produce effects. More on this is explained in Section 7.1. Listing 6 shows8See Kielhorn (1985), paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s 2 & 3 \u00E2\u0080\u0093 \u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF \u00EE\u008C\u0083\u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00B2\u00E0\u00A4\u00AA\u00EE\u008C\u0097\u00E0\u00A5\u0087 \u00E0\u00A4\u00A4 \u00E0\u00A5\u0081 \u00E0\u00A4\u00BF\u00E0\u00A4\u00BD\u00E0\u00A4\u00AA\u00E0\u00A4\u00BE\u00EE\u0085\u0092\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE\u00EE\u0087\u0091\u00E0\u00A4\u00AA\u00E0\u00A5\u0081\u00E0\u00A4\u00BF\u00EE\u0089\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4 \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00B6\u00E0\u00A4\u00B7\u00E0\u00A5\u0087\u00E0\u00A4\u00839\u00E1\u00B9\u009Agvedic convention is used to show accent marks.10 Sohoni and Kulkarnidata Sutra = Widhi { number :: SutraNumber, testing :: Testing}| Samjnyaa { number :: SutraNumber, testing :: Testing}Listing 5Definition of s\u00C5\u00ABtradatatype Testing. Function slfTest is used to prevent a s\u00C5\u00ABtra from apply-ing ad infinitum. Some s\u00C5\u00ABtras produce an effect without any conditions.For example, \u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u0083 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00EE\u0086\u0096\u00E0\u00A4\u0095\u00E0\u00A4\u00B7 \u00EE\u008C\u0083\u00E0\u00A4\u0083 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00E0\u00A4\u0082\u00E0\u00A4\u00B9\u00E0\u00A4\u00A4\u00E0\u00A4\u00BE (A. 1.4.109) defines the sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 samhit\u00C4\u0081 (themode of continuous speech) which is not pre-conditioned by anything whichcan be sensed in the input. This s\u00C5\u00ABtra can get applied and reapplied contin-uously had it not been for function slfTest. The slfTest function in s\u00C5\u00ABtraA. 1.4.109 allows its application only if it was not applied earlier. UnlikeA. 1.4.109, some s\u00C5\u00ABtras produce effects which are conditioned upon thingsfound in the input. For example, \u00E0\u00A4\u0089\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00EE\u0084\u0087\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00A8\u00E0\u00A4\u00A6\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00EE\u0084\u0087\u00EE\u0089\u00BD \u00EE\u008A\u0080\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4\u00E0\u00A4\u0083 (A. 8.4.66) will lookfor an ud\u00C4\u0081tta syllable followed by an anud\u00C4\u0081tta one in samhit\u00C4\u0081 and convertthe anud\u00C4\u0081tta into a svarita syllable. As long as there is no ud\u00C4\u0081tta followedby anud\u00C4\u0081tta in the input, A. 8.4.66 will not apply. A. 8.4.66 does not runthe risk of being applied ad infinitum because it is conditioned on thingswhich can be sensed in the input. Therefore, function slfTest in A. 8.4.66always returns True. Function condTest should test for the condition thatan ud\u00C4\u0081tta is followed by an anud\u00C4\u0081tta in samhit\u00C4\u0081, in which case it shouldreturn True. Listing 6 shows the functions which each s\u00C5\u00ABtra is expected toimplement.If a s\u00C5\u00ABtra is inapplicable, condTest returns Nothing, which meansno effects can be produced. In case a s\u00C5\u00ABtra is applicable condTest re-turns an effect function. Simply calling the effect function with the cor-rect Environment parameter will produce an effect as part of a newEnvironment. Effect is simply a function that takes an Environmentand produces a newer Environment. Each Sutra is expected to imple-ment the functions slfTest and condTest.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 11type Effect = TheEnv -> TheEnvdata Testing = TestFuncs {slfTest :: Environment Trace -> Bool,condTest :: Environment Trace ->([Attribute String], Maybe Effect)}Listing 6Definition of testing functions4.2 Organization of s\u00C5\u00ABtrasThe s\u00C5\u00ABtras are implemented as Haskell modules. Every s\u00C5\u00ABtra module exportsa details function. The details function gives access to the definition of thes\u00C5\u00ABtra and also the slfTest and condTest functions which are required inother parts of the code. Listing 7 shows a rough sketch of s\u00C5\u00ABtra A. 1.3.9. Inaddition, every s\u00C5\u00ABtra will have to implement its own effects function.12 Sohoni and Kulkarnimodule S1_3_9 where-- imports omitted for brevitydetails = Widhi (SutraNumber 1 3 9)(TestFuncs selfTest condTest)selfTest :: TheEnv -> BoolselfTest _ = TruecondTest :: TheEnv -> ([Attribute String], Maybe Effect)condTest env = -- details omitted for brevityeffects :: Effecteffects env = -- details omitted for brevityListing 7S\u00C5\u00ABtra moduleComputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 135 The Ecosystem for Execution of S\u00C5\u00ABtrasThe next s\u00C5\u00ABtra applicable in a derivation takes its input from the resultproduced by the previous one. Due to the siddha/asiddha notion betweencertain s\u00C5\u00ABtras, it can be generally said that the input for the next s\u00C5\u00ABtra maycome from any of the previously generated results. This section discussesthe ecosystem in which s\u00C5\u00ABtras get their inputs.5.1 FramesAs each s\u00C5\u00ABtra applies in a derivation, some information about it is capturedin a record known as a Frame. Some of the information, such as all theConflicts, is captured for reporting. The two important constituents ofFrame are the Sutra which was applied and the State it produced. As aderivation progresses, the output State from one s\u00C5\u00ABtra becomes the inputState of another.Listing 8 shows Frame as an abstract datatype which expects two types,conflict and sutra, to create a concrete datatype. Frame (Conflict Su-tra) Sutra is the realization of a concrete type which is, for convenience,called TheFrame. The Trace is merely a list of TheFrames. It is meantto show a step-by-step account of the derivation.5.2 EnvironmentTo produce an effect, some s\u00C5\u00ABtras look at indicators in what is fed as input.A s\u00C5\u00ABtra such as \u00E0\u00A4\u00B8\u00E0\u00A4\u00BE\u00E0\u00A4\u00B5 \u00EE\u008C\u0083\u00E0\u00A4\u00A7\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4\u00E0\u00A4\u0095\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00E0\u00A4\u00A7 \u00EE\u008C\u0083\u00E0\u00A4\u00A7\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4\u00E0\u00A4\u0095\u00E0\u00A5\u0081\u00E0\u00A4\u00AF\u00E0\u00A5\u008B\u00E0\u00A4\u0083 (A. 7.3.84) is expected to convert an \u00E0\u00A4\u0087\u00E0\u00A4\u0095 \u00E0\u00A5\u008Dletter at the end of the a\u00E1\u00B9\u0085ga into a gu\u00E1\u00B9\u0087a letter, provided that a s\u00C4\u0081rvadh\u00C4\u0081tukadata Frame conflict sutra = Frame {frConflicts :: [conflict],frSutra :: (Maybe sutra),frOutput :: State}type TheFrame = Frame (Competition Sutra) Sutratype Trace = [TheFrame]Listing 8The Trace14 Sohoni and Kulkarni-- input to A. 7.3.84[Component (encode \"\u00E0\u00A4\u00AD\u00E0\u00A5\u0082\") [(\"dhaatu\",Nothing)] [],Component (encode \"\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00E0\u00A5\u008D\")[(\"saarwadhaatuka\",Nothing),(\"pratyaya\",Nothing)] []]Listing 9An input to s\u00C5\u00ABtra A. 7.3.84-- output from A. 7.3.84[Component (encode \"\u00E0\u00A4\u00AD\u00E0\u00A5\u008B\") [(\"dhaatu\",Nothing)] [],Component (encode \"\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00E0\u00A5\u008D\")[(\"saarwadhaatuka\",Nothing) ,(\"pratyaya\",Nothing)] []]Listing 10An output from s\u00C5\u00ABtra A. 7.3.84.or \u00C4\u0081rdhadh\u00C4\u0081tuka pratyaya follows. If this s\u00C5\u00ABtra is fed an input such as theone shown in Listing 9, all necessary conditions can be found in this inputviz. there is an \u00E0\u00A4\u0087\u00E0\u00A4\u0095 \u00E0\u00A5\u008D at the end of the a\u00E1\u00B9\u0085ga, followed by the s\u00C4\u0081rvadh\u00C4\u0081tukapratyaya \u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA \u00E0\u00A5\u008D.Now that its required conditions have been fulfilled, A. 7.3.84 producesan effect such as the one shown in Listing 10. Thus, the input becomesan environment that is looked at by the s\u00C5\u00ABtras to check for any trigger-ing conditions. A s\u00C5\u00ABtra may need to look past its input into the input ofsome previously triggered s\u00C5\u00ABtras. Generalizing this, an environment con-sists of outputs produced by all s\u00C5\u00ABtras in the derivation thus far. The Tracedata structure (see Section 5.1) becomes a very important constituent ofEnvironment.There are s\u00C5\u00ABtras which produce an effect conditioned by what the speakerintends to say. \u00E0\u00A4\u00B5\u00E0\u00A4\u00A4 \u00EE\u008C\u0083\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8 \u00E0\u00A5\u0087\u00E0\u00A4\u00B2\u00E0\u00A4\u009F \u00E0\u00A5\u008D(A. 3.2.123), for example, will be fed an input whichmay contain entities like a dh\u00C4\u0081tu along with other sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s associated withit. However, the specific lak\u00C4\u0081ra, which the dh\u00C4\u0081tu must be cast into, canbe known only from the intention of the speaker. Unless it can be sensedComputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 15type Wiwakshaa = [Tag]egWiwakshaa = [attr gana \"1\",attr purusha \"1\",attr wachana \"1\",attr lakaara \"wartamaana\",attr prayoga \"kartari\",attr samhitaa \"yes\"]Listing 11Tags to describe vivak\u00E1\u00B9\u00A3\u00C4\u0081that the speaker wishes to express a vartam\u00C4\u0081na form, A. 3.2.123 cannot beapplied. Thus, some s\u00C5\u00ABtras are conditioned on what is in the vivak\u00E1\u00B9\u00A3\u00C4\u0081. Asshown in Listing 11, Wiwakshaa is modelled as a list of name-value Tags.The attr function is a helper which creates a Tag. This listing shows avivak\u00E1\u00B9\u00A3\u00C4\u0081 for creating a 3rx person, singular, present tense, active voice formof some dh\u00C4\u0081tu in samhit\u00C4\u0081 mode.In the case of certain s\u00C5\u00ABtras the triggering conditions remain intact for-ever. Such s\u00C5\u00ABtras tend to get applied repeatedly. To allow applicationonly once, housekeeping attributes have to be maintained. The house-keeping attributes may be checked by the slfTest function in the s\u00C5\u00ABtrasand reapplication can be prevented. Consider \u00E0\u00A4\u00B5\u00E0\u00A4\u00A4 \u00EE\u008C\u0083\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8 \u00E0\u00A5\u0087 \u00E0\u00A4\u00B2\u00E0\u00A4\u009F \u00E0\u00A5\u008D (A. 3.2.123) onceagain. The vivak\u00E1\u00B9\u00A3\u00C4\u0081 will continue to have vartam\u00C4\u0081na in it. As such, A.3.2.123 can get reapplied continuously. While producing the effect of in-serting la\u00E1\u00B9\u00AD, A. 3.2.123 could create a housekeeping tag, say \u00E2\u0080\u009C(3.2.123)\u00E2\u0080\u009D. IfA. 3.2.123 were to apply only in the absence of housekeeping attribute\u00E2\u0080\u009C(3.2.123)\u00E2\u0080\u009D, the reapplication could be controlled using a suitably codedslfTest function. Such Housekeeping is also part of the environment. Justlike Wiwakshaa, it is also represented as a list of Tags. The entire rep-resentation of Environment is shown in Listing 12. It is a parameterizedtype that expects a type t to create an environment from. EnvironmentTrace is a concrete type which is given an alias of TheEnv.16 Sohoni and Kulkarnidata Environment t = Env { envWiwakshaa :: Wiwakshaa, envHsekpg :: Housekeeping, envTrace :: t}type TheEnv = Environment TraceListing 12The Environment6 PhasingSamuel Johnson said that language is the dress of thought. Indeed, P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099sgenerative grammar derives a correct utterance from an initial thought pat-tern. The seeds of the finished linguistic forms are sowed very early in theprocess of derivation. Morphemes are gradually introduced depending oncertain conditions and are ultimately transformed into final speech forms.It seems that linguistic forms pass through definite stages. This is a crudeapproximation of the derivation process: laying down the seed form fromsemantic information in the vivak\u00E1\u00B9\u00A3\u00C4\u0081, producing a\u00E1\u00B9\u0085gas, producing padas andfinally making some adjustments for the samhit\u00C4\u0081 mode of utterance. At eachstep along the way, there could be several s\u00C5\u00ABtras that may apply. Grammar-ians call this situation a prasa\u00E1\u00B9\u0085ga.10 However, only one s\u00C5\u00ABtra can be appliedin a prasa\u00E1\u00B9\u0085ga. When the most suitable s\u00C5\u00ABtra gets applied it is said to havebecome prav\u00E1\u00B9\u009Btta. To make the resolution of a prasa\u00E1\u00B9\u0085ga relatively simple, s\u00C5\u00AB-tras apparently belonging to the latter stages should not get applied earlierin the derivation, even if they have scope to apply.Phasing is a method to minimize the number of s\u00C5\u00ABtras that participate ina prasa\u00E1\u00B9\u0085ga. Those sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras, which form the basis of certain adhik\u00C4\u0081ras\u00C5\u00ABtras, are deferred until later in the derivation process. For instance, \u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u0083\u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00EE\u0086\u0096\u00E0\u00A4\u0095\u00E0\u00A4\u00B7 \u00EE\u008C\u0083\u00E0\u00A4\u0083 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00E0\u00A4\u0082\u00E0\u00A4\u00B9\u00E0\u00A4\u00A4\u00E0\u00A4\u00BE (A. 1.4.109) creates a basis for the samhit\u00C4\u0081y\u00C4\u0081m adhik\u00C4\u0081ra whichbegins from \u00E0\u00A4\u00A4\u00E0\u00A4\u00AF\u00E0\u00A5\u008B\u00EE\u0088\u0096\u00E0\u00A4\u00BE \u00EE\u008C\u0083\u00E0\u00A4\u00B5\u00E0\u00A4\u00BF\u00E0\u00A4\u009A \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00E0\u00A4\u0082\u00E0\u00A4\u00B9\u00E0\u00A4\u00A4\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A5\u008D(A. 8.2.108). Similarly, \u00E0\u00A4\u00AF\u00EE\u0089\u00BB\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D\u00E0\u00A5\u0082\u00EE\u0084\u009A\u00E0\u00A4\u00AF\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00BF\u00E0\u00A4\u00A7\u00EE\u0089\u00A8\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00A6\u00E0\u00A5\u0082\u00EE\u0084\u009A\u00E0\u00A4\u00AF\u00E0\u00A4\u00BD\u00E0\u00A5\u0087\u00EE\u0081\u00A2\u00E0\u00A4\u00AE \u00E0\u00A5\u008D(A. 1.4.13) applies the sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 a\u00E1\u00B9\u0085ga to something when there is apratyaya after it. The sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 a\u00E1\u00B9\u0085ga creates a basis for the adhik\u00C4\u0081ra s\u00C5\u00ABtra\u00E0\u00A4\u0085\u00EE\u0081\u00A2\u00EE\u0089\u00BD (A. 6.4.1). If the s\u00C5\u00ABtras, which apply certain sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s, are suppressed10See Abhyankar and Shukla (1961) pages 271 and 273Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 17data Phase a = Phase { phsName :: String -- name of phase, phsNums :: [a] -- sutras in the phase}phases = [Phase \"pada\" [SutraNumber 1 4 14],Phase \"samhitaa\" [SutraNumber 1 4 109]]Listing 13Definition of Phasein the beginning of the derivation and are released subsequently, other vidhis\u00C5\u00ABtras operating within certain adhik\u00C4\u0081ras will not participate in an untimelyprasa\u00E1\u00B9\u0085ga. For example, phasing \u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u0083 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00EE\u0086\u0096\u00E0\u00A4\u0095\u00E0\u00A4\u00B7 \u00EE\u008C\u0083\u00E0\u00A4\u0083 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00E0\u00A4\u0082\u00E0\u00A4\u00B9\u00E0\u00A4\u00A4\u00E0\u00A4\u00BE (A. 1.4.109) will defers\u00C5\u00ABtras like \u00E0\u00A4\u0089\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00EE\u0084\u0087\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00A8\u00E0\u00A4\u00A6\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00EE\u0084\u0087\u00EE\u0089\u00BD \u00EE\u008A\u0080\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4\u00E0\u00A4\u0083 (A. 8.4.66) till a later time.A Phase has a name and contains a list of s\u00C5\u00ABtras which make up thatphase. Listing 13 defines a Phase and creates a list called phases contain-ing two phases\u00E2\u0080\u0093\u00E2\u0080\u009Cpada\u00E2\u0080\u009D and \u00E2\u0080\u009Csamhitaa\u00E2\u0080\u009D. The way phases is defined, padaformation phase (due to A. 1.4.14) and samhit\u00C4\u0081 formation phase (due to A.1.4.109) will be deferred till a later time.7 The Process of DerivationThe details of Sutras are collected in a list called the ashtadhyayi. Forbrevity, a small representation is shown in Listing 14.ashtadhyayi :: [Sutra]ashtadhyayi = [S1_3_9.details,S1_3_78.details,S1_4_99.details,S1_4_100.details]Listing 14ashtadhyayi - a list of s\u00C5\u00ABtras18 Sohoni and Kulkarni-- remove phases from the ashtadhyayiashtWithoutPhases = filterSutras(predByPhases allPhases)ashtadhyayi-- generate the word form using phasesgenerateUsingPhases :: TheEnv -> [Sutra] -> [Phase Sutra]-> TheEnvgenerateUsingPhases env sutras phases =foldl' (gen sutras) newEnv phaseswheregen sutras env phase = generate env (phsNums phase ++ sutras)newEnv = generate env sutras-- the returned environment will contain the derivationfinalEnv = generateUsingPhases env ashtWithoutPhases allPhasesListing 15Generation using phasesBefore the process of derivation begins, s\u00C5\u00ABtras which are part of somephase, are removed from the ashtadhyayi. The generation of derived formswill continue as long as applicable s\u00C5\u00ABtras are found in the ashtadhyayi.When s\u00C5\u00ABtras are no longer applicable, s\u00C5\u00ABtras from a phase are added tothe ashtadhyayi. Adding one phase back to the ashtadhyayi holds thepossibility of new s\u00C5\u00ABtras becoming applicable. The process of derivationcontinues once again until no more s\u00C5\u00ABtras are applicable. The process ofadding back s\u00C5\u00ABtras from other phases continues until there are no morephases left to add. Listing 15 shows this way of using phases to generatethe derived form.Function generateUsingPhases uses the generate function to actu-ally advance the derivation. Given an Environment and a set of Sutras,the process of generating a linguistic form will consist of picking out all theapplicable Sutras that have prasa\u00E1\u00B9\u0085ga. The Sutras should get prioritizedso that only one Sutra can become prav\u00E1\u00B9\u009Btta. The chosen Sutra should beinvoked to produce a new Environment. This process can continue untilComputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 19generate :: TheEnv -> [Sutra] -> TheEnvgenerate env sutras| null (envWiwakshaa env) || null (envTrace env)|| null sutras = env| otherwise =if (isJust chosen)then generate newEnv sutraselse traceShow sutras envwhere list = choose env sutraschosen = prioritize env (testBattery env) listnewEnv = invoke env chosenListing 16Generation of a derived formno more Sutras apply in which case the derivation steps are shown usingtraceShow. The function generate embodies this logic as shown in Listing16.7.1 Choosing the Applicable s\u00C5\u00ABtrasGiven the list ashtadhyayi as defined in Section 7 and a starterEnvironment, each and every Sutra in the ashtadhyayi is tested forapplicability. The applicability test is shown in Listing 17. A Sutra will bechosen if it clears two-stage condition checking. In the initial stage, slfTestchecks if any Housekeeping attributes in the Environment prevent theSutra from applying. If the initial stage is cleared, the second stage invokesthe condTest function of the Sutra. condTest checks the Trace in theEnvironment for existence of conditions specific to the s\u00C5\u00ABtra. In case theconditions exist, condTest returns a collection of Tags and a function, sayeff, to produce the effects. See Section 4.1 to read more about slfTest andcondTest.The function choose, shown in Listing 17, uses the test described above.All Sutras in the ashtadhyayi, for which test returns an effects functioneff, are collected and returned as a list. All s\u00C5\u00ABtras in the list are applicableand have a prasa\u00E1\u00B9\u0085ga. This list of s\u00C5\u00ABtras has to be prioritized so that onlyone s\u00C5\u00ABtra can be invoked.20 Sohoni and Kulkarnichoose :: TheEnv -> [Sutra] -> [(Sutra, [Tag], Effect)]choose env ss =[fromJust r | r <- res, isJust (r) == True]whereres = map appDetails sst = envTrace envappDetails :: Sutra -> Maybe (Sutra, [Tag], Effect)appDetails sut = case (test env sut) of(_, Nothing) -> Nothing(conds, Just eff)-> Just (sut, conds, eff)test :: TheEnv -> Sutra -> ([Tag],Maybe Effect)test e s | null (envTrace e) = ([], Nothing)| otherwise = if slfTest testIfc ethen condTest testIfc eelse ([], Nothing)where testIfc = testing sListing 17Choosing the applicable s\u00C5\u00ABtras8 Prioritizing s\u00C5\u00ABtrasAs the derivation progresses, many s\u00C5\u00ABtras can form a prasa\u00E1\u00B9\u0085ga, for theirtriggering conditions are satisfied in the environment. It is the responsibilityof the grammar to decide which s\u00C5\u00ABtra becomes prav\u00E1\u00B9\u009Btta by ensuring that thederivation does not loop continuously.8.1 Avoiding CyclesIt may so happen that of all the s\u00C5\u00ABtras in a prasa\u00E1\u00B9\u0085ga, the one that hasbeen chosen to become prav\u00E1\u00B9\u009Btta, say Sc, produces an Environment, sayEi, that already exists in the Trace. In case Ei is reproduced, a cyclewill be introduced which will cause the derivation to not terminate. Whileprioritizing, such s\u00C5\u00ABtras, as producing an already produced Environment,must be filtered out.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 218.2 Competitions for Conflict ResolutionThe chosen s\u00C5\u00ABtras can be thought to compete with one another. If there aren s\u00C5\u00ABtras there will be (n-1) competitions. The first and the second s\u00C5\u00ABtrawill compete against one another. The winner among the two will competeagainst the third and so on until we are left with only one s\u00C5\u00ABtra. The s\u00C5\u00ABtrawhich triumphs becomes prav\u00E1\u00B9\u009Btta for it is the strongest among all thosewhich had a prasa\u00E1\u00B9\u0085ga. This view of conflict resolution is shown in Figure 1.S) to Sn are competing s\u00C5\u00ABtras.Figure 1Competitions among s\u00C5\u00ABtrasWe model competition as a match between two entities. The match canend in a draw or produce a winner. As shown in Listing 18, Competition isdefined as an abstract type which contains a Resolution. The Resolutiongives the Result and has a provision to note a Reason for that specificoutcome of the match.The actual competition is represented as a function which takes twoSutras and produces a Resolution as shown in Listing 19.8.3 Competitions and BiasesA s\u00C5\u00ABtra Si is eliminated as soon as it looses out to another s\u00C5\u00ABtra Sj andnever participates in any other competition in the prasa\u00E1\u00B9\u0085ga. One mightobject to this methodology of conducting competitions by suggesting that22 Sohoni and Kulkarni{-Following are abstract types.Concrete realizations such as 'Conflict Sutra' and'Resolution Sutra' are used in code.-}data Conflict a = Conflict a a (Resolution a)data Resolution a = Resolution (Result a) Reasondata Result a = Draw | Winner atype Reason = StringListing 18The Conflict between s\u00C5\u00ABtrastype Competition a = Sutra -> Sutra -> Resolution SutraListing 19Match between s\u00C5\u00ABtrasSi could have debarred another s\u00C5\u00ABtra Sk later on, therefore it is importantto keep Si in the fray. In fact, the objector could claim that all s\u00C5\u00ABtras mustcompete with one another before a s\u00C5\u00ABtra can become prav\u00E1\u00B9\u009Btta. We note thatthe objector\u00E2\u0080\u0099s method of holding competitions would have been useful if thecompetition between Si and Sk would produce a random winner every time.In fact, the competitions in this grammar are biased. They don\u00E2\u0080\u0099t give boththe s\u00C5\u00ABtras equal chance of winning and this is intentional in the design ofP\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammar. In the presence of biases, what is the use of conductingfair competitions? Therefore the proposed method of holding competitionsshould be acceptable.The biases are introduced by the maxim \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A5\u008D\u00E0\u00A4\u0089\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082\u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083.11 One s\u00C5\u00ABtra is stronger than another one by way of four tests, namely,paratva, nityatva, antara\u00E1\u00B9\u0085gatva and apav\u00C4\u0081datva. The paratva test says that,among any two s\u00C5\u00ABtras, the one which is placed later in the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABwins. According to the nityatva test, among two competing s\u00C5\u00ABtras, onethat has prasa\u00E1\u00B9\u0085ga inspite of the other being applied first, is the winner. The11See Kielhorn (1985):\u00E0\u00A5\u00A7\u00E0\u00A5\u00AF, paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 38.Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 23testBattery :: TheEnv -> [Competition Sutra]testBattery e | null (envTrace e) = []| otherwise = [ (scareTest e), (apawaada e), (antaranga e), (nitya e), (para e)]where trace = envTrace e-- various tests. details not shown for brevityscareTest :: TheEnv -> Competition Sutrapara :: TheEnv -> Competition Sutranitya :: TheEnv -> Competition Sutraantaranga :: TheEnv -> Competition Sutraapawaada :: TheEnv -> Competition SutraListing 20A battery of tests to choose a winning s\u00C5\u00ABtraprinciple of antara\u00E1\u00B9\u0085gatva dictates that of two conflicting s\u00C5\u00ABtras, the one thatwins, relies on relatively fewer nimittas (conditions) or nimittas which areinternal to something. Finally, the test of apav\u00C4\u0081datva teaches that, amongtwo conflicting s\u00C5\u00ABtras, a special s\u00C5\u00ABtra wins over a general one to preventniravak\u00C4\u0081\u00C5\u009Batva (total inapplicability) of the special s\u00C5\u00ABtra. The four tests aresuch that a latter one is stronger determiner of the winning s\u00C5\u00ABtra than theprior ones. Test of apav\u00C4\u0081datva has highest priority and paratva has thelowest. If the test of apav\u00C4\u0081datva produces a winner the other three testsneed not be applied. If antara\u00E1\u00B9\u0085gatva produces a winner, the other two testsneed not be administered. If nityatva produces a winner we need not checkparatva. In the worst-case scenario all the four tests have to be applied oneafter another to a pair of conflicting s\u00C5\u00ABtras.To seek a winning s\u00C5\u00ABtra among two that compete with each other, aprioritization function administers a battery of tests, beginning with apav\u00C4\u0081-datva (See Listing 20).24 Sohoni and KulkarniscareTest :: TheEnv -> Sutra -> Sutra -> Resolution SutrascareTest e s1@(Samjnyaa _ _) _ =Resolution (Winner s1) \"SCARE\"scareTest e _ s2@(Samjnyaa _ _) =Resolution (Winner s2) \"SCARE\"scareTest e s1 s2 = Resolution Draw \"SCARE\"Listing 21SCARE Test8.4 S\u00C5\u00ABtra-Conflict Assessment and Resolution Extension(SCARE)The maxim \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A4\u0089\u00E0\u00A5\u008D\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082 \u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083, introduces four methods ofconflict resolution as explained in Section 8.3. In tradition, these meth-ods expect that sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s have already been applied by resorting to eitherk\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a or yathodde\u00C5\u009Bapak\u00E1\u00B9\u00A3a. However, computation differs from tra-dition, in that the assumptions made in traditional approach need to beexplicitly executed in computation. In a computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, meth-ods introduced by the maxim will work provided that all sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s havealready applied. Somehow sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081s need to apply before any of the meth-ods in the maxim have applied. SCARE prioritizes sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras higherthan any other type of s\u00C5\u00ABtra. When any other type of s\u00C5\u00ABtra competes with asa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtra, the latter wins. In case sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras compete, all of them willeventually get a chance to apply. Since SCARE is required to differentiatebetween sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 and non-sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 s\u00C5\u00ABtras, the Sutra datatype has an explicitvalue constructor called Samjnyaa. Listing 21 shows the implementationof scareTest.8.5 Determination of apav\u00C4\u0081da s\u00C5\u00ABtrasAn apav\u00C4\u0081da relationship holds between a pair of s\u00C5\u00ABtras when the generalprovision of one s\u00C5\u00ABtra is overridden by the special provision of another s\u00C5\u00ABtra.Arbitrary pairs of s\u00C5\u00ABtras may not always have an apav\u00C4\u0081da relation betweenthem. There has to be a reason for an apav\u00C4\u0081da relationship to exist betweentwo s\u00C5\u00ABtras. The ni\u00E1\u00B9\u00A3edha and niyama s\u00C5\u00ABtras suggest something which canComputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 25run counter to what other s\u00C5\u00ABtras say. Therefore they are called apav\u00C4\u0081das ofother s\u00C5\u00ABtras.A ni\u00E1\u00B9\u00A3edha s\u00C5\u00ABtra, say \u00E0\u00A4\u00A8 \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00AD\u00EE\u0080\u0092\u00E0\u00A5\u008C \u00E0\u00A4\u00A4\u00EE\u0089\u00BB\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00E0\u00A4\u0083 (A. 1.3.4), is considered an apav\u00C4\u0081daof the general one such as \u00E0\u00A4\u00B9\u00E0\u00A4\u00B2 \u00E0\u00A5\u008D\u00E0\u00A4\u0085\u00EE\u0085\u00B5\u00E0\u00A4\u00AE \u00E0\u00A5\u008D (A. 1.3.3). A ni\u00E1\u00B9\u00A3edha s\u00C5\u00ABtra directlyadvises against taking an action suggested by another s\u00C5\u00ABtra.12 Anothertype of rule, the niyama s\u00C5\u00ABtra, does the work of regulating some operationlaid down by another rule.13 A niyama s\u00C5\u00ABtra such as \u00E0\u00A4\u00A7\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4\u00E0\u00A5\u008B\u00E0\u00A4\u0083 \u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00EE\u0086\u0096\u00E0\u00A4\u00BF\u00E0\u00A4\u00AE\u00EE\u0084\u0087\u00EE\u0089\u00BD \u00E0\u00A4\u008F\u00E0\u00A4\u00B5 (A.6.1.80) is considered an apav\u00C4\u0081da of a more general rule such as \u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A5\u008B \u00E0\u00A4\u00BF\u00E0\u00A4\u00AF \u00E0\u00A5\u0082\u00EE\u0084\u009A\u00E0\u00A4\u00AF\u00E0\u00A5\u0087(A. 6.1.79).Wherever an apav\u00C4\u0081da relationship holds between two s\u00C5\u00ABtras, it is staticand unidirectional. Also, there can be apav\u00C4\u0081das of apav\u00C4\u0081das. Since thecorpus of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s rules is well known, the apav\u00C4\u0081da relationships can beworked out manually to build a mapping of the apav\u00C4\u0081das. Thus, A. 1.3.4 isconsidered an apav\u00C4\u0081da of A. 1.3.3 and A. 6.1.80 is considered an apav\u00C4\u0081da ofA. 6.1.79.Listing 22 shows how apav\u00C4\u0081das are setup. The datatype Apawaadarecords a main s\u00C5\u00ABtra and notes all the apav\u00C4\u0081das of the main s\u00C5\u00ABtra.allApawaadas is a list of Apawaadas which is converted into a mapapawaadaMap for faster access. Function isApawaada is used to checkif s\u00C5\u00ABtra s1 is an apav\u00C4\u0081da of s\u00C5\u00ABtra s2. The apawaada function in Listing 20uses the isApawaada function to return an apav\u00C4\u0081da or returns a draw ifapav\u00C4\u0081da relationship does not exist between s\u00C5\u00ABtras.9 VisibilityThe canonical term siddha means that the output from one s\u00C5\u00ABtra A is visibleand can possibly trigger another s\u00C5\u00ABtra B. The effects of A are visible to B.The s\u00C5\u00ABtras in the trip\u00C4\u0081d\u00C4\u00AB are not siddha in the sap\u00C4\u0081dasapt\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB . Thismeans that even if s\u00C5\u00ABtras from the trip\u00C4\u0081d\u00C4\u00AB have applied in a derivation, theresults produced are not visible to the s\u00C5\u00ABtras in the sap\u00C4\u0081dasapt\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB . Inother words, the State produced by certain s\u00C5\u00ABtras, cannot trigger s\u00C5\u00ABtras ina specific region of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.Listing 23 shows the visibility as dictated by \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00BD\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00B8\u00EE\u0085\u0083\u00E0\u00A4\u00AE \u00E0\u00A5\u008D(A. 8.2.1). Theentire derivation thus far is captured as a Trace containing frames Fn thruF1. The problem is this: Given a s\u00C5\u00ABtra, say Si, the latest Frame Fj is12\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00B8\u00E0\u00A4\u00BD\u00E0\u00A5\u0082\u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF \u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00E0\u00A4\u00A7\u00E0\u00A5\u0087\u00E0\u00A4\u0095\u00E0\u00A4\u00B8\u00E0\u00A4\u00BD\u00E0\u00A5\u0082\u00E0\u00A4\u0082 \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00E0\u00A4\u00A7\u00E0\u00A5\u0087\u00E0\u00A4\u00B8\u00E0\u00A4\u00BD\u00E0\u00A5\u0082\u00E0\u00A4\u00AE \u00E0\u00A5\u00A4\u00E0\u00A5\u008D13\u00E0\u00A4\u00BF\u00E0\u00A4\u00B8\u00EE\u0085\u0083\u00E0\u00A5\u0087 \u00E0\u00A4\u00B8\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4 \u00E0\u00A4\u0086\u00E0\u00A4\u00B0\u00EE\u0087\u00B4\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00A3\u00E0\u00A5\u008B \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00BF\u00E0\u00A4\u00A7\u00E0\u00A4\u0083 \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00AF\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF \u00E0\u00A4\u0095\u00EE\u0088\u00A0\u00E0\u00A4\u00A4\u00E0\u00A5\u0087 \u00E0\u00A5\u00A426 Sohoni and Kulkarnidata Apawaada = Apawaada { apwOf :: SutraNumber, apwApawaadas :: [SutraNumber]}allApawaadas = [Apawaada (SutraNumber 1 3 3)[(SutraNumber 1 3 4)]]apawaadaMap = M.fromList [(apwOf a, apwApawaadas a)| a <- allApawaadas]-- | Is sutra s1 an apawaaad of s2?isApawaada :: SutraNumber -> SutraNumber -> BoolisApawaada s1 s2 = case M.lookup s2 apawaadaMap ofJust sns -> s1 `elem` sns_ -> FalseListing 22Setting up apav\u00C4\u0081dasComputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB 27visibleFrame :: Sutra -> Trace -> Maybe TheFramevisibleFrame _ [] = NothingvisibleFrame s (f@(Frame _ (Just s1) _ _):fs) =if curr < s8_2_1then if top < s8_2_1then Just felse visibleFrame s fselse if top > currthen visibleFrame s fselse Just fwhere s8_2_1 = SutraNumber 8 2 1curr = number stop = number s1Listing 23Visibilityrequired such that Fj contains s\u00C5\u00ABtra Sk whose output State is visible toSi. If Si is from the sap\u00C4\u0081dasapt\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB , frames from the head of the Traceare skipped as long as they contain s\u00C5\u00ABtras from trip\u00C4\u0081d\u00C4\u00AB . Such frames willbe asiddha for Si. If, however, Si is from trip\u00C4\u0081d\u00C4\u00AB , frames from the traceare skipped so long as the s\u00C5\u00ABtra in the frame has a number higher than Si.This is because in the trip\u00C4\u0081d\u00C4\u00AB latter s\u00C5\u00ABtras are asiddha to prior ones. Theforegoing logic is implemented in function visibleFrame.10 Conclusion and Future WorkA computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB can potentially become a good pedagogicalresource to teach grammatical aspects of Sanskrit. As a building block, acomputational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB can be used to build other systems like morpho-logical analyzers and dependency parsers.From an implementation perspective, resorting to k\u00C4\u0081ryak\u00C4\u0081lapak\u00E1\u00B9\u00A3a allowsparibh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 and adhik\u00C4\u0081ra s\u00C5\u00ABtras to be merged into the logic of vidhi or niyamas\u00C5\u00ABtras. While displaying the derivation after it is completed, the concernedvidhi and niyama s\u00C5\u00ABtras can always enlist numbers of the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 andadhik\u00C4\u0081ra s\u00C5\u00ABtras which they have united with. This allows for faster de-28 Sohoni and Kulkarnivelopment of the computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB without having to implementseemingly trivial s\u00C5\u00ABtras in the paradigm used to implement s\u00C5\u00ABtras as notedin Section 4.2.It could be suboptimal to represent all grammatical entities asComponents having Warnas and Tags. To a certain extent, it increasesthe number of Tags applied to Components. For example, since pratyayasare expressed as a Components, a \u00E2\u0080\u0098pratyaya\u00E2\u0080\u0099 Tag has to be applied. Func-tional languages have extremely powerful type systems. To leverage thetype system, Components can be implemented as typed grammatical en-tities. For instance, a Component can have more value constructors forUpasarga, Dhaatu and Pratyaya, to name a few.Abstracting the input as vivak\u00E1\u00B9\u00A3\u00C4\u0081 does away with the need of applyingheuristics to determine what needs to be derived. However, our choice ofrepresenting Wiwakshaa as a simple list of Tags is an oversimplification.The vivak\u00E1\u00B9\u00A3\u00C4\u0081 could be a complex psycholinguistic artifact which may containelements such as the k\u00C4\u0081rakas, hints for using specific dh\u00C4\u0081tus, argument struc-ture of dh\u00C4\u0081tu etc. It may have a sophisticated data structure. A thoroughstudy of semantic aspects of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is necessary to know what vivak\u00E1\u00B9\u00A3\u00C4\u0081may look like in its entirety.In the \u00E0\u00A4\u00AC\u00E0\u00A4\u00BE\u00E0\u00A4\u00A7\u00E0\u00A4\u00AC\u00E0\u00A5\u0080\u00E0\u00A4\u009C\u00E0\u00A5\u0082\u00E0\u00A4\u0095\u00E0\u00A4\u00B0\u00E0\u00A4\u00A3\u00E0\u00A4\u00AE \u00E0\u00A5\u008D, Kielhorn (1985) discusses many variations undereach of the four methods introduced in the para-nitya paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081. Thosevariations should be plugged into the framework discussed in Section 8. Yet,there may be instances of derivations where the maxim \u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE \u00E0\u00A5\u008D\u00E0\u00A4\u0089\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082 \u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083may not be honoured and a better way is required to resolve s\u00C5\u00AB-tra conflicts in totatality. Effects such as viprati\u00E1\u00B9\u00A3edha and p\u00C5\u00ABrvaviprati\u00E1\u00B9\u00A3edhaalso need to be included in the SCARE.ReferencesAbhyankar, K. V. and J. M. Shukla. 1961. A Dictionary Of Sanskrit Gram-mar. Oriental Institute, Baroda.Ajotikar, Tanuja, Anuja Ajotikar, and Peter M. Scharf. 2016. \u00E2\u0080\u009CSome issues informalizing the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u009D. In: Sanskrit and Computational Linguistics,Select papers presented in the \u00E2\u0080\u0098Sanskrit and the IT World\u00E2\u0080\u0099 Section atthe 16th World Sanskrit Conference, (June 28 - 2 July 2015) Bangkok,Thailand. Ed. by Amba Kulkarni. DK Publishers Distributors Pvt. Ltd(New Delhi), pp. 103\u00E2\u0080\u0093124. isbn: 978-81-932319-0-6.Goyal, Pawan, Amba P. Kulkarni, and Laxmidhar Behera. 2008. \u00E2\u0080\u009CCom-puter Simulation of Astadhyayi: Some Insights\u00E2\u0080\u009D. In: Sanskrit Computa-tional Linguistics, First and Second International Symposia Rocquen-court, France, October 29-31, 2007 Providence, RI, USA, May 15-17,2008 Revised Selected and Invited Papers. Ed. by G\u00C3\u00A9rard P. Huet, AmbaP. Kulkarni, and Peter M. Scharf. Springer, pp. 139\u00E2\u0080\u0093161. doi: 10.1007/978-3-642-00155-0_5. url: http://dx.doi.org/10.1007/978-3-642-00155-0_5.Hyman, Malcolm D. 2009. \u00E2\u0080\u009CFrom p\u00C4\u0081\u00E1\u00B9\u0087inian sandhi to finite state calculus\u00E2\u0080\u009D.In: Sanskrit Computational Linguistics. Springer, pp. 253\u00E2\u0080\u0093265.Kielhorn, F. 1985. Paribh\u00C4\u0081\u00E1\u00B9\u00A3endu\u00C5\u009Bekhara of N\u00C4\u0081goj\u00C4\u00ABbha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADa. Parimala Publica-tions, Delhi.Mishra, Anand. 2008. \u00E2\u0080\u009CSimulating the Paninian System of Sanskrit Gram-mar\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics, First and Second Interna-tional Symposia Rocquencourt, France, October 29-31, 2007 Providence,RI, USA, May 15-17, 2008 Revised Selected and Invited Papers. Ed.by G\u00C3\u00A9rard P. Huet, Amba P. Kulkarni, and Peter M. Scharf. Springer,pp. 127\u00E2\u0080\u0093138.\u00E2\u0080\u0094 2009. \u00E2\u0080\u009CModelling the Grammatical Circle of the Paninian System of San-skrit Grammar\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics, Third Interna-tional Symposium, Hyderabad, India, January 15-17, 2009. Proceedings.Ed. by Amba P. Kulkarni and G\u00C3\u00A9rard P. Huet. Springer, pp. 40\u00E2\u0080\u009355.\u00E2\u0080\u0094 2010. \u00E2\u0080\u009CModelling Astadhyayi: An Approach Based on the Methodologyof Ancillary Disciplines (Vedanga)\u00E2\u0080\u009D. In: Sanskrit Computational Linguis-tics - 4th International Symposium, New Delhi, India, December 10-12,2010. Proceedings. Ed. by Girish Nath Jha. Springer, pp. 239\u00E2\u0080\u0093258.2930 Sohoni and KulkarniPatel, Dhaval and Shivakumari Katuri. 2016. \u00E2\u0080\u009CPrakriy\u00C4\u0081pradar\u00C5\u009Bin\u00C4\u00AB - An opensource subanta generator\u00E2\u0080\u009D. In: Sanskrit and Computational Linguistics,Select papers presented in the \u00E2\u0080\u0098Sanskrit and the IT World\u00E2\u0080\u0099 Section atthe 16th World Sanskrit Conference, (June 28 - 2 July 2015) Bangkok,Thailand. Ed. by Amba Kulkarni. DK Publishers Distributors Pvt. Ltd(New Delhi), pp. 195\u00E2\u0080\u0093221. isbn: 978-81-932319-0-6.Scharf, P. 2009. \u00E2\u0080\u009CRule selection in the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD \u00C4\u0081dhy\u00C4\u0081 yi or Is P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammarmechanistic\u00E2\u0080\u009D. In: Proceedings of the 14th World Sanskrit Conference,Kyoto University, Kyoto.Scharf, Peter M. 2009. \u00E2\u0080\u009CModeling p\u00C4\u0081\u00E1\u00B9\u0087inian grammar\u00E2\u0080\u009D. In: Sanskrit Compu-tational Linguistics. Springer, pp. 95\u00E2\u0080\u0093126.Scharf, Peter M. 2016. \u00E2\u0080\u009CAn XML formalization of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u009D. In: San-skrit and Computational Linguistics, Select papers presented in the \u00E2\u0080\u0098San-skrit and the IT World\u00E2\u0080\u0099 Section at the 16th World Sanskrit Conference,(June 28 - 2 July 2015) Bangkok, Thailand. Ed. by Amba Kulkarni.DK Publishers Distributors Pvt. Ltd (New Delhi), pp. 77\u00E2\u0080\u0093102. isbn:978-81-932319-0-6.Sohoni, Samir Janardan and Malhar A. Kulkarni. 2016. \u00E2\u0080\u009CCharacter En-coding for Computational A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u009D. In: Sanskrit and ComputationalLinguistics, Select papers presented in the \u00E2\u0080\u0098Sanskrit and the IT World\u00E2\u0080\u0099Section at the 16th World Sanskrit Conference, (June 28 - 2 July 2015)Bangkok, Thailand. Ed. by Amba Kulkarni. DK Publishers DistributorsPvt. Ltd (New Delhi), pp. 125\u00E2\u0080\u0093155. isbn: 978-81-932319-0-6.Vedantakeshari, Swami Prahlad Giri. 2001. P\u00C4\u0081\u00E1\u00B9\u0087iniya A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB S\u00C5\u00ABtrap\u00C4\u0081\u00E1\u00B9\u00ADha.Krishnadas Academy, Varanasi. 2nd edition.PAIAS: P\u00C4\u0081\u00E1\u00B9\u0087ini A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Interpreter As aServiceSarada Susarla, Tilak M. Rao and Sai SusarlaAbstract: It is widely believed that P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is the most ac-curate grammar and word-generation scheme for a natural languagethere is. Several researchers attempted to validate this hypothesisby analyzing A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s s\u00C5\u00ABtra system from a computational / algo-rithmic angle. Many have attempted to emulate A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s wordgeneration scheme. However, prior work has succeeded in taking onlysmall subsets of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB pertaining to specific constructs andmanually coding their logic for linguistic analysis.However, there is another school of thought that A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB itself(along with its associated corrective texts) constitutes a complete,unified, self-describing solution for word generation (k\u00E1\u00B9\u009Bt, taddhita),compounding (sam\u00C4\u0081sa) and conjugation (sandhi). In this paper, wedescribe our ongoing effort to directly compile and interpret A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s s\u00C5\u00ABtra corpus (with its associated data sets) to automate itsprak\u00E1\u00B9\u009Bti-pratyaya-based word transformation methodology, leaving outk\u00C4\u0081rakas. We have created a custom machine-interpretable languagein JSON for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, a Python-based compiler to automaticallyconvert A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras into that language, and an interpreter toreproduce A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s prakriy\u00C4\u0081 for term definitions, meta-rules andvidhis. Such an interpreter has great value in analyzing the gener-ative capability of P\u00C4\u0081\u00E1\u00B9\u0087inian grammar, assessing its completeness oranomalies and the contributions of various commentaries to the orig-inal methodology. We avoid manually supplying any data derivabledirectly from A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. Unlike existing work that aimed at fastinterpretation of rules, we focus initially on fidelity to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.We have started with a well-annotated online A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB resource.We are able to automatically enumerate the character sequences de-noted by sa\u00E1\u00B9\u0083j n\u00C4\u0081s defined in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, and determine which parib-h\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras apply to which vidhi s\u00C5\u00ABtras. We are in the process of de-veloping a generic r\u00C5\u00ABpa-siddhi engine starting from a prak\u00E1\u00B9\u009Bti-pratyaya3132 Susarla et alsequence. Our service named PAIAS1 p\u00EF\u00BF\u00BFrovides programmatic accessto A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, its data sets and their interpretation via open RESTfulAPI for third-party tool development.1 IntroductionThere is growing interest and activity in applying computing technologyto unearth the knowledge content of India\u00E2\u0080\u0099s heritage literature, especiallyin Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt language. This has led to several research efforts to produceanalysis tools for Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt language content at various levels - text, syntax,semantics and meaning Goyal, Huet, et al. (2012), Oilver Hellwig (2009),Huet (2002), Kulkarni (2016), and Kumar (2012). The word-generatingflexibility and modular nature of the Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammar makes it at onceboth simpler and difficult to produce a comprehensive dictionary for thelanguage: simpler because it allows auto-generation of numerous variants ofwords, and difficult because unbounded nature of Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt vocabulary makesa comprehensive static dictionary impractical. Yet, a dictionary is essentialfor linguistic analysis of Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt documents. P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB comes tothe rescue for Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt linguistic analysis by offering a procedural basis forword generation and compounding to produce a dynamic, semi-automateddictionary. A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is considered a monumental work in terms of its abil-ity to codify the conventions governing the usage of a natural language intoprecise, self-contained generative rules. Ever since the advent of computing,researchers have been trying to automate the rule engine of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB torealize its potential. However, due to the sheer size of the rule corpus and itscomplexity, to date, only specific subsets of its rule base have been digestedmanually to produce word-generation tools pertaining to specific grammarconstructs Goyal, Huet, et al. (2012), Krishna and Goyal (2015), Patel andKaturi (2016), and Scharf and Hyman (2009).However, this approach limits the tools\u00E2\u0080\u0099 coverage of numerous wordforms and hence their usefulness for syntax analysis of the vast Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt cor-pus. Interpreting P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB as separate subsets is complex andunnatural due to intricate interdependencies among rules and their trigger-ing conditions. P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB has a more modular, unified mechanism(prakriy\u00C4\u0081)2 for word generation via rules for joining prak\u00E1\u00B9\u009Bti (stems) with1Pronounced like \u00E2\u0080\u0098payas\u00E2\u0080\u0099 meaning milk.2In this paper, we use the IAST convention for Sanskrit words.PAIAS 33numerous pratyayas based on the word sense required. Most aspects of thejoining mechanism are common across conjugation (sandhi), compounding(sam\u00C4\u0081sa) and new word derivation (e.g., k\u00E1\u00B9\u009Bt and taddhita forms). However,commentaries such as Siddh\u00C4\u0081nta Kaumud\u00C4\u00AB (SK) have arranged the rules forthe purpose of human understanding of specific grammatical constructs. Forthe purpose of computational tools, we believe the direct interpretation ofP\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB offers a more natural and automatable approach thanSK-based approaches.With this view, we have taken up the problem of relying solely on A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB and its associated s\u00C5\u00ABtras for deriving all Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammatical opera-tions of word transformation (or r\u00C5\u00ABpa-siddhi). Our approach is to compilethe A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra text into executable rules automatically (incorporatinginterpretations made by commentaries), and to minimize the manual codingwork to be done per s\u00C5\u00ABtra. We have built a web-based service called PAIASwith a RESTful API for programmatic access to the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB engine (i.e.,the s\u00C5\u00ABtra corpus and its associated data sets) and to enable its executionfor word transformation and other purposes. We adopted a service-orientedarchitecture to cleanly isolate functionality from end-user presentation, sonumerous tools and presentation interfaces can evolve for Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammaremploying appropriate programming languages.In this paper, we describe our ongoing work and its approach, and thespecific results obtained so far. In section 2, we set the context by contrast-ing our approach to relevant earlier work. In section 3, we give an overviewof the project\u00E2\u0080\u0099s goals and guiding design principles. In section 4, we describehow we prepared the source A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB for use in PAIAS. In section 5, weexplain our methodology for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB interpretation including the high-level workflow, s\u00C5\u00ABtra compilation scheme and rule interpreter. In section 6,we outline how PAIAS enumerates several entity sets referred throughoutA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. In section 7.3, we describe our methodology to interpret parib-h\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras. In section 8, we provide details of our implementation and itsstatus. In section 9, we illustrate the operation of the interpreter by show-ing how our engine automatically expands praty\u00C4\u0081h\u00C4\u0081ras. Finally we concludeand outline future work in Section 10.34 Susarla et al2 Related WorkA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and its interpretation for Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammatical analysis andword synthesis has been studied extensively Goyal, Huet, et al. (2012),Goyal, Kulkarni, and Behera (2008), Krishna and Goyal (2015), Patel andKaturi (2016), Satuluri and Kulkarni (2013), Scharf and Hyman (2009),and Subbanna and Varakhedi (2010). For the purpose of this paper, weassume the reader is familiar with P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and its various con-cepts relevant to computational modeling. For a good overview of thoseconcepts, the reader is referred to earlier publications Goyal, Kulkarni, andBehera (2008) and Petersen and Oliver Hellwig (2016). In their A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB2.0 project, Petersen and Oliver Hellwig (2016) have developed a richly an-notated electronic representation of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB that makes it amenable toresearch and machine-processing. We have achieved a similar objective viamanual splitting of sandhis and word-separation within compounds, and bydeveloping a custom vibhakti analyzer for detecting word recurrence acrossvibhakti and vacana variations.Petersen and Soubusta (2013) have developed a digital edition of theA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. They have created a relational database schema and web-interface to support custom views and sophisticated queries. We optedfor a hierarchical key-value structure (JSON) to represent A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB asit enables a more natural navigational exploration of the text unlike a re-lational model. We feel that the size of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is small enough tofit in DRAM of modern computers making efficiency benefits of the rela-tional model less relevant. We used a document database (MongoDB) dueto the schema flexibility and extensibility it offers along with powerful nav-igational queries. Scharf (2016) developed perhaps the most comprehensiveformalization to date of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammar system including the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABin XML format with the express purpose of assisting the development of au-tomated interpreters. In his formalization, the rules are manually encodedin a pre-interpreted form via spelling out the conditions, contexts, and ac-tions of each rule. In contrast, our attempt is to derive those from the rule\u00E2\u0080\u0099stext itself. Scharf\u00E2\u0080\u0099s encoded rule information enables validation of our ruleinterpretation. We could also leverage its other databases that form part ofP\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammar ecosystem.The first step to interpret the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB is to understand the terms andmetarules that P\u00C4\u0081\u00E1\u00B9\u0087ini defines in the text itself. T. Ajotikar, A. Ajotikar,and Scharf (2015) explains some of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s techniques that an interpreterPAIAS 35needs to incorporate, and illustrated how Scharf (2016) captures them. Un-like earlier efforts at processing A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB that have manually enumeratedthe terms and their definitions including praty\u00C4\u0081h\u00C4\u0081ras Mishra (2008), ourapproach is to extract them from the text itself automatically.Several earlier efforts attempted to highlight and emulate various tech-niques used in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB for specific grammatical purposes. They typicallyselect a particular subset of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s engine and code its semanticsmanually to reproduce a specific prakriy\u00C4\u0081. For brevity, we only discuss themost recent work that comes close to ours. Krishna and Goyal (2015) havebuilt an object-oriented class hierarchy to mimic the inheritance structureof A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB rules. They have demonstrated this approach for generatingderivative nouns. Our goal and hence approach differ in two ways, namely,to interpret A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras faithfully as opposed to achieving specificnoun and verb forms, and to mechanize the process of converting s\u00C5\u00ABtras intoexecutable code to the extent possible. However, the learnings and insightsfrom earlier work on interpreting A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Mishra (2008) will apply toour work as well, and hence can be incorporated into our engine.Patel and Katuri (2016) have built a subanta generator that imitates themethod given by siddh\u00C4\u0081nta kaumud\u00C4\u00AB. Their unique contribution is a way toorder the s\u00C5\u00ABtras for more efficient interpretation. However, they also encodethe semantic meaning of individual s\u00C5\u00ABtras manually, and do not suggest amethod to mechanize s\u00C5\u00ABtra interpretation from its text directly. Satuluriand Kulkarni (2013) have attempted to generate sam\u00C4\u0081sa compounds by em-ulating the relevant subset of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. Subbanna and Varakhedi (2010)have emulated the exception model used in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. For this, they haveemulated a small subset of its s\u00C5\u00ABtras relevant to that aspect.3 Design Goals and ScopeThe objective of our project is to develop a working interpreter for P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099sA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB that emulates its methodology faithfully by mechanizing theinterpretation of s\u00C5\u00ABtra text as much as possible. To guide our design, we setthe following principles:Fidelity: We focus on reproducing the prakriy\u00C4\u0081 of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra corpus(by taking the semantic adjustments from relevant vy\u00C4\u0081khy\u00C4\u0081nas as ap-propriate). We do not focus on optimizing the interpretation enginefor speedy execution as of now.36 Susarla et alReuse: We would like to provide a powerful query interface to the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB and its data sets to enable sophisticated analytics and learningaids.Extensibility: We would also like to promote the development of an exten-sible and interoperable framework for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB by providing a pro-grammatic interface to its interpreter engine. This framework shouldsupport plugging in functionality developed by third parties in multi-ple programming languages and methodologies.The specific contributions of this paper include\u00E2\u0080\u00A2 A programmatic interface to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB with a powerful query lan-guage for sophisticated search,\u00E2\u0080\u00A2 A mechanism to automatically extract definitions of sa\u00E1\u00B9\u0083j n\u00C4\u0081s (bothstatically and dynamically defined) by interpreting their s\u00C5\u00ABtra text,\u00E2\u0080\u00A2 A machine-processable language and its interpreter to transform thebulk of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra text into executable code, and a mechanismof interpretation that tracks word transformation state persistently,and\u00E2\u0080\u00A2 An extensible framework that supports interoperability among tech-niques for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra interpretation developed by multiple re-searchers to accelerate tool development.4 Preparing theA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB for machine-processingWe have started with a well-annotated and curated online resource forA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Sarada Susarla and Sai Susarla (2012) available as a spreadsheetdue to its amenability to augmentation and scripted manipulation. Table1 outlines its schema. The spreadsheet assigns each s\u00C5\u00ABtra a canonical IDstring in the format APSSS (e.g., 11076 to denote the 76th s\u00C5\u00ABtra in 1st p\u00C4\u0081daof 1st adhy\u00C4\u0081ya). To enable machine-processing, each s\u00C5\u00ABtra is provided withits words split by sandhi and the individual words in a sam\u00C4\u0081sa separatedby hyphens and tagged with simple morphological attributes such as type(subanta, ti\u00E1\u00B9\u0085anta or avyaya), vibhakti and vacana to enable auto-extraction.The adhik\u00C4\u0081ra s\u00C5\u00ABtras are also explicitly tagged with their influence given asPAIAS 37a s\u00C5\u00ABtra range. For each s\u00C5\u00ABtra, the padas that are inherited from earlier s\u00C5\u00AB-tras through anuv\u00E1\u00B9\u009Btti are listed along with their source s\u00C5\u00ABtra id and vibhaktimodification in the anuv\u00E1\u00B9\u009Btta form if any.We auto-convert this spreadsheet into a JSON JSON (2000) dictionaryand use it as the basis for the PAIAS service. Table 2 shows an exam-ple JSON description of s\u00C5\u00ABtra 1.2.10 with all the abovementioned featuresillustrated.{\u00E2\u0080\u0098 \u00E2\u0080\u0098 Adhyaaya\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Adhyaaya # adhyAyaH\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Paada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Paada # pAdaH\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_num\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_num sU . saM . \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_krama\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_krama sU . kra . saM\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Akaaraadi_krama\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Akaaraadi_krama akArAdi kra . saM\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098Kaumudi_krama\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098Kaumudi_krama kaumudI kra . saM\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_id \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_id pUrNa sU . saM . \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_type \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_type sutralakShaNam \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098Term\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098Term saMj~nA\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Metarule \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Metarule paribhAShA\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Spec ia l_case \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Spec ia l_case atideshaH \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 I n f l u en c e \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 I n f l u en c e adhikAraH \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Commentary\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Commentary vyAkhyAnam\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_text \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_text sutram \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda padchChedaH\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 SamasacCheda\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 SamasacCheda samAsachChedaH\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Anuvrtt i \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 Anuvrtt i pada sut ra #anuvRRitti\u00E2\u0088\u0092padam sutra\u00E2\u0088\u0092sa~NkhyA\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda_notes\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda_notes\u00E2\u0080\u009D}Table 1A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Database Schema38 Susarla et al\u00E2\u0080\u0098 \u00E2\u0080\u009812010\u00E2\u0080\u009D : {\u00E2\u0080\u0098 \u00E2\u0080\u0098 Adhyaaya\u00E2\u0080\u009D : 1 ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Paada\u00E2\u0080\u009D : 2 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_num\u00E2\u0080\u009D : 10 ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_krama\u00E2\u0080\u009D : 12010 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 Akaaraadi_krama\u00E2\u0080\u009D : 3913 ,\u00E2\u0080\u0098 \u00E2\u0080\u0098Kaumudi_krama\u00E2\u0080\u009D : 2613 ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_id \u00E2\u0080\u009D : \u00E2\u0080\u009D 1 . 2 . 1 0 \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_type \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 at ideshaH \u00E2\u0080\u009D ] , \u00E2\u0080\u0098 \u00E2\u0080\u0098 Commentary\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 . . . \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_text \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 halantAchcha | \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda\u00E2\u0080\u009D : [{ \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 halantAt \u00E2\u0080\u009D , \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada_spl i t \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 hal\u00E2\u0088\u0092antAt \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 type \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 subanta \u00E2\u0080\u009D , \u00E2\u0080\u0098 \u00E2\u0080\u0098 vachana\u00E2\u0080\u009D : 1 ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 5 } ,{ \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 cha \u00E2\u0080\u009D , \u00E2\u0080\u0098 \u00E2\u0080\u0098 type \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 avyaya \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 vachana\u00E2\u0080\u009D : 0 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 0 }] ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 Anuvrtt i \u00E2\u0080\u009D : [{ \u00E2\u0080\u0098 \u00E2\u0080\u0098 su t ra \u00E2\u0080\u009D : 12005 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 padas \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 k i t \u00E2\u0080\u009D ] } ,{ \u00E2\u0080\u0098 \u00E2\u0080\u0098 su t ra \u00E2\u0080\u009D : 12008 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 padas \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 san \u00E2\u0080\u009D ] } ,{ \u00E2\u0080\u0098 \u00E2\u0080\u0098 su t ra \u00E2\u0080\u009D : 12009 , \u00E2\u0080\u0098 \u00E2\u0080\u0098 padas \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 ikaH \u00E2\u0080\u009D , \u00E2\u0080\u0098 \u00E2\u0080\u0098 j h a l \u00E2\u0080\u009D ] }]}Table 2A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Database SchemaPAIAS 395 A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Interpreter: High-level WorkflowThe input to our A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB engine is a sequence of tagged lexemes that wecall pada descriptions or pada_descs, and its output is one or more alter-nate sequences of tagged lexemes denoting possible word transformations. Apada_desc is a dictionary of tag-value pairs in JSON format. The tag valuescan be user-supplied (in case of human-assisted analysis), system-inferredor user-endorsed. Table 2 shows a s\u00C5\u00ABtra description where the padacChedasection represents the s\u00C5\u00ABtra as a sequence of pada_descs. An example tagis a pada \u00E2\u0080\u0098type\u00E2\u0080\u0099 such as subanta, ti\u00E1\u00B9\u0085anta, nip\u00C4\u0081ta, avyaya, pratyaya, sa\u00E1\u00B9\u0083j n\u00C4\u0081etc. Each application of an A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra, referred to in this paper as\u00E2\u0080\u0098prakriy\u00C4\u0081\u00E2\u0080\u0099, modifies the input pada_desc sequence by adding/editing/re-moving pada_descs to denote word-splitting, morphing or merging opera-tions based on the semantics of the s\u00C5\u00ABtra.For instance, when applying the sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtra for the sa\u00E1\u00B9\u0083j n\u00C4\u0081 \u00E2\u0080\u0098it\u00E2\u0080\u0099, wetag a given input word with a tag called \u00E2\u0080\u0098it_varnas\u00E2\u0080\u0099 whose value is the offsetof the \u00E2\u0080\u0098it\u00E2\u0080\u0099 var\u00E1\u00B9\u0087as found in the word. Such tags can also be used to storeintermediary states of grammar transformations for reference by subsequentoperations. This persistent tracking of the transformation state of wordsoffers the power required for interpreting A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras faithfully. Theneed for such facility to carry over internal state from one s\u00C5\u00ABtra to anotherhas been identified by Patel and Katuri (2016) for their subanta generatortool.In order to transform tagged lexemes, the first step is to identify the oc-currence of pre-determined patterns in input lexemes which are denoted byexplicit terms (sa\u00E1\u00B9\u0083j n\u00C4\u0081s) in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras. Instead of handcoding thosepattern definitions into the interpreter, our approach is to automatically ex-tract them from the s\u00C5\u00ABtras themselves and interpret them at prakriy\u00C4\u0081 time.To accomplish this, we have devised a machine-processable representationscheme for various s\u00C5\u00ABtras, which we elucidate in Section 6. Likewise, A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB provides a set of 23 paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras or metarules (augmented withapprox. 100 more metarules in paribh\u00C4\u0081\u00E1\u00B9\u00A3endu-\u00C5\u009Bekhara treatise). The purposeof these metarules is to modify the operation of the vidhi s\u00C5\u00ABtras. In Section7.3, we describe how we manually encode metarules as (condition, action)pairs such that we can mechanically determine which paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s apply to agiven A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra. Since paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s operate on vidhi s\u00C5\u00ABtra texts, theirapplicability can be pre-determined a priori instead of at prakriy\u00C4\u0081 time.40 Susarla et alAt a high-level, our approach to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB interpretation involves thefollowing manual steps:\u00E2\u0080\u00A2 Splitting of sandhis and sam\u00C4\u0081sa in the s\u00C5\u00ABtra text to facilitate detectionof word recurrences.\u00E2\u0080\u00A2 Enumerating the anuv\u00E1\u00B9\u009Btta padas of each s\u00C5\u00ABtra (from earlier s\u00C5\u00ABtras).\u00E2\u0080\u00A2 Coding of each of the 23 paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras into condition-action pairs.\u00E2\u0080\u00A2 Preparation of a vibhakti suffix table that covers subantas of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB for use in morphological analysis of s\u00C5\u00ABtra words.\u00E2\u0080\u00A2 Coding of custom functions to interpret the meaning of some technicalwords used in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB but not defined therein (e.g., adar\u00C5\u009Banam,\u00C4\u0081di\u00E1\u00B8\u00A5, antyam, etc.).\u00E2\u0080\u00A2 Adding special case interpretation of the s\u00C5\u00ABtra \u00E2\u0080\u0098halantyam\u00E2\u0080\u0099 as \u00E2\u0080\u0098haliantyam\u00E2\u0080\u0098 to break the cyclic dependency for praty\u00C4\u0081h\u00C4\u0081ra generation (asexplained in Section 9).In the next section, we outline the preprocessing steps needed for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABinterpretation.5.1 Preparing the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB InterpreterTo prepare the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB engine for rule interpretation, we automaticallypreprocess the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB database as follows.1. We first perform morphological analysis of each word of every s\u00C5\u00ABtra toextract its pr\u00C4\u0081tipadikam. This is required to identify recurrence of aword in the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB regardless of vibhakti and vacana variations.We describe this step in Section 5.2.2. For each s\u00C5\u00ABtra, we generate a canonical s\u00C5\u00ABtra text that we refer toas its \u00E2\u0080\u0098mah\u00C4\u0081v\u00C4\u0081kya\u00E2\u0080\u0099 as follows. We expand the s\u00C5\u00ABtra\u00E2\u0080\u0099s text to in-clude all anuv\u00E1\u00B9\u009Btta-padas inherited from earlier s\u00C5\u00ABtras. We representa mah\u00C4\u0081v\u00C4\u0081kya as a list of pada descriptions, each with its morphologi-cal analysis output.3. We auto-extract the definitions of all terms (sa\u00E1\u00B9\u0083j n\u00C4\u0081s) used in theA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB. These come in different forms and need to be handleddifferently. We describe this step in Section 6.PAIAS 414. We compile sa\u00E1\u00B9\u0083j n\u00C4\u0081 and vidhi s\u00C5\u00ABtras into rules to be interpreted atprakriy\u00C4\u0081 time.5. We determine the vidhi s\u00C5\u00ABtras where each of the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras apply,by checking their preconditions. Then we modify the vidhi s\u00C5\u00ABtras.6. Finally, we create an optimized condition hierarchy for rule-checkingby factoring the preconditions for all the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras into adecision tree. This step is still work in progress and is out of the scopeof this paper.5.2 Morphological Analysis of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB WordsTo detect recurrences of a s\u00C5\u00ABtra word at different locations in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB(e.g. through anuv\u00E1\u00B9\u009Btti or embedded references) despite their vibhakti andvacana variations, we need the pr\u00C4\u0081tipadikam of each word. Since most A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB words are subantas specific to the treatise and not found in typicalSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt dictionaries, we developed a simple suffix-based vibhakti analyzerfor this purpose. Since our A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB spreadsheet already has words taggedby their vibhakti and vacana, our vibhakti analyzer takes them as hints andfinds possible matches in predefined vibhakti tables based on various commonword-endings. Once a match is found, it emits an analysis that includes pos-sible alternative pr\u00C4\u0081tipadikas along with their li\u00E1\u00B9\u0085ga and word-ending. Westore the subanta analysis for each A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB word and store it in thepadacCheda section of the s\u00C5\u00ABtra JSON entry for ready reference.With this technique, we are able to determine the pr\u00C4\u0081tipadikam accu-rately for all the technical terms used in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and use it for detectingword recurrences. Though the tool generated multiple options for li\u00E1\u00B9\u0085ga, thatambiguity doesn\u00E2\u0080\u0099t hurt for our purpose of detecting word recurrence sincethe pr\u00C4\u0081tipadikam is unique.Then we extract term (sa\u00E1\u00B9\u0083j n\u00C4\u0081) definitions from the sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras asdescribed in Section 6.6 Extracting Sa\u00E1\u00B9\u0083j\u00C3\u00B1\u00C4\u0081 Definitions from A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s word transformation method consists of detecting pre-definedpatterns denoted by sa\u00E1\u00B9\u0083j n\u00C4\u0081s or terms and performing associated transfor-mations. These terms denote either a set of explicitly enumerated memberelements or conditional expressions to be dynamically checked at prakriy\u00C4\u008142 Susarla et altime. Hence during preprocessing stage, we create a term definition databasewhere each term is defined as a list of member elements or as a compiledrule. The A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB itself defines four types of terms (in increasing orderof extraction complexity):1. Terms defined in an adhik\u00C4\u0081ra cum sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtra denoting a set of ele-ments enumerated explicitly in subsequent vidhi s\u00C5\u00ABtras (e.g., pratyaya,taddhita, nip\u00C4\u0081ta). The term itself becomes an anuv\u00E1\u00B9\u009Btta pada in all vidhis\u00C5\u00ABtras in its adhik\u00C4\u0081ra. Moreover, those s\u00C5\u00ABtras refer to both the termand its member elements in pratham\u00C4\u0081 vibhakti. Hence, to extract thedefinition, we pick pratham\u00C4\u0081 vibhakti terms excluding (a) terms definedin sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and (b) a manually prohibited list ofwords meant to convey colloquial meaning. With this method, wewere able to successfully extract all the pratyayas, taddhitas, nip\u00C4\u0081tas,sam\u00C4\u0081sas from A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB automatically, and verify their authenticitywith those identified in Dikshita (2010).2. Terms with explicit name defined in sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras denoting a setof elements enumerated explicitly (e.g., v\u00E1\u00B9\u009Bddhi). In this case, the el-ements are listed in pratham\u00C4\u0081 vibhakti and hence can be extracteddirectly from the s\u00C5\u00ABtra.3. Terms with an explicit name defined in sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras, and denotinga condition to be computed at prayoga time (e.g., \u00E2\u0080\u0098it\u00E2\u0080\u0099).4. Terms whose name (sa\u00E1\u00B9\u0083j n\u00C4\u0081) and its members (sa\u00E1\u00B9\u0083j ni) are bothdynamically computed quantities (e.g., praty\u00C4\u0081h\u00C4\u0081ras such as \u00E2\u0080\u0098ac\u00E2\u0080\u0099 and\u00E2\u0080\u0098hal\u00E2\u0080\u0099)The last two variants require interpreting the s\u00C5\u00ABtra text in different waysas described in Section 7. During A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB prakriy\u00C4\u0081, when an inputpada_desc needs to be checked for match with a sa\u00E1\u00B9\u0083j n\u00C4\u0081, we have twooptions. If the sa\u00E1\u00B9\u0083j n\u00C4\u0081 is represented as a list of member elements, then allpadas in the pada_desc that appear as members of a list will be annotatedwith the sa\u00E1\u00B9\u0083j n\u00C4\u0081 name. For instance, when checking the word \u00E2\u0080\u0098r\u00C4\u0081ma\u00E2\u0080\u0099 against\u00E2\u0080\u0098gu\u00E1\u00B9\u0087a\u00E2\u0080\u0099 sa\u00E1\u00B9\u0083j n\u00C4\u0081, the pada_desc of the \u00E2\u0080\u0098r\u00C4\u0081ma\u00E2\u0080\u0099 word will be augmented witha property named \u00E2\u0080\u0098gu\u00E1\u00B9\u0087a\u00E2\u0080\u0099 whose value is the index of the last \u00E2\u0080\u0098a\u00E2\u0080\u0099 alphabet inthe \u00E2\u0080\u0098r\u00C4\u0081ma\u00E2\u0080\u0099 word, i.e, 3.PAIAS 437 Compiling Rules from s\u00C5\u00ABtrasIn this section, we describe a mechanism we have devised for transformingA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtras into machine-interpretable rules. This is a core contri-bution of our work as it enables direct interpretation of s\u00C5\u00ABtras. We haveimplemented this mechanism for sa\u00E1\u00B9\u0083j n\u00C4\u0081 and paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras first because(i) they form a crucial prerequisite to the rest of the engine, and (ii) becausethey have not been studied by earlier work as systematically as the inter-pretation of vidhi s\u00C5\u00ABtras. Moreover A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras state themechanism for interpreting vidhi s\u00C5\u00ABtras explicitly. Our vidhi s\u00C5\u00ABtra interpre-tation is a work in progress and will not be discussed further.Our s\u00C5\u00ABtra interpretation scheme is based on some grammatical conven-tions we have observed in the s\u00C5\u00ABtra text. First, the bulk of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABs\u00C5\u00ABtras employ subanta padas and avyayas, and use ti\u00E1\u00B9\u0085anta padas sparingly.Second, saptam\u00C4\u00AB vibhakti is used to indicate the context/condition in whicha s\u00C5\u00ABtra applies. Third, each s\u00C5\u00ABtra word either denotes a sa\u00E1\u00B9\u0083j n\u00C4\u0081 (or itsnegation), a predefined function (e.g. \u00C4\u0081di\u00E1\u00B8\u00A5, antyam, etc), a set of terms orcharacters (e.g., cu\u00E1\u00B9\u00AD\u00C5\u00AB), or joining avyayas (e.g., saha, ca, v\u00C4\u0081 etc.). Finally,whenever multiple words of the same vibhakti occur, one of them is a vi\u00C5\u009Be\u00E1\u00B9\u00A3yaand others are its vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as.Hence we compile each A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB s\u00C5\u00ABtra into a hierarchical expressionvia specially defined operators, called a rule. The rule is either atomic, i.e.,a pada_desc describing a s\u00C5\u00ABtra word, or composite, coded as a JSON list.If pada_desc, it is interpreted via a special operator called INTERPRETdescribed below. If list, its first element is a predefined operator, and the restare its arguments. The arguments can in turn be pada_descs or sub-rules.7.1 Special OperatorsThis section describes several special operators that we have defined to formrules.INTERPRET: This operator interprets a single s\u00C5\u00ABtra word on the inputtagged lexemes. It checks whether the pattern it denotes (e.g., \u00E2\u0080\u0098it\u00E2\u0080\u0099)applies to any of the input lexemes (e.g., \u00E2\u0080\u0098hal\u00E2\u0080\u0099). If the s\u00C5\u00ABtra word is asa\u00E1\u00B9\u0083j n\u00C4\u0081 (e.g., \u00E2\u0080\u0098it\u00E2\u0080\u0099), the interpreter interprets its rule recursively andtags the lexemes with the result (e.g., locations of \u00E2\u0080\u0098it\u00E2\u0080\u0099 var\u00E1\u00B9\u0087as). If thes\u00C5\u00ABtra word is one of a predefined set of words with special meaning,the interpreter invokes its detector function. For instance, \u00C4\u0081di\u00E1\u00B8\u00A5 of the44 Susarla et allexeme \u00E2\u0080\u0098hal\u00E2\u0080\u0099 is \u00E2\u0080\u0098h\u00E2\u0080\u0099. Otherwise, the s\u00C5\u00ABtra word denotes a set of terms orcharacters, in which case the interpreter returns whether the lexemeis a member of that set. For instance, if the s\u00C5\u00ABtra word \u00E2\u0080\u0098cu\u00E1\u00B9\u00AD\u00C5\u00AB\u00E2\u0080\u0099 is inter-preted against input lexeme \u00E2\u0080\u0098c\u00E2\u0080\u0099, it returns True because cu\u00E1\u00B9\u00AD\u00C5\u00AB denotesconsonants in the ca-varga and \u00E1\u00B9\u00ADa-varga, i.e., {ca, cha, ja, jha, na, \u00E1\u00B9\u00ADa,\u00E1\u00B9\u00ADha, \u00CC\u00A3a, \u00E1\u00B8\u008Dha, \u00E1\u00B9\u0087a}.If the s\u00C5\u00ABtra word is a negation such as ataddhita or apratyaya\u00E1\u00B8\u00A5, IN-TERPRET applies the negation before returning the result.The following conjunct operators are used to compose larger rules:PIPE: This operator takes a sequence of rules and invokes them by feedingthe output of a rule invocation as input to the subsequent rule. Thepipe exits when one of the stages return empty, and returns the outputof the last rule. This is used to process all s\u00C5\u00ABtra padas of the same vib-hakti. For instance, when interpreting the pipe [\u00E2\u0080\u0098PIPE\u00E2\u0080\u0099, \u00E2\u0080\u0098\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099, \u00E2\u0080\u0098cu\u00E1\u00B9\u00AD\u00C5\u00AB\u00E2\u0080\u0099]against the lexeme \u00E2\u0080\u0098hal\u00E2\u0080\u0099, the output of \u00E2\u0080\u0098\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099 namely \u00E2\u0080\u0098h\u00E2\u0080\u0099 is comparedagainst \u00E2\u0080\u0098cu\u00E1\u00B9\u00AD\u00C5\u00AB\u00E2\u0080\u0099 membership, which returns None.IF: This operator takes a list of rules. If all of them evaluate to somethingother than None, it returns the input tagged lexeme set as is, otherwiseNone. This is used to encapsulate saptam\u00C4\u00AB vibhakti padas that indicatethe enabling context for a s\u00C5\u00ABtra to apply (e.g., upade\u00C5\u009Be). It is alsoused to encapsulate a \u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB vibhakti padam in a sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtra whichindicates the sa\u00E1\u00B9\u0083j ni (definition of a sa\u00E1\u00B9\u0083j n\u00C4\u0081).PAIR: This operator represents a pair of elements mentioned in a s\u00C5\u00ABtraalong with the avyaya \u00E2\u0080\u0098saha\u00E2\u0080\u0099. The pratham\u00C4\u0081 vibhakti pada sequencedescribes the first element and the t\u00E1\u00B9\u009Bt\u00C4\u00ABy\u00C4\u0081 vibhakti pada sequence de-notes the last element. An example is shown in Figure 3 for the s\u00C5\u00ABtra\u00E2\u0080\u0098\u00C4\u0081dirantyena sahet\u00C4\u0081\u00E2\u0080\u0099. If the pair denotes a sequence, then it describesthe first and last elements.GEN_SAMJNA: This operator handles a sa\u00E1\u00B9\u0083j n\u00C4\u0081 defined as a computedexpression such as \u00E2\u0080\u0098ak\u00E2\u0080\u0099, \u00E2\u0080\u0098hal\u00E2\u0080\u0099, \u00E2\u0080\u0098sup\u00E2\u0080\u0099 etc. It matches the input taggedlexeme against the rule for the sa\u00E1\u00B9\u0083j n\u00C4\u0081. Upon a match, it invokesthe rule for the \u00E2\u0080\u0098sa\u00E1\u00B9\u0083j ni\u00E2\u0080\u0099 by passing the sa\u00E1\u00B9\u0083j n\u00C4\u0081 as a parameter.For instance, the s\u00C5\u00ABtra \u00E2\u0080\u0098\u00C4\u0081dirantyena sahet\u00C4\u0081\u00E2\u0080\u0099 gets compiled into thefollowing rule:PAIAS 45\u00E2\u0080\u00A2 [GEN_SAMJNA, {\u00E2\u0080\u0098sa\u00E1\u00B9\u0083j ni\u00E2\u0080\u0099 : None, \u00E2\u0080\u0098sa\u00E1\u00B9\u0083j n\u00C4\u0081\u00E2\u0080\u0099 : [PAIR, \u00E2\u0080\u0098\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099,[PIPE, \u00E2\u0080\u0098antyam\u00E2\u0080\u0099, \u00E2\u0080\u0098it\u00E2\u0080\u0099] ] } ]Since there is no explicit sa\u00E1\u00B9\u0083j ni in this s\u00C5\u00ABtra, we apply a specialpraty\u00C4\u0081h\u00C4\u0081ra expander function to generate the character sequence fromthe input pair. Figure 3 shows the hierarchical representation of thes\u00C5\u00ABtra text that leads to the above rule.PROHIBIT: This function prohibits applying a s\u00C5\u00ABtra under a matchedsub-condition. It is not the same as negation of a match condition.This is used to process the s\u00C5\u00ABtra word \u00E2\u0080\u0098na\u00E2\u0080\u0099 in a s\u00C5\u00ABtra. For instance,when processing the \u00E2\u0080\u0098it\u00E2\u0080\u0099 sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtra \u00E2\u0080\u0098na vibhaktau tusm\u00C4\u0081h\u00E2\u0080\u0099, as shownin Figure 1, this function removes any \u00E2\u0080\u0098it\u00E2\u0080\u0099 var\u00E1\u00B9\u0087a tagging done whileprocessing its sub-conditions denoted by the words \u00E2\u0080\u0098hal\u00E2\u0080\u0099, \u00E2\u0080\u0098antyam\u00E2\u0080\u0099 and\u00E2\u0080\u0098tusm\u00C4\u0081h\u00E2\u0080\u0099.Figure 1Rule Hierarchy for s\u00C5\u00ABtra \u00E2\u0080\u0098na vibhaktau tusm\u00C4\u0081h\u00E2\u0080\u0099.7.2 Compiling Rules from Sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtrasSa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras come in two flavors:1. those that explicitly list a term and its definition in pratham\u00C4\u0081 vibhaktiwith some other conditions e.g., (upade\u00C5\u009Be pratyayasya \u00C4\u0081di\u00E1\u00B8\u00A5 it) cu\u00E1\u00B9\u00AD\u00C5\u00AB,and46 Susarla et alFigure 2Rule Hierarchy for s\u00C5\u00ABtra \u00E2\u0080\u0098cu\u00E1\u00B9\u00AD\u00C5\u00AB\u00E2\u0080\u0099.Figure 3Rule Hierarchy for s\u00C5\u00ABtra \u00E2\u0080\u0098\u00C4\u0081dirantyena sahet\u00C4\u0081\u00E2\u0080\u0099.PAIAS 472. those that describe the sa\u00E1\u00B9\u0083j n\u00C4\u0081 name as a computed expression andits denoted items in \u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00ABi vibhakti, e.g., \u00C4\u0081di\u00E1\u00B8\u00A5 antyena it\u00C4\u0081 saha (svasyar\u00C5\u00ABpasya).In the above representation of the s\u00C5\u00ABtra texts, we denote words that areinherited by anuv\u00E1\u00B9\u009Btti in parentheses.Figure 2 shows a tree representation for a s\u00C5\u00ABtra of the first flavor. Asa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtra has three components: the term denoted by the edge labeled\u00E2\u0080\u0098kA\u00E2\u0080\u0099, its definition denoted by \u00E2\u0080\u0098kasya\u00E2\u0080\u0099, and the context in which the defini-tion applies, denoted by \u00E2\u0080\u0098kutra\u00E2\u0080\u0099. saptam\u00C4\u00AB vibhakti padas in the s\u00C5\u00ABtra denotethe context. The sa\u00E1\u00B9\u0083j n\u00C4\u0081 term, if explicitly present in the s\u00C5\u00ABtra will be inpratham\u00C4\u0081 vibhakti with its defining words also in pratham\u00C4\u0081. In that case, anexecutable version of the s\u00C5\u00ABtra is a representation of the tree as a hierarchi-cal list. All words in the same vibhakti in the s\u00C5\u00ABtra have vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-vi\u00C5\u009Be\u00E1\u00B9\u00A3yarelation.Figure 3 shows the tree representation for a s\u00C5\u00ABtra of the second flavor. Inthis, the \u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB vibhakti word should be interpreted as a filter or qualifier forthe pratham\u00C4\u0081 vibhakti words, not as the sa\u00E1\u00B9\u0083j ni (the definition). This s\u00C5\u00ABtraalso has t\u00E1\u00B9\u009Bt\u00C4\u00ABy\u00C4\u0081 vibhakti padas joined by \u00E2\u0080\u0098saha\u00E2\u0080\u0099, which can be interpreted assequence generation operator. This operator takes pratham\u00C4\u0081 vibhakti padasto indicate start of the sequence and t\u00E1\u00B9\u009Bt\u00C4\u00ABy\u00C4\u0081 vibhakti padas to indicate end ofsequence.Figure 4 shows another s\u00C5\u00ABtra of the second flavor, where the sa\u00E1\u00B9\u0083j n\u00C4\u0081 andsa\u00E1\u00B9\u0083j ni definition are both parameterized. It has a pada that is a negationof pratyaya. Matching this s\u00C5\u00ABtra requires a pre-defined function that checksif given word is a pratyaya. The sa\u00E1\u00B9\u0083j ni in this case is the set of savar\u00E1\u00B9\u0087asof x.7.3 Interpreting Paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtrasA paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtra describes how to interpret s\u00C5\u00ABtras whose text matches agiven condition. It can be represented as a set of actions guarded by con-ditions. It is applied to transform vidhi s\u00C5\u00ABtras prior to compiling them intorules. The condition indicates the s\u00C5\u00ABtra to which the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 applies, ex-pressed in terms of properties of the words in the s\u00C5\u00ABtra text. The actionsindicate how the matching s\u00C5\u00ABtra should be transformed prior to interpreta-tion.For instance, consider the s\u00C5\u00ABtra \u00E2\u0080\u0098\u00C4\u0081dyantau \u00E1\u00B9\u00ADakitau\u00E2\u0080\u0099. It describes that ifa vidhi s\u00C5\u00ABtra contains \u00E2\u0080\u0098\u00E1\u00B9\u00ADit\u00E2\u0080\u0099 or \u00E2\u0080\u0098kit\u00E2\u0080\u0099 pada (i.e., which has var\u00E1\u00B9\u0087a \u00E2\u0080\u0098\u00E1\u00B9\u00AD\u00E2\u0080\u0099 or \u00E2\u0080\u0098k\u00E2\u0080\u0099 as48 Susarla et alFigure 4Rule Hierarchy for s\u00C5\u00ABtra \u00E2\u0080\u0098a\u00E1\u00B9\u0087udit savar\u00E1\u00B9\u0087asya c\u00C4\u0081pratyaya\u00E1\u00B8\u00A5\u00E2\u0080\u0099.\u00E2\u0080\u0098it\u00E2\u0080\u0099), then the s\u00C5\u00ABtra should be expanded to add extra words \u00E2\u0080\u0098\u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhyantasya\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099 or \u00E2\u0080\u0098\u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhyantasya anta\u00E1\u00B8\u00A5\u00E2\u0080\u0099 respectively to the s\u00C5\u00ABtra text. We express thislogic in our interpreter by coding the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 as shown in Algorithm 1.Here, the condition is expressed as a rule-matching query with new operatorsSAMJNA, PRATYAYA, IT_ENDING, AND and NOT. It applies if a vidhis\u00C5\u00ABtra has an individual pada (in its padacCheda) which is not a sa\u00E1\u00B9\u0083j n\u00C4\u0081 orpratyaya word, but has \u00E2\u0080\u0098\u00E1\u00B9\u00AD\u00E2\u0080\u0099 as its \u00E2\u0080\u0098it\u00E2\u0080\u0099 var\u00E1\u00B9\u0087a. In that case, the s\u00C5\u00ABtra\u00E2\u0080\u0099s textmust be augmented with two additional words \u00E2\u0080\u0098\u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhyantasya \u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099. Thatmatching vidhi s\u00C5\u00ABtra will then be compiled into a rule and then interpretedduring word transformation prakriy\u00C4\u0081 time. Similarly, if the s\u00C5\u00ABtra word has\u00E2\u0080\u0098k\u00E2\u0080\u0099 as its \u00E2\u0080\u0098it\u00E2\u0080\u0099 var\u00E1\u00B9\u0087a, then the additional words will be \u00E2\u0080\u0098\u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhyantasya anta\u00E1\u00B8\u00A5\u00E2\u0080\u0099.As another example, the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtra \u00E2\u0080\u0098midaco\u00E2\u0080\u0099nty\u00C4\u0081t para\u00E1\u00B8\u00A5\u00E2\u0080\u0099 has thefollowing effect. If a vidhi s\u00C5\u00ABtra has \u00E2\u0080\u0098m\u00E2\u0080\u0099 as \u00E2\u0080\u0098it\u00E2\u0080\u0099 (other than in a pratyaya orsa\u00E1\u00B9\u0083j n\u00C4\u0081 word), then the words \u00E2\u0080\u0098\u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhyantasya anty\u00C4\u0081t aca\u00E1\u00B8\u00A5 para\u00E1\u00B8\u00A5\u00E2\u0080\u0099 should beadded to the s\u00C5\u00ABtra text.We manually define the condition action pairs for each of the 23 parib-h\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras as shown in Algorithm 1. At initiation time, the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABengine checks these conditions on each of the vidhi s\u00C5\u00ABtras of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB andtransforms them accordingly prior to the rule compilation step.In our current implementation, we have handcoded the condition--action pairs for about half of the paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081s of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, and are ableto successfully identify the s\u00C5\u00ABtras to which they apply. This is because ofour ability to identify the various listable sa\u00E1\u00B9\u0083j n\u00C4\u0081s, which are needed informulating the conditions. However, processing of vidhi s\u00C5\u00ABtras is futurework.PAIAS 49Algorithm 1 Codifying paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtra \u00E2\u0080\u0098\u00C4\u0081dyantau \u00E1\u00B9\u00ADakitau\u00E2\u0080\u0099.par ibhasa_defs = {. . .s t r (11046) : [{\u00E2\u0080\u0098 \u00E2\u0080\u0098 cond\u00E2\u0080\u009D : {\u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda\u00E2\u0080\u009D :[AND, [ [NOT, SAMJNA] , [NOT, PRATYAYA] ,[ IT_ENDING, { \u00E2\u0080\u0098 \u00E2\u0080\u0098 varna \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098T\u00E2\u0080\u009D } ] ] ] ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_type \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 vidhiH \u00E2\u0080\u009D ]} ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 a c t i on \u00E2\u0080\u009D : [sutra_add_pada , { \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 ShaShThyantasya \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 6 , \u00E2\u0080\u0099 type \u00E2\u0080\u0099 : \u00E2\u0080\u0098 \u00E2\u0080\u0098 subanta \u00E2\u0080\u009D} ,sutra_add_pada , { \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098AdiH\u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 1 , \u00E2\u0080\u0098 type \u00E2\u0080\u0099 : \u00E2\u0080\u0098 \u00E2\u0080\u0098 subanta \u00E2\u0080\u009D}]} ,{\u00E2\u0080\u0098 \u00E2\u0080\u0098 cond\u00E2\u0080\u009D : {\u00E2\u0080\u0098 \u00E2\u0080\u0098 PadacCheda\u00E2\u0080\u009D :[AND, [ [NOT, SAMJNA] , [NOT, PRATYAYA] ,[ IT_ENDING, { \u00E2\u0080\u0098 \u00E2\u0080\u0098 varna \u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 k\u00E2\u0080\u009D} ] ] ] ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 sutra_type \u00E2\u0080\u009D : [ \u00E2\u0080\u0098 \u00E2\u0080\u0098 vidhiH \u00E2\u0080\u009D ]} ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 a c t i on \u00E2\u0080\u009D : [sutra_add_pada , { \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 ShaShThyantasya \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 6 , \u00E2\u0080\u0098 type \u00E2\u0080\u0099 : \u00E2\u0080\u0098 \u00E2\u0080\u0098 subanta \u00E2\u0080\u009D} ,sutra_add_pada , { \u00E2\u0080\u0098 \u00E2\u0080\u0098 pada\u00E2\u0080\u009D : \u00E2\u0080\u0098 \u00E2\u0080\u0098 antaH \u00E2\u0080\u009D ,\u00E2\u0080\u0098 \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u009D : 1 , \u00E2\u0080\u0098 type \u00E2\u0080\u0099 : \u00E2\u0080\u009D subanta \u00E2\u0080\u009D}]}] ,. . .}50 Susarla et al8 ImplementationWe have implemented PAIAS as a Python library and a Flask web microser-vice that provides RESTful API access to its functionality. The API-basedinterface provides a flexible, reliable and reusable foundation for open col-laborative development of higher-level tools and user interfaces in multipleprogramming languages to accelerate research on A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, while ensur-ing interoperability of those tools. The code is available on GitHub athttps://github.com/vedavaapi/ashtadhyayi and will soon be availableas a pip installable module.The module comes bundled with the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB spreadsheet along withdh\u00C4\u0081tu p\u00C4\u0081tha and other associated data sets. Upon first invocation after aclean install, the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB module computes mah\u00C4\u0081v\u00C4\u0081kyas for all s\u00C5\u00ABtras,compiles s\u00C5\u00ABtras into machine-executable rules, builds sa\u00E1\u00B9\u0083j n\u00C4\u0081 definitions,extracts listable terms such as Pratyayas etc, and transforms vidhi s\u00C5\u00ABtras byapplying the matching paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 s\u00C5\u00ABtras. It then stores all this derived statepersistently in JSON format in a MongoDB database. This enables fastaccess to the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB engine subsequently. Our current implementationdoes not handle the transformation and interpretation of vidhi s\u00C5\u00ABtras yet.Figure 2 shows an example Python script using the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB library.We have also devised a powerful query interface to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB for sophis-ticated search. Figure 3 shows a Python script to find unique words thatoccur in the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB grouped by vibhakti. The query condition can bespecified as a JSON dictionary supporting a hierarchical specification ofdesired attributes as shown in this example.PAIAS 51Algorithm 2 Example usage of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB Service.from ashtadhyayi . u t i l s import *from ashtadhyayi import *de f a ( ) :r e turn ashtadhyayi ( )# Provide mahaavaakya o f g iven sut ra as i nd i v i dua l wordsde f mahavakya ( sutra_id ) :s = a ( ) . sut ra ( sutra_id )re turn s [ \u00E2\u0080\u0098 mahavakya_padacCheda \u00E2\u0080\u0099 ]# Show a l l v idh i su t r a s where g iven par ibhasha sut raapp l i e sde f par ibhasha ( sutra_id ) :p = get_paribhasha ( sutra_id )i f not p :p r i n t \u00E2\u0080\u0098 \u00E2\u0080\u0098 Error : Paribhasha d e s c r i p t i o n not found f o r \u00E2\u0080\u009D ,sutra_idreturn [ ]matches = [ ]f o r s_id in p . matching_sutras ( ) :s = a ( ) . sut ra ( s_id )out = d i c t ( ( k , s [ k ] ) f o r k in( \u00E2\u0080\u0098 sutra_krama \u00E2\u0080\u0099 , \u00E2\u0080\u0098 sutra_text \u00E2\u0080\u0099 , \u00E2\u0080\u0098 sutra_type \u00E2\u0080\u0099 ) )matches . append ( out )re turn matches# Return praatipadikam of g iven pada tak ing v ibhakt i andvachana h in t s .de f p raat ipad ika ( pada , v ibhakt i =1, vachana=1):pada = san s c r i p t . t r a n s l i t e r a t e ( pada , s an s c r i p t . SLP1 ,s an s c r i p t .DEVANAGARI)return Subanta . ana lyze ({ \u00E2\u0080\u0098 pada \u00E2\u0080\u0099 : pada ,\u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u0099 : v ibhakt i ,\u00E2\u0080\u0098 vachana \u00E2\u0080\u0099 : vachana })52 Susarla et alAlgorithm 3 Example script to extract unique words of various vibhaktisin A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.from ashtadhyayi . cmdline import *a = ashtadhyayi ( )my f i l t e r = { \u00E2\u0080\u0098PadacCheda \u00E2\u0080\u0099 : { \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u0099 : 1 } }r e s u l t = {}f o r v in [ 0 , 1 , 2 , 3 , 4 , 5 , 6 , 7 ] :my f i l t e r [ \u00E2\u0080\u0098 PadacCheda \u00E2\u0080\u0099 ] [ \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u0099 ] = vv_padas = [ ]f o r s_id in a . su t r a s ( my f i l t e r ) :s = a . sut ra ( s_id )f o r p in s [ \u00E2\u0080\u0098 PadacCheda \u00E2\u0080\u0099 ] :i f \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u0099 not in p :cont inuei f p [ \u00E2\u0080\u0098 v ibhakt i \u00E2\u0080\u0099 ] != v :cont inuev_padas . append (p [ \u00E2\u0080\u0098 pada \u00E2\u0080\u0099 ] )r e s u l t [ v ] = sor t ed ( s e t ( v_padas ) )pr in t_d ic t ( r e s u l t )PAIAS 539 Evaluation: Putting it all togetherIn this section, we illustrate the automated operation of the PAIAS enginevia by showing the expansion of a praty\u00C4\u0081h\u00C4\u0081ra \u00E2\u0080\u0098ac\u00E2\u0080\u0099 into its denoted var\u00E1\u00B9\u0087as.praty\u00C4\u0081h\u00C4\u0081ra expansion is an essential step to enable the rest of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABinterpretation.s\u00C5\u00ABtra#(APSSS)Mah\u00C4\u0081v\u00C4\u0081kyaRepresentationGenerated rule13002 it = upade\u00C5\u009Be(7) ac(1)anun\u00C4\u0081sika\u00E1\u00B8\u00A5(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D], [PIPE, \u00E2\u0080\u009Cac\u00E2\u0080\u009D,\u00E2\u0080\u009Canun\u00C4\u0081sika\u00E1\u00B8\u00A5\u00E2\u0080\u009D]]13003.1 it = upade\u00C5\u009Be(7) hali(7)antyam(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D, \u00E2\u0080\u009Chal\u00E2\u0080\u009D], [PIPE,\u00E2\u0080\u009Cantyam\u00E2\u0080\u009D]]13003.2 it = upade\u00C5\u009Be(7) hal(1)antyam(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D], [PIPE, \u00E2\u0080\u009Chal\u00E2\u0080\u009D,\u00E2\u0080\u009Cantyam\u00E2\u0080\u009D]]13004 it = upade\u00C5\u009Be(7) na(0)vibhaktau(7) tusm\u00C4\u0081\u00E1\u00B8\u00A5(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D, \u00E2\u0080\u009Cvibhakti\u00E2\u0080\u009D],[PROHIBIT, \u00E2\u0080\u009Ctu-s-ma\u00E2\u0080\u009D]]13005 it = upade\u00C5\u009Be(7) \u00C4\u0081di\u00E1\u00B8\u00A5(1)\u00E1\u00B9\u0087i\u00E1\u00B9\u00ADu\u00E1\u00B8\u008Dava\u00E1\u00B8\u00A5(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D], [PIPE, \u00E2\u0080\u009C\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u009D,\u00E2\u0080\u009C\u00E1\u00B9\u0087i-\u00E1\u00B9\u00ADu-\u00E1\u00B8\u008Du\u00E2\u0080\u009D]]13006 it = upade\u00C5\u009Be(7) \u00C4\u0081di\u00E1\u00B8\u00A5(1)\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5(1) pratyayasya(6)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D], [PIPE,\u00E2\u0080\u009Cpratyaya\u00E2\u0080\u009D, [PIPE, \u00E2\u0080\u009C\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u009D, \u00E2\u0080\u009C\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5\u00E2\u0080\u009D]]13007 it = upade\u00C5\u009Be(7) \u00C4\u0081di\u00E1\u00B8\u00A5(1)pratyayasya(6) cu\u00E1\u00B9\u00AD\u00C5\u00AB(1)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D], [PIPE,\u00E2\u0080\u009Cpratyaya\u00E2\u0080\u009D, [PIPE, \u00E2\u0080\u009C\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u009D, \u00E2\u0080\u009Ccu-\u00E1\u00B9\u00ADu\u00E2\u0080\u009D]]13008 it = upade\u00C5\u009Be(7) \u00C4\u0081di\u00E1\u00B8\u00A5(1)pratyayasya(6) la\u00C5\u009Baku(1)ataddhite(7)[PIPE, [IF, \u00E2\u0080\u009Cupade\u00C5\u009Ba\u00E2\u0080\u009D, [NOT,\u00E2\u0080\u009Ctaddhita\u00E2\u0080\u009D]], [PIPE, \u00E2\u0080\u009Cpratyaya\u00E2\u0080\u009D, [PIPE,\u00E2\u0080\u009C\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u009D, \u00E2\u0080\u009Cla-\u00C5\u009Ba-ku\u00E2\u0080\u009D]]11071 praty\u00C4\u0081h\u00C4\u0081ra = svasya(6)r\u00C5\u00ABpasya(6) \u00C4\u0081di\u00E1\u00B8\u00A5(1)antyena(3) saha(0) it\u00C4\u0081(3)[GEN_SAMJNA, {\u00E2\u0080\u0098sa\u00E1\u00B9\u0083j ni\u00E2\u0080\u0099 : None,\u00E2\u0080\u0098sa\u00E1\u00B9\u0083j n\u00C4\u0081\u00E2\u0080\u0099 : [PAIR, \u00E2\u0080\u0098\u00C4\u0081di\u00E1\u00B8\u00A5\u00E2\u0080\u0099, [PIPE,\u00E2\u0080\u0098antyam\u00E2\u0080\u0099, \u00E2\u0080\u0098it\u00E2\u0080\u0099] ] } ]11060 lopa = iti(0) adar\u00C5\u009Banam(1) [PIPE, \u00E2\u0080\u009Cadar\u00C5\u009Banam\u00E2\u0080\u009D]Table 3Mah\u00C4\u0081v\u00C4\u0081kya representations produced by PAIAS engine from specificsa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras relevant to praty\u00C4\u0081h\u00C4\u0081ra expansion.Our engine accomplishes \u00E2\u0080\u0098ac\u00E2\u0080\u0099 expansion as follows. First, it compiles allsa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras into mah\u00C4\u0081v\u00C4\u0081kyas and then into machine-interpretable rules.54 Susarla et alTable 3 shows the mah\u00C4\u0081v\u00C4\u0081kya representations and corresponding rules gen-erated by our engine for specific s\u00C5\u00ABtras relevant to praty\u00C4\u0081h\u00C4\u0081ra expansion,namely \u00E2\u0080\u0098it\u00E2\u0080\u0099 and dynamically computed sa\u00E1\u00B9\u0083j n\u00C4\u0081 names. The engine main-tains a terms_db database that caches the set of lexemes denoted by eachsa\u00E1\u00B9\u0083j n\u00C4\u0081. At start time, the engine resets this database and populates itwith the lexemes of all sa\u00E1\u00B9\u0083j n\u00C4\u0081s that are listed explicitly in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.During pada_desc tagging, whenever the occurrence of a sa\u00E1\u00B9\u0083j n\u00C4\u0081 needs tobe detected, the engine checks the terms_db cache first before attemptingto interpret the sa\u00E1\u00B9\u0083j n\u00C4\u0081\u00E2\u0080\u0099s rules.9.1 Expansion of praty\u00C4\u0081h\u00C4\u0081ra \u00E2\u0080\u0098hal\u00E2\u0080\u0099Praty\u00C4\u0081h\u00C4\u0081ras are computed sa\u00E1\u00B9\u0083j n\u00C4\u0081 names. To expand them, the engineshould be able to interpret the sa\u00E1\u00B9\u0083j n\u00C4\u0081 s\u00C5\u00ABtras for \u00E2\u0080\u0098it\u00E2\u0080\u0099, especially, the s\u00C5\u00ABtra\u00E2\u0080\u0098halantyam\u00E2\u0080\u0099. However, to break its cyclic dependency on the expansion ofpraty\u00C4\u0081h\u00C4\u0081ra \u00E2\u0080\u0098hal\u00E2\u0080\u0099, this s\u00C5\u00ABtra should be interpreted twice - first as a sam\u00C4\u0081sawith vigraha \u00E2\u0080\u0098hali antyam\u00E2\u0080\u0099, where \u00E2\u0080\u0099hali\u00E2\u0080\u0099 is the saptam\u00C4\u00AB vibhakti form of them\u00C4\u0081he\u00C5\u009Bvara s\u00C5\u00ABtra \u00E2\u0080\u0098hal\u00E2\u0080\u0099, and second as \u00E2\u0080\u0099hal antyam\u00E2\u0080\u0099. As a result of the firstinterpretation (s\u00C5\u00ABtra 13003.1), the engine adds the var\u00E1\u00B9\u0087a \u00E2\u0080\u0098l\u00E2\u0080\u0099 as the definitionof the \u00E2\u0080\u0098it\u00E2\u0080\u0099 sa\u00E1\u00B9\u0083j n\u00C4\u0081 in terms_db cache. During the second interpretation of\u00E2\u0080\u0098it\u00E2\u0080\u0099 as \u00E2\u0080\u0098upade\u00C5\u009Be hal antyam\u00E2\u0080\u0099, the engine recursively checks if \u00E2\u0080\u0098hal\u00E2\u0080\u0099 is a sa\u00E1\u00B9\u0083j n\u00C4\u0081.This in turn matches with s\u00C5\u00ABtra 11071 because \u00E2\u0080\u0098l\u00E2\u0080\u0099 in \u00E2\u0080\u0098hal\u00E2\u0080\u0099 gets tagged as\u00E2\u0080\u0098it_var\u00E1\u00B9\u0087a\u00E2\u0080\u0099. Hence \u00E2\u0080\u0098hal\u00E2\u0080\u0099 gets detected as a computed sa\u00E1\u00B9\u0083j n\u00C4\u0081, which denotesthe set of var\u00E1\u00B9\u0087as from \u00E2\u0080\u0099\u00C4\u0081di\u00E2\u0080\u0099 of \u00E2\u0080\u0098hal\u00E2\u0080\u0099 i.e., \u00E2\u0080\u0098h\u00E2\u0080\u0099 upto the but not including thelast \u00E2\u0080\u0098l\u00E2\u0080\u0099 in \u00E2\u0080\u0098upade\u00C5\u009Ba\u00E2\u0080\u0099 i.e., the m\u00C4\u0081he\u00C5\u009Bvara s\u00C5\u00ABtra character sequence. Hence thesa\u00E1\u00B9\u0083j n\u00C4\u0081 \u00E2\u0080\u0098hal\u00E2\u0080\u0099 and its denoted character sequence i.e., all consonants of theSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt alphabet get added to the terms_db cache. When unwinding therecursion back to continue the second interpretation of s\u00C5\u00ABtra 13003, thistime, the ending hal var\u00E1\u00B9\u0087a in each m\u00C4\u0081he\u00C5\u009Bvara s\u00C5\u00ABtra gets an \u00E2\u0080\u0098it_var\u00E1\u00B9\u0087a\u00E2\u0080\u0099 tag.9.2 Expansion of praty\u00C4\u0081h\u00C4\u0081ra \u00E2\u0080\u0098ac\u00E2\u0080\u0099Next, when trying to interpret the input lexeme \u00E2\u0080\u0098ac\u00E2\u0080\u0099, the engine looks totag the lexeme\u00E2\u0080\u0099s constituent parts by matching them with the definitionsof known sa\u00E1\u00B9\u0083j n\u00C4\u0081s. This time, the s\u00C5\u00ABtra 13003.2 applies, causing the \u00E2\u0080\u0098c\u00E2\u0080\u0099 tobe tagged as an \u00E2\u0080\u0099it_var\u00E1\u00B9\u0087a\u00E2\u0080\u0099 because \u00E2\u0080\u0098c\u00E2\u0080\u0099 is a member of the \u00E2\u0080\u0098hal\u00E2\u0080\u0099 set in theterms_db cache. Since there is no explcit sa\u00E1\u00B9\u0083j n\u00C4\u0081 called \u00E2\u0080\u0098ac\u00E2\u0080\u0099, the enginePAIAS 55checks to see if \u00E2\u0080\u0098ac\u00E2\u0080\u0099 is a dynamically computed sa\u00E1\u00B9\u0083j n\u00C4\u0081 name by applyings\u00C5\u00ABtra 11071 \u00E2\u0080\u0098\u00C4\u0081di\u00E1\u00B8\u00A5 antyena saha it\u00C4\u0081\u00E2\u0080\u0099.This time, when computing the var\u00E1\u00B9\u0087a set as part of s\u00C5\u00ABtra 11071, the last\u00E2\u0080\u0098hal\u00E2\u0080\u0099 var\u00E1\u00B9\u0087a in each m\u00C4\u0081he\u00C5\u009Bvara s\u00C5\u00ABtra needs to be suppressed. To accomplishthis, we had to manually code the interpretation of a single vidhi s\u00C5\u00ABtra \u00E2\u0080\u0098tasyalopa\u00E1\u00B8\u00A5\u00E2\u0080\u0099. To do so, we had to manually rewrite it as \u00E2\u0080\u0098ita\u00E1\u00B8\u00A5 lopa\u00E1\u00B8\u00A5\u00E2\u0080\u0099 because ourengine does not yet have the logic to interpret vidhi s\u00C5\u00ABtras automatically.The engine reduces the definition of \u00E2\u0080\u0098lopa\u00E1\u00B8\u00A5\u00E2\u0080\u0099 to be \u00E2\u0080\u0098adar\u00C5\u009Banam\u00E2\u0080\u0099 from s\u00C5\u00ABtra11060. We manually wrote a function to interpret the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB word\u00E2\u0080\u0098adar\u00C5\u009Banam\u00E2\u0080\u0099 to suppress the emission of its referrent lexeme - here the onewith the \u00E2\u0080\u0098it_var\u00E1\u00B9\u0087a\u00E2\u0080\u0099 tag.Thus the engine is able to generate the var\u00E1\u00B9\u0087a sequence for \u00E2\u0080\u0098ac\u00E2\u0080\u0099 praty\u00C4\u0081h\u00C4\u0081raas \u00E2\u0080\u0098a i u e o ai au\u00E2\u0080\u0099. The terms_db cache serves two purposes: i) to breakinfinite recursions in expansion of sa\u00E1\u00B9\u0083j n\u00C4\u0081 definitions that are possible inA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, and ii) to speedup subsequent processing of a sa\u00E1\u00B9\u0083j n\u00C4\u0081 once ithas been expanded.10 Future DirectionsWe recognize that generating an automated interpretation engine for A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081d-hy\u00C4\u0081y\u00C4\u00AB is a complex and long-term task due to the need to validate and adjustthe methodology manually, and the thousands of s\u00C5\u00ABtras involved. However,our attempt is to rely on the precision of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s exposition to mech-anise large parts of the work. Our second objective is to provide a robustfoundational platform so multiple researchers can work collaboratively andleverage each other\u00E2\u0080\u0099s innovations to accelerate the task. To this end, wewould like to work closely with other researchers to incorporate existing ap-proaches to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB interpretation and its validation. This is especiallytrue for vidhi s\u00C5\u00ABtras which constitute the bulk of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB.We hope that our programmatic interface to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB and its seman-tic functionality enables interoperable applications and deeper explorationof the grammatical structure of Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt literature by the larger computerscience community. Certain directions include data-driven analysis of therelative usage of Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammar constructs in Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt literature, vocab-ulary and its evolution over time, k\u00C4\u0081raka analysis via a mix of data-drivenand first-principles approaches. A robust grammar engine provides a sound56 Susarla et albasis for such projects. Another area of future research would be to explorethe engine\u00E2\u0080\u0099s applicability for modeling other natural languages.11 ConclusionIn this paper, we have presented a programmatic interface to the celebratedSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt grammar treatise A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB with the goal to evolve a direct in-terpreter of its s\u00C5\u00ABtras for Sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt word generation and transformation in allits variations. Our initial experience indicates that the consistent structureand conventions of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099s s\u00C5\u00ABtras make them amenable to mechanizeds\u00C5\u00ABtra interpretation with fidelity. However, much more work needs to bedone to fully validate the hypothesis. Having a flexible, reusable and ex-tendible interface to A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB provides a sound basis for collaborativeresearch and application development.ReferencesAjotikar, Tanuja, Anuja Ajotikar, and Peter Scharf. 2015. \u00E2\u0080\u009CSome Issuesin the Computational Implementation of the Ashtadhyayi\u00E2\u0080\u009D. In: San-skrit and Computational Linguistics, select papers from \u00E2\u0080\u0099Sanskrit andIT World\u00E2\u0080\u0099 section of 16th World Sanskrit Conference. Ed. by AmbaKulkarni. Bangkok, Thailand, pp. 103\u00E2\u0080\u0093124.Dikshita, Pushpa. 2010. \u00E2\u0080\u009CAshtadhyayi Sutra Pathah\u00E2\u0080\u009D. In: Samskrita Bharati.Chap. 4.Goyal, Pawan, G\u00C3\u00A9rard Huet, Amba Kulkarni, Peter Scharf, and RalphBunker. 2012. \u00E2\u0080\u009CA Distributed Platform for Sanskrit Processing\u00E2\u0080\u009D. In:24th International Conference on Computational Linguistics (COLING),Mumbai.Goyal, Pawan, Amba Kulkarni, and Laxmidhar Behera. 2008. \u00E2\u0080\u009CComputerSimulation of Ashtadhyayi: Some Insights\u00E2\u0080\u009D. In: 2nd International Sym-posium on Sanskrit Computational Linguistics. Providence, USA.Hellwig, Oilver. 2009. \u00E2\u0080\u009CExtracting dependency trees from Sanskrit texts\u00E2\u0080\u009D.Sanskrit Computational Linguistics 3, LNAI 5406pp. 106\u00E2\u0080\u0093115.Huet, G\u00C3\u00A9rard. 2002. \u00E2\u0080\u009CThe Zen Computational Linguistics Toolkit: LexiconStructures and Morphology Computations using a Modular FunctionalProgramming Language\u00E2\u0080\u009D. In: Tutorial, Language Engineering ConferenceLEC\u00E2\u0080\u00992002. Hyderabad.JSON. 2000. Introducing JSON. http://www.json.org/.Krishna, Amrit and Pawan Goyal. 2015. \u00E2\u0080\u009CTowards automating the gener-ation of derivative nouns in Sanskrit by simulating Panini\u00E2\u0080\u009D. In: San-skrit and Computational Linguistics, select papers from \u00E2\u0080\u0099Sanskrit and ITWorld\u00E2\u0080\u0099 section of 16th World Sanskrit Conference. Ed. by Amba Kulka-rni. Bangkok, Thailand.Kulkarni, Amba. 2016. Samsaadhanii: A Sanskrit Computational Toolkit.http://sanskrit.uohyd.ac.in/.Kumar, Anil. 2012. \u00E2\u0080\u009CAutomatic Sanskrit Compound Processing\u00E2\u0080\u009D. PhD the-sis. University of Hyderabad.Mishra, Anand. 2008. \u00E2\u0080\u009CSimulating the Paninian System of Sanskrit Gram-mar\u00E2\u0080\u009D. In: 1st and 2nd International Symposium on Sanskrit Computa-tional Linguistics. Providence, USA.5758 Susarla et alPatel, Dhaval and Shivakumari Katuri. 2016. \u00E2\u0080\u009CPrakriy\u00C4\u0081pradar\u00C5\u009Bin\u00C4\u00AB - an opensource subanta generator\u00E2\u0080\u009D. In: Sanskrit and Computational Linguistics -16th World Sanskrit Conference, Bangkok, Thailand, 2015.Petersen, Wiebke and Oliver Hellwig. 2016. \u00E2\u0080\u009CAnnotating and Analyzing theAshtadhyayi\u00E2\u0080\u009D. In: Input a Word, Analyse the World: Selected Approachesto Corpus Linguistics, Newcastle upon Tyne: Cambridge Scholars Pub-lishing.Petersen, Wiebke and Simone Soubusta. 2013. \u00E2\u0080\u009CStructure and implementa-tion of a digital edition of the Ashtadhyayi\u00E2\u0080\u009D. In: In Recent Researchesin Sanskrit Computational Linguistics - Fifth International SymposiumIIT Mumbai, India, January 2013 Proceedings.Satuluri, Pavankumar and Amba Kulkarni. 2013. \u00E2\u0080\u009CGeneration of SanskritCompounds\u00E2\u0080\u009D. In: International Conference on Natural Language Pro-cessing.Scharf, Peter. 2016. \u00E2\u0080\u009CAn XML formalization of the Ashtadhyayi\u00E2\u0080\u009D. In: San-skrit and Computational Linguistics - 16th World Sanskrit Conference,Bangkok, Thailand, 2015.Scharf, Peter and Malcolm Hyman. 2009. Linguistic Issues in Encoding San-skrit. Motilal Banarsidass, Delhi.Subbanna, Sridhar and Srinivasa Varakhedi. 2010. \u00E2\u0080\u009CAsiddhatva Principlein Computational Model of Ashtadhyayi\u00E2\u0080\u009D. In: 4th International Sanskritand Computational Linguistics Symposium. New Delhi.Susarla, Sarada and Sai Susarla. 2012. Panini Ashtadhyayi Sutras with Com-mentaries: Sortable Index. https://sanskritdocuments.org/learning_-tools/ashtadhyayi/.Yogyat\u00C4\u0081 as an absence of non-congruitySanjeev Panchal and Amba KulkarniAbstract: Yogyat\u00C4\u0081 or mutual congruity between the meanings of therelated word is an important factor in the process of verbal cognition.In this paper, we present the computational modeling of yogyat\u00C4\u0081 forautomatic parsing of Sanskrit sentences. Among the several definitionsof yogyat\u00C4\u0081 we modeled it as an absence of non-congruity. We discussthe reasons behind our modeling.Due to lack of any syntactic criterion for vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a (adjectives) in San-skrit, parsing Sanskrit texts with adjectives resulted in a high numberof false positives. Hints from the vy\u00C4\u0081kara\u00E1\u00B9\u0087a texts helped us in theformulation of a criterion for vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a with syntactic and ontologi-cal constraints, which provided us a clue to decide the absence ofnon-congruity between two words with respect to the adjectival rela-tion. A simple two-way classification of nouns into dravya and gu\u00E1\u00B9\u0087awith further sub-classification of gu\u00E1\u00B9\u0087as into gu\u00E1\u00B9\u0087avacanas was foundto be necessary for handling adjectives. The same criterion was alsonecessary to handle the ambiguities between a k\u00C4\u0081raka and non-k\u00C4\u0081rakarelations. These criteria together with modeling yogyat\u00C4\u0081 as an absenceof non-congruity resulted in 81% improvement in precision.1 IntroductionThree factors viz. \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081 (expectancy), yogyat\u00C4\u0081 (congruity) and sannidhi(proximity) play a crucial role in the process of \u00C5\u009B\u00C4\u0081bdabodha (verbal cogni-tion). These factors have been found to be useful in the development of aSanskrit parser as well. The concept of subcategorisation of modern Lin-guistics comes close to the concept of \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081. Subcategorization structuresprovide syntactic frames to capture different syntactic behaviors of verbs.Sanskrit being an inflectional language, the information of various relationsis encoded in suffixes rather than in positions. These suffixes express theexpectancy, termed as \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081 in the Sanskrit literature. Kulkarni, Pokar,5960 Sanjeev Panchal and Amba Kulkarniand Shukl (2010) describe how the \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081 was found to be useful in theproposition of possible relations between words. Sannidhi has been found tobe equivalent to the weak non-projectivity principle (Kulkarni, P. Shukla,et al. 2013c). In this paper, we will discuss the role of the third factor viz.yogyat\u00C4\u0081, in building a Sanskrit parser.The concept of selection restriction is similar to the concept of yogyat\u00C4\u0081.The expectancy, or the \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081, proposes a possible relation between thewords in a sentence. Such a relation would hold between two words only ifthey are meaning-wise compatible. It is the selection restriction or yogyat\u00C4\u0081which then comes into force to prune out incongruent relations, keeping onlythe congruent ones. Katz and Fodor (1963) proposed a model of selection re-strictions as necessary and sufficient conditions for semantic acceptability ofthe arguments to a predicate. Identifying a selection restriction that is bothnecessary and sufficient is a very difficult task. Hence there were attemptsto propose alternatives. One such alternative was proposed by Wilks (1975)who viewed these restrictions as preferences rather than necessary and suffi-cient conditions. After the development of WordNet, Resnik (1993) modeledthe problem of induction of selectional preferences using the semantic classhierarchy of WordNet. Since then there is an upsurge in the field of com-putational models for the automated treatment of selectional preferenceswith a variety of statistical models and Machine learning techniques. Inrecent times, one of the ambitious projects to represent World Knowledgewas taken up under the banner of Cyc. This knowledgebase contains overfive hundred thousand terms, including about seventeen thousand types ofrelations, and about seven million assertions relating these terms.1 In spiteof the availability of such a huge knowledge base, we rarely find Cyc beingused in NLP applications.The first attempt to use the concept of yogyat\u00C4\u0081 in the field of Ma-chine Translation was by the Akshar Bharati group (Bhanumati 1989) inthe Telugu-Hindi Machine Translation system. Selectional restrictions wereused in defining the K\u00C4\u0081raka Charts that provided a subcategorization frameas well as semantic constraints over the arguments of the verbs. On simi-lar lines Noun Lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a Charts and Verb Lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a Charts were also usedfor disambiguation of noun and verb meanings. These charts expressed se-lectional restrictions using both ontological concepts as well as semantic1http://www.cyc.com/kb, accessed on 30th August, 2017Yogyat\u00C4\u0081 as an absence of non-congruity 61properties. An example K\u00C4\u0081raka chart for the Hindi verb j\u00C4\u0081n\u00C4\u0081 (to go) isgiven in table 1.case relation necessity case marker semantic constraintap\u00C4\u0081d\u00C4\u0081nam (source) desirable se not (up\u00C4\u0081dhi:vehicle)kara\u00E1\u00B9\u0087am (instrument) desirable se (up\u00C4\u0081dhi:vehicle)karma(object) mandatory 0/ko -kart\u00C4\u0081 (agent) mandatory 0 -Table 1K\u00C4\u0081raka Chart for the verb j\u00C4\u0081n\u00C4\u0081 (to go)Here up\u00C4\u0081dhi is an imposed property. The first row in Table 1 states aconstraint that a noun with case marker se has a k\u00C4\u0081raka role of ap\u00C4\u0081d\u00C4\u0081nam(source) provided it is not a vehicle. The ontological classification was in-spired by the ontology originated from the vai\u00C5\u009Be\u00E1\u00B9\u00A3ika school of philosophy.The parsers for Indian languages were further improved. Bharati, Chai-tanya, and Sangal (1995) mentions the importance of two semantic factorsviz. animacy and humanity, in parsing, that removes the ambiguity amongthe kart\u00C4\u0081 and karma(roughly subject and object). This hypothesis was fur-ther strengthened with experimental verification by Bharati, Husain, et al.(2008).In the next section, we first state the importance of yogyat\u00C4\u0081 in parsing,as a filter to prune out meaningless parses. Since yogyat\u00C4\u0081 deals with thecompatibility between meanings, and a word expresses meanings at differentlevels, we also discuss the mutual hierarchy among these various meanings.In the third section, we look at various definitions of yogyat\u00C4\u0081 offered inthe tradition, and decide the one that is suitable for implementation. Inthe same section, we evolve strategies to disambiguate relations based onyogyat\u00C4\u0081. Finally, the criteria evolved for disambiguation are evaluated. Theevaluation results are discussed in section four, followed by the conclusion.2 Yogyat\u00C4\u0081 as a filterNecessary condition for understanding a sentence is that a word having anexpectancy for another word should become nir\u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3a (having no further62 Sanjeev Panchal and Amba Kulkarniexpectancy) once a relation is established between them. Further, such re-lated words should also have mutual compatibility from the point of view ofthe proposed relation. If they are not, then the expectancy of such wordswill not be put to rest and there would not be any verbal cognition. There-fore the role of yogyat\u00C4\u0081 in verbal cognition is very important. The purposeof using yogyat\u00C4\u0081 in parsing is not to make a computer \u00E2\u0080\u0098understand\u00E2\u0080\u0099 the text,but to rule out incompatible solutions from among the solutions that fulfillthe \u00C4\u0081k\u00C4\u0081m\u00CC\u0087k\u00E1\u00B9\u00A3\u00C4\u0081s. For example, in the sentenceSkt: y\u00C4\u0081nam vanam gacchati.Gloss: vehicle{neut., sg., nom./acc.} forest{neut., sg., nom./acc.}go{present, 3rd per., sg.}There are 6 possible analyses, based on the \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081. They are1. y\u00C4\u0081nam is the kart\u00C4\u0081 and vanam is the karma of the verb gam,2. y\u00C4\u0081nam is the karma and vanam is the kart\u00C4\u0081 of the verb gam,3. y\u00C4\u0081nam is the kart\u00C4\u0081 of the verb gam and vanam is the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of y\u00C4\u0081nam,4. y\u00C4\u0081nam is the karma of the verb gam and vanam is the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of y\u00C4\u0081nam,5. y\u00C4\u0081nam is the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of vanam which is the kart\u00C4\u0081 of the verb gam,6. y\u00C4\u0081nam is the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of vanam which is the karma of the verb gam.If the machine knows that the kart\u00C4\u0081 of an action of going should be mov-able, and that the designation of y\u00C4\u0081na is movable, but that of vana is notmovable, then mechanically it can rule out the second analysis. The wordsy\u00C4\u0081nam and vanam on account of the agreement between them have the poten-tial to be vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as of each other. But the semantic incompatibility betweenthe meanings of these words rules out the last four possibilities, leaving onlythe first correct analysis.As another example, look at the sentenceSkt: R\u00C4\u0081me\u00E1\u00B9\u0087a b\u00C4\u0081\u00E1\u00B9\u0087ena V\u00C4\u0081l\u00C4\u00AB hanyate.Gloss: Rama{ins.} arrow{ins.} Vali{nom.} is_killed.R\u00C4\u0081ma and b\u00C4\u0081\u00E1\u00B9\u0087a, both being in instrumental case, can potentially be akart\u00C4\u0081 as well as a kara\u00E1\u00B9\u0087am of the verb han (to kill). If the machine knowsthat b\u00C4\u0081\u00E1\u00B9\u0087a can be used as an instrument in the act of killing, while R\u00C4\u0081mabeing the name of a person, can not be a potential instrument in the act ofYogyat\u00C4\u0081 as an absence of non-congruity 63killing, it can then filter out the incompatible solution: R\u00C4\u0081ma as a kara\u00E1\u00B9\u0087amand b\u00C4\u0081\u00E1\u00B9\u0087a as a kart\u00C4\u0081.Look at another sentence payas\u00C4\u0081 si\u00C3\u00B1cati (He wets with water). Herepayas (water) is in instrumental case, and is a liquid, and hence is compat-ible with the action of si\u00C3\u00B1c (to wet). But in the sentence vahnin\u00C4\u0081 si\u00C3\u00B1cati(He wets with fire), vahni (fire) is not fit to be an instrument of the actionof wetting, and as such it fails to satisfy the yogyat\u00C4\u0081. But now imagine asituation where a person is in a bad mood, and his friend without know-ing it starts accusing him further for some fault of his, instead of utteringsome soothing words of the console. Third-person watching this utters kimvahnin\u00C4\u0081 si\u00C3\u00B1casi (Why are you pouring fire?) - a perfect verbalization ofthe situation. The words, here, are like a fire to the person who is alreadyin a bad mood. This meaning of vahni is its extended meaning. Thus,even if a relation between primary meanings does not make sense, if therelation between extended meanings makes sense, we need to produce theparse. Therefore, in addition to the primary meanings, the machine also,sometimes, needs access to the secondary/extended meanings of the words.2.1 Word and its MeaningsEvery word has a significative power that denotes its meaning. In Indiantheories of meaning, this significative power is classified into three typesviz. abhidh\u00C4\u0081 (the primary meaning), lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087\u00C4\u0081 (the secondary or metaphoricmeaning) and vya\u00C3\u00B1jan\u00C4\u0081 (the suggestive meaning). In order to use the con-cept of yogyat\u00C4\u0081 in designing a parser, we should know what is the role ofeach of these meanings in the process of interpretation.The secondary meaning comes into play when the primary meaning isincompatible with the meanings of other words in a sentence. The absenceof yogyat\u00C4\u0081 is the basic cause for this signification. Indian rhetoricians ac-cept three conditions as necessary for a word to denote this extended ormetaphoric sense. These three conditions are21. inapplicability / unsuitability of the primary meaning,2. some relation between the primary meaning and the extended mean-ing, and2 mukhy\u00C4\u0081rthab\u00C4\u0081dhe tadyoge r\u00C5\u00AB\u00E1\u00B8\u008Dhito\u00E2\u0080\u0099tha prayojan\u00C4\u0081t |anyo\u00E2\u0080\u0099rtho lak\u00E1\u00B9\u00A3yate yat s\u00C4\u0081 lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087\u00C4\u0081ropit\u00C4\u0081 kriy\u00C4\u0081 ||(KP II 9)64 Sanjeev Panchal and Amba Kulkarni3. definite motive justifying the extension.In addition to these two meanings, there is one more meaning, calledvya\u00C3\u00B1jan\u00C4\u0081 or the suggestive meaning. This corresponds to the inner mean-ing of any text/speaker\u00E2\u0080\u0099s intention. In order to understand this meaning,consider a sentence gato\u00E2\u0080\u0099stam arka\u00E1\u00B8\u00A5 which literally means \u00E2\u0080\u0098the sun has set\u00E2\u0080\u0099.Every listener gets this meaning. In addition to this meaning, it may alsoconvey different signals to different listeners. For a child playing in theground, it may mean \u00E2\u0080\u0098now it is getting dark and it is time to stop playingand go home\u00E2\u0080\u0099, for a Brahmin, it may mean \u00E2\u0080\u0098it is time to do the sandhy\u00C4\u0081-vandana\u00E2\u0080\u0099, and for a young man it may mean \u00E2\u0080\u0098it is time to meet his lover\u00E2\u0080\u0099.This extra meaning co-exists with the primary meaning. It does not blockthe primary meaning. Therefore vya\u00E1\u00B9\u0085g\u00C4\u0081rtha (suggestive meaning) exists inparallel with the primary/secondary meaning.Since the suggestive meaning is in addition to the primary/secondarymeaning, and is optional, and also is different for different listeners, it in-volves subjectivity for processing. Hence it is not possible to objectivelyprocess this meaning for any utterance.3 This also puts an upper limit onthe meaning one can get from a linguistic utterance without the interferenceof subjective judgments. In summary, we observe that these three meaningsare not in the same plane. Lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087\u00C4\u0081 comes into play only when abhidh\u00C4\u0081fails to provide a suitable meaning for congruent interpretation. And thesuggestive meaning can co-exist with the abhidh\u00C4\u0081 as well as the lak\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087\u00C4\u0081,and as such, is outside the scope of automatic processing.3 Modeling Yogyat\u00C4\u0081Yogyat\u00C4\u0081 is the compatibility between the meanings of related words. Thismeaning, as we saw above, can be either a primary or a metaphoric one. Theabsence of any hindrance in the understanding of a sentence implies thereis yogyat\u00C4\u0081 or congruity among the meanings. There have been differentviews among scholars about what yogyat\u00C4\u0081 is. According to one definition,yogyat\u00C4\u0081 is artha-ab\u00C4\u0081dha\u00E1\u00B8\u00A54 (that which is not a hindrance to meaning). It3One of the reviewers commented that taking into account the advents in Big Dataand Machine Learning techniques, it may even be possible to process such meaningsby machines in the future. However, we are of the opinion that machine would needsemantically annotated corpus for learning, which does not yet exist.4All the meanings we will be discussing below are found in NK p. 675.Yogyat\u00C4\u0081 as an absence of non-congruity 65is further elaborated as b\u00C4\u0081dhaka-pram\u00C4\u0081-viraha\u00E1\u00B8\u00A5 or b\u00C4\u0081dhaka-ni\u00C5\u009Bcaya-abh\u00C4\u0081va\u00E1\u00B8\u00A5(absence of the decisive knowledge of incompatibility). There are other at-tempts to define it as an existing qualifying property. One such definitionis sambandha-arhatvam (eligibility for mutual association), and the otherone is paraspara-anvaya-prayojaka-dharmavattvam (a property of promot-ing mutual association). The first set of definitions presents yogyat\u00C4\u0081 as anabsence of incompatibility whereas the second set of definitions present itas the presence of compatibility between the meanings.Let us see the implications of modeling yogyat\u00C4\u0081 through these two lenses.1. We establish a relation only if the two morphemes are mutually con-gruous.In this case, we need to take care of not only the congruity betweenprimary meanings but even between the metaphoric/secondary mean-ings.2. We establish a relation if there is no incongruity between the twomeanings.The first possibility ensures that the precision is high and there is lesschance of Type-1 error, i.e. of allowing wrong solutions. The second pos-sibility, on the other hand, ensures that the recall is high and there is lesschance of Type-2 error, viz. the possibility of missing any correct solution.But there is a chance that we allow some unmeaningful solutions as well. Ifwe decide to go for the first possibility, we need to handle both the primaryas well as secondary meanings, and we need to state precisely under whatconditions the meanings are congruous. And this means modeling congruityfor each verb and for each relation. This is a gigantic task, and there is apossibility of missing correct solutions if we do not take into account allthe possible extensions of meanings. Therefore, we decided to go for thesecond choice allowing a machine to do some mistakes of choosing incon-gruous solutions but we did not want to throw away correct solutions evenby mistake. This decision is in favor of our philosophy of sharing the loadbetween man and machine. Our aim is to provide access to the originaltext by reducing the language learning load. So we can not afford to missa possible solution. Thus at the risk of providing more solutions than theactual possible solutions, we decided to pass on some load to the reader ofpruning out irrelevant solutions manually.66 Sanjeev Panchal and Amba KulkarniIn the first step, we decided to use yogyat\u00C4\u0081 only in those cases where acase marker is ambiguous between more than one relation. We noticed thefollowing three cases of ambiguities with reference to the relations.1. vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va (adjectival relation)Here both the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya and vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a agree in gender, number and case,and hence only on the basis of the word form, we can not tell whichone is vi\u00C5\u009Be\u00E1\u00B9\u00A3ya and which one is vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a.2. a k\u00C4\u0081raka and a non-k\u00C4\u0081raka relation as ina. kara\u00E1\u00B9\u0087am (instrument) and hetu (cause), with an instrumentalcase marker,b. samprad\u00C4\u0081nam (beneficiary), prayojanam (purpose) and t\u00C4\u0081-darthya (being intended for), with a dative case marker,c. ap\u00C4\u0081d\u00C4\u0081nam (source) and hetu (cause), with an ablative casemarker.3. \u00C5\u009Ba\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB sambandha (a genitive relation) and a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a (an adjective)When two words are in the genitive case, it is not clear whether thereis an adjectival relation between them, or a genitive relation.We now discuss each of these three cases below.3.1 Vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va (Adjectival relation)We come across a term sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a (co-reference) in P\u00C4\u0081\u00E1\u00B9\u0087ini to denotean adjective (Joshi and Roodbergen 1998, p. 6). One of the contexts inwhich the term sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a is used is the context of an agreementbetween an adjective and a noun.5 For example, dh\u00C4\u0081vantam\u00CC\u0087 m\u00E1\u00B9\u009Bgam\u00CC\u0087 (a run-ning deer), or sundara\u00E1\u00B8\u00A5 a\u00C5\u009Bva\u00E1\u00B8\u00A5 (a beautiful horse). P\u00C4\u0081\u00E1\u00B9\u0087ini has not defined theterm sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a, either. The term sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a (co-reference)literally means \u00E2\u0080\u0098having the same locus\u00E2\u0080\u0099. Pata\u00C3\u00B1jali in the Samartha-\u00C4\u0081hnikadiscusses the term s\u00C4\u0081m\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087ya (co-referential) (literally a propertyof being in the same locus). In the example, sundara\u00E1\u00B8\u00A5 a\u00C5\u009Bva\u00E1\u00B8\u00A5 (a beautifulhorse), both the qualities of saundarya (beauty) and a\u00C5\u009Bvatva (horse-ness)reside in an a\u00C5\u009Bva (horse), which is the common locus. Similarly, in thecase of \u00C4\u0081c\u00C4\u0081rya\u00E1\u00B8\u00A5 dro\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5, or agne g\u00E1\u00B9\u009Bhapate (O Agni! house-holder), boththe words \u00C4\u0081c\u00C4\u0081rya as well as dro\u00E1\u00B9\u0087a refer to the same individual, so do agni5s\u00C4\u0081m\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087yam ekavibhaktitvam ca. dvayo\u00C5\u009Bcaitad bhavati. kayo\u00E1\u00B8\u00A5. Vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-vi\u00C5\u009Be\u00E1\u00B9\u00A3yayo\u00E1\u00B8\u00A5 v\u00C4\u0081 sa\u00C3\u00B1j\u00C3\u00B1\u00C4\u0081-sa\u00C3\u00B1j\u00C3\u00B1inorv\u00C4\u0081 (MBh 1.1.1)Yogyat\u00C4\u0081 as an absence of non-congruity 67and g\u00E1\u00B9\u009Bhapati. This is true of various other relation-denoting terms such asguru, \u00C5\u009Bi\u00E1\u00B9\u00A3ya, pit\u00C4\u0081, putra, etc. and up\u00C4\u0081dhis (imposed / acquired properties)such as r\u00C4\u0081j\u00C4\u0081, mantr\u00C4\u00AB, vaidya, etc. From all this discussion, we may says\u00C4\u0081m\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087ya (the property of having the same locus) is the semanticcharacterisation of a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a.In Sanskrit, there is no syntactic / morphological category as a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a(an adjective). The gender, number and case of a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a follows that ofa vi\u00C5\u009Be\u00E1\u00B9\u00A3ya (the head). From the point of view of analysis this provides asyntactic clue for a possible vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va between two words suchas in \u00C5\u009Bukla\u00E1\u00B8\u00A5 pa\u00E1\u00B9\u00ADa\u00E1\u00B8\u00A5 (a white cloth). This agreement is just a necessary con-dition, and not sufficient. Because, a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a, in addition to agreeing withthe vi\u00C5\u009Be\u00E1\u00B9\u00A3ya should also be semantically fit to be a qualifier of the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya.For example, there can be two words say y\u00C4\u0081nam (a vehicle) and vanam (aforest), that match perfectly in gender, number and case, but we can notimagine a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va between y\u00C4\u0081na and vana. Is it only thesemantics that rules out such a relation or are there any clues, especiallysyntactic ones, that help us to rule out a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va betweensuch words?In search of clues:P\u00C4\u0081\u00E1\u00B9\u0087ini has not defined the terms vi\u00C5\u009Be\u00E1\u00B9\u00A3ya and vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a. Pata\u00C3\u00B1jali usestwo terms dravya (substance) and gu\u00E1\u00B9\u0087a (quality) while commenting on theagreement between a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya and a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a.yad asau dravyam\u00CC\u0087 \u00C5\u009Brito bhavati gu\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5 tasya yat li\u00E1\u00B9\u0085gam vacanamca tad gu\u00E1\u00B9\u0087asya api bhavati. (MBh under A4.1.3 Vt VI.)A quality assumes the gender and number of the substance inwhich it resides.But then what is this gu\u00E1\u00B9\u0087a?We come across the description of gu\u00E1\u00B9\u0087a by Kaiyya\u00E1\u00B9\u00ADa.sattve nivi\u00C5\u009Bate apaiti p\u00E1\u00B9\u009Bthag j\u00C4\u0081ti\u00E1\u00B9\u00A3u d\u00E1\u00B9\u009B\u00C5\u009Byate\u00C4\u0081dheya\u00E1\u00B8\u00A5 -ca-akriy\u00C4\u0081ja\u00E1\u00B8\u00A5-ca sa\u00E1\u00B8\u00A5 asattva-prak\u00E1\u00B9\u009Bti-gu\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5(MBh A4.1.44)Gu\u00E1\u00B9\u0087a is something which is found in things / substances (sattve68 Sanjeev Panchal and Amba Kulkarninivi\u00C5\u009Bate), which can cease to be there (apaiti), which is foundin different kinds of substances (p\u00E1\u00B9\u009Bthag j\u00C4\u0081ti\u00E1\u00B9\u00A3u), which is some-times an effect of an action and sometimes not so (\u00C4\u0081dheya\u00E1\u00B8\u00A5-ca-akriy\u00C4\u0081ja\u00E1\u00B8\u00A5-ca), and whose nature is not that of a substance(asattva-prak\u00E1\u00B9\u009Bti).Thus gu\u00E1\u00B9\u0087a is something which is not a substance since it resides in otherthings. It is not universal since it is found in different kinds of substances.It is not an action, since gu\u00E1\u00B9\u0087a is sometimes an effect of an action, as in thecase of the color of a jar and sometimes not, as in the case of the magnitudeof a substance. This characterisation of gu\u00E1\u00B9\u0087ais very close to the vai\u00C5\u009Be\u00E1\u00B9\u00A3ika\u00E2\u0080\u0099sconcept of gu\u00E1\u00B9\u0087a (Raja 1963).Then, is this vai\u00C5\u009Be\u00E1\u00B9\u00A3ika gu\u00E1\u00B9\u0087a a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a?Pata\u00C3\u00B1jali commenting on the word gu\u00E1\u00B9\u0087a under A2.2.11 provides anexample contrasting two types of gu\u00E1\u00B9\u0087as. While both \u00C5\u009Bukla and gandhaare qualities (gu\u00E1\u00B9\u0087a) according to the vai\u00C5\u009Be\u00E1\u00B9\u00A3ika ontology, the usage \u00C5\u009Bukla\u00E1\u00B8\u00A5pa\u00E1\u00B9\u00ADa\u00E1\u00B8\u00A5 (a white cloth) is possible, while gandham candanam (fragrance sandal-wood) is not. Thus, only some of the vai\u00C5\u009Be\u00E1\u00B9\u00A3ika gu\u00E1\u00B9\u0087as have a potential to bea vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a, and not all.If vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a is not a vai\u00C5\u009Be\u00E1\u00B9\u00A3ika gu\u00E1\u00B9\u0087a, what is it?The characterisation of gu\u00E1\u00B9\u0087a by Bhart\u00E1\u00B9\u009Bhari in Gu\u00E1\u00B9\u0087a-samudde\u00C5\u009Ba includesbhedakam as one of the characteristics of gu\u00E1\u00B9\u0087a. But, in addition, gu\u00E1\u00B9\u0087a,according to him, is also capable of expressing the degree of quality in asubstance through a suffix. He defines gu\u00E1\u00B9\u0087a assam\u00CC\u0087sargi bhedakam\u00CC\u0087 yad yad savy\u00C4\u0081p\u00C4\u0081ram\u00CC\u0087 prat\u00C4\u00AByategu\u00E1\u00B9\u0087atvam\u00CC\u0087 paratantratv\u00C4\u0081t tasya \u00C5\u009B\u00C4\u0081stra ud\u00C4\u0081h\u00E1\u00B9\u009Btam VP III.5.1Whatever rests on something else (sam\u00CC\u0087sargi), differentiates it(bhedaka), and is understood in that function (savy\u00C4\u0081p\u00C4\u0081ra) is,being dependent, called quality in the \u00C5\u009B\u00C4\u0081stra. (Iyer 1971)According to Bhart\u00E1\u00B9\u009Bhari, apart from being a differentiator, a gu\u00E1\u00B9\u0087a hasanother important characteristic, viz. that such a distinguishing quality canYogyat\u00C4\u0081 as an absence of non-congruity 69also express the degree of excellence through some suffix (such as a com-parative suffix tarap, or a superlative suffix tamap). This concept of gu\u00E1\u00B9\u0087aof Bhart\u00E1\u00B9\u009Bhari, thus is different from the concept of the gu\u00E1\u00B9\u0087a of a vai\u00C5\u009Be\u00E1\u00B9\u00A3ika.This definitely rules out the case of gandha, since we can not have gand-hatara but we can have \u00C5\u009Buklatara to distinguish the white-ness between twowhite cloths.Another clue from P\u00C4\u0081\u00E1\u00B9\u0087iniWe have another hint from P\u00C4\u0081\u00E1\u00B9\u0087ini through Pata\u00C3\u00B1jali. While in A4.1.3,Pata\u00C3\u00B1jali has used the terms dravya and gu\u00E1\u00B9\u0087a in connection with agreement,in A1.2.52, he uses the term gu\u00E1\u00B9\u0087avacana while describing a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087agu\u00E1\u00B9\u0087avacan\u00C4\u0081n\u00C4\u0081m\u00CC\u0087 \u00C5\u009Babd\u00C4\u0081n\u00C4\u0081m-\u00C4\u0081\u00C5\u009Brayata\u00E1\u00B8\u00A5 li\u00E1\u00B9\u0085gavacan\u00C4\u0081ni bhavanti-iti(A1.2.52).The words which are gu\u00E1\u00B9\u0087avacanas take the gender and numberof the substance in which they reside.The term gu\u00E1\u00B9\u0087avacana is used for those words which designate quality andthen a substance in which this quality resides (Cardona 2009). In the ex-ample, \u00C5\u009Bukla\u00E1\u00B8\u00A5 pa\u00E1\u00B9\u00ADa\u00E1\u00B8\u00A5, since \u00C5\u009Bukla in addition to being a quality (white color),can also designate a substance, such as a pa\u00E1\u00B9\u00ADa (cloth), which is (white) incolor, it is a gu\u00E1\u00B9\u0087avacana word. But gandha (fragrance) designates only qual-ity, and can not be used to designate a substance that has a fragrance, andhence is not a gu\u00E1\u00B9\u0087avacana.Is gu\u00E1\u00B9\u0087avacana necessary and sufficient to describe a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a?Let us look at the examples above. It definitely rules out y\u00C4\u0081nam\u00CC\u0087 andvanam\u00CC\u0087 to be qualifiers of each other, since neither of them is quality. Butthen what about dh\u00C4\u0081van (the one who is running) in dh\u00C4\u0081van b\u00C4\u0081laka\u00E1\u00B8\u00A5 (arunning boy)? Is dh\u00C4\u0081van a gu\u00E1\u00B9\u0087avacana?Gu\u00E1\u00B9\u0087avacana is a technical term, used by P\u00C4\u0081\u00E1\u00B9\u0087ini to define an operation ofelision of matup suffix in certain quality denoting words such as \u00E2\u0080\u0099sukla etc. Sotechnically, a word such as dh\u00C4\u0081van, though it designates a substance, is not agu\u00E1\u00B9\u0087avacana. This is clear from Pata\u00C3\u00B1jali\u00E2\u0080\u0099s commentary on A1.4.16 where he6The V\u00C4\u0081rtika gu\u00E1\u00B9\u0087vacanam ca is followed by several other v\u00C4\u0081rttikas, of which the fol-lowing two are relevant. sam\u00C4\u0081sa-k\u00E1\u00B9\u009Bt-taddhita-avyaya-sarvan\u00C4\u0081ma-asarvali\u00E1\u00B9\u0085g\u00C4\u0081 j\u00C4\u0081ti\u00E1\u00B8\u00A5 ||41 ||sam\u00CC\u0087khy\u00C4\u0081 ca ||42 ||70 Sanjeev Panchal and Amba Kulkarnistates that compounds (sam\u00C4\u0081sa), primary derivatives (k\u00E1\u00B9\u009Bdantas), secondaryderivatives (taddhit\u00C4\u0081ntas), indeclinables (avyaya), pronouns (sarvan\u00C4\u0081ma),words referring to universals (j\u00C4\u0081ti), numerals (sam\u00CC\u0087khy\u00C4\u0081) can not get thedesignation gu\u00E1\u00B9\u0087avacana, since the latter sam\u00CC\u0087j\u00C3\u00B1\u00C4\u0081s (technical terms) supersedethe previous ones.7The very fact that K\u00C4\u0081ty\u00C4\u0081yana had to mention that words belonging to allthe latter categories are not gu\u00E1\u00B9\u0087avacana, indicates that all these categoriesof words have the potential to get the gu\u00E1\u00B9\u0087avacana designation, but P\u00C4\u0081\u00E1\u00B9\u0087inidid not intend to assign this sa\u00C3\u00B1j\u00C3\u00B1\u00C4\u0081 to these words. Whatever may be thereason, but this list of various categories, in fact, provides us a morphologicalclue for a word to be a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a.Here are some examples of vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as belonging to these different gram-matical categories.1. Sam\u00C4\u0081sa (a compound)Bahuvr\u00C4\u00ABhi (exo-centric) compounds refer to an object different fromthe components of the compound, and thus typically act as adjectives.For example, p\u00C4\u00ABt\u00C4\u0081mbara\u00E1\u00B8\u00A5 is made up of two components p\u00C4\u00ABta (yellow)and ambara (cloth), but it refers to the \u00E2\u0080\u0098one wearing a yellow-cloth\u00E2\u0080\u0099(and is conventionally restricted to Vi\u00E1\u00B9\u00A3\u00E1\u00B9\u0087u). An example of tat-puru\u00E1\u00B9\u00A3(endo-centric) compound as a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a is parama-ud\u00C4\u0081ra\u00E1\u00B8\u00A5 (extremelynoble).2. K\u00E1\u00B9\u009Bdanta (an adjectival participle)Nouns derived from verbs act as qualifiers of a noun. For example,in the expression dh\u00C4\u0081vantam m\u00E1\u00B9\u009Bgam (a running deer), dh\u00C4\u0081vantam, averbal noun, is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a. Only certain k\u00E1\u00B9\u009Bdanta suffixes such as \u00C5\u009Bat\u00E1\u00B9\u009B,\u00C5\u009B\u00C4\u0081nac, kta, etc. produce nouns that can be vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as, and not all.3. Taddhita (a secondary derivative)Taddhitas with certain suffixes derive new nouns such as bh\u00C4\u0081rat\u00C4\u00ABya(Indian), dhanav\u00C4\u0081n (wealthy), gu\u00E1\u00B9\u0087in (possessing good qualities), etc.that denote a substance, as against certain other taddhita words suchas manu\u00E1\u00B9\u00A3yat\u00C4\u0081 (humanity), v\u00C4\u0081rddhakya (senility) etc. which derive newwords designating qualities.4. Sarvan\u00C4\u0081ma (a pronoun)Pronouns also act as qualifiers. For example, in the expression idampustakam (this book), idam is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a.7ga\u00E1\u00B9\u0087avacanasaj\u00CC\u0087\u00C3\u00B1\u00C4\u0081y\u00C4\u0081\u00E1\u00B8\u00A5 ca et\u00C4\u0081bhi\u00E1\u00B8\u00A5 b\u00C4\u0081dhanam\u00CC\u0087 yath\u00C4\u0081 sy\u00C4\u0081t itiYogyat\u00C4\u0081 as an absence of non-congruity 715. J\u00C4\u0081ti (a universal)In an expression \u00C4\u0081mra\u00E1\u00B8\u00A5 v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5 (a mango tree), both the words \u00C4\u0081mra\u00E1\u00B8\u00A5and v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5 are common nouns. But one is a special and the other oneis a general one. So the designation of \u00C4\u0081mra is a subset of the designa-tion of v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a. Only in such cases, where there is a par\u00C4\u0081j\u00C4\u0081ti-apar\u00C4\u0081j\u00C4\u0081ti(hypernymy-hyponymy) relation, the one denoting an apar\u00C4\u0081j\u00C4\u0081ti (hy-ponymy) qualifies to be a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of the other one.6. Sam\u00CC\u0087khy\u00C4\u0081 (a numeral)In an expression eka\u00E1\u00B8\u00A5 puru\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5 (a man), the word eka\u00E1\u00B8\u00A5 designates anumber, which is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of puru\u00E1\u00B9\u00A3a.There are still two more classes of words that are not covered in the abovelist, but which can be vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as. They are: words denoting an acquiredproperty or an imposed property, and the relation-denoting terms. Forexample, \u00C4\u0081c\u00C4\u0081rya\u00E1\u00B8\u00A5 in \u00C4\u0081c\u00C4\u0081rya\u00E1\u00B8\u00A5 dro\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5, is an imposed property and putra\u00E1\u00B8\u00A5in Da\u00C5\u009Barathasya putra\u00E1\u00B8\u00A5 r\u00C4\u0081ma\u00E1\u00B8\u00A5 is a relation denoting term.In summary, samastapada, certain k\u00E1\u00B9\u009Bdantas, certain taddhit\u00C4\u0081ntas,sam\u00CC\u0087khy\u00C4\u0081, sarvan\u00C4\u0081ma, ontological categories such as par\u00C4\u0081-apar\u00C4\u0081 j\u00C4\u0081tis,semantico-syntactic property such as gu\u00E1\u00B9\u0087avacana and finally semantic prop-erties such as relation denoting terms and up\u00C4\u0081dhis, all these serve as charac-terisations of a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a. This characterization is only a necessary condition,and not sufficient since it does not involve any mutual compatibility betweenthe words. However, it brings in more precision in the necessary conditionsfor two words to be in vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va.3.1.1 Deciding a Vi\u00C5\u009Be\u00E1\u00B9\u00A3yaOnce we have identified the words that are mutually compatible with regardto an adjectival relation, the next thing is to decide the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya (head)among them. The commentary on A2.1.57 is useful in deciding the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya.This s\u00C5\u00ABtra deals with the compound formation of two words that are invi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va. In Sanskrit compound formation, the one which issubordinate gets a designation of upasarjana. This provides us a clue aboutwhich word classes are subordinate to which ones. A noun may refer to asubstance through an expression expressing the class character (j\u00C4\u0081ti) suchas utpalam (a flower), or through an action associated with it (kriy\u00C4\u0081vacana),as in dh\u00C4\u0081van (running), or through a gu\u00E1\u00B9\u0087av\u00C4\u0081caka such as n\u00C4\u00ABlam. If there aretwo words designating common nouns, one denoting a special and the otherone general, then the one which denotes a special type of common noun is72 Sanjeev Panchal and Amba Kulkarnisubordinate.8 For example, in \u00C4\u0081mra\u00E1\u00B8\u00A5 v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5, \u00C4\u0081mra is a special kind of tree,and hence is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a and v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a is its vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. If one word designates acommon noun and the other one either a gu\u00E1\u00B9\u0087avacana or a kriy\u00C4\u0081vacana, thenthe word denoting the common noun becomes the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya.9 Thus in n\u00C4\u00ABlamutpalam, utpalam is the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. In p\u00C4\u0081caka\u00E1\u00B8\u00A5 br\u00C4\u0081hma\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5 (cook Brahmin),br\u00C4\u0081hma\u00E1\u00B9\u0087a\u00E1\u00B8\u00A5 is the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. When one of the words designate a gu\u00E1\u00B9\u0087avacanaand the other a kriy\u00C4\u0081vacana, or both the words designate either gu\u00E1\u00B9\u0087avacanasor kriy\u00C4\u0081vacanas, then either of them can be a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya, as in kha\u00C3\u00B1ja\u00E1\u00B8\u00A5 kubja\u00E1\u00B8\u00A5(a hump-backed who is limping) or kubja\u00E1\u00B8\u00A5 kha\u00C3\u00B1ja\u00E1\u00B8\u00A5 (a limping person withhump-back), similarly as in kha\u00C3\u00B1ja\u00E1\u00B8\u00A5 p\u00C4\u0081caka\u00E1\u00B8\u00A5 (a limping cook) or p\u00C4\u0081caka\u00E1\u00B8\u00A5kha\u00C3\u00B1ja\u00E1\u00B8\u00A5 (a limping person who is a cook), etc.On the basis of the above discussion, we have the following preferentialorder for the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya.j\u00C4\u0081tiv\u00C4\u0081caka S {gu\u00E1\u00B9\u0087vacana, k\u00E1\u00B9\u009Bdanta}.We saw earlier that a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a can be any one of the following: a pro-noun, a numeral, a k\u00E1\u00B9\u009Bdanta, a taddhit\u00C4\u0081nta, a samasta-pada, gu\u00E1\u00B9\u0087av\u00C4\u0081caka,j\u00C4\u0081ti, relation denoting terms, and an up\u00C4\u0081dhi. So adding all these categoriesto the above preferential order, we get,j\u00C4\u0081tiv\u00C4\u0081caka S up\u00C4\u0081dhi S taddhit\u00C4\u0081nta S gu\u00E1\u00B9\u0087avacana S numeral S k\u00E1\u00B9\u009BdantaS pronoun.103.1.2 Flat or Hierarchical Structure?After we identify all the words that have a sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a relation be-tween them, and mark the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya (the head) among them, the next taskis to know whether a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a is related to this vi\u00C5\u009Be\u00E1\u00B9\u00A3ya directly, or throughother vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as.If there are n vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as, and all of them are related to the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya directly,then it results in a flat structure. But if a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a is related to the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya8s\u00C4\u0081m\u00C4\u0081nyaj\u00C4\u0081ti-vi\u00C5\u009Be\u00E1\u00B9\u00A3aj\u00C4\u0081ti\u00C5\u009Babdayo\u00E1\u00B8\u00A5 samabhivy\u00C4\u0081h\u00C4\u0081re tu vi\u00E1\u00B9\u00A3e\u00E1\u00B9\u00A3aj\u00C4\u0081tireva vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087am. underA2.1.57, in BM9j\u00C4\u0081ti\u00C5\u009Babdo gu\u00E1\u00B9\u0087akriy\u00C4\u0081\u00C5\u009Babdasamabhivy\u00C4\u0081h\u00C4\u0081re vi\u00C5\u009Be\u00E1\u00B9\u00A3yasamarpaka eva na tu vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087asamarpaka\u00E1\u00B8\u00A5, svabh\u00C4\u0081v\u00C4\u0081t, under A2.1.57, in BM10This preferential order is purely based on some observations of the corpus, and needsfurther theoretical support, if there is any.Yogyat\u00C4\u0081 as an absence of non-congruity 73through other vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as, then there are exponentially large number of waysin which n vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as can relate to the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. For example, if there arethree words say a, b and c, of which c is the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. Then computationally,there are three ways in which the other two words may relate to c.1. Both a and b are the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of c. (This results in a flat structure.)2. a is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of b and b that of c.3. b is a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a of a and a that of c.In positional languages like English, only the first two cases are possible.For example, consider the phrase \u00E2\u0080\u0098light red car\u00E2\u0080\u0099, which may either mean acar which is red in color and is light in weight, or a car which is light-red incolor. In the second case, light-red is a compound.Sanskrit being a free word order language, one can imagine, computation-ally, a possibility for the third type as well. The relation between the adjec-tival terms being that of s\u00C4\u0081m\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087ya (co-referential), semantically,only a flat structure is possible with adjectives. The other two cases of hi-erarchical structures result in compound formation in Sanskrit.This is also supported by Jaimini\u00E2\u0080\u0099s M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087s\u00C4\u0081 s\u00C5\u00ABtragu\u00E1\u00B9\u0087\u00C4\u0081n\u00C4\u0081m ca par\u00C4\u0081rthatv\u00C4\u0081t asambandha\u00E1\u00B8\u00A5 samatv\u00C4\u0081t sy\u00C4\u0081t. (MS3.1.22)In as much as all subsidiaries are subservient to something elseand are equal in that respect, there can be no connection amongthemselves.(Jha 1933)Thus, a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a is not connected to another vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a. The associated struc-ture is a flat one, with all the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as being connected to the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya.3.2 Distinguishing a k\u00C4\u0081raka from a non-k\u00C4\u0081raka:In Sanskrit, some case markers denote both a k\u00C4\u0081raka relation as well as anon-k\u00C4\u0081raka relation, as we saw earlier. In a sentence, if a verb denotes anaction, then nouns denote the participants in such an action. These partic-ipants, which are classified into 6 types, viz. kart\u00C4\u0081, karma, kara\u00E1\u00B9\u0087am, sam-prad\u00C4\u0081nam, ap\u00C4\u0081d\u00C4\u0081nam, and adhikara\u00E1\u00B9\u0087am are collectively called as k\u00C4\u0081rakas.Other nouns in the sentence, which do not participate directly in the action,74 Sanjeev Panchal and Amba Kulkarniexpress non-k\u00C4\u0081raka relations such as hetu (cause), prayojanam (purpose),etc. We get a clue to distinguish between the nouns which are related bya k\u00C4\u0081raka relation and those which are related by a non-k\u00C4\u0081raka one in theAru\u00E1\u00B9\u0087\u00C4\u0081dhik\u00C4\u0081ra of the \u00C5\u009A\u00C4\u0081bara bh\u00C4\u0081\u00E1\u00B9\u00A3ya. There it is mentioned thatna ca am\u00C5\u00ABrta-artha\u00E1\u00B8\u00A5 kriy\u00C4\u0081t\u00C4\u0081\u00E1\u00B8\u00A5 s\u00C4\u0081dhanam\u00CC\u0087 bhavat\u00C4\u00ABti (SB; p 654)No unsubstantial object can ever be the means of accomplishingan act.Thus anything other than dravya can not be a k\u00C4\u0081raka. As we saw earlier,the gu\u00E1\u00B9\u0087avacanas also can designate a dravya. And thus, all the dravyas andthe gu\u00E1\u00B9\u0087avacanas are qualified to be a k\u00C4\u0081raka. And the rest, i.e. nouns whichdenote either a gu\u00E1\u00B9\u0087a which is not a gu\u00E1\u00B9\u0087avacana or a kriy\u00C4\u0081 (verbal nouns),may have a non-k\u00C4\u0081raka relation with a verb.Let us see some examples.Skt: r\u00C4\u0081ma\u00E1\u00B8\u00A5 da\u00C5\u009Barathasya \u00C4\u0081j\u00C3\u00B1ay\u00C4\u0081 rathena vanam gacchati.Gloss: Rama {nom.} Dasharatha{gen.} order{ins.} ratha{ins.} for-est{acc.} goes.Eng: On Dasharatha\u00E2\u0080\u0099s order, Rama goes to the forest by a chariot.Skt: r\u00C4\u0081ma\u00E1\u00B8\u00A5 adhyayanena atra vasati.Gloss: Rama {nom.} study{ins.} here lives.Eng: Rama lives here in order to study.In the first sentence \u00C4\u0081j\u00C3\u00B1\u00C4\u0081 (order) is the cause for Rama\u00E2\u0080\u0099s going to forest,ratha (chariot) is the instrument (or vehicle) for his going and in the secondsentence adhyayana is the cause of R\u00C4\u0081ma\u00E2\u0080\u0099s stay.Since both hetu as well as kara\u00E1\u00B9\u0087am demand a 3rx case suffix, \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081would establish a relation of kara\u00E1\u00B9\u0087am between \u00C4\u0081j\u00C3\u00B1ay\u00C4\u0081 and gacchati,11between rathena and gacchati and also between adhyayana and gacchati.Now with the above definition of a k\u00C4\u0081raka, adhyayana, being a verbal noun (ak\u00E1\u00B9\u009Bdanta) in the sense of bh\u00C4\u0081va, represents an abstract concept and thereforeit does not designate a dravya (a substance). Hence it can not be a kara\u00E1\u00B9\u0087am.Similarly \u00C4\u0081j\u00C3\u00B1\u00C4\u0081, which is a gu\u00E1\u00B9\u0087a (according to Vai\u00C5\u009Be\u00E1\u00B9\u00A3ika ontology, being a11To be precise, the relation is between the meaning denoted by the nominal stem \u00C4\u0081j\u00C3\u00B1\u00C4\u0081and the one denoted by the verbal root gam.Yogyat\u00C4\u0081 as an absence of non-congruity 75\u00C5\u009Babda), can not be a kara\u00E1\u00B9\u0087a. Thus the use of congruity helps in pruningout impossible relations.On the same grounds, establishment of ap\u00C4\u0081d\u00C4\u0081nam and samprad\u00C4\u0081namrelations between a non-dravya12 denoting noun and a verb can also beprevented.3.3 Congruous substantive for a \u00E1\u00B9\u00A2a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB (genitive)P\u00C4\u0081\u00E1\u00B9\u0087ini has not given any semantic criterion for the use of the genitive re-lation. His rule is \u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB \u00C5\u009Be\u00E1\u00B9\u00A3e (A2.3.50) which means, in all other casesthat are not covered so far, the genitive case suffix is to be used. The re-lation marked by the \u00E1\u00B9\u00A3a\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB (genitive) case marker falls under the utth\u00C4\u0081pya(aroused) \u00C4\u0081k\u00C4\u0081m\u00CC\u0087k\u00E1\u00B9\u00A3\u00C4\u0081. This is a case of uni-directional expectancy. Thus,there is no syntactic clue to which noun the word in genitive case wouldget attached. All other nouns in the sentence are potential candidates fora genitive relation to join with. The clue is, however, semantic. Pata\u00C3\u00B1jaliin the Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3ya on A2.3.50 provides some semantic clues. He says thereare hundreds of meanings of \u00C5\u009Ba\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u00AB. Some of them are sva-sv\u00C4\u0081mi-bh\u00C4\u0081va as inr\u00C4\u0081j\u00C3\u00B1a\u00E1\u00B8\u00A5 puru\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5 (a king\u00E2\u0080\u0099s man), avayava-avayav\u00C4\u00AB-bh\u00C4\u0081va as in v\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3asya \u00C5\u009B\u00C4\u0081kh\u00C4\u0081(branch of a tree) etc. So in order to establish a genitive relation, we needthe semantic inputs. However, there are certain constraints. They are1. A genitive connecting a verbal noun expressing bh\u00C4\u0081va such as lyu\u00E1\u00B9\u00AD etc.expresses a k\u00C4\u0081raka13 relation and not the genitive one, as in r\u00C4\u0081masyagamanam.2. A genitive always connects with a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya, and never with a vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a,since there is a sam\u00C4\u0081n\u00C4\u0081dhikara\u00E1\u00B9\u0087a relation between the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya andvi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a. For example, in the expression r\u00C4\u0081masya v\u00C4\u00ABre\u00E1\u00B9\u0087a putre\u00E1\u00B9\u0087a, thegenitive relation of r\u00C4\u0081masya is with putre\u00E1\u00B9\u0087a and not with v\u00C4\u00ABre\u00E1\u00B9\u0087a.Lexical resources such as Sanskrit WordNet14 and Amarako\u00C5\u009Ba15 that aremarked with the semantic information of part-whole relation, janya-janaka-bh\u00C4\u0081va, \u00C4\u0081j\u00C4\u00ABvik\u00C4\u0081 relation etc. help in identifying the genitive relations withconfidence. When both the words refer to dravyas (substantives), then alsothere is a possibility of a genitive relation. So note that, while for other12To be precise, a non-dravya and non-gu\u00E1\u00B9\u0087avacana.13kart\u00E1\u00B9\u009Bkarma\u00E1\u00B9\u0087o\u00E1\u00B8\u00A5 k\u00E1\u00B9\u009Bti (A2.3.65)14http://www.cfilt.iitb.ac.in/wordnet/webswn/english_version.php15http://scl.samsaadhanii.in/amarakosha/index.html76 Sanjeev Panchal and Amba Kulkarnirelations, we look for the absence of non-congruity for ruling out the rela-tions, in the case of genitives, instead, we look for the presence of congruity,to prune out impossible relations. We took this decision, since we found itdifficult to describe the non-congruity in the case of genitive relations.Ambiguity between a genitive and an adjectival relationFurther, we come across an ambiguity in the genitive relation, in thepresence of adjectives. Look at the following two examples.Skt: v\u00C4\u00ABrasya R\u00C4\u0081masya b\u00C4\u0081\u00E1\u00B9\u0087amGloss: brave{gen.} Rama{gen.} arrowEng: An arrow of brave RamaandSkt: R\u00C4\u0081masya putrasya pustakamGloss: Rama{gen.} son{gen.} bookEng: A book of Rama\u00E2\u0080\u0099s sonIn the first example, v\u00C4\u00ABra being a gu\u00E1\u00B9\u0087avacana, with the earlier charac-terisation of an adjective, v\u00C4\u00ABra would be marked an adjective. while in thesecond one there is a kinship relation.4 EvaluationAs stated earlier, \u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3\u00C4\u0081 states the possibility of relations between twowords. The mutual compatibility between the meanings further helps inpruning out the incompatible relations. We classified the content nounsinto two classes: dravya and gu\u00E1\u00B9\u0087a. Gu\u00E1\u00B9\u0087as being further marked if they aregu\u00E1\u00B9\u0087avacanas. We tested the mutual compatibility only when the suffix isambiguous. To be precise, the yogyat\u00C4\u0081 is used only to disambiguate betweena k\u00C4\u0081raka versus non-k\u00C4\u0081raka relation, to establish the vi\u00C5\u009Be\u00E1\u00B9\u00A3ya-vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a-bh\u00C4\u0081va,and to establish a genitive relation. This ensured that we do not miss themetaphoric meanings. In the case of k\u00C4\u0081raka relations, if the noun denotes agu\u00E1\u00B9\u0087avacana, then the possible k\u00C4\u0081raka relation, on the basis of expectancy isYogyat\u00C4\u0081 as an absence of non-congruity 77pruned out. Similarly, in the case of adjectival relations, the relations witha non-gu\u00E1\u00B9\u0087av\u00C4\u0081caka gu\u00E1\u00B9\u0087a is pruned out.The performance of the system with and without yogyat\u00C4\u0081 was measuredto evaluate the impact of yogyat\u00C4\u0081. The corpus for evaluation of sentencesconsists of around 2300 sentences. It includes sentences with various gram-matical constructions, a few passages from school text book, Bhagavadg\u00C4\u00ABt\u00C4\u0081,and a sample from M\u00C4\u0081gha\u00E2\u0080\u0099s \u00C5\u009Ai\u00C5\u009Bup\u00C4\u0081lavadham. The \u00C5\u009Blokas in Bhagvadg\u00C4\u00ABt\u00C4\u0081as well as in \u00C5\u009Ai\u00C5\u009Bup\u00C4\u0081lavadham were converted to a canonical form.16 Thesentences with conjunction were not considered for the evaluation, since thenouns in conjunction conflict with the adjectives, and the criteria for han-dling conjunction are under development. The statistics showing the size ofvarious texts, the average word length and the average sentence length isgiven in Table 2.Type Sents Words characters avg sntlen avg wrd lenText books 260 1,295 9,591 4.98 7.40Syntax 937 3,339 25,410 3.56 7.61M\u00C4\u0081gha\u00E2\u0080\u0099s SPV 66 623 5,851 9.40 9.39Bhagvadg\u00C4\u00ABt\u00C4\u0081 940 5,698 42,251 6.06 7.41Total 2,203 10,955 83,103 3.77 7.58Table 2Corpus CharacteristicsAll these sentences were run through a parser, first without using theconditions of yogyat\u00C4\u0081 and second times using the conditions of yogyat\u00C4\u0081. Inboth cases, the parser produced all possible parses. We also ensured thatthe correct parse is present among the produced solutions. Table 3 showsthe statistics providing the number of solutions with and without using thefilter of yogyat\u00C4\u0081. The number of parses produced was reduced drastically.This improved the precision by 63% in textbook stories, by 67% in thegrammatical constructs, and by 81% in case of the text from Bhagvadg\u00C4\u00ABt\u00C4\u0081and M\u00C4\u0081gha\u00E2\u0080\u0099s k\u00C4\u0081vya. Better results in the case of these texts pertains to thefact that these texts have more usage of adjectives and non-k\u00C4\u0081raka relationsas against the textbook sentences, and artificial grammatical constructs.16All the \u00C5\u009Blokas were presented in their anvita form, following the traditional Da\u00E1\u00B9\u0087\u00E1\u00B8\u008D\u00C4\u0081n-vaya method, where the verb typically is at the end, and vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as precede the vi\u00C5\u009Be\u00E1\u00B9\u00A3yas.78 Sanjeev Panchal and Amba KulkarniCorpus type Sents avg sols avg sols improvementwithout with inyogyata yogyata precisionText books 260 39.76 14.56 63%Syntax 937 19.5 6.33 67%Literary 66 11,199 2,107 81%BhG 940 2,557 478 81%Total 2203 1439.54 268.85 81%Table 3Improvement5 ConclusionYogyat\u00C4\u0081 or mutual congruity between the meanings of the related words isan important factor in the process of verbal cognition. In this paper, wepresented the computational modeling of yogyat\u00C4\u0081 for automatic parsing ofSanskrit sentences. Among the several definitions of yogyat\u00C4\u0081, we modeled itas an absence of non-congruity.Due to lack of any syntactic criterion for vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a (adjectives) in Sanskrit,parsing Sanskrit texts with adjectives resulted in a high number of falsepositives. Hints from the vy\u00C4\u0081kara\u00E1\u00B9\u0087a texts helped us in the formulationof a criterion for vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a with syntactic and ontological constraints, whichprovided us a hint to decide the absence of non-congruity between two wordswith respect to the adjectival relation. A simple two-way classification ofnouns into dravya (substance) and gu\u00E1\u00B9\u0087a (quality) with further classificationsof gu\u00E1\u00B9\u0087as into gu\u00E1\u00B9\u0087avacanas was found to be necessary for handling adjectives.The same criterion was also found useful to handle the ambiguities betweena k\u00C4\u0081raka and non-k\u00C4\u0081raka relations. These criteria together with modelingyogyat\u00C4\u0081 as an absence of non-congruity resulted in 81% improvement inprecision.Finally, the fact that there can not be an adjective of an adjective, havingidentified a vi\u00C5\u009Be\u00E1\u00B9\u00A3ya, there is only one way all the vi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087as can connect withthe vi\u00C5\u009Be\u00E1\u00B9\u00A3ya. This theoretical input provided much relief from a practicalpoint of view, in the absence of which possible solutions would have beenexponential.Yogyat\u00C4\u0081 as an absence of non-congruity 796 AbbreviationsA: P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, See Pande, 2004Aa.b.c : adhy\u00C4\u0081ya(chapter),p\u00C4\u0081da(quarter),s\u00C5\u00ABtra number in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00ABBM: B\u00C4\u0081lamanoram\u00C4\u0081, see Pande, 2012MBh: Pata\u00C3\u00B1jali\u00E2\u0080\u0099s Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3ya, see M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0085sakaKP: K\u00C4\u0081vyaprak\u00C4\u0081\u00C5\u009Ba, see JhalakikarMS: M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087s\u00C4\u0081 s\u00C5\u00ABtra, through SBNK: Ny\u00C4\u0081yako\u00C5\u009Ba, see JhalkaikarPM: Padama\u00C3\u00B1jar\u00C4\u00AB, see MishraSB: \u00C5\u009A\u00C4\u0081bara Bh\u00C4\u0081\u00E1\u00B9\u00A3ya, see M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087saka, 1990VP: V\u00C4\u0081kyapad\u00C4\u00AByam, see Sharma, 1974ReferencesBhanumati, B. 1989. An Approach to Machine Translation among IndianLanguages. Tech. rep. Dept. of CSE, IIT Kanpur.Bharati, Akshar, Vineet Chaitanya, and Rajeev Sangal. 1995. Natural Lan-guage Processing: A Paninian Perspective. Prentice-Hall New Delhi.Bharati, Akshar, Samar Husain, Bharat Ambati, Sambhav Jain, Dipti MSharma, and Rajeev Sangal. 2008. \u00E2\u0080\u009CTwo semantic features make all thedifference in Parsing accuracy\u00E2\u0080\u009D. In: Proceedings of the 6th InternationalConference on Natural Language Processing (ICON-08). C-DAC, Pune.Cardona, George. 2007. P\u00C4\u0081\u00E1\u00B9\u0087ini and P\u00C4\u0081\u00E1\u00B9\u0087in\u00C4\u00AByas on \u00C5\u009Ae\u00E1\u00B9\u00A3a Relations. KunjunniRaja Academy of Indological Research Kochi.\u00E2\u0080\u0094 2009. \u00E2\u0080\u009COn the structure of P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s system\u00E2\u0080\u009D. In: Sanskrit ComputationalLinguistics 1 & 2. Ed. by G\u00C3\u00A9rard Huet, Amba Kulkarni, and Peter Scharf.Springer-Verlag LNAI 5402.Devasthali, G V. 1959. M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087s\u00C4\u0081: The v\u00C4\u0081kya \u00C5\u009B\u00C4\u0081stra of Ancient India. Book-sellers\u00E2\u0080\u0099 Publishing Co., Bombay.Huet, G\u00C3\u00A9rard, Amba Kulkarni, and Peter Scharf, eds. 2009. Sanskrit Com-putational Linguistics 1 & 2. Springer-Verlag LNAI 5402.Iyer, K A Subramania. 1969. Bhart\u00E1\u00B9\u009Bhari: A study of V\u00C4\u0081kyapad\u00C4\u00ABya in the lightof Ancient comentaries. Deccan College, Poona.\u00E2\u0080\u0094 1971. The V\u00C4\u0081kyapad\u00C4\u00ABya of Bhart\u00E1\u00B9\u009Bhari, chapter III pt i, English Transla-tion. Deccan College, Poona.Jha, Ganganatha. 1933. \u00C5\u009A\u00C4\u0081bara Bh\u00C4\u0081\u00E1\u00B9\u00A3ya. Oriental Institute Baroda.Jhalakikar, V R. 1920; 7th edition. K\u00C4\u0081vyaprak\u00C4\u0081\u00C5\u009Ba of Mamma\u00E1\u00B9\u00ADa with theB\u00C4\u0081labodhin\u00C4\u00AB. Bhandarkar Oriental Research Institute, Pune.\u00E2\u0080\u0094 1928. Ny\u00C4\u0081yako\u00C5\u009Ba. Bombay Sanskrit and Prakrit Series, 49, Poona.Jij\u00C3\u00B1\u00C4\u0081su, Brahmadatta. 1979. (In Hindi). A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB (Bh\u00C4\u0081\u00E1\u00B9\u00A3ya) Pratham\u00C4\u0081v\u00E1\u00B9\u009Btti.Ramlal Kapoor Trust Bahalgadh, Sonepat, Haryana, India.Joshi, S D. 1968. Pata\u00C3\u00B1jali\u00E2\u0080\u0099s Vy\u00C4\u0081kara\u00E1\u00B9\u0087a Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3ya Samarth\u00C4\u0081hnika (P2.1.1) Edited with Translation and Explanatory Notes. Center of Ad-vanced Study in Sanskrit, University of Poona, Poona.Joshi, S D and J.A.F. Roodbergen. 1975. Pata\u00C3\u00B1jali\u00E2\u0080\u0099s Vy\u00C4\u0081kara\u00E1\u00B9\u0087a Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3yaK\u00C4\u0081rak\u00C4\u0081hnikam (P 1.4.23\u00E2\u0080\u00931.4.55). Pune: Center of Advanced Study inSanskrit.80Yogyat\u00C4\u0081 as an absence of non-congruity 81\u00E2\u0080\u0094 1998. The A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB of P\u00C4\u0081\u00E1\u00B9\u0087ini with Translation and Explanatory Notes,Volume 7. Sahitya Akadamy, New Delhi.Katz, J J and J A Fodor. 1963. \u00E2\u0080\u009CThe structure of a Semantic Theory\u00E2\u0080\u009D.Language 39pp. 170\u00E2\u0080\u0093210.Kiparsky, Paul. 2009. \u00E2\u0080\u009COn the Architecture of Panini\u00E2\u0080\u0099s Grammar\u00E2\u0080\u009D. In: San-skrit Computational Linguistics 1 & 2. Ed. by G\u00C3\u00A9rard Huet, AmbaKulkarni, and Peter Scharf. Springer-Verlag LNAI 5402, pp. 33\u00E2\u0080\u009394.Kulkarni, Amba. 2013b. \u00E2\u0080\u009CA Deterministic Dependency Parser with DynamicProgramming for Sanskrit\u00E2\u0080\u009D. In: Proceedings of the Second InternationalConference on Dependency Linguistics (DepLing 2013). Prague, CzechRepublic: Charles University in Prague Matfyzpress Prague Czech Re-public, pp. 157\u00E2\u0080\u0093166. url: http://www.aclweb.org/anthology/W13-3718.Kulkarni, Amba and G\u00C3\u00A9rard Huet, eds. 2009. Sanskrit Computational Lin-guistics 3. Springer-Verlag LNAI 5406.Kulkarni, Amba, Sheetal Pokar, and Devanand Shukl. 2010. \u00E2\u0080\u009CDesigning aConstraint Based Parser for Sanskrit\u00E2\u0080\u009D. In: Fourth International SanskritComputational Linguistics Symposium. Ed. by G N Jha. Springer-Verlag,LNAI 6465, pp. 70\u00E2\u0080\u009390.Kulkarni, Amba and K. V. Ramakrishnamacharyulu. 2013a. \u00E2\u0080\u009CParsing San-skrit texts: Some relation specific issues\u00E2\u0080\u009D. In: Proceedings of the 5th Inter-national Sanskrit Computational Linguistics Symposium. Ed. by MalharKulkarni. D. K. Printworld(P) Ltd.Kulkarni, Amba, Preeti Shukla, Pavankumar Satuluri, and Devanand Shukl.2013c. \u00E2\u0080\u009CHow \u00E2\u0080\u0098Free\u00E2\u0080\u0099 is the free word order in Sanskrit\u00E2\u0080\u009D. In: Sanskrit Syntax.Ed. by Peter Scharf. Sanskrit Library, pp. 269\u00E2\u0080\u0093304.Mishra, Sri Narayana. 1985. K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081v\u00E1\u00B9\u009Btti\u00E1\u00B8\u00A5 along with commentaries Ny\u00C4\u0081saof Jinendrabuddhi and Padama\u00C3\u00B1jar\u00C4\u00AB of Haradattami\u00C5\u009Bra. Ratna Publica-tions, Varanasi.M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083saka\u00E1\u00B8\u00A5, Yudhi\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADhira. 1990. M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087s\u00C4\u0081 \u00C5\u009A\u00C4\u0081bara Bh\u00C4\u0081\u00E1\u00B9\u00A3ya. Ramlal KapoorTrust, Sonipat, Hariyana.\u00E2\u0080\u0094 1993. Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3yam, Pata\u00C3\u00B1jalimuniviracitam. Ramlal Kapoor Trust,Sonipat, Hariyana.Pande, Gopaldatta. 2000, Reprint Edition. Vaiy\u00C4\u0081kara\u00E1\u00B9\u0087a Siddh\u00C4\u0081ntakaumud\u00C4\u00ABof Bha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADojidik\u00E1\u00B9\u00A3ita (Text only). Chowkhamba Vidyabhavan, Varanasi.\u00E2\u0080\u0094 2012, Reprint Edition. Vaiy\u00C4\u0081kara\u00E1\u00B9\u0087a Siddh\u00C4\u0081ntakaumud\u00C4\u00AB of Bha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADojidik\u00E1\u00B9\u00A3itacontaining B\u00C4\u0081lamanoram\u00C4\u0081 of \u00C5\u009Ar\u00C4\u00AB V\u00C4\u0081sudevad\u00C4\u00ABk\u00E1\u00B9\u00A3ita. Chowkhamba Surab-harati Prakashan, Varanasi.82 Sanjeev Panchal and Amba KulkarniPande, Gopaldatta. 2004. A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB of P\u00C4\u0081\u00E1\u00B9\u0087ini elaborated by M.M.PanditrajDr. Gopal Shastri. Chowkhamba Surabharati Prakashan, Varanasi.Pataskar, Bhagyalata A. 2006. \u00E2\u0080\u009CSemantic Analysis of the technical terms inthe \u00E2\u0080\u0098A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u0099 meaning \u00E2\u0080\u0098Adjective\u00E2\u0080\u0099\u00E2\u0080\u009D. Annals of Bhandarkar OrientalResearch Institute 87pp. 59\u00E2\u0080\u009370.Raja, K Kunjunni. 1963. Indian Theories of Meaning. Adayar Library andResearch Center, Madras.Ramakrishnamacaryulu, K V. 2009. \u00E2\u0080\u009CAnnotating Sanskrit Texts Basedon \u00C5\u009A\u00C4\u0081bdabodha Systems\u00E2\u0080\u009D. In: Proceedings Third International San-skrit Computational Linguistics Symposium. Ed. by Amba Kulkarni andG\u00C3\u00A9rard Huet. Hyderabad India: Springer-Verlag LNAI 5406, pp. 26\u00E2\u0080\u009339.Ramanujatatacharya, N S. 2005. \u00C5\u009A\u00C4\u0081bdabodha M\u00C4\u00ABm\u00C4\u0081m\u00CC\u0087s\u00C4\u0081. Institut Fran\u00C3\u00A7is dePondich\u00C3\u00A9rry.Resnik, Phillip. 1993. \u00E2\u0080\u009CSemantic classes and syntactic ambiguity\u00E2\u0080\u009D. In: AR-RPA Workshop on Human Language Technology. Princeton.Sharma, Pandit Shivadatta. 2007. Vy\u00C4\u0081kara\u00E1\u00B9\u0087amah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3yam. ChaukhambaSanskrit Paratishthan, Varanasi.Sharma, Raghunath. 1974. V\u00C4\u0081kyapad\u00C4\u00AByam Part III with commentary Prak\u00C4\u0081\u00C5\u009Baby Helaraja and Ambakartri. Varanaseya Sanskrit Visvavidyalaya,Varanasi.Shastri, Swami Dwarikadas and Pt. Kalika Prasad Shukla. 1965. K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081v\u00E1\u00B9\u009Bt-ti\u00E1\u00B8\u00A5 with the Ny\u00C4\u0081sa and Padama\u00C3\u00B1jar\u00C4\u00AB. Varanasi: Chaukhamba SanskritPratishthan.Wilks, Yorick. 1975. \u00E2\u0080\u009CA preferential, pattern-seeking, semantics for NaturalLanguage Interface\u00E2\u0080\u009D. Artificial Intelligence 6pp. 53\u00E2\u0080\u009374.An \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach to Learning Context FreeGrammar Rules for Sanskrit Using AdaptorGrammarAmrith Krishna, Bodhisattwa Prasad Majumder, AnilKumar Boga, and Pawan GoyalAbstract: This work presents the use of Adaptor Grammar, a non-parametric Bayesian approach for learning (Probabilistic) Context-Free Grammar productions from data. In Adaptor Grammar, we pro-vide the set of non-terminals followed by a skeletal grammar thatestablishes the relations between the non-terminals in the grammar.The productions and the associated probability for the productionsare automatically learnt by the system from the usages of words orsentences, i.e., the dataset. This facilitates the encoding of prior lin-guistic knowledge through the skeletal grammar and yet the tiresometask of finding the productions is delegated to the system. The systemcompletely learns the grammar structure by observing the data. Wecall this approach the \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 approach. In this work, we discussthe effect of using Adaptor grammars for Sanskrit at word-level super-vised tasks such as compound type identification and also in identify-ing the source and derived words from corpora for derivational nouns.In both of the works, we show the use of sub-word patterns learnedusing Adaptor grammar as effective features for their correspondingsupervised tasks. We also present our novel approach of using AdaptorGrammars for handling Structured Prediction tasks in Sanskrit. Wepresent the preliminary results for the word reordering task in San-skrit. We also outline our plan for the use of Adaptor grammars forDependency Parsing and Poetry to Prose Conversion tasks.8384 Amrith Krishna et al1 IntroductionThe recent trends in Natural Language Processing (NLP) community sug-gest an increased application of black-box statistical approaches such as deeplearning. In fact, such systems are preferred as there has been an increase inthe performance of several NLP tasks such as machine translation, sentimentanalysis, word sense disambiguation, etc. (Manning 2016). In fact, MITTechnology Review reported the following regarding Noam Chomsky\u00E2\u0080\u0099s opin-ion about the extensive use of \u00E2\u0080\u0098purely statistical methods\u00E2\u0080\u0099 in AI. The reportsays that \u00E2\u0080\u009Cderided researchers in machine learning who use purely statisticalmethods to produce behavior that mimics something in the world, but whodon\u00E2\u0080\u0099t try to understand the meaning of that behavior.\u00E2\u0080\u009D (Cass 2011).Chomsky quotes, \u00E2\u0080\u009CIt\u00E2\u0080\u0099s true there\u00E2\u0080\u0099s been a lot of work on trying to applystatistical models to various linguistic problems. I think there have beensome successes, but a lot of failures. There is a notion of success ... whichI think is novel in the history of science. It interprets success as approxi-mating un-analyzed data.\u00E2\u0080\u009D (Pinker et al. 2011). Norvig (2011), in his replyto Chomsky, comes in defense of statistical approaches used in the com-munity. Norvig lays emphasis on the engineering aspects of the problemsthat the community deals with and the performance gains achieved in usingsuch approaches. He rightly attributes that, while the generative aspects ofa language can be deterministic, the analysis of a language construct canlead to ambiguity. As probabilistic models are tolerant to noise in the data,the use of such approaches is often necessary for engineering success. It isoften the case that the speakers of a language deviate from the laid outlinguistic rules in usage. This can be seen as noise in the dataset, and yetthe system we intend to build should be tolerant to such issues as well. Theuse of statistical approaches provides a convenient means of achieving thesame. But, the use of statistical approaches does not imply discarding of thelinguistic knowledge that we possess. Manning (2016) quotes the work ofPaul Smolensky, \u00E2\u0080\u009CWork by Paul Smolensky on how basically categorical sys-tems can emerge and be represented in a neural substrate (Smolensky andLegendre 2006). Indeed, Paul Smolensky arguably went too far down therabbit hole, devoting a large part of his career to developing a new categor-ical model of phonology, Optimality Theory (Prince and Smolensky 1993).\u00E2\u0080\u009DThis is an example where the linguistics and the statistical computationalmodels had a successful synergy, fruitful for both the domains.\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 85The Probabilistic Context-Free Grammars (PCFGs) provide a conve-nient platform for expressing linguistic structures with probabilistic priori-tization of the structures they accept. It has been shown that PCFGs canbe learned automatically using statistical approaches (Horning 1969). Inthis work, we look into Adaptor grammar (Johnson, T. L. Griffiths, andGoldwater 2007), a non-parametric Bayesian approach for learning gram-mar from the observations, say, sentences or word usages in the language.When given a skeletal grammar along with the fixed set of non-terminals,Adaptor grammar learns the right-hand side of the productions and theprobabilities associated with them. The grammar does so just by observingthe dataset provided to it, and hence the name \u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 approach.The use of Adaptor grammars for linguistic tasks provides the followingadvantages for a learning task.1. Adaptor grammars in effect output valid PCFGs, which in turn arecontext-free grammars, and thus are valid for linguistic representa-tions.2. It helps to encode linguistic information which is already described invarious formalisms via the skeletal grammars. Thus domain knowledgecan effectively be used. The only restriction here might be that theexpressive power of the grammar is limited to that of a Context-FreeGrammar.3. By leveraging the power of statistics, we can obtain the likelihoodof various possible parses, in case of structural ambiguity during ananalysis of a sentence.4. While the proposed structures might not be as competitive in perfor-mance as with the black-box statistical approaches such as the deeplearning approaches, the interpretability of the Adaptor grammar-based systems is a big plus. Grammar experts can look into the indi-vidual production rules learned by the system. This frees the expertsfrom coming up with the rules in the first place. Additionally, bylooking into the production rules, understandable to any domain ex-pert with the knowledge of context-free grammars, it can be validatedwhether the system has learned patterns that are relevant to the taskor not.86 Amrith Krishna et alIn Section 2, we discuss the preliminaries regarding Context-Free Gram-mars, Probabilistic CFGs, and Adaptor Grammar. In Section 3, we discussthe use of Adaptor grammars in various NLP tasks for different languages.We then describe the work performed in Sanskrit with Adaptor grammarsin Section 4. We then discuss future directions in Sanskrit tasks, specificallyfor multiple structured prediction tasks.2 Preliminaries - CFG and Probabilistic CFGContext-Free Grammar was proposed by Noam Chomsky who initiallytermed it as phrase structure grammar. Formally, a Context-Free GrammarG is a 4-tuple (kP\u00CE\u00A3P gP h), where k is a set of non-terminals, \u00CE\u00A3 is a finiteset of terminals, g is the set of productions from k to (k \u00E2\u0088\u00AA \u00CE\u00A3)\u00E2\u0088\u0097, where \u00E2\u0088\u0097is the \u00E2\u0080\u0098Kleene Star\u00E2\u0080\u0099 operation. h is an element of k which is treated as thestart symbol, which forms the root of the parse trees for every string ac-cepted by the grammar. Using the notation Ll for the language generatedby non-terminal m, the language generated by the grammar G is LS .Figure 1An example of a Context Free GrammarThe productions in Context-Free Grammars are often handcrafted byexpert linguists. it is common to have large CFGs for many of the real-lifeNLP tasks. It is common that a given string can have multiple possibleparses for the given grammar. This is due to the fact that a Context-FreeGrammar contains all possible choices that can be produced from a givenNon-terminal (O\u00E2\u0080\u0099Donnell 2015). The grammar neither provides a determin-istic parse nor prioritizes the parses. This leads to structural ambiguity inthe grammar. Probabilistic Context-Free Grammars (PCFGs) have beenintroduced to weigh the probable trees when the ambiguity arises, and thusprovide a means for prioritizing the desired rules. A PCFG is a 5-tuple\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 87(kP\u00CE\u00A3P gP hP \u0012), where \u0012, denotes a vector of real numbers in the range of[0P 1] indexed by productions of g, subject to noting gl for the set of pro-ductions of m in g, for all m in k we require\u00E2\u0088\u0091.\u00E2\u0088\u0088fX\u0012. = 1Figure 2Example of a Probabilistic Context Free Grammar corresponding to CFGshown in Figure 1The probabilities associated with all the productions of a given non-terminal should add up to 1. The probability of a given tree is nothingbut the product of the probabilities associated with the rules which areused to construct the tree. A given vector \u0012l denotes the parameters of amultinomial distribution that have the non-terminal m on their left-handside (LHS) (O\u00E2\u0080\u0099Donnell 2015).Note that PCFGs make two strong conditional independence assump-tions (O\u00E2\u0080\u0099Donnell 2015):1. The decision about expanding a non-terminal depends only on thenon-terminal and the given distribution for that non-terminal. Noother assumptions can be made.2. Following from the first assumption, a generated expression is inde-pendent of other expressions.There are numerous techniques suggested for the estimation of weightsfor the productions in PCFG. The Inside-Outside algorithm is a maximumlikelihood estimation approach based on the unsupervised Expectation max-imization parameter estimation method. Summarily, the algorithm starts byinitializing the parameters with a random set of values and then iteratively88 Amrith Krishna et almodifies the parameter values such that the likelihood of the training corpusis increased. The process continues until the parameter values converge, i.e.,no more improvement of the likelihood over the corpus is possible.Another way of estimating parameters is through the Bayesian Inferenceapproach (Johnson, T. Griffiths, and Goldwater 2007). Given a corpus ofstrings s = s)P s2:::::sn, we assume a CFG G generates all the strings inthe corpus. We take the dataset s and infer the parameters \u0012 using Bayes\u00E2\u0080\u0099theoreme (\u0012|s) \u00E2\u0088\u009D eG(s|\u0012)e (\u0012)where,eG(s|\u0012) =n\u00E2\u0088\u008Fi5)eG(si|\u0012)Now, the joint posterior distribution for the set of possible trees t andthe parameters \u0012 can be obtained bye (tP \u0012|s) \u00E2\u0088\u009D e (s|t)e (t|\u0012)e (\u0012) = (n\u00E2\u0088\u008Fi5)e (si|ti)e (ti|\u0012))e (\u0012)To calculate the posterior distribution, we assume that the parametersin \u0012 are drawn from a known distribution termed as the prior. We assumethat each non-terminal in the grammar has a given distribution which neednot be the same for all. For a non-terminal, the multinomial distributionis indexed by the respective productions and since we use Dirichlet priorover here, each production probability \u0012l\u00E2\u0086\u0092\u000C has a corresponding Dirichletparameter \u000Bl\u00E2\u0086\u0092\u000C. Now, either through Markov Chain Monte Carlo Sam-pling approaches (Johnson, T. Griffiths, and Goldwater 2007) or throughvariational inference or a hybrid approach, the parameters are learnt (Zhai,Boyd-Graber, and Cohen 2014).However, this approach as well does not deal with the real bottleneck,which is to come up with relevant rules which can solve a task for a givencorpus. For large datasets, the CFGs could have a large set of rules andit is often cumbersome to come up with rules by experts alone. Non-Parametric Bayesian Approaches have been proposed as modifications forPCFGs. Roughly, the Non-parametric Bayesian approaches can be seen aslearning a single model that can adapt its complexity to the data (Gersh-man and David M Blei 2012). The term non-parametric does not imply that\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 89there are no parameters associated with the learning algorithm, but ratherit implies that the number of parameters is not fixed, and increases with anincrease in data or observations.The most general version of learning PCFGs goes by the name of Infi-nite HMM or Infinite PCFG (Johnson 2010). In infinite PCFG, say for themodel described in Liang et al. (2007), we are provided with a set of atomiccategories and a combination of these categories as rules. Now, dependingon the data, the learning algorithm learns the productions and the numberof possible non-terminals along with the probabilities associated with them(Johnson 2010). Another variation that is popular with the Non-ParametricGrammar induction models is the Adaptor grammar (Johnson, T. L. Grif-fiths, and Goldwater 2007). Here, the number of non-terminals remainsfixed and is set manually. But, the production rules and their correspond-ing probabilities are obtained by inference. The productions are obtainedfor a subset of non-terminals which are \u00E2\u0080\u0098adapted\u00E2\u0080\u0099, and it uses a skeletalgrammar to obtain the linguistic structures.An Adaptor Grammar is a 7-tuple G = (kP\u00CE\u00A3P gP hP \u0012P VPX). Here V \u00E2\u008A\u0086k denotes non-terminals which are adapted, i.e., productions for the nonterminals in V will automatically be learnt from data. X is the Adaptorset, where Xl is a function that maps a distribution over trees Tl to adistribution over distributions over Tl (Johnson 2010).Figure 3Example of an Adaptor Grammar. The non-terminals marked with an \u00E2\u0080\u0098@\u00E2\u0080\u0099show that they are adapted. The productions will be learnt from data,where each production is a variable length permutation of subset of theelements in the alphabet setThe independence assumptions that exist for PCFGs are not anymorevalid in the case of Adaptor Grammars (Zhai, Boyd-Graber, and Cohen2014). Here the non-terminal m is defined in terms of another distributionHl . Now the adaptors for each of the non-terminal m, Xl , can be basedon Dirichlet Process or a generalisation of the same, termed as Pitman-Yor90 Amrith Krishna et alProcess. Here iDl(GY1 P GY2 :::::P GYm) is a distribution over all the treesrooted in the non-terminal mHl =\u00E2\u0088\u0091l\u00E2\u0086\u0092Y1:::Ym\u00E2\u0088\u0088fX\u0012l\u00E2\u0086\u0092Y1:::YmiDl(GY1 P GY2 :::::P GYm)Gl \u00E2\u0088\u00BC Xl(Hl)3 Adaptor Grammar in Computational LinguisticsAdaptor Grammar has been widely used in multiple morphological and syn-tactic tasks for various languages. Adaptor Grammar has been initiallyshown for word segmentation task in English (Johnson, T. L. Griffiths, andGoldwater 2007). A sentence with no explicit word boundaries was given asobservations and the task was to predict the actual words in the sentence.The task is similar to tasks for variable-length motif identification.Adaptor Grammars has been introduced by Johnson, T. L. Griffiths, andGoldwater (2007) as a non-parametric Bayesian framework for performinginference of syntactic grammar of a language over parse trees. A PCFG(Probabilistic Context-Free Grammar) and an adaptor function jointly de-fine an Adaptor grammar. The PCFG learns the grammar rules behind thedata generation process and the adaptor function maps the probabilities ofthe generated parse trees to substantially larger values than of the sameunder the conditionally independent PCFG model.Adaptor grammars have been very effectively used in numerous NLPrelated tasks. Johnson (2010) has drawn connections between topic mod-els and PCFGs and then proposed a model with combined insights fromadaptor grammars and topic models. While LDA defines topics project-ing documents to lower-dimensional space, Adaptor grammar defines thedistribution over trees. The author also projects a hybrid model to iden-tify topical collocations using the power of PCFG encoded topic models.Adaptor grammars are also used in named entity structure learning. Zhai,Kozareva, et al. (2016) has used adaptor grammars for identifying entitiesfrom shopping-related queries in an unsupervised manner.The word segmentation task is essentially identifying the individualwords from a continuous sequence of characters. This is seen as a chal-lenging task in computational cognitive science as well. Johnson (2008a)used Adaptor Grammar for word segmentation on the Bantu Language,\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 91\u00E2\u0080\u0098Sesotho\u00E2\u0080\u0099. The author specifically showed how the grammar with additionalsyllable structure yields a better F-score for word segmentation task than theusual collocation grammar. A similar study has been carried out by Kumar,Padr\u00C3\u00B3, and Oliver Gonz\u00C3\u00A1lez (2015). The authors present the mechanism tolearn complex agglutinative morphology with specific examples of three offour Dravidian languages, Tamil, Malayalam, and Kannada. Furthermore,the authors specifically have stressed upon the task of dealing with sandhiusing finite-state transducers after producing morphological segment genera-tion using Adaptor grammars. Adaptor grammar succeeds in leveraging theknowledge about the agglutinative nature of the Dravidian language but re-frains from modeling the specific morphotactic regularities of the particularlanguage. Johnson also demonstrates the effect of syllabification on wordsegmentation task using PCFGs (Johnson 2008b). Johnson further moti-vates the usability of the aforementioned unsupervised approaches for wordsegmentation and grammar induction tasks by extracting the collocationaldependencies between words (Johnson and Demuth 2010).Due to their generalizability, Adaptor grammars have been used exten-sively in NLP. Hardisty, Boyd-Graber, and Resnik (2010) achieves state-of-the-art accuracy in perspective classification using adaptive Na\u00C3\u00AFve Bayesmodel \u00E2\u0080\u0093 the adaptor grammar-based non-parametric Bayesian model. Be-sides this, adaptor grammar has been proven to be effective in grammarinduction (Cohen, David M Blei, and Smith 2010). Grammar inductionis an unsupervised syntax learning task. The authors achieved consider-able results along with the finding that the variational inference algorithm(David M. Blei, Kucukelbir, and McAuliffe 2017) can be extended to thelogistic normal prior instead of the Dirichlet prior. Neubig et al. (2011)proposed an unsupervised model for phrase alignment and extraction wherethey claimed that their method can be thought of as an adaptor grammarover two languages. Zhai, Kozareva, et al. (2016) has presented a work,where the authors attempted to identify relevant suggestive keywords toa typed query so as to improve the results for search in an e-commercesite. The authors previously presented a new variational inference approachthrough a hybrid of Markov chain Monte Carlo and variational inference.It has been reported that the hybrid scheme has improved scalability with-out compromising the performance on typical common tasks of grammarinduction.Botha and Blunsom (2013) presented a new probabilistic model thatextends Adaptor grammar to make it learn word segmentation and mor-92 Amrith Krishna et alpheme lexicons in an unsupervised manner. Stem derivation in Semitic lan-guages such as Arabic achieves better performance using this mildly context-sensitive grammar formalism. Again, Eskander, Rambow, and Yang (2016)recently investigated with Adaptor Grammars for unsupervised morpholog-ical segmentation to establish a claim of language-independence. Keepingaside other baselines such as morphological knowledge input from externalsources and other cascaded architectures, adaptor grammar proved to beoutperforming in a majority of the cases.Another use of Adaptor grammar has been seen in the identification ofnative language (Wong, Dras, and Johnson 2012). Authors used adaptorgrammar in identifying n-gram collocations of an arbitrary length over amix of Parts of Speech tags and words to feed them as a feature in the clas-sifier. By modeling the task with syntactic language models, the authorsshowed that extracted collocations efficiently represent the native language.Besides grammar induction, Huang, Zhang, and Tan (2011) further usesAdaptor grammar for machine transliteration. The PCFG framework helpsto learn syllable equivalent in both languages and hence aids in the auto-matic phonetic translation. Furthermore, Feldman et al. (2013) recentlyexplored a Bayesian model to understand how feedback from segmentedwords can alter the phonetic category learning of infants due to access tothe knowledge of the joint occurrence of word-pairs.As an extension to the standard Adaptor Grammar, O\u00E2\u0080\u0099Donnell (2015)presented Fragment Grammars which were built as a generalization of Adap-tor Grammars. They generalize Adaptor Grammars by scoping the pro-ductivity and abstraction to occur at any point within individual storedstructures. The specific model has adopted \u00E2\u0080\u0098stochastic memoization\u00E2\u0080\u0099 as anefficient substructure storing mechanism from the Adaptor grammar frame-work. It further memoizes partial internal computations via a lazy evalua-tion version of the original storage mechanism given by Adaptor Grammar.4 Adaptor Grammar for SanskritAdaptor Grammars have also been used for Sanskrit as well, mainly as ameans of obtaining variable-length character n-grams to be used as fea-tures for classification tasks. Below, we describe two different applications,compound type identification, as well as identifying the Taddhita suffix forderivational nouns.\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 934.1 Variable Length Character n-grams for compound typeidentification1Krishna, Satuluri, Sharma, et al. (2016) used adaptor grammars for identi-fying patterns present in different types of compound words. The underlyingtask was, given a compound word in Sanskrit, to identify the type of thecompound. The problem was a multi-class classification problem. The clas-sifier needed to classify a given compound into one of the four broad classes,namely, Avyay\u00C4\u00ABbh\u00C4\u0081va, Dvandva, Bahuvr\u00C4\u00ABhi, Tatpuru\u00E1\u00B9\u00A3a.The system is developed as an ensemble-based supervised classifier. Weused the Random Forests classifier with an easy ensemble approach to han-dle the class imbalance problem persisting in the data. The classifier hada majority of its labels in Tatpuru\u00E1\u00B9\u00A3a. The presence of Avyay\u00C4\u00ABbh\u00C4\u0081va was theleast. The classifier incorporated rich features from multiple sources. Therules from A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB pertaining to compounds that are of conditional na-ture i.e. contains those containing selectional constraints were encoded asa feature. This was encoded by applying those selectional restrictions overthe input compounds. Variable-length character n-grams for each class ofcompounds were obtained from adaptor grammar. Each filtered productionfrom the compound class-specific grammar was used as a feature. We alsoincorporated noun pairs that follow the knowledge structure in Amarako\u00C5\u009Baas mentioned in Nair and A. Kulkarni (2010). We used a selected subset ofrelations from Nair and A. Kulkarni (2010).We capture semantic class-specific linguistic regularities present in ourdataset using variable-length character n-grams and character n-gram col-locations shared between compounds using adaptor grammars.We learn 3 separate grammars namely, G1, G2, and G3, with the sameskeletal structure as Figure 4a, but with different data samples belongingto Tatpuru\u00E1\u00B9\u00A3a, Bahuvr\u00C4\u00ABhi and Dvandva respectively. We did not learngrammar for Avyay\u00C4\u00ABbh\u00C4\u0081va, due to insufficient data samples for learning thepatterns. We use a \u00E2\u0080\u0098$\u00E2\u0080\u0099 marker to indicate the word boundary between thecomponents, where the components were in sandhi split form. A \u00E2\u0080\u0098#\u00E2\u0080\u0099 sym-bol was added to mark the beginning and end of the first and the finalcomponents, respectively. We also learn a grammar G4, where the entiredataset is taken together along with additional 4000 random pair of words1The work has been done as part of the compound type identification work publishedin Krishna, Satuluri, Sharma, et al. (2016). Please refer to the aforementioned work for adetailed explanation of the concepts described here.94 Amrith Krishna et alfrom the Digital Corpus of Sanskrit, where none of the words appeared asa compound component in the corpus. The co-occurrence or the absence ofit was taken as the proxy for compatibility between the components. Theskeletal grammar in Figure 4b has two adapted non-terminals, both markedby \u00E2\u0080\u0098@\u00E2\u0080\u0099. Also, the adapted non-terminal \u00E2\u0080\u0098Word\u00E2\u0080\u0099 is a non-terminal appearingas a production to the adapted non-terminal \u00E2\u0080\u0098Collocation\u00E2\u0080\u0099. The \u00E2\u0080\u0098+\u00E2\u0080\u0099 symbolindicates the notion of one or more occurrence of \u00E2\u0080\u0098Word\u00E2\u0080\u0099, as used in regularexpressions. This is not standard to use the notation in productions as percontext-free grammar. This is ideally achieved using recursive grammarsin CFGs with additional non-terminals. But, in order to present a simplerrepresentation of skeletal grammar, we followed this scheme. In subsequentrepresentations, we will be using recursiveness instead of the \u00E2\u0080\u0098+\u00E2\u0080\u0099 notation.Figure 4a) Skeletal grammar for the adaptor grammar b) Derivation tree for aninstance of a production \u00E2\u0080\u0098#sa$ \u00C5\u009Ba\u00E2\u0080\u0099 for the non-terminal @CollocationEvery production in the learned grammars has a probability to be in-voked, where the likelihood of all the productions of a non-terminal, sumsto one. To obtain discriminative productions from G1, G2, and G3, we findconditional entropy of the productions with that of G4 and filter only thoseproductions above a threshold. We also consider all the unique productionsin each of the Grammars in G1 to G3. We further restrict the productionsbased on the frequency of the production in the data and the length of thesub-string produced by the production, both of them were kept at the valueof three.We show an instance of one such production for a variable-length char-acter n-gram collocation. Here, for the adapted non-terminal @Collocation,we find that one of the production finally derives \u00E2\u0080\u0098#sa$ \u00C5\u009Ba\u00E2\u0080\u0099, which actuallyis derived as two @Word derivations as shown in the Figure 4b. We usethis as a regular expression, which captures some properties that need tobe satisfied by the concatenated components. The particular production\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 95mandates that the first component must be exactly sa, as it is sandwichedbetween the symbols # and $. Now, since \u00C5\u009Ba occurs after the previous sub-string which contains $ the boundary for both the components, \u00C5\u009Ba shouldbelong to the second component. Now, since as per the grammar both thesubstrings are independent @word productions, we relax the constraint thatboth the substrings should occur immediately one after the other. We treatthe same as a regular expression, such that \u00C5\u009Ba should occur after sa, and anynumber of characters can come in between both the substrings. For this par-ticular pattern, we had 22 compounds, all of those belonging to Bahuvr\u00C4\u00ABhi,which satisfied the criteria. Now, compounds where the first componentis \u00E2\u0080\u0098sa\u00E2\u0080\u0099 are mostly Bahuvr\u00C4\u00ABhi compounds, and this is obvious to Sanskritlinguists. But here, the system was not provided with any such prior infor-mation or possible patterns. The system learned the pattern from the data.Incidentally, our dataset consisted of a few compound samples belonging todifferent classes as well where the first component was \u00E2\u0080\u0098sa\u00E2\u0080\u0099.4.1.1 ExperimentsDataset - We obtained a labeled dataset of compounds and the decom-posed pairs of components from the Sanskrit studies department, UoHyd2.The dataset contains more than 32,000 unique compounds. The compoundswere obtained from ancient digitised texts including \u00C5\u009Ar\u00C4\u00ABmad Bhagavat G\u00C4\u00ABta,Caraka sa\u00E1\u00B9\u0083hit\u00C4\u0081 among others. The dataset contains the sandhi split com-ponents along with the compounds. With more than 75% of the dataset con-taining Tatpuru\u00E1\u00B9\u00A3a compounds, we down-sample the Tatpuru\u00E1\u00B9\u00A3a compoundsto a count of 4000, to match with the second-highest class, Bahuvr\u00C4\u00ABhi. Wefind that the Avyay\u00C4\u00ABbh\u00C4\u0081va compounds are severely under-represented in thedata-set, with about 5% of the Bahuvr\u00C4\u00ABhi class. From the dataset, we fil-tered 9,952 different data-points split into 7,957 data points for training andthe remaining as a held-out dataset.Result - To measure the impact of different types of features we incorpo-rated, we train the classifier incrementally with different feature types. Wereport the results over the held-out data. At first, we train the system withonly A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB rules and some additional hand-crafted rules. We find thatthe overall accuracy of the system is about 59.34%. Then we augmented theclassifier by adding features from Amarako\u00E1\u00B9\u00A3a. We find that the overall accu-racy of the system has increased to 63.81%. We then finally add the adaptor2http://sanskrit.uohyd.ac.in/scl/96 Amrith Krishna et alClass P R FA 0.92 0.43 0.58B 0.85 0.74 0.79D 0.69 0.39 0.49T 0.68 0.88 0.77Table 1Classwise performance of the Random Forests Classifier.grammar-based features which have increased the performance of the systemto an accuracy of 74.98 %. The effect of adding adaptor grammar featureswas more visible for the improvement in the performance of Dvandva andBahuvr\u00C4\u00ABhi. Notably, the precision for Dvandva and Bahuvr\u00C4\u00ABhi increased byabsolute values 0.15 and 0.06 respectively, when compared to the resultsbefore adding adaptor grammar-based features. Table 1 presents the resultof the system with the entire feature set per Compound class. The additionof adaptor grammar features has resulted in an overall increase in the per-formance of the system from 63.81 % to 74.91 %. The patterns for adaptorgrammar were learned only using the data from the training set and theheld-out data was not used. This was done so as to ensure no over-fittingof data takes place. Also, we filtered the productions with length less than3 and which do not occur many times in the grammar.4.2 Distinctive Patterns in Derivational Nouns in Taddhita3Derivational nouns are a means of vocabulary expansion in a language. Anew word is created in a language where an existing word is modified byan affix. Taddhita is a category of such derivational affixes which are usedto derive a pr\u00C4\u0081tipadika from another pr\u00C4\u0081tipadika. The challenge here is toidentify Taddhita pr\u00C4\u0081tipadikas from corpora in Sanskrit and also to identifytheir source words.Pattern-based approaches often result in false positives. The edit dis-tance, a popular distance metric to compare the similarity of two givenstrings, between the source and derived words due to the patterns tendsto vary from 1 to 6. For example, consider the word \u00E2\u0080\u0098r\u00C4\u0081va\u00E1\u00B9\u0087i\u00E2\u0080\u0099 derived from3The work has been done as part of the Derivational noun word pair identification workpublished in Krishna, Satuluri, Ponnada, et al. (2017). Please refer to the aforementionedwork for a detailed explanation of the concepts described here.\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 97\u00E2\u0080\u0098r\u00C4\u0081va\u00E1\u00B9\u0087a\u00E2\u0080\u0099, where the edit distance between the words is just 1. But, \u00E2\u0080\u0098\u00C4\u0080\u00C5\u009B-val\u00C4\u0081yana\u00E2\u0080\u0099 derived from \u00E2\u0080\u0098a\u00C5\u009Bvala\u00E2\u0080\u0099 has an edit distance of 6. Also, the word\u00E2\u0080\u0098k\u00C4\u0081la\u00C5\u009Ba\u00E2\u0080\u0099 is derived from the word \u00E2\u0080\u0098kala\u00C5\u009Ba\u00E2\u0080\u0099, but \u00E2\u0080\u0098k\u00C4\u0081ra\u00E1\u00B9\u0087a\u00E2\u0080\u0099 is not derived from\u00E2\u0080\u0098kara\u00E1\u00B9\u0087a\u00E2\u0080\u0099. Similarly \u00E2\u0080\u0098stutya\u00E2\u0080\u0099 is derived from \u00E2\u0080\u0098stu\u00E2\u0080\u0099 but using a k\u00E1\u00B9\u009Bt affix. But,dak\u00E1\u00B9\u00A3i\u00E1\u00B9\u0087\u00C4\u0081 (South direction) is used to derive d\u00C4\u0081k\u00E1\u00B9\u00A3hi\u00E1\u00B9\u0087\u00C4\u0081tya (Southern) with ataddhita affix. If we have to use v\u00E1\u00B9\u009Bddhi as an indicator, which is the onlydifference between both the examples, then there are cases such as k\u00C4\u0081rakaderived from k\u00E1\u00B9\u009B for k\u00E1\u00B9\u009Bt and a\u00E1\u00B9\u00A3vaka is derived from a\u00E1\u00B9\u00A3va using taddhita. Allthese instances show the level of ambiguity that can arise in deciding thepairs of source and derived words using taddhita. All the aforementionedexamples show the need for knowledge of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB (or the knowledge ofaffixes), semantic relation between the word pairs or a combination of theseto resolve the right set of word pairs.The approach proposed in Krishna, Satuluri, Ponnada, et al. (2017) firstidentifies a high recall low precision set of word pairs from multiple San-skrit Corpora based on pattern similarities as exhibited by the 137 affixesin Taddhita. Once the patterns are obtained, we look for various similar-ities between the word pairs to group them together. We use rules fromA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB , especially from the Taddhita section. But since we could notincorporate rules of semantic and pragmatic nature, to compensate for themissing rules, we tried to identify patterns from the word pairs, specificallythe source words, to be used. We use Adaptor Grammar for the purpose.Currently, we do not identify the exact affix that leads to the derivationof the word. Also, since the affixes are distinguished not just by the visiblepattern, but also by the \u00E2\u0080\u0098it\u00E2\u0080\u0099 markers, it is challenging to identify the exactaffix. So, we group all those affixes that result in similar patterns into asingle group. All the word pairs that follow the same pattern belong toone group. To further increase the group size, we group all those entriesthat differ by v\u00E1\u00B9\u009Bddhi and gu\u00E1\u00B9\u0087a also into the same group. Such distinctionsare not considered while forming a group. Effectively we only look into thepattern at the end of the \u00E2\u0080\u0098derived word\u00E2\u0080\u0099. We call all such collection of groupsbased on the patterns as our \u00E2\u0080\u0098candidate set\u00E2\u0080\u0099.For every distinct pattern in our candidate set, we first identify the wordpairs and then create a graph with the given word pairs. A word pair is anode and edges are formed between nodes where they match different setof similarities. The first set of similarities are based on rules directly fromA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, while the second set of node similarities were using charactern-grams using Adaptor grammars. Once the similarities were found, we98 Amrith Krishna et alapply the Modified Adsorption approach (Talukdar and Crammer 2009) onthe graph. The modified adsorption is a semi-supervised label prorogationapproach where labels are provided to a subset of nodes and then propagatedto the remaining nodes based on the similarity they share with other nodes.Figure 5 shows a sample construction of the graph for the word pairs,where words differ by a pattern \u00E2\u0080\u0098ya\u00E2\u0080\u0099. Here every pair obtained by patternmatching is a node. Now, Modified Adsorption is a semi-supervised ap-proach. So, we need a limited number of labeled nodes. The nodes markedin grey are labeled nodes. They are called as seed nodes. The label here isjust binary, i.e. a word pair can either be a true Taddhita pair or not. Now,edges are formed between the word pairs. Modified Adsorption provides ameans of designing the graph explicitly, while many of its predecessors reliedmore on nearest-neighbor based approaches (Zhu and Ghahramani 2002).Also, the edges can be weighted based on the closeness between differentnodes. Once the graph structure is defined, we perform the modified ad-sorption. In this approach, the labels from the seed nodes are propagatedthrough the edges, such that the labels from seed nodes are propagated toother unlabelled nodes as well. The highly similar nodes should be givensimilar labels or else the optimization function penalizes any other label as-signments. We use three different means of obtaining similarities betweenthe nodes. The first such set of similarity is the rules in A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB thatthe pair of nodes have a match with. The second set of similarity is the sumof probabilities of productions from adaptor grammar, which are matchedfor a pair of nodes. The third is the word vector similarity between thesource words in the node pairs. For a detailed working of the system and adetailed explanation of each set of features please refer to Krishna, Satuluri,Ponnada, et al. (2017). Here, we republish the working of the second setof features obtained using Adaptor grammar and the results of the modelthereafter.Character n-grams similarity by Adaptor Grammar - P\u00C4\u0081\u00E1\u00B9\u0087ini hadan obligation to maintain brevity, as his grammar treatise was supposed tobe memorized and recited orally by humans (Kiparsky 1994). In A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB,P\u00C4\u0081\u00E1\u00B9\u0087ini uses character sub-strings of varying lengths as conditional rules forchecking the suitability of the application of an affix. We examine if thereare more such regularities in the form of variable-length character n-gramsthat can be observed from the data, as brevity is not a concern for us. Also,we assume this would compensate for the loss of some of the informationwhich P\u00C4\u0081\u00E1\u00B9\u0087ini originally encoded using pragmatic rules. In order to identify\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 99Figure 5Graph structure for the group of words where derived words end in \u00E2\u0080\u0098ya\u00E2\u0080\u0099.Nodes in grey denote seed nodes, where they are marked with their classlabel. The Nodes in white are unlabelled nodes.the regularities in the pattern in the words, we use Adaptor grammar.In Listing 1, \u00E2\u0080\u0098Word\u00E2\u0080\u0099 and \u00E2\u0080\u0098Stem\u00E2\u0080\u0099 are non-terminals, which are adapted.The non-terminal \u00E2\u0080\u0098Suffix\u00E2\u0080\u0099 consists of the set of various end-patterns.lory\u00E2\u0086\u0092 htem huffixlory\u00E2\u0086\u0092 htemhtem\u00E2\u0086\u0092 Xhvrshuffix\u00E2\u0086\u0092 v|yv|:::::|VyvnvListing 1: Skeletal CFG for the Adaptor grammarThe set A2 captures all the variable-length character n-grams learned asthe productions by the grammar along with the probability score associatedwith the production. We form an edge between two nodes in Gi2, if thereexists an entry in A2, which are present in both the nodes. We sum theprobability value associated with all such character n-grams common to thepair of nodes vj P vk \u00E2\u0088\u0088 ki, and calculate the edge score \u001Cj;k. If the edge score100 Amrith Krishna et alis greater than zero, we find the sigmoid of the value so obtained to assignthe weight to the edge. The expression for calculating \u001Cj;k in the equationgiven below uses the Iverson bracket (Knuth 1992) to show the conditionalsum operation. The equation essentially makes sure that the probabilitiesassociated with only those character n-grams get summed, which are presentin both the nodes. We define the edge score \u001Cj;k, weight set li2 and Edgeset Ei2 as follows.\u001Cj;k =|A2|\u00E2\u0088\u0091l5)vk2;l[vk2;l = vj2;l]Evk;vji2 ={1 \u001Cj;k S 00 \u001Cj;k = 0lvk;vji2 ={\u001B(\u001Cj;k) \u001Cj;k S 00 \u001Cj;k = 0As mentioned, we use the label distribution per node obtained fromphase 1 as the seed labels in this setting.4.2.1 ExperimentsAs we mentioned, we use three different set of similarity sets for weightingthe edges. But, in Modified Adsorption (MAD) the edge weight requires tobe a scalar. This implies a similarity score between a pair of nodes usingone similarity function can be used at a time. hence, we chose to applythe similarity weights sequentially on the graph. An alternative would havebeen to obtain a weighted average of the different similarity scores. But, ourpipeline approach can be seen as a means of bootstrapping our seeds set.In Modified Adsorption, we need to provide seed labels, which are labels forsome of the nodes. In reality, the seed nodes do not have a binary assignmentof the labels, rather a distribution of the labels (Talukdar and Crammer2009). So after the run of each similarity set, we get a label distribution foreach of the nodes in the graph. This label distribution is used to generateseed nodes in the subsequent run of the modified adsorption. The seed nodesalso get modified during the run of the algorithm.Dataset - We use multiple lexicons and corpora to obtain our vocabu-lary C. We use IndoWordNet (M. Kulkarni et al. 2010), the Digital Corpus of\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 101Sanskrit4, a digitized version of the Monier Williams5 Sanskrit-English dic-tionary, a digitized version of the Apte Sanskrit-Sanskrit Dictionary (Goyal,G. P. Huet, et al. 2012) and we also utilize the lexicon employed in the San-skrit Heritage Engine (Goyal and G. Huet 2016). We obtained close to170,000 unique word lemmas from the combined resources.Results - In Krishna, Satuluri, Ponnada, et al. (2017), we report resultsfrom 11 of the patterns from a total of more than 80 patterns we initiallyobtained. Due to the lack of enough evidence in the form of data-points wedid not attempt the procedure for others. Here, we only show results for 5 ofthe patterns, which were selected based on the size of evidence from the cor-pora we obtain. Since we use each of the similarity set sequentially, we haveoutputs at each of the phase of the sequences. The result of the system afterincorporating A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB rules is bVDW1, while that after incorporatingAdaptor grammar ngrams is bVDW2 and the final result after the wordvector similarity is bVD. Now, since we have 5 different patterns, we havean index i sub-scripted to the systems to denote the corresponding patterns.We additionally use a baseline called as Label Propagation (LP), based onthe algorithm by Zhu and Ghahramani (2002). We can find that the systemswhich incorporates adaptor grammar are thebVD andbVDW2. Both thesystems are the best and second best performing systems respectively.Table 2 shows the results of our system. We compare the performanceof 5 different patterns, selected based on the number of candidate wordpairs available for the pattern. The system proposed in the work bVDiperforms the best for all the 5 patterns. Interestingly, bVDW2i is thesecond best-performing system in all cases. The system uses 3 kinds ofsimilarity measures in a sequential pipeline of which adaptor grammar comesas the second feature set. To understand the impact of adding adaptorgrammar-based features, we can compare the results with that ofbVDW1i.The system shows the result for each of the patterns before using adaptorgrammar-based features.A baseline using the label propagation algorithm was also used. Themotive behind the label propagation baseline was to measure the effect ofModified adsorption on the task. In Label Propagation, we experimentedwith the parameter K with different values, K \u00E2\u0088\u0088 {10P 20P 30P 40P 50P 60}, andfound that K = 40, provides the best results for 3 of the 5 end-patterns.4http://kjc-sv013.kjc.uni-heidelberg.de/dcs/5http://www.sanskrit-lexicon.uni-koeln.de/monier/102 Amrith Krishna et alPattern System P R AaMAD 0.72 0.77 73.86MADB2 0.68 0.68 68.18MADB1 0.49 0.52 48.86LP 0.55 0.59 55.68akaMAD 0.77 0.67 73.33MADB2 0.71 0.67 70MADB1 0.43 0.4 43.33LP 0.75 0.6 70inMAD 0.74 0.82 76.47MADB2 0.67 0.70 67.65MADB1 0.51 0.56 51.47LP 0.63 0.65 63.23yaMAD 0.7 0.72 70.31MADB2 0.61 0.62 60.94MADB1 0.53 0.59 53.12LP 0.56 0.63 56.25iMAD 0.55 0.52 54.76MADB2 0.44 0.38 45.24MADB1 0.3 0.29 30.95LP 0.37 0.33 38.09Table 2Comparative performance of the four competing models.The values for K are set by empirical observations. We find that for those3 patterns (\u00E2\u0080\u0098a\u00E2\u0080\u0099,\u00E2\u0080\u0098in\u00E2\u0080\u0099,\u00E2\u0080\u0098i\u00E2\u0080\u0099), the entire vertex set has v\u00E1\u00B9\u009Bddhi attribute set to thesame value. For the other two (\u00E2\u0080\u0098ya\u00E2\u0080\u0099,\u00E2\u0080\u0098aka\u00E2\u0080\u0099), K = 50 gave the best results.Here, the vertex set has nodes where the v\u00E1\u00B9\u009Bddhi attribute is set to either ofthe values. For a better insight towards this finding, the notion of the patternthat we use in the design of the system needs to be elaborated. A pattern iseffectively the substrings that remain in both the source word and derivedword after removing the portions which are common in both. This patternis the visible change that happens in the derivation of a word. To reduce thenumber of distinct patterns we did not consider the pattern changes thatoccur due to v\u00E1\u00B9\u009Bddhi and gu\u00E1\u00B9\u0087a as distinct patterns, rather we abstractedthem out. Now, multiple affixes may lead to the generation of the same setof patterns. In the case of pattern, rather end-pattern, (Krishna, Satuluri,Ponnada, et al. 2017), \u00E2\u0080\u0098a\u00E2\u0080\u0099, the effect may be the result of application of oneof the following affixes such as a\u00E1\u00B9\u0087 a\u00C3\u00B1 etc. Here, all the affixes of pattern \u00E2\u0080\u0098a\u00E2\u0080\u0099\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 103lead to v\u00E1\u00B9\u009Bddhi. But for the pattern \u00E2\u0080\u0098ya\u00E2\u0080\u0099, the affixes may or may not lead toa v\u00E1\u00B9\u009Bddhi. We report the best result for each of the system in Table 2.5 Inference of Syntactic Structure in SanskritIn this section, we are reporting an ongoing work, where we investigate theeffectiveness of using Adaptor grammar for inference of syntactic structuresin Sanskrit. We experiment with the effect of Adaptor Grammar in cap-turing the \u00E2\u0080\u0098natural order\u00E2\u0080\u0099 or the word order followed in prose. For thistask, we use a dataset of Sanskrit sentences which are in prose order. Thedataset consists of 2000 sentences from Pa\u00C3\u00B1c\u00C4\u0081khy\u00C4\u0081naka and more than 600sentences from Mah\u00C4\u0081bh\u00C4\u0081rata, . For this experiment, we only consider themorphological classes of the words involved in the sentences. Currently, weuse the morphological tags as used in the Sanskrit Library.6 We keep 500of the sentences for testing and the remaining 2000 are used for identifyingthe patterns. Some of the constructs had one or two words, which we ignorefor the experiment.We learn the necessary productions in grammar and then evaluate thegrammar on the 500 test sentences. We calculate the likelihood of gener-ating each of the sentences. In order to test the likelihood of the correctsentence, we also generate all possible permutations of the morphologicaltags in each of the test sentences. For sentences of length S 5, we breakthem into sub-sequences of 5 and find the permutations of the sub-sequencesand concatenate them again. This is used as a means of sampling the pos-sible combinations as the explicit enumeration of all the permutations arecomputationally costly. From the generated candidate set we find the like-lihood of the ground truth sentence and rank them. We report our resultsbased on two measures.1. Edit Distance (ED) - The edit distance of the top-ranked sentenceamong the candidate set for a given sentence with that of the groundtruth. Edit distance is roughly described as the minimum numberof operations required to convert one string to another based on afixed set of operations with predefined costs. We use the standardLevenshtein distance (Levenshtein 1966), where the three operationsare \u00E2\u0080\u0098insert\u00E2\u0080\u0099, \u00E2\u0080\u0098delete\u00E2\u0080\u0099 and \u00E2\u0080\u0098substitution\u00E2\u0080\u0099. All the 3 operations have a6http://sanskritlibrary.org/helpmorphids.html104 Amrith Krishna et alcost of 1. We compare the ground truth sentence with the predictedsentence that has the highest likelihood to obtain the measure. Thepredicted sentences with a lower Edit Distance implies a better result.2. Mean Reciprocal Rank (MRR) - Mean Reciprocal Rank is theaverage of reciprocal ranks for each of the queries. Here a test sentenceis treated as a query. The different permutations are the retrievedresults for the query. So from the ranked retrieved list, we find theinverse of the rank of the gold standard sentence. The better the MRRScore, the better the result.1|f||e|\u00E2\u0088\u0091i5)relirvnkiWe first attempt the same skeletal grammars as proposed by Johnson,T. L. Griffiths, and Goldwater (2007) for capturing the syntactic regularities.We used both the \u00E2\u0080\u0098unigram\u00E2\u0080\u0099 and \u00E2\u0080\u0098collocation\u00E2\u0080\u0099 grammar as mentioned in thework. Figures 6 and 7 show the first two grammars that we have used forthe task.Figure 6Unigram grammar as used in Johnson, T. L. Griffiths, and Goldwater(2007)With these grammars, we experimented with various hyper-parametersettings. Since both the grammars are right recursive grammars, the lengthof the productions so learnt from the grammar varied greatly. Though this isbeneficial for identifying the word lengths, the association with the morpho-logical tags cannot be much longer. Secondly, the number of productions tobe learnt is user-defined hyper-parameter. We find that due to the possiblevarying length size of strings and fewer observations, the main morpholog-ical patterns that were learnt as the productions were not repeated enoughin the observations to be statistically significant.\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 105Figure 7Collocation grammar as used in Johnson, T. L. Griffiths, and Goldwater(2007)Figure 8Modified grammar by eliminating the recursiveness in the Adaptednonterminal \u00E2\u0080\u0098@Word\u00E2\u0080\u0099.We modified both the grammars to restrict the length of the productionsto a maximum of 4 and limited the number of productions to be learnt. Weshow the modification done to the adapted non-terminal \u00E2\u0080\u0098word\u00E2\u0080\u0099 in both thegrammars. This restricts the number of productions that \u00E2\u0080\u0098word\u00E2\u0080\u0099 can learn.The modified portion can be seen in Figure 8.The results for all the four grammars are shown in Table 3. It can beseen that there is considerable improvement in the Mean Reciprocal Rankand the edit distance measures for the task with the restricted grammar.On our manual inspection of the patterns learnt from all the grammars, itwas observed that the initial skeletal grammars were essentially over-fittingthe training instances due to longer lengths. The modified grammars couldreduce the Edit distance to almost half and double the Mean Reciprocal106 Amrith Krishna et alGrammar MRR EDUnigram 0.2923 4.87Collocation 0.3016 4.66Modified Unigram 0.4025 3.21Modified Collocation 0.5671 2.20Table 3Results for the word reordering task.Rank for the task.For example, consider the sentence \u00E2\u0080\u0098tatra budha\u00E1\u00B8\u00A5 vrata cary\u00C4\u0081 sam\u00C4\u0081ptau\u00C4\u0081gacchat (\u00C4\u0081 agacchat)\u00E2\u0080\u0099 from Mah\u00C4\u0081bh\u00C4\u0081rata. Consider the corresponding se-quence of morphological tags as shown, \u00E2\u0080\u0098i m1s iic f3s f7s i ipf[1]_a3s\u00E2\u0080\u0099.7 Wefilter out the \u00E2\u0080\u0098iic\u00E2\u0080\u0099 tags as the \u00E2\u0080\u0098iic\u00E2\u0080\u0099 tag stands for the compound component.It can be seen as part of the immediate next noun tag following it. We donot filter out the \u00E2\u0080\u0098i\u00E2\u0080\u0099 tags as of now, where \u00E2\u0080\u0098i\u00E2\u0080\u0099, stands for the indeclinable.So in effect the tag sequence is \u00E2\u0080\u0098i m1s f3s f7s i ipf[1]_a3s\u00E2\u0080\u0099. The \u00E2\u0080\u0098Colloca-tion\u00E2\u0080\u0099 Grammar had the following sequence as the most likely output \u00E2\u0080\u0098i f7s im1s f3s ipf[1]_a3s\u00E2\u0080\u0099 with an edit distance of 4. In the \u00E2\u0080\u0098Modified Collocation\u00E2\u0080\u0099Grammar the predicted sequence is \u00E2\u0080\u0098i m1s f3s i f7s ipf[1]_a3s\u00E2\u0080\u0099. The editdistance of the sentence is 2. Here, it can be seen that just 2 tags haveswapped their position. The tags \u00E2\u0080\u0098i\u00E2\u0080\u0099 and \u00E2\u0080\u0098f7s\u00E2\u0080\u0099 have changed their positions,but are still at adjacent positions to each other. The fourth and fifth wordsin the original sentence have changed to become the fifth and fourth wordsin the predicted sentence.The results shown here are preliminary in nature. What excites us themost is the provision this framework provides to incorporate the syntacticknowledge which is explicitly defined in our grammar formalisms. With thiswork, we plan to extend the work to two immediate tasks. First, we planto extend the word-reordering task to the poetry to prose conversion task.Currently, the task is to convert a bag of words into its corresponding proseor the \u00E2\u0080\u0098natural order\u00E2\u0080\u0099. But we will investigate the regularities involved inpoetry apart from the aspects of meter and incorporate the regularities toguide the grammar in picking up those patterns. We can also attempt tolearn the conditional probabilities for the syntactic patterns in both poetry7We follow the notations from Sanskrit Library - http://sanskritlibrary.org/helpmorphids.html\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 107and prose. Second, we will be performing the Dependency parse analysisof given sentences at a morphological level. Goyal and A. Kulkarni (2014)presents a scheme for converting Sanskrit constructs in constituency parsestructure to Dependency parse structure. Headden III, Johnson, and Mc-Closky (2009) provides some insights into the use of PCFGs and lexicalevidence for unsupervised dependency parsing. Currently, we will be work-ing only on the projective dependency parsing. We will be relying on theDependency Model with Valence to define our PCFG formalism for depen-dency parsing.6 ConclusionThe primary goal of this work was to look into the applicability of the Adap-tor Grammars, a non-parametric Bayesian approach for learning syntacticstructures from observations. In this work, we introduced the basic con-cepts of the Adaptor grammars, various applications in which the grammaris used in NLP tasks. We provide detailed descriptions of how adaptorgrammar is used in word-level vocabulary expansion tasks in Sanskrit. Theadaptor grammars were used as effective sub-word n-gram features for bothCompound type identification and Derivational noun pair identification. Wefurther showed the feasibility of using adaptor grammar for syntactic levelanalysis of sentences in Sanskrit. We plan to investigate the feasibility ofusing the Adaptor grammars for dependency parsing and poetry to proseconversion tasks at the sentence level.AcknowledgementsThe authors acknowledge the use of the morphologically tagged databaseof the Pa\u00C3\u00B1c\u00C4\u0081khy\u00C4\u0081naka and Mah\u00C4\u0081bh\u00C4\u0081rata produced under the direction ofProfessor Peter M. Scharf while laureate of a Blaise Pascal Research Chairat the Universit\u00C3\u00A9 Paris Diderot 2012\u00E2\u0080\u00932013 and maintained by The SanskritLibrary.ReferencesBlei, David M., Alp Kucukelbir, and Jon D. McAuliffe. 2017. \u00E2\u0080\u009CVariationalInference: A Review for Statisticians\u00E2\u0080\u009D. Journal of the American Statis-tical Association 112.518pp. 859\u00E2\u0080\u0093877. doi: 10.1080/01621459.2017.1285773. eprint: https://doi.org/10.1080/01621459.2017.1285773.url: https://doi.org/10.1080/01621459.2017.1285773.Botha, Jan A and Phil Blunsom. 2013. \u00E2\u0080\u009CAdaptor Grammars for LearningNon- Concatenative Morphology\u00E2\u0080\u009D. In: Proceedings of the 2013 Conferenceon Empirical Methods in Natural Language Processing. Association forComputational Linguistics.Cass, Stephen. 2011. Unthinking Machines, Artificial intelligence needs areboot, say experts. url: https://www.technologyreview.com/s/423917/unthinking-machines/.Cohen, Shay B, David M Blei, and Noah A Smith. 2010. \u00E2\u0080\u009CVariational in-ference for adaptor grammars\u00E2\u0080\u009D. In: Human Language Technologies: The2010 Annual Conference of the North American Chapter of the Asso-ciation for Computational Linguistics. Association for ComputationalLinguistics, pp. 564\u00E2\u0080\u0093572.Eskander, Ramy, Owen Rambow, and Tianchun Yang. 2016. \u00E2\u0080\u009CExtendingthe Use of Adaptor Grammars for Unsupervised Morphological Segmen-tation of Unseen Languages.\u00E2\u0080\u009D In: COLING, pp. 900\u00E2\u0080\u0093910.Feldman, Naomi H, Thomas L Griffiths, Sharon Goldwater, and James LMorgan. 2013. \u00E2\u0080\u009CA role for the developing lexicon in phonetic categoryacquisition.\u00E2\u0080\u009D Psychological review 120.4p. 751.Gershman, Samuel J and David M Blei. 2012. \u00E2\u0080\u009CA tutorial on Bayesian non-parametric models\u00E2\u0080\u009D. Journal of Mathematical Psychology 56.1pp. 1\u00E2\u0080\u009312.Goyal, Pawan and G\u00C3\u00A9rard Huet. 2016. \u00E2\u0080\u009CDesign and analysis of a lean in-terface for Sanskrit corpus annotation\u00E2\u0080\u009D. Journal of Language Modelling4.2pp. 145\u00E2\u0080\u0093182.Goyal, Pawan, G\u00C3\u00A9rard P Huet, Amba P Kulkarni, Peter M Scharf, andRalph Bunker. 2012. \u00E2\u0080\u009CA Distributed Platform for Sanskrit Processing.\u00E2\u0080\u009DIn: COLING, pp. 1011\u00E2\u0080\u00931028.Goyal, Pawan and Amba Kulkarni. 2014. \u00E2\u0080\u009CConverting Phrase Structures toDependency Structures in Sanskrit\u00E2\u0080\u009D. In: Proceedings of COLING 2014,108\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 109the 25th International Conference on Computational Linguistics: Tech-nical Papers, pp. 1834\u00E2\u0080\u00931843.Hardisty, Eric A, Jordan Boyd-Graber, and Philip Resnik. 2010. \u00E2\u0080\u009CModel-ing perspective using adaptor grammars\u00E2\u0080\u009D. In: Proceedings of the 2010Conference on Empirical Methods in Natural Language Processing. As-sociation for Computational Linguistics, pp. 284\u00E2\u0080\u0093292.Headden III, William P, Mark Johnson, and David McClosky. 2009. \u00E2\u0080\u009CIm-proving unsupervised dependency parsing with richer contexts andsmoothing\u00E2\u0080\u009D. In: Proceedings of Human Language Technologies: The 2009Annual Conference of the North American Chapter of the Associationfor Computational Linguistics. Association for Computational Linguis-tics, pp. 101\u00E2\u0080\u0093109.Horning, James Jay. 1969. A study of grammatical inference. Tech. rep.STANFORD UNIV CALIF DEPT OF COMPUTER SCIENCE.Huang, Yun, Min Zhang, and Chew Lim Tan. 2011. \u00E2\u0080\u009CNonparametricbayesian machine transliteration with synchronous adaptor grammars\u00E2\u0080\u009D.In: Proceedings of the 49th Annual Meeting of the Association for Com-putational Linguistics: Human Language Technologies: short papers-Volume 2. Association for Computational Linguistics, pp. 534\u00E2\u0080\u0093539.Johnson, Mark. 2008a. \u00E2\u0080\u009CUnsupervised word segmentation for Sesotho us-ing adaptor grammars\u00E2\u0080\u009D. In: Proceedings of the Tenth Meeting of ACLSpecial Interest Group on Computational Morphology and Phonology.Association for Computational Linguistics, pp. 20\u00E2\u0080\u009327.\u00E2\u0080\u0094 2008b. \u00E2\u0080\u009CUsing Adaptor Grammars to Identify Synergies in the Unsuper-vised Acquisition of Linguistic Structure.\u00E2\u0080\u009D In: ACL, pp. 398\u00E2\u0080\u0093406.\u00E2\u0080\u0094 2010. \u00E2\u0080\u009CPCFGs, topic models, adaptor grammars and learning topicalcollocations and the structure of proper names\u00E2\u0080\u009D. In: Proceedings of the48th Annual Meeting of the Association for Computational Linguistics.Association for Computational Linguistics, pp. 1148\u00E2\u0080\u00931157.Johnson, Mark and Katherine Demuth. 2010. \u00E2\u0080\u009CUnsupervised phonemic Chi-nese word segmentation using Adaptor Grammars\u00E2\u0080\u009D. In: Proceedings ofthe 23rd international conference on computational linguistics. Associa-tion for Computational Linguistics, pp. 528\u00E2\u0080\u0093536.Johnson, Mark, Thomas L Griffiths, and Sharon Goldwater. 2007. \u00E2\u0080\u009CAdap-tor grammars: A framework for specifying compositional nonparametricBayesian models\u00E2\u0080\u009D. In: Advances in neural information processing sys-tems, pp. 641\u00E2\u0080\u0093648.110 Amrith Krishna et alJohnson, Mark, Thomas Griffiths, and Sharon Goldwater. 2007. \u00E2\u0080\u009CBayesianinference for pcfgs via markov chain monte carlo\u00E2\u0080\u009D. In: Human LanguageTechnologies 2007: The Conference of the North American Chapter ofthe Association for Computational Linguistics; Proceedings of the MainConference, pp. 139\u00E2\u0080\u0093146.Kiparsky, Paul. 1994. \u00E2\u0080\u009CPaninian linguistics\u00E2\u0080\u009D. The Encyclopedia of Languageand Linguistics 6pp. 2918\u00E2\u0080\u00932923.Knuth, Donald E. 1992. \u00E2\u0080\u009CTwo notes on notation\u00E2\u0080\u009D. The American Mathemat-ical Monthly 99.5pp. 403\u00E2\u0080\u0093422.Krishna, Amrith, Pavankumar Satuluri, Harshavardhan Ponnada, MuneebAhmed, Gulab Arora, Kaustubh Hiware, and Pawan Goyal. 2017. \u00E2\u0080\u009CAGraph Based Semi-Supervised Approach for Analysis of DerivationalNouns in Sanskrit\u00E2\u0080\u009D. In: Proceedings of TextGraphs-11: the Workshopon Graph-based Methods for Natural Language Processing. Vancouver,Canada: Association for Computational Linguistics, pp. 66\u00E2\u0080\u009375. url:http://www.aclweb.org/anthology/W17-2409.Krishna, Amrith, Pavankumar Satuluri, Shubham Sharma, Apurv Kumar,and Pawan Goyal. 2016. \u00E2\u0080\u009CCompound Type Identification in Sanskrit:What Roles do the Corpus and Grammar Play?\u00E2\u0080\u009D In: Proceedings of the6th Workshop on South and Southeast Asian Natural Language Process-ing (WSSANLP2016). Osaka, Japan: The COLING 2016 OrganizingCommittee, pp. 1\u00E2\u0080\u009310.Kulkarni, Malhar, Chaitali Dangarikar, Irawati Kulkarni, Abhishek Nanda,and Pushpak Bhattacharyya. 2010. \u00E2\u0080\u009CIntroducing Sanskrit Wordnet\u00E2\u0080\u009D. In:Proceedings on the 5th Global Wordnet Conference (GWC 2010), Narosa,Mumbai, pp. 287\u00E2\u0080\u0093294.Kumar, Arun, Llu\u00C3\u00ADs Padr\u00C3\u00B3, and Antoni Oliver Gonz\u00C3\u00A1lez. 2015. \u00E2\u0080\u009CJointBayesian Morphology learning of Dravidian Languages\u00E2\u0080\u009D. In: RICTA2015: Proceedings of the Joint Workshop on Language Technology forClosely Related Languages, Varieties and Dialects: Hissan, Bulgaria:September 10, 2015: proceedings book.Levenshtein, Vladimir I. 1966. \u00E2\u0080\u009CBinary codes capable of correcting deletions,insertions, and reversals\u00E2\u0080\u009D. In: Soviet physics doklady. Vol. 10, pp. 707\u00E2\u0080\u0093710.Liang, Percy, Slav Petrov, Michael I Jordan, and Dan Klein. 2007. \u00E2\u0080\u009CThe Infi-nite PCFG Using Hierarchical Dirichlet Processes.\u00E2\u0080\u009D In: EMNLP-CoNLL,pp. 688\u00E2\u0080\u0093697.\u00E2\u0080\u0098Ekalavya\u00E2\u0080\u0099 Approach 111Manning, Christopher D. 2016. \u00E2\u0080\u009CComputational linguistics and deep learn-ing\u00E2\u0080\u009D. Computational Linguistics.Nair, Sivaja S and Amba Kulkarni. 2010. \u00E2\u0080\u009CThe Knowledge Structure inAmarakosa.\u00E2\u0080\u009D In: Sanskrit Computational Linguistics. Springer, pp. 173\u00E2\u0080\u0093189.Neubig, Graham, Taro Watanabe, Eiichiro Sumita, Shinsuke Mori, andTatsuya Kawahara. 2011. \u00E2\u0080\u009CAn unsupervised model for joint phrasealignment and extraction\u00E2\u0080\u009D. In: Proceedings of the 49th Annual Meet-ing of the Association for Computational Linguistics: Human Lan-guage Technologies-Volume 1. Association for Computational Linguis-tics, pp. 632\u00E2\u0080\u0093641.Norvig, Peter. 2011. On Chomsky and the Two Cultures of Statistical Learn-ing. url: http://norvig.com/chomsky.html.O\u00E2\u0080\u0099Donnell, Timothy J. 2015. Productivity and reuse in language: A theoryof linguistic computation and storage. MIT Press.Pinker, Steven, Emilio Bizzi, Sydney Brenner, Noam Chomsky, Marvin Min-sky, and Barbara H. Partee. 2011. Keynote Panel: The Golden Age: ALook at the Original Roots of Artificial Intelligence, Cognitive Science,and Neuroscience. url: http://languagelog.ldc.upenn.edu/myl/PinkerChomskyMIT.html.Prince, Alan and Paul Smolensky. 1993. Optimality Theory: Constraint in-teraction in generative grammar. John Wiley & Sons, the version pub-lished in 2008.Smolensky, Paul and G\u00C3\u00A9raldine Legendre. 2006. The harmonic mind: Fromneural computation to optimality-theoretic grammar (Cognitive architec-ture), Vol. 1. MIT press.Talukdar, Partha and Koby Crammer. 2009. \u00E2\u0080\u009CNew regularized algorithmsfor transductive learning\u00E2\u0080\u009D. Machine Learning and Knowledge Discoveryin Databasespp. 442\u00E2\u0080\u0093457.Wong, Sze-Meng Jojo, Mark Dras, and Mark Johnson. 2012. \u00E2\u0080\u009CExploringAdaptor Grammars for Native Language Identification\u00E2\u0080\u009D. In: Proceedingsof the 2012 Joint Conference on Empirical Methods in Natural LanguageProcessing and Computational Natural Language Learning, pp. 699\u00E2\u0080\u0093709.Zhai, Ke, Jordan Boyd-Graber, and Shay B Cohen. 2014. \u00E2\u0080\u009COnline adaptorgrammars with hybrid inference\u00E2\u0080\u009D. Transactions of the Association forComputational Linguistics 2pp. 465\u00E2\u0080\u0093476.Zhai, Ke, Zornitsa Kozareva, Yuening Hu, Qi Li, and Weiwei Guo. 2016.\u00E2\u0080\u009CQuery to Knowledge: Unsupervised Entity Extraction from Shopping112 Amrith Krishna et alQueries using Adaptor Grammars\u00E2\u0080\u009D. In: Proceedings of the 39th Interna-tional ACM SIGIR conference on Research and Development in Infor-mation Retrieval. ACM, pp. 255\u00E2\u0080\u0093264.Zhu, Xiaojin and Zoubin Ghahramani. 2002. Learning from Labeled andUnlabeled Data with Label Propagation. Tech. rep.A user-friendly tool for metrical analysis of SanskritverseShreevatsa RajagopalanAbstract: This paper describes the design and implementation of atool that assists readers of metrical verse in Sanskrit (and otherlanguages/literatures with similar prosody). It is open-source, andavailable online as a web application, as a command-line tool and as asoftware library. It handles both var\u00E1\u00B9\u0087av\u00E1\u00B9\u009Btta and m\u00C4\u0081tr\u00C4\u0081v\u00E1\u00B9\u009Btta metres. Ithas many features for usability without placing strict demands on itsusers. These include allowing input in a wide variety of transliterationschemes, being fairly robust against typographical or metrical errorsin the input, and \u00E2\u0080\u009Caligning\u00E2\u0080\u009D the given verse in light of the recognizedmetre.This paper describes the various components of the system and itsuser interface, and details of interest such as the heuristics used in theidentifier and the dynamic-programming algorithm used for displayingresults. Although originally and primarily designed to help readers,the tool can also be used for additional applications such as detectingmetrical errors in digital texts (its very first version identified 23 errorsin a Sanskrit text from an online corpus), and generating statisticsabout metres found in a larger text or corpus. These applications areillustrated here, along with plans for future improvements.1 Introduction1.1 DemoAs a software tool is being discussed, it seems best to start with ademonstration of a potential user interaction with the tool. Suppose I wishto learn about the metre of the following subh\u00C4\u0081\u00E1\u00B9\u00A3ita (which occurs in thePratij\u00C3\u00B1\u00C4\u0081yaugandhar\u00C4\u0081ya\u00E1\u00B9\u0087a attributed to Bh\u00C4\u0081sa):113114 Rajagopalank\u00C4\u0081\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u0081d agnir j\u00C4\u0081yate mathya-m\u00C4\u0081n\u00C4\u0081d-bh\u00C5\u00ABmis toya\u00E1\u00B9\u0083 khanya-m\u00C4\u0081n\u00C4\u0081 dad\u00C4\u0081ti |sots\u00C4\u0081h\u00C4\u0081n\u00C4\u0081\u00E1\u00B9\u0083 n\u00C4\u0081styas\u00C4\u0081dhya\u00E1\u00B9\u0083 nar\u00C4\u0081\u00E1\u00B9\u0087\u00C4\u0081\u00E1\u00B9\u0083m\u00C4\u0081rg\u00C4\u0081rabdh\u00C4\u0081\u00E1\u00B8\u00A5 sarva-yatn\u00C4\u0081\u00E1\u00B8\u00A5 phalanti ||Then I can visit the tool\u00E2\u0080\u0099s website, http://sanskritmetres.appspot.com, enter the above verse (exactly as above), and correctly learn that itis in the metre \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB. More interestingly, suppose I do not have the versecorrectly: perhaps I am quoting it from memory (possibly having misheardit, and unaware of the line breaks), or I have found the text on a (not veryreliable) blog, or some errors have crept into the digital text, or possibly Ijust make some mistakes while typing. In such a case, possibly even withan unreasonable number of mistakes present, I can still use the tool in thesame way. Thus, I can enter the following error-ridden input (which, forillustration, is encoded this time in the ITRANS convention):kaaShThaad agni jaayatemathyamaanaad bhuumistoya khanyamaanaa /daati sotsaahaanaaM naastyasaadhyaMnaraaNaaM maargaabdhaaH savayatnaaH phalantiihi //Here, some syllables have the wrong prosodic weight (laghu instead ofguru and vice-versa), some syllables are missing, some have been introducedextraneously, not a single line of the input is correct, and even the totalnumber of syllables is wrong. Despite this, the tool identifies the metreas \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB. The output from the tool, indicating the identified metre, andhighlighting the extent to which the given verse corresponds to that metre,is shown in Figure 1. The rest of this paper explains how this is done, amongother things.1.2 BackgroundA large part of Sanskrit literature, in k\u00C4\u0081vya, \u00C5\u009B\u00C4\u0081stra and other genres, is inverse (padya) rather than prose (gadya). A verse in Sanskrit (not countingsome modern Sanskrit poets\u00E2\u0080\u0099 experimentation with \u00E2\u0080\u009Cfree verse\u00E2\u0080\u009D and the like)is invariably in metre.Computer tools to recognize the metre of a Sanskrit verse are notnew. A script in the Perl programming language, called sscan, writtenby John Smith, is distributed among other utilities at the http://bombay.Metrical analysis of Sanskrit verse 115Figure 1A screenshot of the output from the tool for highly erroneous input.Despite the errors, the metre is correctly identified as \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB. The gurusyllables are marked in bold, and the deviations from the expected metricalpattern (syllables with the wrong weight, or missing or superfluoussyllables) are underlined (and highlighted in red).116 Rajagopalanindology.info website, and although the exact date is unknown, thetimestamp in the ZIP file suggests a date of 1998 or earlier for this file (Smith1998?). This script, only 61 lines long (38 source lines not includingcomments and description) was the spark of inspiration that initiated thewriting of the tool being described in the current paper, in 2013. Othersoftware or programs include those by Murthy (2003?), by A. Mishra (2007)and by Melnad, Goyal, and P. M. Scharf (2015). A general introduction tometre and Sanskrit prosody is omitted in this paper for reasons of space,as the last of these papers (Melnad, Goyal, and P. M. Scharf 2015) quiteexcellently covers the topic.Like these other tools, the tool being discussed in this paper recognizesthe metre given a Sanskrit verse. It is available in several forms: as aweb application hosted online at http://sanskrit-metres.appspot.com,as a commandline tool, and as a Python library; all are available in thesource-code form at https://github.com/shreevatsa/sanskrit. It isbeing described here for two reasons:1. It has some new features that I think will be interesting (seesection 1.4), some of which distinguish it from other tools. Thedevelopment of this tool has thrown up a few insights (see Section 4)which may be useful to others who would like to develop better toolsin the future.2. A question was raised about this tool (P. Scharf 2016), namely:\u00E2\u0080\u009CAn open-source web archive of metrically related soft-ware and data can be found at https://github.com/shreevatsa/sanskrit with an interface at http://sanskritmetres.appspot.com/. The author and contributors to this archiveand data were unknown at the time and not included in ourliterature review. No description of the extent, comprehen-siveness, and effectiveness of the software has been found.\u00E2\u0080\u009DI took this as encouragement that such a description may be desirable/ of interest to others.1.3 The intended userThe tool can be useful for someone trying to read or compose Sanskrit verses,and for someone checking a text for metrical errors. In other words, the toolMetrical analysis of Sanskrit verse 117can be used by different kinds of users: a curious learner, an editor workingwith a text (checking verses for metrical correctness), a scholar investigatingthe metrical signature of a text, or an aspiring poet. To make these concrete,consider the following \u00E2\u0080\u009Cuser stories\u00E2\u0080\u009D as motivating examples.\u00E2\u0080\u00A2 Devadatta is learning Sanskrit. He knows that Sanskrit verse iswritten in metre and that this is supposed to make it easier to chant orrecite. But he knows very little about various metres, so that when helooks at a verse, especially one in a longer metre like \u00C5\u009A\u00C4\u0081rd\u00C5\u00ABla-vikr\u00C4\u00AB\u00E1\u00B8\u008Ditamor Sragdhar\u00C4\u0081, he cannot quickly recognize the metre. All he sees is astring of syllables, and he has no idea where to pause (yati), how torecite, or even where to break at p\u00C4\u0081d\u00C4\u0081s if they are not indicated clearlyin the text he is reading. With this tool, these problems are solved,and he can focus on understanding and appreciating the poetry, nowthat he can read it aloud rhythmically and melodically and savor itssounds.\u00E2\u0080\u00A2 Chitralekha is a scholar. She works with digital texts that, thoughuseful to have, are sometimes of questionable provenance and do notalways meet her standards of critical editions. Errors might havecrept into the texts, and she has the idea that some instances ofscribal emendation or typographical errors (such as omitted letters,extraneous letters, or transposed letters) are likely to cause metricalerrors as well. With this tool, she can catch a great many of them(see Section 3). Sometimes, she is interested in questions aboutprosody itself, such as: what are all the metres used in this text?Which ones are the most common? How frequently does the poetX use a particular \u00E2\u0080\u009Cpoetic license\u00E2\u0080\u009D of scansion? What are the rulesgoverning Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh (\u00C5\u009Aloka), typically? This tool can help her withsuch questions too.\u00E2\u0080\u00A2 Kamban would like to write poetry, like his famous namesake. Hehas a good command of vocabulary and grammar and has some poeticimagination, but when he writes a verse, especially in an \u00E2\u0080\u009Cexotic\u00E2\u0080\u009D (tohim) metre, he is sometimes unsure whether he has got all the syllablesright. With this tool, he enters his tentative attempt and sees whetheranything is off. He knows that the metres will soon become second-nature to him and he will not need the tool anymore, but still he118 Rajagopalanwishes he could have more help\u00E2\u0080\u0094such as choosing his desired metre,and knowing what he needs to add to his partially composed verse.With the names of these users as mnemonics, we can say that the toolcan be used to discover, check, and compose metrical verse and facts aboutthem.1.4 User-friendly featuresAs mentioned earlier, the tool has several features for easing the user\u00E2\u0080\u0099s job:1. It accepts a wide variety of input scripts (transliteration schemes).Unlike most tools, it does not enforce the input to be in any particularinput scheme or system of transliteration. Instead, it accepts IAST,Harvard-Kyoto, and ITRANS transliteration, Unicode Devan\u00C4\u0081gar\u00C4\u00AB,and Unicode Kannada scripts, without the user having to indicatewhich input scheme is used. The tool is agnostic to the input methodused, as it converts all input to an internal representation based onSLP1 (P. M. Scharf and Hyman 2011). It is straightforward to extendto other scripts or transliteration methods, such as SLP1 or other Indicscripts.2. It is highly robust against typographical errors or metrical errors inthe verse that is input. This is perhaps the most interesting featureof the tool and is useful because the text in the \u00E2\u0080\u009Cwild\u00E2\u0080\u009D is not alwayserror-free.3. It can detect the metre even from partial verses\u00E2\u0080\u0094even if the user isnot aware that the verse one is looking up is incomplete.4. Informative \u00E2\u0080\u009Cdisplay\u00E2\u0080\u009D of a verse in relation to the identified metre,by aligning the verse to the metre using a dynamic programmingalgorithm to find the best alignment.5. Supports learning more about a metre, by pointing to other examplesof the metre, and audio recordings of the metre being recited in severalstyles (where available).6. Quick link to provide feedback (by creating an issue on GitHub),specific to the input verse being processed on the page.Metrical analysis of Sanskrit verse 119PreprocessingMetrical(raw)dataBuild MetricalindexIdentifyUserinputDetectinputscheme Transliterateto SLP1InputschemeSLP1(phonemes) ScansionSLP1(withpunctuation)DisplayMetricalsignature(patternlines)Listofmetres PrettyoutputFigure 2A \u00E2\u0080\u009Cdata flow diagram\u00E2\u0080\u009D of the system\u00E2\u0080\u0099s operation. The rectangles denotedifferent forms taken by the data; the ovals denote code that transforms (oruses, or generates) the data.2 How it worksThis section describes how the system works. At a high level, there are thefollowing steps/components:1. Metrical data, about many known metres. This has been entered intothe system.2. Building the Index (Pre-processing): from the metrical data, variousindices are generated.3. Detection and Transliteration: The input supplied by the user isexamined, the input scheme detected, and transliterated into SLP1.4. Scansion: The SLP1 text (denoting a set of phonemes) is translatedinto a metrical signature (a pattern of laghus and gurus).5. Matching: The metrical signature is compared against the index, toidentify the metre (or metres).6. Display: The user\u00E2\u0080\u0099s input is displayed to the user, appropriately re-formatted to fit the identified metre(s), and with highlighting of anydeviations from the metrical ideal.These steps are depicted in Figure 2, and described in more detail in thefollowing subsections.120 Rajagopalan2.1 Metrical dataThis is the raw data about all the metres known to the system. Theyare stored in the JSON format, so that they could be used by otherprograms too. In what follows, a metrical pattern is defined as string overthe alphabet {LPG}, i.e., a sequence of symbols each of which is either L(denoting laghu or a light syllable) or G (denoting guru or a heavy syllable).As described elsewhere (Melnad, Goyal, and P. M. Scharf 2015), there aretwo main types of metres, var\u00E1\u00B9\u0087av\u00E1\u00B9\u009Btta and m\u00C4\u0081tr\u00C4\u0081v\u00E1\u00B9\u009Btta (note that (Murthy2003) points out that the \u00C5\u009Aloka metre constitutes a third type by itself),with the former having three subtypes:1. samav\u00E1\u00B9\u009Btta metres, in which all four p\u00C4\u0081das of a verse have the samemetrical pattern,2. ardhasamav\u00E1\u00B9\u009Btta metres, in which odd p\u00C4\u0081das have one pattern and evenp\u00C4\u0081das another (so that the two halves of the verse have the samemetrical pattern),3. vi\u00E1\u00B9\u00A3amav\u00E1\u00B9\u009Btta metres, in which potentially all four p\u00C4\u0081das have differentmetrical patterns.Correspondingly each metre\u00E2\u0080\u0099s characteristics are indicated in this systemwith the minimal amount of data necessary:1. samav\u00E1\u00B9\u009Btta metres are represented by a list of length one (or forconvenience, simply a string), containing the pattern of each of theirp\u00C4\u0081das,2. ardhasamav\u00E1\u00B9\u009Btta metres are represented by a list of length two,containing the pattern of the odd p\u00C4\u0081das followed by the pattern ofthe even p\u00C4\u0081das,3. vi\u00E1\u00B9\u00A3amav\u00E1\u00B9\u009Btta metres are represented by a list of length four, containingthe pattern for each of the four p\u00C4\u0081as.Additionally, with the pattern, yati can be indicated; also spaces can beadded, which are ignored. The yati is ignored for identification, but usedlater for displaying information about the metre. Here are some lines, asexamples:Metrical analysis of Sanskrit verse 121{# ...['\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB', 'GGGG\u00E2\u0080\u0094GLGGLGG'],['Prahar\u00E1\u00B9\u00A3i\u00E1\u00B9\u0087\u00C4\u00AB', 'GGGLLLLGLGLGG'],['Bhuja\u00E1\u00B9\u0085gapray\u00C4\u0081tam', 'LGG LGG LGG LGG'],# ...['Viyogin\u00C4\u00AB', ['LLGLLGLGLG','LLGGLLGLGLG']],# ...['Udgat\u00C4\u0081', ['LLGLGLLLGL','LLLLLGLGLG','GLLLLLLGLLG','LLGLGLLLGLGLG']],# ...}For m\u00C4\u0081tr\u00C4\u0081v\u00E1\u00B9\u009Btta metres (those based on the number of morae: m\u00C4\u0081tr\u00C4\u0081s),the constraints are more subtle, and as not every syllable\u00E2\u0080\u0099s weight is fixed,there are so many patterns that fit each metre that it may not be efficient togenerate and store each pattern separately. Instead, the system representsthem by using a certain conventional notation, which expands to regularexpressions. This notation is inspired by the elegant notation describedin another paper (Melnad, Goyal, and P. M. Scharf 2015), and uses aparticularly useful description of the \u00C4\u0080ry\u00C4\u0081 and related metres available in apaper by Ollett Ollett (2012).# ...[\"\u00C4\u0080ry\u00C4\u0081\", [\"22 4 22\", \"4 22 121 22 .\", \"22 4 22\", \"4 22 1 22 .\"]],[\"G\u00C4\u00ABti\", [\"22 4 22\", \"4 22 121 22 .\"]],[\"Upag\u00C4\u00ABti\", [\"22 4 22\", \"4 22 L 22 .\"]],[\"Udg\u00C4\u00ABti\", [\"22 4 22\", \"4 22 L 22 .\", \"22 4 22\", \"4 22 121 22.\"]],[\"\u00C4\u0080ry\u00C4\u0081g\u00C4\u00ABti\", [[\"22 4 22\", \"4 22 121 22 (4|2L)\"]],# ...Here, 2 will be interpreted as the regular expression (G|LL) and 4 asthe regular expression (GG|LLG|GLL|LLLL|LGL) \u00E2\u0080\u0093 all possible sequencesof laghus and gurus that are exactly 4 m\u00C4\u0081tr\u00C4\u0081s long. Note that with thisnotation, the frequently mentioned rule of \u00E2\u0080\u009Cany combination of 4 m\u00C4\u0081tr\u00C4\u0081sexcept LGL (ja-ga\u00E1\u00B9\u0087a)\u00E2\u0080\u009D is simply denoted as 22, expanding to the regularexpression (G|LL)(G|LL) which covers precisely the 4 sequences of laghusand gurus of total duration 4, other than LGL.122 RajagopalanType of metre Numbersamav\u00E1\u00B9\u009Btta 1242ardhasamav\u00E1\u00B9\u009Btta 132vi\u00E1\u00B9\u00A3amav\u00E1\u00B9\u009Btta 19m\u00C4\u0081tr\u00C4\u0081v\u00E1\u00B9\u009Btta 5Total 1398Table 1The number of metres \u00E2\u0080\u009Cknown\u00E2\u0080\u009D to the current system. Not too muchshould be read into the raw numbers as a larger number isn\u00E2\u0080\u0099t necessarilybetter; see Section 4.1.3 for why.The data in the system was started with a hand-curated list of popularmetres (Ganesh 2013). It was greatly extended with the contributionsof Dhaval Patel, which drew from the V\u00E1\u00B9\u009Bttaratn\u00C4\u0081kara and the work ofMishra (A. Mishra 2007). A few metres from these contributions are yetto be incorporated, because of reasons described in section 4.1.3. Overall,as a result of all this, at the moment we have a large number of knownmetres, shown in Table 1.2.2 Metrical indexThe data described in the previous section is not used directly by the restof the program. Instead, it is first processed into data structures (which wecan consider a sort of \u00E2\u0080\u009Cindex\u00E2\u0080\u009D) that allow for efficient lookup, even whenthe number of metres is huge. These enable the robustness to errors thatis one of the most important features of the system. The indices are calledp\u00C4\u0081da1, p\u00C4\u0081da2, p\u00C4\u0081da3, p\u00C4\u0081da4, ardha1, ardha2, and full. Each of theseindices consists of an associative array (a Python dict) that maps a pattern(a \u00E2\u0080\u009Cpattern\u00E2\u0080\u009D is a string over the alphabet {LPG}) to a list1 of metres thatcontain that pattern (at the position indicated by the name of the index),and similarly an array that maps a regular expression to the list of metresthat contain it. For instance, ardha2 maps the second half of each knownmetre to that metre\u00E2\u0080\u0099s name. It is at this point that we also introduce laghu-1Why a list? Because different metres can share the same p\u00C4\u0081da, for instance. Andthere can even be multiple names for the same metre. See Section 4.1.3 later.Metrical analysis of Sanskrit verse 123ending variants for many metres (see more in 4.1.2). Section 2.5 describeshow these indices are used.Although this index is generated automatically and not written down incode, the following hypothetical code illustrates some sample entries in theardha2 index:ardha2_patterns = {# ...'GGGGGLGGLGGGGGGGLGGLGG': ['\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB'],# laghu variants for illustration.# In reality we don't add for \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB\u00E2\u0080\u00A6'GGGGGLGGLGLGGGGGLGGLGG': ['\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB'],'GGGGGLGGLGGGGGGGLGGLGL': ['\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB'],'GGGGGLGGLGLGGGGGLGGLGL': ['\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB'],# ...}ardha2_regexes = {# ...\"22 4 22\" + \"4 22 L 22 .\": ['\u00C4\u0080ry\u00C4\u0081', 'Upag\u00C4\u00ABti],# ...}2.3 TransliterationThe first step that happens after users enter their input is automatictransliteration. Detecting the input scheme is based on a few heuristics.Among the input schemes initially supported (Devan\u00C4\u0081gar\u00C4\u00AB, Kannada, IAST,ITRANS, and Harvard-Kyoto), the detection is done as follows:\u00E2\u0080\u00A2 If the input contains any Kannada consonants and vowels, treat it asKannada.\u00E2\u0080\u00A2 If the input contains (m)any Devan\u00C4\u0081gar\u00C4\u00AB consonants and vowels, treatit as Devan\u00C4\u0081gar\u00C4\u00AB. Note that this should not be applied to othercharacters from the Devan\u00C4\u0081gar\u00C4\u00AB Unicode block, such the da\u00E1\u00B9\u0087\u00E1\u00B8\u008Da symbol,which are often used with other scripts too, as encouraged in theUnicode standard.\u00E2\u0080\u00A2 If the input contains any of the characters \u00C4\u0081\u00C4\u00AB\u00C5\u00AB\u00E1\u00B9\u009B\u00E1\u00B9\u009D\u00E1\u00B8\u00B7\u00E1\u00B8\u00B9\u00E1\u00B9\u0083\u00E1\u00B8\u00A5\u00E1\u00B9\u0085\u00C3\u00B1\u00E1\u00B9\u00AD\u00E1\u00B8\u008D\u00C5\u009B\u00E1\u00B9\u00A3, treatit as IAST.124 Rajagopalan\u00E2\u0080\u00A2 If the input matches the regular expressionaa|ii|uu|[RrLl]\^[Ii]|RR[Ii]|LL[Ii]|~N|Ch|~n|N\^|Sh|shtreat it as ITRANS. Here, the Sh and sh might seem dangerous, butthe consonant cluster \u00E0\u00A4\u0083\u00E0\u00A4\u00B9 is unlikely in Sanskrit.\u00E2\u0080\u00A2 Else, treat the input as Harvard-Kyoto.An option to explicitly indicate the input scheme (bypassing theautomatic inference) could be added but has not seemed necessary so far.The input is transliterated into (a subset of) the encoding SLP1 (P. M.Scharf and Hyman 2011), which is used internally, as it has many propertiessuitable for computer representation of Sanskrit text. While the input isbeing transliterated according to the detected scheme, known punctuationmarks (and line breaks) are retained, while all \u00E2\u0080\u009Cunknown\u00E2\u0080\u009D characters thathave not been programmed into the transliterator (such as control charactersand accent marks in Devan\u00C4\u0081gar\u00C4\u00AB) are ignored.The exact details of how the transliteration is done are omitted here, astransliteration may be regarded as a reasonably well-solved problem by now.One point worth mentioning is that there are no strict input conventions. Inother work (Melnad, Goyal, and P. M. Scharf 2015), a convention is adoptedlike:If the input text lacks line-end markers, it is assumed to be asingle p\u00C4\u0081da and to belong to the samav\u00E1\u00B9\u009Btta type of metreSuch a scheme may be interesting to explore. For now, as much as possible,the system tries to assume an untrained user and therefore infer all suchthings, or try all possibilities.2.4 ScanThe transliteration into SLP1 can be thought of as having generated a setof Sanskrit phonemes (this close relationship between phonemes and thetextual representation is the primary strength of the SLP1 encoding). Fromthese phonemes, scansion into a pattern of laghus and gurus can proceeddirectly, without bothering with syllabification (however, syllabification isstill done, for the sake of the \u00E2\u0080\u009Calignment\u00E2\u0080\u009D described later in section 2.6).The rule for scansion is mechanical: initial consonants are dropped, andMetrical analysis of Sanskrit verse 125each vowel is considered as a set along with all the non-vowels that follow itbefore the next vowel (or end of text) is found. If the vowel is long or if thereare multiple consonants (treating anusv\u00C4\u0081ra and visarga as consonants here,for scansion only) in this set, then we have a guru, else we have a laghu.The validity of this method of scansion, with reference to the traditionalSanskrit grammatical and metrical texts, is skipped in this paper, assomething similar has been treated elsewhere (Melnad, Goyal, and P. M.Scharf 2015). However, note that this is the \u00E2\u0080\u009Cpurist\u00E2\u0080\u009D version of Sanskritscansion. There is an issue of \u00C5\u009Bithila-dvitva or poetic licence, which is treatedin more detail in Section 4.3.2.5 IdentificationThe core of the tool\u00E2\u0080\u0099s robust metre identification is an algorithm for tryingmany possibilities for identifying the metre of the input text. Identifyingthe metre given a metrical pattern (the result of scansion) is done in twosteps: (1) first the input is broken into several \u00E2\u0080\u009Cparts\u00E2\u0080\u009D in various ways, andthen (2) each of these parts is matched against the appropriate indices.2.5.1 PartsGiven the metrical pattern corresponding to the input text, which may beeither a full verse, a half-verse, or a single quarter-verse (p\u00C4\u0081da), we try tobreak it into parts in multiple ways. One way of breaking the input, whichshould not be ignored, is already given by the user, in the form of line breaksin the input. If there are 4 lines, for example, it is a strong possibility thatthese are the 4 p\u00C4\u0081das of the verse. If there are 2 lines, each line may containtwo p\u00C4\u0081das. But what if there are 3 lines, or 5? Another way of breaking theinput is by counting syllables. If the number of syllables is a multiple of 4(say 4n), it is possible that every n syllables constitute a p\u00C4\u0081da of a samav\u00E1\u00B9\u009Bttametre. But what if the number of syllables is not a multiple of 4?The solution adopted here is to consider all ways of breaking a patterninto k parts even when its length (say l) may not be a multiple of k. Althoughthis would apply to any positive k, we only care about k = 4 and k = 2, solet\u00E2\u0080\u0099s focus on the k = 4 case for illustration. In that case, suppose that thelength l leaves a remainder r when divided by 4, that is,l \u00E2\u0089\u00A1 r (mod 4)P 0 \u00E2\u0089\u00A4 r Q 4126 Rajagopalanor in other words l can be written as l = 4n + r for some integer n, where0 \u00E2\u0089\u00A4 r Q 4. Then, as \u00E2\u008C\u008Al/4\u00E2\u008C\u008B = n (here \u00E2\u008C\u008A\u00C2\u00B7\u00E2\u008C\u008B denotes the \u00E2\u0080\u009Cfloor function\u00E2\u0080\u009D,or integer division with rounding down), we can consider all the ways ofbreaking the string of length l into 4 parts of lengths (n+vP n+bP n+xP n+y)where v+ b+ x+ y = r (in words: we consider all ways of distributing theremainder r among the 4 parts). For example, when r = 2, we say that astring of length 4n+ 2 can be broken into 4 parts in 10 ways:(nP nP nP n+ 2)(nP nP n+ 1P n+ 1)(nP nP n+ 2P n)(nP n+ 1P nP n+ 1)(nP n+ 1P n+ 1P n)(nP n+ 2P nP n)(n+ 1P nP nP n+ 1)(n+ 1P nP n+ 1P n)(n+ 1P n+ 1P nP n)(n+ 2P nP nP n)Similarly, there are 4 ways when r = 1, 20 ways when r = 3, and of coursethere is exactly one way (nP nP nP n) when r = 0.In this way, we can break the given string into 4 parts (in 1, 4, 10, or 20ways) or into 2 parts (in 1 or 2 ways), either by lines or by syllables. Forinstance, if we are given an input of 5 lines, then there are 4 ways we canbreak it into 4 parts, by lines. What we do with these parts is explainednext.2.5.2 Lookup/matchOnce we have the input broken into the appropriate number of parts (basedon whether we\u00E2\u0080\u0099re treating it as a full verse, a half verse, or a p\u00C4\u0081da), we look upeach part in the appropriate index. For a particular index, to match againstpatterns is a direct lookup (we do not have to loop through all patterns inthe index). To match against regexes, we do indeed loop through all regexes,which are fewer in number compared to the number of patterns. If needed,we can trade-off time and memory here; for instance, we could have indexeda large number of instantiated patterns instead of regexes even for m\u00C4\u0081tr\u00C4\u0081Metrical analysis of Sanskrit verse 127treating input askind of index full verse half verse single p\u00C4\u0081dap\u00C4\u0081da1 first part of 4 first part of 2 the full inputp\u00C4\u0081da2 second part of 4 second part of 2 the full inputp\u00C4\u0081da3 third part of 4 first part of 2 the full inputp\u00C4\u0081da4 fourth part of 4 second part of 2 the full inputardha1 first part of 2 the full input -ardha2 second part of 2 the full input -full the full input - -Table 2What to match or look up, depending on how the input is being treated.Everywhere in the table above, phrases like \u00E2\u0080\u009Cfirst part of 4\u00E2\u0080\u009D mean both bylines and by syllables. For instance, when treating the input as a full verse,the first )/4 part by lines and the first )/4 part by syllables are bothmatched against the p\u00C4\u0081da1 index.metres. Note that in this way, to match an ardhasamav\u00E1\u00B9\u009Btta or a vi\u00E1\u00B9\u00A3amav\u00E1\u00B9\u009Bttathat has been input perfectly, we search directly for the full pattern (of theentire verse) in the index. We do not have to run a loop for breaking aline into p\u00C4\u0081das in all possible ways, as in (Melnad, Goyal, and P. M. Scharf2015). Details of which indices are looked up are in Table 2.2.6 Align/DisplayThe metre identifier, from the previous section, results in a list of metresthat are potential matches to the input text. Not all of them may matchthe input verse perfectly; some may have been detected based on partialmatches. Whatever the reason for this imperfect match (an over-eagermatching on the part of the metre identifier, or errors in the input text),it would be useful for the user to see how closely their input matches agiven metre. And even when the match is perfect, aligning the verse to themetre can help highlight the p\u00C4\u0081da breaks, the location of yati, and so on.This is done by the tool, using a simple dynamic-programming algorithmvery similar to the standard algorithm for the longest common subsequenceproblem: in effect, we simply align both the strings (the metrical pattern of128 Rajagopalanthe input verse, and that of the known metre) along their longest commonsubsequence.What this means is that given two strings s and t, we use a dynamicprogramming algorithm to find the minimal set of \u00E2\u0080\u009Cgap\u00E2\u0080\u009D characters to insertin each string, such that the resulting strings match wherever both have anon-gap character (and never have a gap character in both). For example:('abcab', 'bca'), => ('abcab', '-bca-')('hello', 'hello'), => ('hello', 'hello')('hello', 'hell'), => ('hello', 'hell-')('hello', 'ohell'), => ('-hello', 'ohell-')('abcdabcd', 'abcd'), => ('abcdabcd', 'abcd----')('abcab', 'acb'), => ('abcab', 'a-c-b')('abcab', 'acbd'), => ('abcab-', 'a-c-bd')We use this algorithm on the verse pattern and the metre\u00E2\u0080\u0099s pattern,to decide how to align them. Then, using this alignment, we display theuser\u00E2\u0080\u0099s input verse in its display version (transliterated into IAST, and withsome recognized punctuation retained). Here, laghu and guru syllables arestyled differently in the web application (styling customizable with CSS).This also highlights each location of yati or caesura (if known and stored forthe metre), so that the user can see if their verse violates any of the subtlerrules, such as words straddling yati boundaries.This algorithm could also be used for ranking the results, based on thedegree of match between the input and each result (metre identified).3 Text analysis and resultsAs part of testing the tool (and as part of pursuing the interest in literatureand prosody that led to the tool in the first place), a large number of textssuch as from GRETIL2 were examined. Although primarily designed to helpreaders, the tool can also be used to analyze a metrical text, to catch errorsor generate statistics about the metres used. In the very first version of thetool, the first metre added was Mand\u00C4\u0081kr\u00C4\u0081nt\u00C4\u0081, and the tool was run on atext of the Meghad\u00C5\u00ABta from GRETIL, the online corpus of Sanskrit texts.This text was chosen because the Meghad\u00C5\u00ABta is well-known to be entirely2G\u00C3\u00B6ttingen Register of Electronic Texts in Indian Languages: and related Indologicalmaterials from Central and Southeast Asia, http://gretil.sub.uni-goettingen.deMetrical analysis of Sanskrit verse 129in the Mand\u00C4\u0081kr\u00C4\u0081nt\u00C4\u0081 metre, so the \u00E2\u0080\u009Cgold standard\u00E2\u0080\u009D to use as a referenceto compare against was straightforward. Surprisingly, this tool successfullyidentified 23 errors in the 122 verses!3 These were communicated to theGRETIL maintainer.Similarly, testing of the tool on other texts highlighted many errors.Errors identified in the GRETIL text of Bhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s \u00C5\u009Aatakatraya werecarefully compared against the critical edition by D. D. Kosambi.4 In thistext, as in N\u00C4\u00ABlaka\u00E1\u00B9\u0087\u00E1\u00B9\u00ADha\u00E2\u0080\u0099s Kali-vi\u00E1\u00B8\u008Dambana,5 in Bhalla\u00E1\u00B9\u00ADa\u00E2\u0080\u0099s Bhalla\u00E1\u00B9\u00ADa-\u00C5\u009Bataka,6,and in almost all cases, the testing highlighted errors in the text, ratherthan any in the metre recognizer. This constitutes evidence that therecognizer has a high accuracy approaching 100%, though the lack of areliable (and nontrivial) \u00E2\u0080\u009Cgold standard\u00E2\u0080\u009D hinders attaching a numeric valueto the accuracy. In the terminology of \u00E2\u0080\u009Cprecision and recall\u00E2\u0080\u009D, the recognizerhas a recall of 100% in the examples tested (for example, no verse thatis properly in \u00C5\u009A\u00C4\u0081rd\u00C5\u00ABla-vikr\u00C4\u00AB\u00E1\u00B8\u008Ditam is failed to be recognized as that metre),while the precision was lower and harder to measure because of errors inthe input (sufficiently many errors can make the verse partially match adifferent metre).After sufficiently fixing the tool and the text so that Meghad\u00C5\u00ABta wasrecognized as being 100% in the Mand\u00C4\u0081kr\u00C4\u0081nt\u00C4\u0081 metre, other texts wereexamined. These statistics7 confirmed that, for example, the most commonmetres in the Amaru\u00C5\u009Bataka are \u00C5\u009A\u00C4\u0081rd\u00C5\u00ABlavikr\u00C4\u00AB\u00E1\u00B8\u008Ditam (57%), Hari\u00E1\u00B9\u0087\u00C4\u00AB (13%) and\u00C5\u009Aikhari\u00E1\u00B9\u0087\u00C4\u00AB (10%), while those in K\u00C4\u0081lid\u00C4\u0081sa\u00E2\u0080\u0099s Raghuvam\u00C5\u009Ba are \u00C5\u009Aloka, Upaj\u00C4\u0081tiand Rathoddhat\u00C4\u0081. And so on. Once errors in the texts are fixed, this sort ofanalysis can give insights into the way different poets use metre. It can alsobe used for students to know which are the most common metres to focuson learning, at least for a given corpus. Other sources of online texts, like3See a list of 23 errors and 3 instances of broken sandhi detected in one of the GRETILtexts of the Meghad\u00C5\u00ABta, at https://github.com/shreevatsa/sanskrit/blob/f2ef7364/meghdk_u_errors.txt (October 2013).4See https://github.com/shreevatsa/sanskrit/blob/7c42546/texts/gretil_stats/diff-bharst_u.htm-old.patch for a list of errors found, in diff format, with commentsreferring to the location of the verse in Kosambi5https://github.com/shreevatsa/sanskrit/blob/08ccb91/texts/gretil_stats/diff-nkalivpu.htm.patch6https://github.com/shreevatsa/sanskrit/blob/67251bc/texts/gretil_stats/diff-bhall_pu.htm.patch7http://sanskritmetres.appspot.com/statistics130 RajagopalanTITUS, SARIT8 or The Sanskrit Library9 could also be used for testing thesystem.4 Interesting issues and computational experienceSome insights and lessons learned as a result of this project are worthhighlighting, as are some of the design decisions that were made eitherintentionally or unconsciously.4.1 Metrical data4.1.1 The ga\u00E1\u00B9\u0087a-sFor representing the characteristics of a given metre, a popular scheme usedby all Sanskrit authors of works on prosody is the use of the 8 ga\u00E1\u00B9\u0087s. Eachpossible laghu-guru combination of three syllables (trika), namely each of the2+ possibilities LLL, LLG, LGL, LGG, GLL, GLG, GGL, GGG, is given a distinctname (na, sa, ja, ya, bha, ra, ta, ma respectively), so that a long patternof laghus and gurus can be concisely stated in groups of three. This is anexcellent mnemonic and space-saving device, akin to writing in octal insteadof binary. For instance, the binary number 1101100101012 can be writtenmore concisely as the octal number 66250 and the translation between themis immediately apparent (1101100101012 corresponds to 66250 and vice-versa, by simply treating each group of three binary digits (bits) as an octaldigit, or conversely expanding each octal digit to a three-bit representation).Similarly, the pattern GGLGGLLGLGLG of Indrava\u00E1\u00B9\u0083\u00C5\u009Ba can be more conciselyexpressed by the description as \u00E2\u0080\u009Cta ta ja ra\u00E2\u0080\u009D. Moreover, another mnemonicdevice of unknown origin uses a string \u00E2\u0080\u009Cyam\u00C4\u0081t\u00C4\u0081r\u00C4\u0081jabh\u00C4\u0081nasalaga\u00E1\u00B9\u0083\u00E2\u0080\u009D thattraverses all the 8 ga\u00E1\u00B9\u0087as (and the names lv and gv used for any \u00E2\u0080\u009Cleftover\u00E2\u0080\u009Dlaghus and gurus respectively), assigning them syllable weights (via vowellengths) such that the three syllables starting at any of the 8 consonants areitself in the ga\u00E1\u00B9\u0087a named by that consonant.10Thus we can see that the ga\u00E1\u00B9\u0087a names are a useful mnemonic and space-saving device, and yet at the same time, from an information-theoreticpoint of view, they contain absolutely no information that is not present8http://sarit.indology.info9http://sanskritlibrary.org10In the modern terminology of combinatorics, this is a de Bruijn sequence.Metrical analysis of Sanskrit verse 131in the expanded string (the pattern of Ls and Gs). Moreover, for a typicalreader who is not trying to memorize the definitions of metres (either in theGGLGGLLGLGLG form or the \u00E2\u0080\u009Cta ta ja ra\u00E2\u0080\u009D\u00E2\u0080\u0099 form), the ga\u00E1\u00B9\u0087as add no value andserve only to further mystify and obscure the topic. Moreover, they can bemisleading as to the nature of yati breaks in the metre, as the metre beingdescribed is rarely grouped into threes, except for certain specific metres(popularly used in stotras) such as \u00E0\u00A4\u00AD\u00E0\u00A4\u009C\u00E0\u00A5\u0081\u00EE\u0081\u00A2\u00E0\u00A5\u0082\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4\u00E0\u00A4\u00AE \u00E0\u00A5\u008D, \u00E0\u00A4\u00A4\u00E0\u00A5\u008B\u00E0\u00A4\u009F\u00E0\u00A4\u0095\u00E0\u00A4\u00AE \u00E0\u00A5\u008D, and \u00E0\u00A5\u0090\u00E0\u00A4\u00BF\u00EE\u008C\u00B4\u00E0\u00A4\u00B5\u00E0\u00A4\u00A3\u00E0\u00A5\u0080. One canas easily (and more enjoyably) learn the pattern of a metre by committing arepresentative example (a good verse in that metre) to memory, rather thanthe definition using ga\u00E1\u00B9\u0087as, as the author and others know from personalexperience. For these reasons, the ga\u00E1\u00B9\u0087a information is de-emphasized in thetool described in this paper.4.1.2 p\u00C4\u0081d\u00C4\u0081nta-laghuSanskrit poetic convention is that the very last syllable in a verse can belaghu even if the metre requires it to be guru. Consider for instance, thevery first verse of K\u00C4\u0081lid\u00C4\u0081sa\u00E2\u0080\u0099s Meghad\u00C5\u00ABta, in the Mand\u00C4\u0081kr\u00C4\u0081nt\u00C4\u0081 metre:ka\u00C5\u009Bcit k\u00C4\u0081nt\u00C4\u0081-viraha-guru\u00E1\u00B9\u0087\u00C4\u0081 sv\u00C4\u0081dhik\u00C4\u0081r\u00C4\u0081t pramatta\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081pen\u00C4\u0081sta\u00E1\u00B9\u0083gamita-mahim\u00C4\u0081 var\u00E1\u00B9\u00A3a-bhogye\u00E1\u00B9\u0087a bhartu\u00E1\u00B8\u00A5yak\u00E1\u00B9\u00A3a\u00C5\u009B cakre janaka-tanay\u00C4\u0081-sn\u00C4\u0081na-pu\u00E1\u00B9\u0087yodake\u00E1\u00B9\u00A3usnigdhacch\u00C4\u0081y\u00C4\u0081-taru\u00E1\u00B9\u00A3u vasati\u00E1\u00B9\u0083 r\u00C4\u0081magiry\u00C4\u0081\u00C5\u009Brame\u00E1\u00B9\u00A3uEven though the Mand\u00C4\u0081kr\u00C4\u0081nt\u00C4\u0081 requires in each p\u00C4\u0081da a final syllable thatis guru, the final syllable of the verse above is allowed to be \u00E1\u00B9\u00A3u which ifit occurred in another position (and not followed by a consonant cluster)would be treated as a laghu syllable. A similar convention, though notalways stated as clearly in texts in prosody, more or less applies at the endof each half (ardha or pair of p\u00C4\u0081das) of the verse (for an example, see thek\u00C4\u0081\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADh\u00C4\u0081d agnir\u00E2\u0080\u00A6 verse in \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB from Section 1.1).The question of such a laghu at the end of odd p\u00C4\u0081das (vi\u00E1\u00B9\u00A3ama-p\u00C4\u0081d\u00C4\u0081nta-laghu) is a thorny one, with no clear answers. Even the word of someonelike Ked\u00C4\u0081rabha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADa cannot be taken as final on this matter, as it needs tohold up to actual usage and what is pleasing to the trained ear. Certainlywe see such laghus being used liberally in metres like \u00C5\u009Aloka, Upaj\u00C4\u0081ti andVasantatilak\u00C4\u0081. At the same time, there are metres like \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB where thiswould be unusual. The summary from those well-versed in the metrical132 Rajagopalantradition11 is that such laghus are best avoided (and are therefore unusual,the works of the master poets) in yati-prabala metres, those where the yati isprominent. This is why, \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB with 11 syllables to a p\u00C4\u0081da requires a stricterobservance of guru at the end of odd p\u00C4\u0081das than a metre like Vasantatilak\u00C4\u0081with 14. As a general rule of thumb, though, such vi\u00E1\u00B9\u00A3ama-p\u00C4\u0081d\u00C4\u0081nta-laghuscan be regarded as incorrect in metres longer than Vasantatilak\u00C4\u0081. It is notclear how a computer could automatically make such subjective decisions,so something like the idea (Melnad, Goyal, and P. M. Scharf 2015) of storinga boolean parameter about which metres allow this option, seems desirable.Still, the question of how that boolean parameter is to be chosen remainsopen.4.1.3 Is more data always better?It seems natural that having data about more metres would lead to betterdecisions and better results, but in practice, some care is needed. A commonproblem is that when there are too many metres in our database, thelikelihood of false positives increases. To see this more clearly, imaginea hypothetical case in which every possible combination of laghu and gurusyllables was given its own name as a metre: in that case, a verse intendedto be in the metre \u00C5\u009A\u00C4\u0081rd\u00C5\u00ABlavikr\u00C4\u00AB\u00E1\u00B8\u008Dtam, say, with even a single error, wouldperfectly match some other named metre, and we would be misled as to thetruth. A specific case where this happens easily is when a user inputs asingle p\u00C4\u0081da but the system tries to treats it as a full verse. In this case, thequarters of the input, as they are much shorter, are more likely to matchsome metre accidentally. The solution of returning multiple results (a listof results rather than a single result) alleviates this problem (cf. the idea oflist decoding from computer science).A related problem is the over-precise naming of metres. We know thatIndravajr\u00C4\u0081 and Upendravajr\u00C4\u0081 differ only in the weight of the first syllable,and that the Upaj\u00C4\u0081ti metre consists of free alternation between them forthe four p\u00C4\u0081das in a verse, as for this particular metre, the weight of thefirst syllable does not matter too much. However, there exist theorists ofprosody who have, to each of the 24 = 16 possibilities (all the ways ofcombining Indravajr\u00C4\u0081 and Upendravajr\u00C4\u0081), given names like M\u00C4\u0081y\u00C4\u0081, Prem\u00C4\u0081,M\u00C4\u0081l\u00C4\u0081, \u00E1\u00B9\u009Addhi\u00E1\u00B8\u00A5 and so on (A. Mishra 2007). This is not very useful to areader, as in such cases, the metre in question is, in essence, really more11\u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n\u00C4\u00AB R. Ganesh, personal communicationMetrical analysis of Sanskrit verse 133common than such precise naming would make it seem. Velankar (Velankar1949) even considers the name Upaj\u00C4\u0081ti as arising from the \u00E2\u0080\u009Cdispleasure\u00E2\u0080\u009D ofthe \u00E2\u0080\u009Cmethodically inclined prosodist\u00E2\u0080\u009D.Another issue is that data compiled from multiple works on prosody(or sometimes even from the same source) can have inconsistencies. Itcan happen that the same metre is given different names in differentsources (Velankar 1949, p. 59). This is very common with noun endingsthat mark gender, such as -\u00C4\u0081 versus -a\u00E1\u00B9\u0083, but we also see cases wherecompletely different names are used. It can also happen that the samename is used for entirely different metres (see also the confusion aboutUpaj\u00C4\u0081ti mentioned below in Section 4.4). For these reasons, instead ofstoring each metre as a (namePpattern) pair as mentioned earlier, or asthe (better) (namePpatternPbool) triple (Melnad, Goyal, and P. M. Scharf2015), it seems best to store a (patternPboolPnameP source for name) tuple.I started naively, thinking the name of metres is objective truth, and as aresult of this project I realized that names are assigned with some degree ofarbitrariness.Finally, a couple more points: (1) There exist metres that end with laghusyllables, and the code should be capable of handling them. (2) It is betterto keep metrical data as data files, rather than code. This was a mistakemade in the initial design of the system. Although it did not deter helpfulcontributors like Dhaval Patel from contributing code-like definitions foreach metre, it is still a hindrance that is best avoided. Keeping data in datafiles is language-agnostic and would allow it to be used by other tools.Overall, however, despite these issues, on the whole, the situation is nottoo bad, because it is mostly a small set of metres that is used by most poets.Although the repertoire of Sanskrit metres is vast (Deo 2007), and even theset of commonly used metres is larger in Sanskrit than in other languages,nevertheless, as with r\u00C4\u0081gas in music, although names can and have beengiven to a great many combinations, not every mathematical possibility isan aesthetic possibility.124.2 TransliterationIt appears that accepting input in various input schemes is one of the featuresof the tool that users enjoy. Although the differences between various inputschemes are mostly superficial and easily learned, it appears that many12This remark comes from \u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n\u00C4\u00AB Ganesh who has pointed this out multiple times.134 Rajagopalanpeople have their preferred scheme that they would like to employ whereverpossible. These are fortunately easy for computers to handle.As pointed out elsewhere in detail (P. M. Scharf and Hyman 2011), theset of graphemes or phonemes one might encounter in putatively Sanskritinput is larger than that supported by common systems of transliterationlike Harvard-Kyoto or IAST. Characters like chandrabindu and \u00E0\u00A4\u00B3 will occurin the input especially with modern poetry or verse from other languages.The system must be capable of doing something reasonable in such cases.A perhaps unusual choice is that the system does not currently acceptinput in SLP1, even though SLP1 is used internally. The simple reason isthat no one has asked for it, and it does not seem that many people typein SLP1. SLP1 is a great internal format and can be a good choice forinteroperability between separate tools, but it seems that the average userdoes not prefer typing kfzRaH for \u00E0\u00A4\u0095\u00E0\u00A5\u0083\u00EE\u0089\u0097\u00E0\u00A4\u0083. Nevertheless, this is a minor pointas this input method can easily be added if anyone wants it.In an earlier paper (Melnad, Goyal, and P. M. Scharf 2015), two of thedeficiencies stated about the tool by Mishra (A. Mishra 2007) are that:1. By supporting only Harvard-Kyoto input, that tool requires specialtreatment of words with consecutive a-i or a-u vowels (such as theword \u00E2\u0080\u009C\u00E0\u00A5\u0082\u00E0\u00A4\u0089\u00E0\u00A4\u0097\u00E2\u0080\u009D). In this tool, as Devan\u00C4\u0081gar\u00C4\u00AB input is accepted, such wordscan be input (besides of course by simply inserting a space).2. That tool does not support accented input, which (Melnad, Goyal,and P. M. Scharf 2015) do because they accept input in SLP1. Inthis tool, accented input is accepted if input as Devan\u00C4\u0081gar\u00C4\u00AB. However,as neither this tool nor the one by (Melnad, Goyal, and P. M. Scharf2015) supports Vedic metre, this point seems moot: Sanskrit poetryin the classical (non-Vedic) metre is not often accompanied by accentmarkers! In this tool, accent marks in Devan\u00C4\u0081gar\u00C4\u00AB are accepted butignored.4.3 ScansionAs a coding shortcut when the program was first being written, I decidedto treat anusv\u00C4\u0081ra and visarga as consonants too for scansion, instead ofespecially handling them. To my surprise, I have not had to revise this andeliminate the shortcut, because, in every instance, the result of scansion isthe same. I am not aware of any text on prosody treating anusv\u00C4\u0081ra andMetrical analysis of Sanskrit verse 135visarga as consonants, but their identical treatment is valid for Sanskritprosody. This is a curious insight that the technological constraints (orlaziness) have given us!As mentioned in earlier work (Melnad, Goyal, and P. M. Scharf 2015),in later centuries of the Sanskrit tradition, there evolved an option ofconsidering certain guru syllables as laghu, as a sort of poetic license,in certain cases. Specifically, certain consonant clusters, especially thosecontaining r like pr and hr, were allowed to be treated as if they were singleconsonants, at the start of a word. This rule is stated by Ked\u00C4\u0081rabha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADatoo and seems to be freely used in the Telugu tradition even today. Afurther trend is to allow this option everywhere, based on how \u00E2\u0080\u009Ceffortlessly\u00E2\u0080\u009Dor \u00E2\u0080\u009Cquickly\u00E2\u0080\u009D certain consonant clusters can be pronounced, compared withothers. A nuanced understanding of this matter comes from a practisingpoet and scholar of Sanskrit literature, \u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n\u00C4\u00AB R. Ganesh:13 thispractice arose from the influence of Pr\u00C4\u0081k\u00E1\u00B9\u009Bta and De\u00C5\u009Bya (regional) languages(for instance, it is well-codified as a rule in Kannada and Telugu, under thename of \u00C5\u009Aithila-dvitva). It was also influenced by music; Ganesh cites thetreatise \u00E0\u00A4\u009A\u00E0\u00A4\u00A4\u00E0\u00A4\u00A6\u00E0\u00A5\u0081 \u00EE\u008C\u0083\u00EE\u008D\u0080\u00E0\u00A4\u00A1\u00E0\u00A5\u0080\u00E0\u00A5\u0082\u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00B6\u00E0\u00A4\u0095\u00E0\u00A4\u00BE. He concludes that as a conscientious poet, he willfollow poets like K\u00C4\u0081lid\u00C4\u0081sa, Bh\u00C4\u0081ravi, M\u00C4\u0081gha, \u00C5\u009Ar\u00C4\u0081har\u00E1\u00B9\u00A3a and Vi\u00C5\u009B\u00C4\u0081khadatta innot using this exception when composing Sanskrit, but using it sparinglywhen composing in languages like Kannada where prior poets have used itfreely.With this understanding,14 the question arises whether the system needsto encode this exception, especially for dealing with later or modern poetry.This could be done, but as a result of the system\u00E2\u0080\u0099s robustness to errors, inpractice, this turns out to be less necessary. Any single verse is unlikely toexploit this poetic license in every single p\u00C4\u0081da, so the occasional usage of thisexception does not prevent the metre from being detected. The only caveatis that this already counts as an error, so verses that exploit this exceptionwould have slightly lower robustness to further additional errors.13personal communication, but see also corroboration at https://groups.google.com/d/msg/bvparishat/ya1cGLuhc14/EkIqH9NbgawJ14See another summary here: https://github.com/shreevatsa/sanskrit/issues/1#issuecomment-68502605136 Rajagopalan4.4 IdentificationIt is not enough for a verse to have the correct scansion (the correct patternof laghu and guru syllables), for it to be a perfect specimen of a givenmetre. There are additional constraints, such as yati: because a pauseis indicated at each yati-sth\u00C4\u0081na (caesura), a word must not cross sucha boundary, although separate lexical components of a compound word(sam\u00C4\u0081sa) may. Previously (Melnad, Goyal, and P. M. Scharf 2015), anapproach has been suggested of using a text segmentation tool such as theSanskrit Heritage Reader (Huet 2005; Huet and Goyal 2013) for detectingwhen such a constraint is violated. This would indeed be ideal, but the toolbeing described in this paper alleviates the problem by displaying the user\u00E2\u0080\u0099sinput verse aligned to the metre, with each yati-sth\u00C4\u0081na indicated. Thus, anyinstance of a word crossing a yati boundary will be apparent in the display.Note that we can provide information on all kinds of Upaj\u00C4\u0081ti, evenif they are not explicitly added to our database, a problem mentionedpreviously (Melnad, Goyal, and P. M. Scharf 2015). Upaj\u00C4\u0081ti just means\u00E2\u0080\u009Cmixture\u00E2\u0080\u009D; the common upaj\u00C4\u0081ti of Indravajr\u00C4\u0081 and Upendravajr\u00C4\u0081, as a metre,has nothing to do with the upaj\u00C4\u0081ti of Va\u00E1\u00B9\u0083\u00C5\u009Bastha and Indrava\u00E1\u00B9\u0083\u00C5\u009Ba (Velankar1949). In fact, the latter is sometimes known by the more specific name ofKarambaj\u00C4\u0081ti,15 among other names. Whenever an Upaj\u00C4\u0081ti of two differentmetres is used and input correctly, each of the two metres will be recognizedand shown to the user, because different p\u00C4\u0081das will match different patternsin our index. So without us doing any special work of adding all the kindsof Upaj\u00C4\u0081ti to the data, the user can see in any given instance that theirverse contains elements of both metres, and in exactly what way. Of course,adding the \u00E2\u0080\u009Cmixed\u00E2\u0080\u009D metre explicitly to the data would be more informativeto the user, if the mixture is a common one.4.5 DisplayOnce a metre is identified, for some users, telling the user the name of themetre may be enough. However, if we envision this tool being used by anyonereading any Sanskrit verse (such as Devadatta from Section 1.3), then formany users, being told the name of the metre (or even the metre\u00E2\u0080\u0099s pattern)carries mainly the information that the verse is in some metre, but does notsubstantially improve the reader\u00E2\u0080\u0099s enjoyment of the verse. Seeing the verse15\u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n R. Ganesh, personal communicationMetrical analysis of Sanskrit verse 137aligned to the metre, with line breaks introduced in the appropriate placesand yati locations highlighted, helps a great amount. What would help themost, however, is a further introduction to the metre, along with familiarexamples that happen to be in the same metre, and audio recordings oftypical ways of reciting the metre.The tool does this, for popular metres (see Figure 1), drawing onanother resource (Ganesh 2013). In these audio recordings made in 2013,\u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n\u00C4\u00AB R. Ganesh describes several popular metres, with well-chosenexamples (most recited from memory and some composed extempore for thesake of the recordings). Some interesting background such as its usage in thetradition\u00E2\u0080\u0094a brief \u00E2\u0080\u009Cbiography\u00E2\u0080\u009D of the metre \u00E2\u0080\u0094is also added for some metres.Although they were not created for the sake of this tool, it was the sameinterest in Sanskrit prosody that led both to the creation of this tool and tomy request for these recordings. Showing the user\u00E2\u0080\u0099s verse accompanied byexamples of recitation of other verses in the same metre helps the user readaloud and savor the verse they input.Incidentally, an introduction to metres via popular examples andaccompanying audio recordings is also the approach taken by the bookChandovallar\u00C4\u00AB (S. Mishra 1999). The examples chosen are mostly from thestotra literature, which are most likely to be familiar to an Indian audience.In this way, it can complement the recordings mentioned in the previousparagraph, in which the examples were often chosen for their literary qualityor illustrative features.4.6 Getting feedback from usersThe main lesson I learned from building this system was the value of makingthings accessible to as many users as possible, by removing as many barriersas possible. Write systems that are \u00E2\u0080\u009Cliberal\u00E2\u0080\u009D in what they accept, but arenevertheless conservative enough to avoid making errors (Postel\u00E2\u0080\u0099s law).There exist users who may not have much computer science orprogramming knowledge, but are nevertheless scholars who are expertsin a specific subject. For example, India\u00E2\u0080\u0099s tech penetration is low; evenmany Sanskrit scholars aren\u00E2\u0080\u0099t trained or inclined to enter verse in standardtransliteration formats. The very fact that they are visiting your tool andusing it means that they constitute a self-selecting sample. It would bea shame not to use their expertise. Their contributions and suggestionscan help improve the system. In the case of this tool, the link to GitHub138 Rajagopalandiscussion pages, and making it easy with a quick link to report issuesencountered during any particular interaction, have generated a lot ofimprovements, both in terms of usability and correctness. A minor exampleof a usability improvement is setting things up so that the text area isautomatically focused when a user visits the web page\u00E2\u0080\u0094this is trivial to setup, but not something that had occurred as something desirable to do. Inthis case, a user asked for it.Though user feedback guided many design decisions, gathering andacting on more of the user feedback would lead to further improvements.5 Conclusions and future workThis paper has described a tool for metre recognition that takes variousmeasures to be useful to users as much as possible. In this section, we listthe current limitations of the tool and improvements that can be (and areplanned to be) made.In terms of transliteration, though there are many transliterationschemes supported, even the requirement to be in a specific transliterationscheme is too onerous\u00E2\u0080\u0094instead, the tool must let the user type, and in real-time display its understanding of the user\u00E2\u0080\u0099s input, while offering convenientinput methods (such as a character picker16) that do not require priorknowledge of how to produce specific characters. Similarly, on the outputside, a user\u00E2\u0080\u0099s preferred script for reading Sanskrit (which may not be thesame as their input script) should be used and remembered for futuresessions, so that for instance a user can completely use the tool and seeall Sanskrit text in the Kannada script. There may even exist users whoprefer to read everything in SLP1!Very few m\u00C4\u0081tr\u00C4\u0081 metres are currently supported (only members of the\u00C4\u0080ry\u00C4\u0081 family have been added). There are many simple m\u00C4\u0081tr\u00C4\u0081 metres usedin stotras, such a metre consisting of alternating groups of 3 and 4 m\u00C4\u0081tr\u00C4\u0081s.More examples for each metre, such as from Chandovallar\u00C4\u00AB (S. Mishra 1999),would help.The program is a monolithic application. It should be made moremodular and packaged into libraries for distribution so that other softwarecan easily incorporate the same user-friendly features. Similarly, in additionto the human interface, providing an API would make this code usable from16For instance, https://r12a.github.io/pickers/devanagariMetrical analysis of Sanskrit verse 139another website or application. Another limitation is that the programrequires a dedicated server to run; if rewritten to run entirely in the browserit could be packaged as a browser extension so that any Sanskrit verse onany web page can be quickly queried about and reformatted in a metricallyclear form. The automatic inference of the transliteration scheme and otheraspects of the user\u00E2\u0080\u0099s intention, though a user-friendly feature, might haveerrors occasionally, so the program would be improved by allowing them tobe indicated manually when desired.Finally, the most promising avenue for future work is running this toolon large texts rather than for one verse at a time, which can uncovermany insights about prosody. For instance, the most common Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh(\u00C5\u009Aloka) metre, the work-horse of Sanskrit literature and beloved of theepic poets of the R\u00C4\u0081m\u00C4\u0081ya\u00E1\u00B9\u0087a and the Mah\u00C4\u0081bh\u00C4\u0081rata, is still difficult todefine clearly. The naive definition, that the odd p\u00C4\u0081das match the regularexpression \u00E2\u0080\u009C....LGG.\u00E2\u0080\u009D and the even p\u00C4\u0081d\u00C4\u0081s match \u00E2\u0080\u009C....LGL.\u00E2\u0080\u009D, is foundinsufficient: there are both more and fewer constraints in practice. It isnot the case that all 2)6 choices for the first four syllables are acceptable,nor is it the case that every acceptable \u00C5\u009Bloka satisfies even these constraints.G. S. S. Murthy (Murthy 2003) surveys and summarizes the literature onthis metre and concludes with some perceptive remarks:It is indeed surprising that anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup has remained ill-defined forso long. [\u00E2\u0080\u00A6] If anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup is being used for thousands of yearsin sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt literature without a precise definition having beenspelled out till date, it must be simply because the internalrhythm of anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup becomes ingrained in the mind of a studentof sa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009Bt at an early age due to constant and continuousencounter with anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup and when one wants to compose a versein anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup, one is guided by that rhythm intuitively.It is now almost within reach, by running a tool like this on a largecorpus consisting of the Mah\u00C4\u0081bh\u00C4\u0081rata, R\u00C4\u0081m\u00C4\u0081ya\u00E1\u00B9\u0087a, and other large works,to arrive at a descriptive definition of \u00C5\u009Bloka based on the verses found in theliterature so that we can make explicit the rules that have been implicitlyadhered to by the natural poets.140 RajagopalanAcknowledgementsI am indebted to the poet and scholar \u00C5\u009Aat\u00C4\u0081vadh\u00C4\u0081n\u00C4\u00AB R. Ganesh forencouraging my interest in Sanskrit (and other Indian) prosody. It is hisintimate love of metres (reminding me of the story of the mathematicianRamanujan for whom every positive integer was a personal friend), that ledme to the realization that an understanding of metre greatly enriches thejoy of poetry. Dhaval Patel contributed metrical data, and raised pointsabout nuances, from V\u00E1\u00B9\u009Bttaratn\u00C4\u0081kara (some still unresolved). Sridatta Apointed out some more. I thank Vishvas Vasuki for being a heavy user andpointing out many bugs and suggestions, and for initiating the sanskrit-programmers mailing list where this project began. Finally, I thank my wifeChitra Muthukrishnan for supporting me during this work, both technicallyand otherwise, and for reviewing drafts of this article.ReferencesDeo, Ashwini S. 2007. \u00E2\u0080\u009CThe metrical organization of Classical Sanskritverse\u00E2\u0080\u009D. Journal of linguistics 43.1pp. 63\u00E2\u0080\u0093114.Ganesh, Shatavadhani R. 2013. Sanskrit Metres (A playlist with a series ofaudio recordings containing recitation and information about popularmetres). url: https : / / www . youtube . com / playlist ? list =PLABJEFgj0PWVXr2ERGu2xtoSXrNdBs5xS.Huet, G\u00C3\u00A9rard. 2005. \u00E2\u0080\u009CA functional toolkit for morphological and phonolog-ical processing: application to a Sanskrit tagger\u00E2\u0080\u009D. Journal of FunctionalProgramming 15.4pp. 573\u00E2\u0080\u0093614.Huet, G\u00C3\u00A9rard and Pawan Goyal. 2013. \u00E2\u0080\u009CDesign of a lean interface for Sanskritcorpus annotation\u00E2\u0080\u009D. Proceedings of ICON 2013, the 10th InternationalConference on NLPpp. 177\u00E2\u0080\u009386.Melnad, Keshav, Pawan Goyal, and Peter M. Scharf. 2015. \u00E2\u0080\u009CIdentification ofmeter in Sanskrit verse\u00E2\u0080\u009D. In: Selected papers presented at the seminar onSanskrit syntax and discourse structures, 13\u00E2\u0080\u009315 June 2013, UniversiteParis Diderot, with a bibliography of recent research by Hans HenrichHock. Providence: The Sanskrit Library, 325\u00E2\u0080\u0093346.Mishra, Anand. 2007. Sanskrit metre recognizer. url: http://sanskrit.sai.uni-heidelberg.de/Chanda/.Mishra, Sampadananda. 1999. Chandovallari: Handbook of Sanskrit prosody.Sri Aurobindo Society.Murthy, G. S. S. 2003. \u00E2\u0080\u009CCharacterizing Classical Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADup: A Study inSanskrit Prosody\u00E2\u0080\u009D. Annals of the Bhandarkar Oriental Research Institute84pp. 101\u00E2\u0080\u0093115. issn: 03781143.\u00E2\u0080\u0094 2003? Maatraa5d.java. url: https://github.com/sanskrit-coders/sanskritnlpjava/tree/master/src/main/java/gssmurthy.Ollett, Andrew. 2012. \u00E2\u0080\u009CMoraic Feet in Prakrit Metrics: A Constraint-BasedApproach\u00E2\u0080\u009D. Transactions of the Philological Society 110.12241\u00E2\u0080\u0093282.Scharf, Peter. 2016. \u00E2\u0080\u009CSanskrit Library conventions of digital representationand annotation of texts, lexica, and manuscripts\u00E2\u0080\u009D. In: ICON 2016Workshop on bridging the gap between Sanskrit computational linguisticstools and management of Sanskrit digital libraries 17\u00E2\u0080\u009320 December 2016,IIT-BHU.141142 RajagopalanScharf, Peter M. and Malcolm D. Hyman. 2011. Linguistic Issues inEncoding Sanskrit. The Sanskrit Library, Providence and MotilalBanarsidass, Delhi. url: http://sanskritlibrary.org/Sanskrit/pub/lies_sl.pdf.Smith, John. 1998? sscan (part of sktutils.zip). url: http : / / bombay .indology.info/software/programs/index.html.Velankar, H. D. 1949. Jayad\u00C4\u0081man: A collection of ancient texts on SanskritProsody and A Classified List of Sanskrit Metres with an AlphabeticalIndex. Harito\u00E1\u00B9\u00A3am\u00C4\u0081l\u00C4\u0081, pp. 14\u00E2\u0080\u009315.Improving the learnability of classifiers for SanskritOCR correctionsDevaraja Adiga, Rohit Saluja, Vaibhav Agrawal, GaneshRamakrishnan, Parag Chaudhuri, K. Ramasubramanian andMalhar KulkarniAbstract: Sanskrit OCR documents have a lot of errors. Correctingthose errors using conventional spell-checking approaches breaks downdue to the limited vocabulary. This is because of high inflectionsof Sanskrit, where words are dynamically formed by Sandhi rules,Sam\u00C4\u0081sa rules, Taddhita affixes, etc. Therefore, correcting OCR doc-uments require huge efforts. In this paper, we present different ma-chine learning approaches and various ways to improve features forameliorating the error corrections in Sanskrit OCR documents. Wesimulated Subanta Prakara\u00E1\u00B9\u0087am of Vaiy\u00C4\u0081kara\u00E1\u00B9\u0087aSiddh\u00C4\u0081ntaKaumud\u00C4\u00AB forsynthesizing off-the-shelf dictionary. Most of the methods we proposecan also work for general Sanskrit word corrections.1 IntroductionOptical character recognition(OCR) is the process of identifying charactersin document images for creating editable electronic texts. SanskritOCR byIndsenz, Google OCR and Tesseract are major OCRs available for Sanskrit.Word level error analysis for 6 books printed at various places of Indiahaving different fonts scanned with 300 DPI are listed in Table 1. Correctingthe errors manually becomes cumbrous even with the OCR accuracy ashigh as above 90% unless complemented by a mechanism for correctingthe errors. User feedback based OCR correcting mechanisms can improvethrough correcting a contiguous text having a uniform font. We discussdifferent approaches for correcting Sanskrit OCR based on available systemresources.143144 Adiga et alBook Name Publisher DetailsYearofPub-lica-tionNo.ofPagesOCRedWER- Ind-SenzWER-GoogleRaghuvam\u00CC\u0087\u00C5\u009BamSanj\u00C4\u00ABv-in\u00C4\u00ABsametamNir\u00E1\u00B9\u0087aya S\u00C4\u0081garaPress, Mumbai 1929 200 19% 35%N\u00E1\u00B9\u009Bsi\u00E1\u00B9\u0083hap\u00C5\u00ABr-vottarat\u00C4\u0081-pan\u00C4\u00AByopani\u00E1\u00B9\u00A3at\u00C4\u0080nand\u00C4\u0081\u00C5\u009Brama,Pune 1929 160 34% 41%Siddh\u00C4\u0081nta\u00C5\u009Aekhara-1CalcuttaUniversity Press 1932 390 38%* 66%Ga\u00E1\u00B9\u0087aka-Tarangi\u00E1\u00B9\u0087\u00C4\u00ABJyotish PrakashPress, Varanasi 1933 150 34% 46%Siddh\u00C4\u0081nta\u00C5\u009Akhara-2CalcuttaUniversity Press 1947 241 55%* 53%Siddh\u00C4\u0081nta\u00C5\u009Airoma\u00E1\u00B9\u0087iSampurananandaUniversity,Varanasi1981 596 18% 29%Table 1Word Error Rates for Indsenz\u00E2\u0080\u0099s SanskritOCR and Google OCR (*Aftertraining 5 pages)Conventional approaches for spell checking uses Levenshtein-Damerrauedit distance to a known dictionary and auto-corrects the errors using alanguage model (Whitelaw et al. 2009). For post-OCR corrections of lan-guages highly rich in inflections, this naive approach results in poor accu-racy (Sankaran and Jawahar 2013). It primarily depends upon lookups intoa fixed vocabulary. Such vocabulary for Sanskrit is always incomplete be-cause of the complexity arising due to its inflectional nature, tendency todo Sandhi, and highly productive in derivative morphology such as Sam\u00C4\u0081sa,Taddhita, and K\u00E1\u00B9\u009Bdanta.In recent works, encoder-decoder Recurrent Neural Networks (RNNs)with character-based attention have shown state-of-the-art results in Neu-ral Language Correction (Xie et al. 2016). Saluja et al. (2017a) proposeda logistic regression-based machine learning framework for correcting In-Sanskrit OCR 145dic OCRs using dual-engine OCR. For correcting OCRs across four Indiclanguages (Sanskrit, Hindi, Kannada, and Malayalam) in the single-engineenvironment, Saluja et al. (2017b) have succeeded in reaching the state ofart using a special type of RNNs, called Long Short Term Memory Networks(LSTM).OCR Word Corrected Word Ground Truth\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00B6\u00E0\u00A5\u0080\u00E0\u00A4\u0086\u00E0\u00A4\u0086\u00E0\u00A4\u0086\u00E0\u00A4\u0086\u00E0\u00A4\u00B0 \u00E0\u00A5\u008D\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8 \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00B6\u00E0\u00A5\u0080\u00E0\u00A4\u00A3\u00E0\u00A4\u00BE \u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8 \u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00B6\u00E0\u00A5\u0080\u00E0\u00A4\u00A3\u00E0\u00A4\u00BE \u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00EE\u0089\u0094\u00E2\u0097\u008C\u00E0\u00A5\u008D\u00E0\u00A5\u0087\u00E0\u00A5\u00A4\u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D\u00E0\u00A4\u00BF\u00E2\u0097\u008C\u00E0\u00A5\u00AF \u00E0\u00A4\u00B7\u00EE\u0089\u0094\u00E2\u0097\u008C\u00E0\u00A5\u008D\u00E0\u00A5\u008B\u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00E0\u00A4\u00A6\u00E0\u00A4\u00BF\u00E0\u00A5\u008D\u00E2\u0097\u008C \u00E0\u00A4\u00B7\u00EE\u0089\u0094\u00E0\u00A5\u008B\u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B7\u00E0\u00A4\u00BF\u00E0\u00A4\u00A6\u00E0\u00A4\u00AD\u00E0\u00A4\u00A4 \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00AE \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u0086\u00EE\u0089\u00B0\u00E2\u0097\u008C\u00E0\u00A5\u008D\u00E0\u00A5\u0082\u00E0\u00A4\u008F\u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00EE\u0089\u008E\u00E0\u00A4\u00B0\u00E0\u00A4\u0083 \u00E0\u00A4\u00AD\u00E0\u00A4\u00A4 \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00AE \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B0\u00E0\u00A4\u00BE\u00EE\u0089\u00B0\u00E0\u00A4\u00BF\u00E2\u0097\u008C\u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00EE\u0089\u008E\u00E0\u00A4\u00B0\u00E0\u00A4\u0083 \u00E0\u00A4\u00AD\u00E0\u00A4\u00A4 \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00AE \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00EE\u0089\u00B0\u00E0\u00A4\u00A4\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00EE\u0089\u008E\u00E0\u00A4\u00B0\u00E0\u00A4\u0083\u00E0\u00A4\u00AE\u00EE\u0081\u00A2\u00E0\u00A4\u00B2\u00EE\u0089\u00A8\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u008A\u0080\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u0083 \u00E0\u00A4\u00AE\u00EE\u0081\u00A2\u00E0\u00A4\u00B2\u00E0\u00A4\u00A4\u00E0\u00A4\u00AF\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u008A\u0080\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u0083 \u00E0\u00A4\u00AE\u00EE\u0081\u00A2\u00E0\u00A4\u00B2\u00E0\u00A4\u00A4\u00E0\u00A4\u00AF\u00E0\u00A5\u0082 \u00EE\u008C\u0083\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u008A\u0080\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u0083\u00E0\u00A4\u00AE\u00E2\u0097\u008C\u00E0\u00A4\u00BE\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00E0\u00A4\u00AC\u00E0\u00A4\u0082 \u00E0\u00A4\u00AE\u00E0\u00A4\u00B9\u00E0\u00A4\u00BE\u00E0\u00A4\u009A\u00E0\u00A4\u00AC\u00E0\u00A4\u0082 \u00E0\u00A4\u00AE\u00E0\u00A4\u00B9\u00E0\u00A4\u00BE\u00E0\u00A4\u009A\u00E0\u00A4\u00AC\u00E0\u00A4\u0082\u00E0\u00A4\u00A8\u00E2\u0097\u008C\u00E0\u00A5\u0088\u00E0\u00A5\u00AF\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00EE\u0087\u0086 \u00E0\u00A5\u008D\u00E0\u00A4\u00BF\u00E2\u0097\u008C\u00E0\u00A5\u00A6\u00E0\u00A4\u0087\u00E0\u00A4\u00B7\u00EE\u0082\u0096\u00E0\u00A4\u00A4\u00E0\u00A5\u0087 \u00E0\u00A4\u00B9\u00E2\u0097\u008C\u00E0\u00A5\u0088\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00AD \u00E0\u00A5\u008D\u00E0\u00A4\u00BF\u00E2\u0097\u008C\u00E0\u00A4\u00B7 \u00E0\u00A5\u008D\u00E0\u00A4\u00BF\u00E2\u0097\u008C\u00EE\u0082\u0096\u00E0\u00A4\u00A4\u00E0\u00A5\u0087 \u00E0\u00A4\u00B9\u00E0\u00A4\u00B5\u00E0\u00A5\u0088\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00AD\u00E0\u00A4\u00BF\u00E0\u00A4\u00B7\u00EE\u0082\u0096\u00E0\u00A4\u00A4\u00E0\u00A5\u0087\u00EE\u0087\u0086\u00E0\u00A4\u008A\u00E0\u00A5\u0082\u00E0\u00A4\u0089\u00E0\u00A4\u0097\u00E0\u00A4\u00B5\u00E2\u0097\u008C\u00E0\u00A5\u008D\u00E0\u00A5\u0082\u00E2\u0097\u008C\u00E0\u00A4\u00BE\u00EE\u0086\u00A6\u00E2\u0097\u008C\u00E0\u00A5\u008D\u00E0\u00A5\u0081\u00E0\u00A4\u008B\u00E0\u00A4\u0089Sc\u00E0\u00A4\u008A \u00E0\u00A4\u00AD\u00E0\u00A4\u0097\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00EE\u0086\u00A6\u00EE\u0088\u00BA \u00E0\u00A4\u00AD\u00E0\u00A4\u0097\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00EE\u0086\u00A6\u00EE\u0088\u00BATable 2Examples of Sanskrit OCR words corrected by our framework.OCRed data of over 5k Sanskrit document images and 12k in differentlanguages were corrected using our framework - OpenOCRCorrect.The basic dictionary lookup approach requires fewer system resourceswhereas Neural language correcting models demand higher system specifi-cations. So we propose and evaluate different models for correcting SanskritOCR1, starting from simple dictionary lookup to Neural attention models.For building the vocabulary for Sanskrit, we developed a Subanta-generator.We will be using various other auxiliary sources which we will discuss in thenext section. Then in section 4 we discuss the results for various error de-tecting approaches in detail. Suggestion generation will be explained in thefollowing section. Table 2 shows the OCR errors corrected by our frame-work and Figure 1 is a screen-shot of our framework. We are using multicolorcoding to depict compound words, out-of-vocabulary words, auto-correctedwords and correct words.Our contributions in this paper are i) Suggesting different models forerror detection based upon the amount of training data (if the range is 10kand GPU is not available use Plug-in classifier, for a range of 100k withGPU use LSTM and for a range of 1000k with GPU use attention models)1The source code of Sanskrit OCR corrector, OpenOCRCorrect is available athttps://goo.gl/WqoVi2146 Adiga et alFigure 1A screen-shot of our framework.ii) increasing the learnability of ML classifiers by increasing auxiliary sourcesand iii) Comparing the different ML-based and Deep learning-based methodsfor the task of Error detection. We have improved the results of plug-in-classifier by introducing more auxiliary sources and synthesized words.Further, we use attention model to compare the results with LSTM basederror detection.2 Auxiliary SourcesFigure 2 depicts the functionality of human-interactive framework for OCRcorrections. We will be using various auxiliary sources that are helpful inverifying the correct words and curating the word-level errors. Our system isleveraged by OCR data from different systems, dynamically updated OCRconfusions, and domain-specific vocabulary. We are also using a synthesizedoff-the-shelf dictionary. These features are used for supervised learning bytraining a plug-in-classifier for achieving a better F-score. Erroneous wordsare corrected using suggestions through human interaction to keep the con-fidence level high. Later on, words having similar errors are auto-corrected.Sanskrit OCR 147Figure 2Block diagram of our framework.In the following sections, we discuss various auxiliary sources used by theframework.2.1 OCR documents from different systemsSince different OCR systems are using different models they are likely tomake different kinds of errors and are likely to be correct on the OCRwords that they agree upon. This observation is especially leveraged by theensemble-based ML approach (Polikar 2006). Therefore OCR documentsfrom different systems can become a powerful auxiliary source.2.2 Off-the shelf dictionarySince the vocabulary is incomplete for Sanskrit due to rich inflections, wedeveloped a Subanta generator for synthesizing noun variants. A databankof noun variants is available through Huet (2017) which has around 6.5 lakhunique Subanta words. Among the different declension generators, Patel and148 Adiga et alShivakumari Katuri (2015) is an open-source Subanta generator for Sanskrit.We developed a new Subanta generator for the following reasons\u00E2\u0080\u00A2 For ease of integration into the OCR framework\u00E2\u0080\u00A2 For overcoming the errors produced by the existing Subanta generator.Examples from Patel and Sivakumari Katuri (2015) -\u00E2\u0080\u0093 \u00E0\u00A5\u0082\u00E0\u00A4\u00A5\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE \u00E0\u00A4\u008F\u00E0\u00A4\u0095\u00E0\u00A4\u00B5\u00E0\u00A4\u009A\u00E0\u00A4\u00A8\u00E0\u00A4\u00AE \u00E0\u00A5\u008Dfor words ending with \u00E0\u00A4\u008B.\u00E2\u0080\u0093 \u00E0\u00A4\u00BF\u00EE\u0085\u0095\u00E0\u00A4\u00A4\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE \u00E0\u00A4\u00BF\u00EE\u0085\u0095\u00E0\u00A4\u00B5\u00E0\u00A4\u009A\u00E0\u00A4\u00A8\u00E0\u00A4\u00AE \u00E0\u00A5\u008Dfor many of the Sarvan\u00C4\u0081ma\u00C4\u0081s\u00E2\u0080\u0093 Declensions for words ending with \u00E0\u00A4\u00B5\u00E0\u00A4\u00B8 \u00E0\u00A5\u0081 affix are wrong in case of\u00E0\u00A4\u00AD\u00E0\u00A4\u00B8\u00EE\u008C\u0098\u00E0\u00A4\u0082\u00E0\u00A4\u00BE.\u00E2\u0080\u00A2 To have the provision for future enhancementsA\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u0081 rules corresponding to Subanta Prakara\u00E1\u00B9\u0087am and requiredSandhi rules are coded in accordance with the rules explanations as givenin (B. D\u00C4\u00ABk\u00E1\u00B9\u00A3ita, V. D\u00C4\u00ABk\u00E1\u00B9\u00A3ita, and Sarasvat\u00C4\u00AB 2006). For resolving the con-flicts we chose the order of applicability of rules as per the Paribh\u00C4\u0081\u00E1\u00B9\u00A3\u00C4\u0081 -\u00E0\u00A4\u00AA\u00E0\u00A4\u00B0\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00BE\u00EE\u0085\u00B0\u00E0\u00A4\u00B0\u00EE\u0081\u00A2\u00E0\u00A4\u00BE\u00E0\u00A4\u00AA\u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE\u00EE\u0084\u0087\u00E0\u00A5\u0081\u00E0\u00A4\u00B0\u00E0\u00A5\u008B\u00EE\u0084\u0087\u00E0\u00A4\u00B0\u00E0\u00A4\u0082 \u00E0\u00A4\u00AC\u00E0\u00A4\u00B2\u00E0\u00A5\u0080\u00E0\u00A4\u00AF\u00E0\u00A4\u0083. Context dependencies of many rules are re-solved by collecting the context information. For example, for the rule \u00E0\u00A4\u008F\u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u009A\u00E0\u00A5\u008B\u00E0\u00A4\u00AC\u00E0\u00A4\u00B6\u00E0\u00A5\u008B \u00E0\u00A4\u00AD\u00E0\u00A4\u00B7 \u00E0\u00A4\u009D\u00E0\u00A5\u008D\u00E0\u00A4\u00B7\u00EE\u0085\u00B0\u00EE\u0089\u00BD \u00E0\u00A4\u0083\u00EE\u0085\u009E\u00E0\u00A5\u008B\u00E0\u00A4\u0083 (\u00E0\u00A4\u0085.8-2-37), the roots collected are \u00E0\u00A4\u0097\u00E0\u00A4\u00BE\u00E0\u00A4\u00A7 \u00E0\u00A5\u008D, \u00E0\u00A4\u0097\u00E0\u00A4\u00A7\u00E0\u00A5\u0081 \u00E0\u00A5\u008D, \u00E0\u00A4\u0097\u00E0\u00A4\u00A7\u00E0\u00A5\u0083 \u00E0\u00A5\u008D, \u00E0\u00A4\u00A6\u00E0\u00A4\u0098 \u00E0\u00A5\u008D, \u00E0\u00A4\u00A6\u00E0\u00A4\u00A7 \u00E0\u00A5\u008D, \u00EE\u0091\u008D\u00E0\u00A4\u00AD \u00E0\u00A5\u008D,\u00E0\u00A4\u00BF\u00E0\u00A4\u00BE\u00E0\u00A4\u0098 \u00E0\u00A5\u008D, \u00E0\u00A4\u00AC\u00E0\u00A4\u00A7 \u00E0\u00A5\u008D, \u00E0\u00A4\u00AC\u00E0\u00A5\u0080\u00E0\u00A4\u00AD \u00E0\u00A5\u008D, \u00E0\u00A4\u00AC\u00E0\u00A4\u00A7\u00E0\u00A5\u0081 \u00E0\u00A5\u008D. And also the roots \u00E0\u00A4\u0097\u00E0\u00A4\u00BE\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u0097\u00E0\u00A4\u00B9\u00E0\u00A5\u0081,\u00E0\u00A5\u008D \u00E0\u00A4\u0097\u00E0\u00A4\u00B9\u00E0\u00A5\u0083,\u00E0\u00A5\u008D \u00E0\u00A4\u00AE\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00EE\u008C\u00B4\u00E0\u00A4\u00B2\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u00A6\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u00BF\u00E0\u00A4\u00A6\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00EE\u0091\u008B\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00EE\u0091\u008D\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u00BF\u00E0\u00A4\u00BE\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u00BF\u00E0\u00A5\u0081\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D\u00E0\u00A4\u00AC\u00E0\u00A4\u00BE\u00E0\u00A4\u00B9,\u00E0\u00A5\u008D \u00E0\u00A4\u00AC\u00E0\u00A4\u00B9\u00E0\u00A5\u0083 \u00E0\u00A5\u008D are considered after applying \u00E2\u0080\u0098\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u00A6\u00E0\u00A4\u00A7\u00E0\u00A5\u0087\u00E0\u00A4\u00BE \u00EE\u008C\u0083\u00E0\u00A4\u00A4\u00E0\u00A5\u008B\u00E0\u00A4\u0098 \u00EE\u008C\u0083\u00E0\u00A4\u0083\u00E2\u0080\u0099 (\u00E0\u00A4\u0085.8-2-32) or \u00E2\u0080\u0098\u00E0\u00A4\u00B9\u00E0\u00A5\u008B \u00E0\u00A4\u00A2\u00E0\u00A4\u0083\u00E2\u0080\u0099 (\u00E0\u00A4\u0085.8-2-31).An example word where this rule is applied - \u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE\u00E0\u00A4\u00A7\u00E0\u00A4\u0095\u00E0\u00A5\u0081 \u00E0\u00A5\u008D (\u00E0\u00A5\u0082\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00A6\u00E0\u00A4\u0095\u00E0\u00A4\u00AE -\u00E0\u00A5\u008D \u00E0\u00A4\u0095\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE\u00EE\u0091\u008B\u00E0\u00A4\u00B9)\u00E0\u00A5\u008D.For the rules \u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00EE\u0087\u00B4\u00EE\u0089\u00A8\u00E0\u00A4\u00BE\u00EE\u0082\u008C\u00E0\u00A4\u00A4\u00E0\u00A4\u0083\u00E0\u00A5\u0081 (\u00E0\u00A4\u0085.7-1-78), \u00E0\u00A4\u0086\u00EE\u0082\u008C\u00E0\u00A5\u0080\u00E0\u00A4\u00A8\u00EE\u0085\u0092\u00E0\u00A5\u008B\u00E0\u00A4\u00A8 \u00E0\u00A5\u0081\u00EE\u008C\u0083\u00E0\u00A4\u00AE \u00E0\u00A5\u008D(\u00E0\u00A4\u0085.7-1-80) and \u00E0\u00A4\u00B6\u00EE\u0087\u0096\u00E0\u00A4\u00A8\u00E0\u00A5\u008B\u00EF\u0088\u008B\u00E0\u00A4\u00A8\u00EE\u0084\u009A\u00E0\u00A4\u00AE \u00E0\u00A5\u008D(\u00E0\u00A4\u0085.7-1-81), we grouped the participles of roots belonging to different conju-gations accordingly. Similar way we tried to completely/partially solve thecontext dependencies of many rules.We have processed XML file of Monier-Williams Sanskrit Dictionaryavailable in the Cologne Digital Sanskrit Dictionary collections, Insti-tute for Indology and Tamilistics, University of Cologne (http://www.sanskrit-lexicon.uni-koeln.de/download.html) and extracted morethan 1.8 lakh words with the gender information from the XML file. Vib-hakti variants for these words are generated using the Subanta generatorand around 3.2million unique words are generated. We also used the verbswhich are listed in the \u00E0\u00A4\u00BF\u00E0\u00A4\u00AC\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00EE\u0091\u0091\u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00EE\u0089\u009A\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00E0\u00A4\u00A6\u00E0\u00A4\u0095\u00E0\u00A4\u00BE (Verb-forms-Generator) of ILTP-DC(Indian language technology proliferation and deployment center), whichare around 3 lakh unique words. These 3.5 million words are used as anoff-the-shelf dictionary for the OCR corrector.Sanskrit OCR 1492.3 Domain specific vocabularyIn Sanskrit literature frequency of commonly used words changes from one\u00C5\u009A\u00C4\u0081stra to another. So the domain-specific vocabulary is the most powerfulauxiliary resource which will fill the words not found in the off-the-shelf dic-tionary. Domain-specific vocabulary is created by extracting unique stringsfrom the various books available in G\u00C3\u00B6ttingen Register of Electronic Texts inIndian Languages (GRETIL 15.11.2001 - 16.02.2018). This auxiliary sourceis also dynamically updated as the user corrects the document, which helpsin correcting the rest of the document.2.4 Sandhi RulesDue to Sandhi rules and Sam\u00C4\u0081sa, words can change dynamically in Sanskritdocuments. We are using basic Sandhi rules to find the subwords of a com-pound word and to match with the words from the vocabulary for detectingits correctness. A greedy approach is used for this splitting with a mini-mum set of words of maximum length and minimum edit distance as thecriteria. For example, the OCR word \u00E0\u00A4\u009C\u00E0\u00A4\u00BE\u00E0\u00A4\u0097\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4\u00E0\u00A4\u00BE\u00E0\u00A4\u00B5\u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00E0\u00A4\u00AD\u00E0\u00A4\u00B5\u00E0\u00A5\u0087\u00E0\u00A4\u00BE\u00E0\u00A4\u00B5\u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00BD\u00E0\u00A4\u00AF\u00E0\u00A4\u00AE\u00EE\u0080\u0092\u00E0\u00A4\u0082\u00E0\u00A5\u0081 will be splitinto \u00E0\u00A4\u009C\u00E0\u00A4\u00BE\u00E0\u00A4\u0097\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4, \u00E0\u00A4\u0085\u00E0\u00A4\u00B5\u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00E0\u00A4\u00AD \u00E0\u00A5\u008D(this word is matched with \u00E0\u00A4\u0085\u00E0\u00A4\u00B5\u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u00BE\u00E0\u00A4\u00AE)\u00E0\u00A5\u008D, \u00E0\u00A4\u008F\u00E0\u00A4\u00B5, \u00E0\u00A4\u0085\u00E0\u00A4\u00B5\u00EE\u0089\u00B0\u00E0\u00A4\u00BE\u00E0\u00A4\u00BD\u00E0\u00A4\u00AF\u00E0\u00A4\u00AE \u00E0\u00A5\u008Dand \u00E0\u00A4\u0089\u00EE\u0080\u0092\u00E0\u00A4\u0082. This helps in detecting out-of-vocabulary words and generatingsuggestions for them.2.5 Document and OCR specific n-gram confusionsSince different OCR systems use different preprocessing techniques, differentclassifier models, error confusions for a word varies from one OCR engine toanother (Abdulkader and Casey 2009). Thus, the OCR specific confusionscan be helpful in deciding whether the part of the erroneous word should bechanged or not and also in deciding the tie while changing the part of theerroneous word. For example, while changing the erroneous word \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B5\u00EE\u0086\u008F\u00E0\u00A4\u0083, ifthe dictionary lookup suggests \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00AC\u00EE\u0086\u008F\u00E0\u00A4\u0083 and \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B0\u00EE\u0086\u008F\u00E0\u00A4\u0083 as nearest possible words,having higher n-gram confusion to \u00E0\u00A4\u00B5->\u00E0\u00A4\u00AC biases the selection towards \u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00AC\u00EE\u0086\u008F\u00E0\u00A4\u0083.150 Adiga et al3 Methodologies Followed3.1 Learning by Optimizing Performance Measures thoughPlug-in ApproachWe rephrase our basic problem of error detection as that of continuouslyevolving a classifier that labels the OCR of a word as correct or incorrect.The classifier should be trained to optimize a performance measure that isnot necessarily the conventional likelihood function or sum of squares error.An example performance measure to be maximized and that is coherent withour needs of maximizing recall (coverage) in detecting erroneous words whilealso being precise in this detection is the F\u00E2\u0088\u0092score, which, unfortunately,does not decompose over the training examples and can be hard to optimize.We adapt a plug-in approach (Narasimhan, Vaish, and Agarwal 2014) totrain our binary classifier over such non-decomposable objectives while alsobeing efficient for incremental re-training.Consider a simple binary classification problem where the task is toassign every data point x \u00E2\u0088\u0088 X , a binary label y \u00E2\u0088\u0088 {\u00E2\u0088\u00921P+1}. Plug-in-classifiers achieve this by first learning to predict Class Probability Esti-mate (CPE) scores. A function g : X \u00E2\u0086\u0092 [0P 1] is learned such that g(x) \u00E2\u0089\u0088Probability(y = 1). Various tools such as logistic regression may be used tolearn this CPE model g. The final classifier is of the form sign(g(x) \u00E2\u0088\u0092 \u0011)where \u0011 is a threshold that is tuned to maximize the performance measurebeing considered, e.g. F-measure, G-mean etc.In Saluja et al. (2017a), various features based on dictionary n-grams andlanguage rules have been used in Sanskrit, Hindi, and Marathi. Our majorwork in this paper is to improve features for such a classifier and verify theireffect in three different domains in Sanskrit. We use train:val:test ratio as48:12:40 for all our experiments that use plug-in-classifier since we wantedto explore the possibility of using the classifier to correct the last 40% of thebook, once initial 60% of the book is corrected.3.2 LSTM with fixed delayThe basic RNN (Recurrent Neural Network) can be represented by Equa-tions .1 and .2.ht = g(lhhht\u00E2\u0088\u0092) +lhxxt + bh) (.1)Sanskrit OCR 151yt =lyhht (.2)g can be sigmoid(\u001B(xi) = yxp(xi)\u00E2\u0088\u0091j [yxp(xj)]) or tanh (tvnh(x) = 2\u001B(2x) \u00E2\u0088\u0092 1)or Rectified Linear Unit (ReLU) (f(x) = max(0P x)) (Talathi and Vartak.2014). The matrices lhx and lyh connect the input to the hidden layerand hidden layer to output respectively. These matrices are common forinstance in the sequence. The matrix lhh is the feedback from past inputand is responsible for remembrance and forgetfulness of the past sequencebased on context.Equation .2 at each time t can be unfolded back in time, to time t = 1 forthe 1st character of the word sequence, using Equation .1 and the networkcan be trained using back-propagation through time (BPTT) (Mike andPaliwal. 1997).Since we have taken care to ensure equal byte length per letter withASCII transliteration scheme, for the loss function we used negative log-likelihood of Log SoftMax (multi-class) function. The Log SoftMax functionis given in .3, where yti is the value at ith index of output vector yt.f(yti) = log(exp(yti)\u00E2\u0088\u0091j [exp(ytj)]) (.3)The equations are similar for the LSTM with each unit as a memory unit,instead of a neuron. Such memory unit remembers, forgets, and transferscell state to the output(or next state) based on input history. The cell stateat time t is given by equation .4 where the forget gate ft and the input gateit fire according to equations .5 and .6 respectively.xt = ftxt\u00E2\u0088\u0092) + itg)(lhcht\u00E2\u0088\u0092) +lxcxt + bc) (.4)ft = g2(lxfxt +lhfht\u00E2\u0088\u0092) +lcfxt\u00E2\u0088\u0092) + bf ) (.5)it = g2(lxixt +lhiht\u00E2\u0088\u0092) +lcixt\u00E2\u0088\u0092) + bi) (.6)The data is selectively transferred from the cell to hidden state ht ac-cording to equation .7 where the selection is done by the firing of outputgate ot as per equation .8.ht = otg+(xt) (.7)152 Adiga et alg) and g+ are generally tanh and g2 is generally sigmoid.ot = g2(lxoxt +lhoht\u00E2\u0088\u0092) +lcoxt\u00E2\u0088\u0092) + bo) (.8)Saluja et al. (2017b) uses 512 X 2 LSTM with the fixed delay (betweenthe input sequence and output sequence) trained and tested on charactersfrom 86k OCR word correction pairs with train:val:test split as 64:16:20.The model, when trained on large data, is able to learn OCR specific errorpatterns as well as a language model. The model abstains from changing thecorrect word. Thus, for error detection, the word changed by such a modelis marked as incorrect whereas the word unchanged by the model is markedas correct. Such a model over-fits the OCR system and domain. It workswell with the dataset range of a hundred thousand OCR word correctionpairs.3.3 Attention ModelHere, we use the model with more number of layers than LSTM based modeldiscussed in the previous section. Attention models are the models with aseparate encoder as well as a decoder as compared to LSTM based modelwherein the same layers encode as well as decode a sequence. Attentionmodels contain RNN layers as an encoder that can take characters fromOCR words, and similar RNN based decoders decode the encoder\u00E2\u0080\u0099s outputto correct words when trained with a large amount of data. The attentionlayers, which are applied on encoder\u00E2\u0080\u0099s output to help the decoder, learnto give attention to different contexts around the character being correctedbased on the input. We train and test such a model with the 86k OCRword correction pairs used in (Saluja et al. 2017b). We use open-sourcelibrary OpenNMT ( http://opennmt.net/) for this purpose with the de-fault model that includes 500 X 2 LSTM encoder as well as 500 X 2 LSTMdecoder. Such a model is able to learn a dataset of an order of millions ofOCR word correction pairs as per our experiences for French and English inICDAR Post-OCR Competition 2017. Here again, we mark words changedby the model as incorrect for error detection and the words that remainedunchanged as correct. As we will see later, even when trained on 86k pairs,such a model is able to perform close to the LSTM based model for the errordetection task.Sanskrit OCR 1534 Error Detection Methods and Results4.1 Unsupervised approachApproach TP FP TN FN Prec. Recall F-ScoreGen. Dict. 89.18 40.12 59.87 10.82 29.75 89.18 44.61LookupSandhi Rules 54.34 13.23 86.77 45.66 43.89 54.34 48.56Sec. OCR 90.68 23.59 76.40 9.31 42.79 90.68 58.14LookupTable 3Error Detection Results with unsupervised methods. Using Sandhi ruleswhile dictionary lookup increase true detections(TN) but increase falsedetections(FN) as well which is balanced by secondary OCR lookup.We applied various methods for detecting errors in the OCR text. Tostart with, we used the book named \u00E2\u0080\u009C\u00C4\u0080ryabha\u00E1\u00B9\u00AD\u00C4\u00AByabh\u00C4\u0081\u00E1\u00B9\u00A3ya of N\u00C4\u00ABlaka\u00E1\u00B9\u0087\u00E1\u00B9\u00ADhaIII Golap\u00C4\u0081da(Ananta\u00C5\u009BayanaSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009BtaGranth\u00C4\u0081vali\u00E1\u00B8\u00A5, 1957)\u00E2\u0080\u009D for which we hadthe OCR text (OCRed from indsenz) and the ground truth data available.Using unsupervised methods, commonly used dictionary lookup basedapproach gave poor F-Scores due to a lot of correct words marked as errors,i.e. lower True Negative percentage as shown in Table 3. Marking the wordsthat are formed by applying Sandhi rules on dictionary lexicons as correctincreased detection of correct words(True Negatives) but not the detectionof errors (True Positives) as compared to the previous approach. For thisbook, lookup into OCR output of other engine (Google Docs) for the samedocument images improved the F-Score to a decent value.4.2 Single Engine EnvironmentFor supervised learning using the plug-in-classifier as explained in section3.1, we are splitting the data with train:val:test ratio as 48:12:40, we trainthe plug-in-classifier with various features. We are able to improve the row 1results in Table 3 by including frequency of n-grams (upto 8) in the generaldictionary as features. We also include the binary feature based on lookupin a general dictionary. The results are shown in the first row of Table 4.154 Adiga et alApproach TP FP TN FN Prec. Recall F-ScoreClassifier withngramsfrequency +word lookup inGeneral Dict.as features73.38 22.86 77.13 26.61 38.88 73.38 50.83Classifier withngramsfrequency +word lookup inSynthesisedDict.(supersetof gen. dict.) asfeatures74.06 21.02 78.98 25.94 41.14 74.06 52.89Classifier withngramsfrequency +word lookup inSynthesisedDict. as well asDomain Dict.as features66.37 13.08 86.92 33.63 50.38 66.37 57.28Classifier withfeatures in row3 + no. ofSandhicomponents inOCR word asfeatures68.50 13.53 86.47 31.50 50.10 68.50 57.87Table 4ML Classifier\u00E2\u0080\u0099s Error Detection Results in Single Engine Environment.Here we achieved the F-score close to that of Secondary OCR lookup usingUnsupervised approachWe further include more words in the dictionary by synthesizing nounsSanskrit OCR 155and collecting the verbs as explained in 2.2. This helps us to achieve theresults shown in row 2 of Table 4.Adding frequencies of n-grams from OCR word as features from domaindictionary generated as explained in 2.3 along with synthesized dictionaryimproved the results as shown in row 3 of Table 4.For improving the results further as shown in row 3 of Table 4, we usedthree splitting based features. i) Split the OCR words using commonly usedSandhi rules and used the no. of lexicon components obtained from thegeneral dictionary as features. ii) We also used no. of lexicon componentsobtained by splitting the OCR word as lexicons of domain dictionary (forJyoti\u00E1\u00B9\u00A3a) as a feature. Herein, the no. of characters from unknown sub-strings in the OCR word are added to the feature. iii) The product offeatures obtained in (i) and (ii) is also used as the feature. We normalizedall these features about the mean and standard deviation of training data.The results are shown in row 4 of Table 4. It is important to note thathere in single-engine environment we are able to reach closer to the dualengine environment based Secondary OCR Lookup approach given in row 3of Table 3.4.3 Multi Engine EnvironmentWe further include the dual engine OCR agreement as a feature in additionto the features used in previous sections and achieve the results obtainedin Table 5. Here we have used Indsenz as primary OCR engine and Googledocs as secondary OCR engine.We improve the results further by using the feature of dual OCR agree-ment between Indsenz and Tesseract in addition to previous features toobtain the results shown in row 4 of Table 5.We present the results of Plug-in Classifier trained and tested on thedataset of books with different domains in Table 6 for proving its consistencyover various domains. Row 1 in this table shows the baseline for the book\u00E2\u0080\u0098N\u00E1\u00B9\u009Bsi\u00E1\u00B9\u0083hap\u00C5\u00ABrvottarat\u00C4\u0081pan\u00C4\u00AByopani\u00E1\u00B9\u00A3at\u00E2\u0080\u0099 (\u00C4\u0080nand\u00C4\u0081\u00C5\u009BramaSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009BtaGranth\u00C4\u0081vali\u00E1\u00B8\u00A5,1929) and row 2 shows the results achieved using all the features (obtainedusing triple engine environment, off-the-shelf dictionary, domain vocabularyand n-gram frequency from general, synthesized and domain vocabularies).It is important to note that the TP (Errors detected as errors) is high forthe baseline in this case as compared to TP for baseline in other domains.However, TN (Correct words detected as correct) for the dictionary lookup156 Adiga et alApproach TP FP TN FN Prec. Recall F-ScoreClassifier withfeatures intable 4 row 2along with dualengineagreement*85.13 17.84 82.16 14.87 48.62 85.13 61.89Classifier withfeatures intable 4 row 3along with dualengineagreement78.04 13.67 86.33 21.96 53.11 78.04 63.20Classifier withfeatures intable 4 row 4along with dualengineagreement83.49 15.26 84.74 16.51 52.25 83.49 64.28Classifier withfeatures intable 4 row 4along withtriple engineagreements83.43 14.95 85.04 16.56 52.74 83.43 64.63Table 5ML Classifier\u00E2\u0080\u0099s Error Detection Results in Multi Engine Environments.(*state of the art (Saluja et al. 2017a)). Here TP is significantly increasedwhen compared to single engine environment.baselines are however close to each other for all domains as shown in row 1of Table 3 and row 1 and row 3 of Table 6. The reason for high TN couldbe less ambiguity (as compared to other domains) in incorrect words sinceTP (unlike TN) does not depend on the presence of correct OCR words ina dictionary. Hence we are getting F-score as high as 62.87 for the baselinein this case. We also evaluated the system for S\u00C4\u0081hitya domain. For thisSanskrit OCR 157Approach TP FP TN FN Prec. Recall F-ScoreVed\u00C4\u0081nta gen.dict. lookupbaseline85.52 34.35 65.65 14.48 49.71 85.52 62.87Ved\u00C4\u0081nta Plug-inClassifier 79.95 9.80 90.20 20.05 77.95 79.95 78.94S\u00C4\u0081hitya gen. dict.lookup baseline 64.24 35.36 64.64 35.76 32.86 64.24 43.49S\u00C4\u0081hitya Plug-inClassifier 87.88 13.37 86.62 12.12 66.52 87.88 75.72Table 6ML Classifier\u00E2\u0080\u0099s Error Detection Results for other domains. Above resultsshows the generality of the model for different domains of Sanskritliterature.we have used the book \u00E2\u0080\u0098Raghuva\u00E1\u00B9\u0083\u00C5\u009Bam Sanj\u00C4\u00ABvin\u00C4\u00ABsametam\u00E2\u0080\u0099 (Nir\u00E1\u00B9\u0087aya S\u00C4\u0081garaPress, 1929, 1-9 Sarga) and row 3 in table 6 shows the baseline, whereasrow 4 shows the results obtained using our framework.4.4 Deep Neural Network-based approachesApproach TP FP TN FN Prec. Recall F-ScoreLSTM with fixeddelay* 92.64 5.45 94.54 7.36 94.84 92.64 93.72Char. levelAttention model 81.53 7.74 92.26 18.47 91.92 81.53 86.41Table 7Neural Network\u00E2\u0080\u0099s Error Detection Results. (*state of the art (Saluja et al.2017b))Here, in Table 7, we present the results for the approaches described inSections 3.2 and 3.3 for 86k pairs used in (Saluja et al. 2017b) with 64:16:20as train:val:test split. The first row shows the Sanskrit results from (Salujaet al. 2017b). The second row presents the results for the character levelattention model. For the attention model, we use characters from OCR word158 Adiga et aland its preceding OCR word (as context) at input and characters from thecorrect word at the output. We tried other contexts at the input as well.Using the context of characters from one word gave optimized F-Score.F-scores show that using these approaches we can outperform all otherML techniques, but requires a large amount of training data for genericadaptations. Since these models learn error patterns and language basedon the dataset, if the test data differs (in terms of OCR confusions/systemand/or domain from training data), we can make use of approaches men-tioned in the previous sections. Since plug-in-classifier uses general auxiliarysources, we recommend to use it for practical purposes.5 Suggestion GenerationThe results for various ways of exploiting auxiliary sources, to generate ap-propriate suggestions, are given in (Saluja et al. 2017a) for \u00E2\u0080\u009C\u00C4\u0080ryabha\u00E1\u00B9\u00AD\u00C4\u00AByab-h\u00C4\u0081\u00E1\u00B9\u00A3ya of N\u00C4\u00ABlaka\u00E1\u00B9\u0087\u00E1\u00B9\u00ADha III Golap\u00C4\u0081da(1957)\u00E2\u0080\u009D.Here, in Table 8, we show the improvement in results due to adaptationsof domain dictionary and OCR Confusions on-the-fly for \u00E2\u0080\u009C\u00C4\u0080ryabha\u00E1\u00B9\u00AD\u00C4\u00AByab-h\u00C4\u0081\u00E1\u00B9\u00A3ya of N\u00C4\u00ABlaka\u00E1\u00B9\u0087\u00E1\u00B9\u00ADha III K\u00C4\u0081lakriy\u00C4\u0081p\u00C4\u0081da(Ananta\u00C5\u009BayanaSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009BtaGranth\u00C4\u0081vali\u00E1\u00B8\u00A5,1931)\u00E2\u0080\u009D.We synthetically generated word images for the words in Sanskrit dic-tionaries, and OCR-ed them using ind.senz (ind.senz 2014) and extractedaround 0.5 million erroneous-correct word pairs. We used the longest com-mon subsequence algorithm (Hirschberg 1977) for generating around 0.78million OCR character confusions. The row 1 of Table 8 shows the total per-centage of correct suggestions obtained using various auxiliary sources withi) words common to dual OCR systems as Domain Vocabulary throughoutthe document and ii) obtained synthesized confusions. As shown in row 2,we further improved the quality of suggestions by uploading the correcteddomain words on-the-fly after the user corrects the page. Adapting the con-fusions on-the-fly page by page further improved results as shown in row 3.Using real confusions from the primary OCR text and ground truth fromother books further helped in improving results as shown in row 4 of Table 8.Sanskrit OCR 159Sources Included Percentage ofCorrect SuggestionsDomain words with dual OCR agreement+ Synthesized Confusions 36.26Prev. + adapting Domain Words/Page 36.38Prev. + adapting Confusions/Page 37.14Prev. - Synthesized + Real Confusions 39.40Table 8Improvement in Suggestions with Adaptive sources for \u00E2\u0080\u009C\u00C4\u0080ryabha\u00E1\u00B9\u00AD\u00C4\u00AByabh\u00C4\u0081\u00E1\u00B9\u00A3yaof N\u00C4\u00ABlaka\u00E1\u00B9\u0087\u00E1\u00B9\u00ADha III K\u00C4\u0081lakriy\u00C4\u0081p\u00C4\u0081da(Ananta\u00C5\u009BayanaSa\u00E1\u00B9\u0083sk\u00E1\u00B9\u009BtaGranth\u00C4\u0081vali\u00E1\u00B8\u00A5,1931)\u00E2\u0080\u009D.6 ConclusionsIn this paper, we demonstrate different ML approaches for Sanskrit OCRcorrections. Our framework leverages synthesized dictionary, n-gram errorconfusions and domain vocabularies. Error confusions and domain-specificvocabularies grow on-the-fly with user corrections. We have presented amulti-engine environment which is useful in detecting potential errors. Usingvarious auxiliary sources along with plug-in-classifier we succeed in achievingF-Scores better than (Saluja et al. 2017a). LSTM with fixed delay is outper-forming other approaches. Deep neural network-based approaches, however,require higher-level resources like GPU and a large amount of training data.Our system is able to generate correct suggestions for the errors having editdistance as high as 15. As shown in (Saluja et al. 2017a), our GUI is ableto reduce the overall cognitive load of the user by providing adequate colorcoding, generating suggestions, and auto-correcting similar erroneous words.As a future enhancement to the framework, Sandhi splitting using a greedyapproach can be improved with better algorithms.ReferencesAbdulkader, Ahmad and Matthew R. Casey. 2009. \u00E2\u0080\u009CLow Cost Correction ofOCR Errors Using Learning in a Multi-Engine Environment\u00E2\u0080\u009D. In: Pro-ceedings of the 10th international conference on document analysis andrecognition.D\u00C4\u00ABk\u00E1\u00B9\u00A3ita, Bha\u00E1\u00B9\u00AD\u00E1\u00B9\u00ADoj\u00C4\u00AB, V\u00C4\u0081sudeva D\u00C4\u00ABk\u00E1\u00B9\u00A3ita, and J\u00C3\u00B1\u00C4\u0081nendra Sarasvat\u00C4\u00AB. 2006.Vaiy\u00C4\u0081kara\u00E1\u00B9\u0087asiddh\u00C4\u0081ntakaumud\u00C4\u00AB with the commentary B\u00C4\u0081lamanoram\u00C4\u0081 andTattvabodhin\u00C4\u00AB. Motilal Banarasidas.GRETIL. 15.11.2001 - 16.02.2018. G\u00C3\u00B6tingen Register of Electronic Texts inIndian Languages. url: http://gretil.sub.uni-goettingen.de/gretil.htm.Hirschberg, Daniel S. 1977. \u00E2\u0080\u009CAlgorithms for the longest common subse-quence problem\u00E2\u0080\u009D. Journal of the ACM 24.4pp. 664\u00E2\u0080\u0093675.Huet, G\u00C3\u00A9rard. 2017. The Sanskrit Heritage Resources. url: https : / /gitlab.inria.fr/huet/Heritage%5Ctextunderscore%20Resources/.ind.senz. 2014. \u00E2\u0080\u009CSanskritOCR\u00E2\u0080\u009D. http://www.indsenz.com/. Last accessed on01/15/2018.Mike, Schuster and Kuldip K. Paliwal. 1997. \u00E2\u0080\u009CBidirectional recurrent neuralnetworks\u00E2\u0080\u009D. In: IEEE Transactions on Signal Processing.Narasimhan, Harikrishna, Rohit Vaish, and Shivani Agarwal. 2014. \u00E2\u0080\u009COnthe Statistical Consistency of Plug-in Classifiers for Non-decomposablePerformance Measures\u00E2\u0080\u009D. In: Proceedings of NIPS.Patel, Dhaval and Shivakumari Katuri. 2015. \u00E2\u0080\u009CPrakriy\u00C4\u0081pradar\u00C5\u009Bin\u00C4\u00AB - an opensource subanta generator\u00E2\u0080\u009D. In: Sanskrit and Computational Linguistics.D. K. Publishers, New Delhi.Patel, Dhaval and Sivakumari Katuri. 2015. Subanta Generator. Lastaccessed on 09/30/2017. url: http : / / www . sanskritworld . in /sanskrittool/SanskritVerb/subanta.html.Polikar, R. 2006. \u00E2\u0080\u009CEnsemble based systems in decision making\u00E2\u0080\u009D. In: IEEECircuits and Systems Magazine.Saluja, Rohit, Devaraj Adiga, Parag Chaudhuri, Ganesh Ramakrishnan, andMark Carman. 2017a. \u00E2\u0080\u009CA Framework for Document Specific Error De-tection and Corrections in Indic OCR\u00E2\u0080\u009D. 1st International Workshop onOpen Services and Tools for Document Analysis (ICDAR- OST).160Sanskrit OCR 161\u00E2\u0080\u0094 2017b. \u00E2\u0080\u009CError Detection and Corrections in Indic OCR using LSTMs\u00E2\u0080\u009D.International Conference on Document Analysis and Recognition (IC-DAR).Sankaran, Naveen and C.V. Jawahar. 2013. \u00E2\u0080\u009CError Detection in Highly In-flectional Languages\u00E2\u0080\u009D. In: Proceedings of 12th International Conferenceon Document Analysis and Recognition. IEEE, pp. 1135\u00E2\u0080\u00931139.Talathi, Sachin S. and Aniket Vartak. 2014. \u00E2\u0080\u009CImproving performance of re-current neural network with relu nonlinearity\u00E2\u0080\u009D. In: In the InternationalConference on Learning Representations workshop track.Whitelaw, Casey, Ben Hutchinson, Grace Y Chung, and Gerard Ellis. 2009.\u00E2\u0080\u009CUsing the web for language independent spellchecking and autocorrec-tion\u00E2\u0080\u009D. In: Proceedings of the Conference on Empirical Methods in NaturalLanguage Processing: Volume 2. Association for Computational Linguis-tics, pp. 890\u00E2\u0080\u0093899.Xie, Ziang, Anand Avati, Naveen Arivazhagan, Dan Jurafsky, and AndrewY. Ng. 2016. \u00E2\u0080\u009CNeural language correction with character-based atten-tion\u00E2\u0080\u009D. arXiv preprint arXiv:1603.09727.A Tool for Transliteration of Bilingual TextsInvolving SanskritNikhil Chaturvedi and Rahul GargAbstract: Sanskrit texts are increasingly being written in bilingual and trilin-gual formats, with Sanskrit paragraphs or shlokas followed by their cor-responding English commentary. Sanskrit can also be written in manyways, including multiple ramanized encodings such as SLP-1, Velthuis etc.The need to handle code-switching in such texts is exacerbated due to therequirement of rendering web pages with multilingual Sanskrit content.These need to automatically detect whether a given text fragment is in San-skrit, followed by the identification of the form/encoding, further selec-tively performing transliteration to a user specified script. The Brahmi-derived writing systems of Indian languages are mostly rather similar instructure, but have different letter shapes. These scripts are based on sim-ilar phonetic values which allows for easy transliteration. This correspon-dence forms the basis of themotivation behind deriving a uniform encodingschema that is based on the underlying phonetic value rather than the sym-bolic representation. The open-source tool developed by us performs thisend-to-end detection and transliteration, and achieves an accuracy of 99.1%between SLP-1 and English on a Wikipedia corpus using simple machinelearning techniques.1 IntroductionSanskrit is one of the most ancient languages in India and forms the basis of nu-merous Indian languages. It is the only known language which has a built-inscheme for pronunciation, word formation and grammar (Maheshwari 2011). Itis one of the most used languages of it's time and hence encompasses a rich tra-dition of poetry and drama as well as scientific, technical, philosophical and re-ligious texts. Unfortunately, Sanskrit is now spoken by only a small number ofpeople. The aforementioned literature, though available, remains inaccessible tomost of the world. However, in recent years, Sanskrit has shown a resurgence163164 Caturvedi and Gargthrough various media, with people reviving the language over the internet (Nairand Devi 2011) and through bilingual and trilingual texts.There exist numerous web-based application tools that provide age-old San-skrit content to users and assist them with getting an insight into the language.Cologne Sanskrit Dictionary Project (Kapp and Malten 1997) aims to digitize themajor bilingual Sanskrit dictionaries. Sanskrit Reader Companion (Goyal andHuet 2013) by INRIA has tools for declension, conjugation, Sandhi splitting andmerging along with word stemming. Samsadhani (A. Kulkarni 2017) by Univer-sity of Hyderabad supports transliteration, morphological analysis and Sandhi.Sanskrit language processing tools developed at the Jawaharlal Nehru University(Jha 2017) provide a number of tools with the final aim of constructing a Sanskrit-Hindi translator. In this paper, we attempt to construct a transliteration tool torender the web pages of the above tools in multiple scripts and encodings at thebackend. Through this, we aim to expand the reach of Sanskrit to a wider com-munity, along with the standardization of an open-source tool for transliteration.The number of bilingual and trilingual content involving Sanskrit has been ona steady rise. For example, the Gita Supersite (Prabhakar 2005) maintained by IITKanpur serves as a huge bilingual database of the Bhagvad Gita, the Ramacharit-manas andUpanishads. Traditional texts such as Srisa Chandra Vasu's translationof the Ashtadhyayi in English (Vasu 1897) exist in a similar format. These worksbroadly follow a commentary structure with Sanskrit hyms, verses andwords fol-lowed by their translation in popularmodern day languages like English orHindi.Code-switching (Auer 2013) is the practice of moving back and forth between twolanguages, or between two dialects/registers of the same language. Due to theircommentarial nature, multilingual Sanskrit works constitute massive amounts ofcode-switching. For example, an excerpt of the Valmiki Ramayana from Gita Su-persite: \"\u00E0\u00A4\u00A4\u00E0\u00A4\u00AA\u00EE\u008A\u0080\u00E0\u00A5\u0080 ascetic, \u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00EE\u0088\u00A7\u00E0\u00A5\u0080\u00E0\u00A4\u00BF\u00E0\u00A4\u0095: Valmiki, \u00E0\u00A4\u00A4\u00E0\u00A4\u00AA: \u00EE\u008A\u0080\u00E0\u00A4\u00BE\u00EE\u0085\u009C\u00E0\u00A4\u00BE\u00E0\u00A4\u00AF\u00E0\u00A4\u00BF\u00E0\u00A4\u00A8\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4\u00E0\u00A4\u00AE \u00E0\u00A5\u008Dhighly delighted in thepractice of religious austerities and study of vedas, \u00E0\u00A4\u00B5\u00E0\u00A4\u00BE\u00E0\u00A4\u00BF\u00EE\u008C\u00B4\u00E0\u00A4\u00B5\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00E0\u00A4\u0082 \u00E0\u00A4\u00B5\u00E0\u00A4\u00B0\u00E0\u00A4\u00AE \u00E0\u00A5\u008Deloquent amongthe knowledgeable, \u00E0\u00A4\u00AE\u00E0\u00A4\u00BF\u00E0\u00A5\u0081\u00E0\u00A4\u00A8\u00E0\u00A4\u00AA\u00EE\u0081\u00A2\u00E0\u00A5\u0081\u00E0\u00A4\u00B5\u00E0\u00A4\u00AE p\u00E0\u00A5\u008Dreeminent among sages, \u00E0\u00A4\u00A8\u00E0\u00A4\u00BE\u00E0\u00A4\u00B0\u00E0\u00A4\u00A6\u00E0\u00A4\u00AE N\u00E0\u00A5\u008Darada, \u00E0\u00A4\u00AA\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00AA\u00E0\u00A5\u0082\u00EE\u0082\u008C en-quired.\" This motivates the need for a word-level transliteration tool that tacklesareas of code-switching and performs transliteration through an automatic detec-tion of the relevant sections.Romanisation is another phenomenon that has led to the resurgence of San-skrit on the Internet. In linguistics, romanisation is the conversion of writing froma different writing system to the Roman (Latin) script. Multiple methods of thistransliteration have emerged, although none has emerged as the clear standard.These methods include SLP1, Velthuis, Harvard-Kyoto, ISO15919, WX, IAST andNational Library at Kolkata romanisation. Such romanisation makes it easy forTransliteration tool 165large parts of the world population to pronounce and appreciate Sanskrit verses.Therefore, any standardized transliteration tool for Sanskrit needs to support allthe above romanisation encodingsA property of the Sanskrit language and other major Indian languages likeHindi, Marathi, Tamil, Gujarati etc. that forms the basis of our transliteration,is that these languages are written using different letter shapes (scripts) but arerather similar structurally. The same sounds are duplicated across these lan-guages, allowing for easy transliteration. The phonetic sound [ki] (IPA) willbe rendered as \u00E0\u00A4\u00BF\u00E0\u00A4\u0095 in Devanagari, as \u00E0\u00A8\u00BF\u00E0\u00A8\u0095 in Gurmukhi, and as \u00D3\u0090 in Tamil. Eachhaving different code- points in Unicode and ISCII1. This enabled us to formu-late a mediating encoding schema that encodes the sound of a syllable rather thanany syntactical aspect, thus allowing seamless transliteration between any 2 givenscripts.Romanised Sanskrit however exacerbates the problem of code-switching. Therequirement for a general-purpose transliteration tool is now to differentiate be-tween two words of the same script, which turns out to be a non-trivial problem.We again use the intuition of phonetics to overcome this problem. Certain sounds(or sequence of sounds) occur more frequently in some languages than in others.This allows us to formulate the classifier using a simple Naive Bayes model thatfunctions on all possible substrings of a given word. We manage to achieve aclassification accuracy of 99.1% between English and Sanskrit written using SLP1.The rest of the paper is organized as follows. In section 2 we briefly describepresently used encoding and romanisation schemes forwriting Sanskrit texts. Sec-tion 3 describes the prevalent transliteration tools available. Sections 4 and 5 re-spectively describe our transliterator and script detector. In section 6, we presentour results and discuss possible future work.2 Sanskrit Alphabet and EncodingsThe Sanskrit alphabet comprises 5 short (\u00E0\u00A5\u0091\u00EE\u008A\u0080)2 vowels, 8 long (\u00E0\u00A4\u00A6\u00E0\u00A5\u0080\u00E0\u00A4\u0098 \u00EE\u008C\u0083) vowels and 9prolated (\u00EE\u0087\u0093\u00E0\u00A4\u00A4\u00E0\u00A5\u0081) vowels. Each of these vowels can be pronounced in three different1Indian Script Code for Information Interchange (ISCII) is an 8-bit coding scheme forrepresenting the main Indic scripts. Unicode is based on ISCII, and with Unicode beingthe standard now, ISCII has taken a back seat.2In this paper, we use Unicode Devanagari enclosed within round brackets for betterunderstanding through popular Sanskrit terms.166 Caturvedi and Gargways: acute accent (\u00E0\u00A4\u0089\u00E0\u00A4\u00A6\u00E0\u00A4\u00BE\u00EE\u0084\u0087), grave accent (\u00E0\u00A4\u0085\u00E0\u00A4\u00A8\u00E0\u00A4\u00A6\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00EE\u0084\u0087) and circumflex (\u00EE\u008A\u0080\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0\u00E0\u00A4\u00A4). Vowels inacute accent are written as before (\u00E0\u00A4\u0085), in grave accent, a horizontal line is drawnunder them (\u00E0\u00A4\u0085\u00E0\u00A5\u0092) and circumflex vowels arewrittenwith a vertical line drawn abovethem (\u00E0\u00A4\u0085\u00E0\u00A5\u0091). There are 33 consonants including 4 semi-vowels, 3 sibilants and 1 as-pirate (\u00E0\u00A4\u00B9).There are several methods of transliteration from Devanagari to the Romanscript (a process known as romanization) which share similarities, although nosingle system of transliteration has emerged as the standard. SLP1 (P. Scharf 2011)andWX (Bharati et al. 1995) map each Devanagari letter to exactly one ASCII sym-bol. Velthuis (Velthuis 1981) is based on using the ISO 646 repertoire to repre-sent mnemonically the accents used in standard scholarly transliteration. IAST(Trevelyan, Jones, and Williams 1894) incorporates diacritics to represent letters.Harvard-Kyoto (Nakatani 1996) largely resembles SLP1 in terms of using capitalletters in its mapping. ITRANS (Chopde 1991) exists as a pre-processing packageand hence is widely used for electronic documents. ISO15919 (ISO/TC-46 2001)like IAST uses diacritics. A comparison of some of the above schemes was firstpresented in (Huet 2009). A more detailed comparison is also given under Ap-pendix A of this paper. Reader may refer to (P. M. Scharf and Hyman 2012) for athorough analysis and discussion.Unicode has designated code blocks for almost all major Indian scripts. Thesupported scripts are: Assamese, Bengali (Bangla), Devanagari, Gujarati, Gur-mukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu among others. Acrossscripts, Unicode respects alphabet correspondence and letters with similar pho-netic values are assigned the same code-points. As a result, transliteration canbe done easily with a mere offsetting. In Unicode, the Devanagari symbol (\u00E0\u00A4\u0085) iscoded as U+0905, whereas its representation in Gurmukhi script is (\u00E0\u00A8\u0085) which iscoded as U+0A05. In comparison, the symbol (\u00E0\u00A4\u0095) in Unicode Devanagari has itscode as U+0915 while in Gurmukhi is (\u00E0\u00A8\u0095) with the code as U+0A15. Therefore,transliteration of Sanskrit texts written using Unicode in Indian scripts can be eas-ily done by simply changing the offset value.However, the Unicode encoding doesn\u00E2\u0080\u0099t represent the language in its trueessence. Hindi, Sanskrit andmost other Indian languages are centred around pho-netic values. Hence the encoded token should ideally represent the entire soundrather than it being split into different symbols for consonants and vowels. SinceUnicode is based on the display of fonts and not the underlying phonetic struc-ture, it requires significant parsing to figure out anything about the letter from itscorresponding encoding, which section of consonants it belongs to, whether it isvoiced or unvoiced etc. For example, the symbol (\u00E0\u00A5\u0090\u00E0\u00A5\u0080) stands for the consonantsTransliteration tool 167(\u00E0\u00A4\u00B8)\u00E0\u00A5\u008D and (\u00E0\u00A4\u00B0)\u00E0\u00A5\u008D followed by the vowel (\u00E0\u00A4\u0088). Its Unicode representation will consist of (\u00E0\u00A4\u00B8+ \u00E2\u0097\u008C\u00E0\u00A5\u008D + \u00E0\u00A4\u00B0 + \u00E2\u0097\u008C\u00E0\u00A5\u008D + \u00E0\u00A4\u0088). Phonetically 3 units (\u00E0\u00A4\u00B8 a\u00E0\u00A5\u008Dnd \u00E0\u00A4\u00B0 \u00E0\u00A5\u008Dand \u00E0\u00A4\u0088), but represented in Unicodethrough 5 Unicode characters. Our tool fixes this issue by creating a new repre-sentation that encapsulates the consonants and the vowels (or lack of it) in a singleencoding.A comprehensive phonetic encoding (PE) based on the Ashtadhyayi rules forcomputational processing of Sanskrit language has been described in (Sohoni andM. Kulkarni 2015). In order to implement this encoding in a general purposeprogramming language, a 20-bit encoding scheme was proposed. Although thisencoding is phonetically rich, it is possible to compact it into fewer bits withoutcompromising on the essential phonetic information present in the language. Ourproposed internal encoding described in the following sections aims to achievethis goal.Table 1Comparison of existing Transliteration ToolsTool Support for Encodings Bi-lingualSupportOpenSourceUniDev SLP1 ITR Vel. ISO IASTHarv.Kyoto WXSohoniPEITRANS Yes No Yes No No No No No No No NoSanscript Yes Yes Yes No No Yes Yes No No No YesAkshara- Yes No Yes Yes Yes Yes Yes No No No NomukhaSamsaa- Yes Yes Yes Yes No Yes Yes Yes No No YesdhaniiGoogle Yes No No No No No No No No No NoInputProposed Yes Yes Yes Yes Yes Yes Yes Yes No Yes YesTool3 Existing Transliteration ToolsA number of tools exist as of today for Sanskrit transliteration to other scripts andencodings. We present a brief survey of the same. Aksharamukha (Rajan 2001),Sanscript (Prasad 2015) and ITRANS (Chopde 1991) are some of the tools currentlyused for transliteration in Sanskrit. Google Input is another tool that is used totransliterating Devanagari to English. Though Aksharamukha and ITRANS sup-port the romanised forms of Sanskrit, none of the aforementioned tools manageto handle bilingual scenarios. Most of these (except Sanscript) are also not opensource and hence cannot be utilized by Sanskrit Developers. These tools have beensummarised in Table 1.168 Caturvedi and GargInternational Phonetic Alphabet (IPA) is an internationally accepted schemefor encoding phonetic sounds. However, it has a number of representational andbackward transliteration issues because of being completely sound based. The im-ported sounds (\u00E0\u00A4\u00A8\u00EE\u0080\u0092\u00E0\u00A5\u0081\u00E0\u00A4\u00BE) don\u00E2\u0080\u0099t share any correspondence to their roots. The sounds of(\u00E0\u00A4\u008B) and (\u00E0\u00A4\u00BF\u00E0\u00A4\u00B0) have the same representation in IPA, making it impossible to differen-tiate them while translating back. Anuswar (\u00E0\u00A4\u0085\u00E0\u00A4\u00A8\u00EE\u008A\u0080\u00E0\u00A5\u0081\u00E0\u00A4\u00BE\u00E0\u00A4\u00B0) has multiple representationsbased on context, but none is unique to it (m, n, chandra). Visarga (\u00E0\u00A4\u00BF\u00E0\u00A4\u00B5\u00E0\u00A4\u00B8\u00E0\u00A4\u0097 \u00EE\u008C\u0083) has thesame representation as (\u00E0\u00A4\u00B9).Figure 1Model for Web-Based ApplicationsWX and SLP encoding schemes are also phonetic in nature. However,the Sanksrit language alphabet system has a rich structure that categorizes thephonemes according to the place of pronounciation (Gutturals, Palatals, Retroflex,Dentals, Labials), the amount of air exhaled (aspirated or unaspirated) andwhether the consonents are voiced andunvoiced. These attritutes of the phonemesare very useful while carrying out phonologial or morphological processing in thelanguage. It is desirable to have an encoding that represents these attributes of thelanguage in a natural manner.Due to these inefficacies of existing tools andphonetic schemes, we created ourown unified encoding schema which naturally encodes the sounds in the SanskritVarnamala (as described in the next section).Transliteration tool 1694 Design of the Transliterator4.1 Internal RepresentationWe created an internal encoding that represents simple syllables (single consonantfollowed by single vowel) using 16-bits. Initial 5 bits in this encoding represent thescript (hence can support 32 scripts). Next 6 bits represent the consonants (\u00EE\u0088\u00B5\u00E0\u00A4\u009C\u00E0\u00A4\u0082\u00E0\u00A4\u00A8)while the last 5 bits represent the vowel (\u00EE\u008A\u0080\u00E0\u00A4\u00B0/\u00E0\u00A4\u00AE\u00E0\u00A4\u00BE\u00E0\u00A4\u00BD\u00E0\u00A4\u00BE). Each 16-bit code represents aspecific simple syllable sound, which can further be reverse mapped to a specifieddestination script. In contrast, the Unicode representation for a simple Sanskritsyllable would require 32-bits under the UTF-16 encoding, and 48-bits under theUTF-8 encoding.With 33 consonants and 14 vowels, we can encode their permutations usingjust 9-bits versus the 11-bits that we currently are using. But, we preferred to usesome extra bits so as to keep our representation clean and allow for the bits withinthemselves to represent certain nuances of the Sanskrit language. Our encodingrespects the phonetic structure of the language as described by Panini in a mannervery similar to the phonetic encoding (PE) of (Sohoni and M. Kulkarni 2015). Justby using the bit patterns of this encoding, it is possible to figure out importantphonetic characteristics of the letters.For the 5 bits of the vowels, the second-last bit represents whether the vowelis a simple vowel (\u00E0\u00A4\u0085, \u00E0\u00A4\u0087, \u00E0\u00A4\u0089, \u00E0\u00A4\u008B, \u00E0\u00A4\u008C) or a dipthong/compound vowel (\u00E0\u00A4\u008F, \u00E0\u00A4\u0090, \u00E0\u00A4\u0093, \u00E0\u00A4\u0094). Thelast bit of the vowels represent the length of the vowel. Long (\u00E0\u00A4\u00A6\u00E0\u00A5\u0080\u00E0\u00A4\u0098 \u00EE\u008C\u0083) vowels (\u00E0\u00A4\u0086, \u00E0\u00A4\u0088,\u00E0\u00A4\u008A, \u00E0\u00A5\u00A0, \u00E0\u00A5\u00A1, \u00E0\u00A4\u008F, \u00E0\u00A4\u0090, \u00E0\u00A4\u0093, \u00E0\u00A4\u0094) will have their last bit as 1, while short (\u00E0\u00A5\u0091\u00EE\u008A\u0080) vowels (\u00E0\u00A4\u0085, \u00E0\u00A4\u0087, \u00E0\u00A4\u0089, \u00E0\u00A4\u008B,\u00E0\u00A4\u008C) will have their last bit as 0.In the case of consonants, the first 3 bits represent the place of pronunciation ofthe letter. Thus, the sequence 000 refers to the throat as the source and the lettersare called Gutturals (\u00E0\u00A4\u0095,\u00E0\u00A5\u008D \u00E0\u00A4\u0096 \u00E0\u00A5\u008D, \u00E0\u00A4\u0097 \u00E0\u00A5\u008D, \u00E0\u00A4\u0098 \u00E0\u00A5\u008D, \u00E0\u00A4\u0099,\u00E0\u00A5\u008D \u00E0\u00A4\u00B9)\u00E0\u00A5\u008D, 001 refers to the palate and letters are calledPalatals (\u00E0\u00A4\u009A \u00E0\u00A5\u008D, \u00E0\u00A4\u009B,\u00E0\u00A5\u008D \u00E0\u00A4\u009C \u00E0\u00A5\u008D, \u00E0\u00A4\u009D \u00E0\u00A5\u008D, \u00E0\u00A4\u009E \u00E0\u00A5\u008D, \u00E0\u00A4\u00AF \u00E0\u00A5\u008D, \u00E0\u00A4\u00B6)\u00E0\u00A5\u008D. 010 refers to the murdha and are called Retroflexletters (\u00E0\u00A4\u009F,\u00E0\u00A5\u008D \u00E0\u00A4\u00A0,\u00E0\u00A5\u008D \u00E0\u00A4\u00A1,\u00E0\u00A5\u008D \u00E0\u00A4\u00A2,\u00E0\u00A5\u008D \u00E0\u00A4\u00A3 \u00E0\u00A5\u008D, \u00E0\u00A4\u00B0,\u00E0\u00A5\u008D \u00E0\u00A4\u00B7)\u00E0\u00A5\u008D, 011 contains letters with source of origin as the teeth andare called Dentals (\u00E0\u00A4\u00A4 \u00E0\u00A5\u008D, \u00E0\u00A4\u00A5 \u00E0\u00A5\u008D, \u00E0\u00A4\u00A6,\u00E0\u00A5\u008D \u00E0\u00A4\u00A7 \u00E0\u00A5\u008D, \u00E0\u00A4\u00A8 \u00E0\u00A5\u008D, \u00E0\u00A4\u00B2,\u00E0\u00A5\u008D \u00E0\u00A4\u00B8)\u00E0\u00A5\u008D. Lastly, 100 refers to the lips and the lettersare called Labials (\u00E0\u00A4\u00AA \u00E0\u00A5\u008D, \u00E0\u00A4\u00AB,\u00E0\u00A5\u008D \u00E0\u00A4\u00AC \u00E0\u00A5\u008D, \u00E0\u00A4\u00AD \u00E0\u00A5\u008D, \u00E0\u00A4\u00AE \u00E0\u00A5\u008D, \u00E0\u00A4\u00B5)\u00E0\u00A5\u008D, while 101, 110 and 111 are reserved for specialsymbols and accents.As for the last 3 bits of consonants, the first of these is 0 for stop-consonants(\u00EE\u0089\u00B6\u00E0\u00A4\u00B6\u00EE\u008C\u0083) which means non-nasal, non-semivowel and non-sibilant consonants. Thesecond of these bits represents voicing (whether or not the vocal chords vibrate inpronunciation). It is 1 for voiced (\u00E0\u00A4\u0098\u00E0\u00A5\u008B\u00E0\u00A4\u00B7) consonants like (\u00E0\u00A4\u0097 \u00E0\u00A5\u008D, \u00E0\u00A4\u0098)\u00E0\u00A5\u008D while 0 for unvoiced170 Caturvedi and Garg(\u00E0\u00A4\u0085\u00E0\u00A4\u0098\u00E0\u00A5\u008B\u00E0\u00A4\u00B7) consonants like (\u00E0\u00A4\u0095,\u00E0\u00A5\u008D \u00E0\u00A4\u0096)\u00E0\u00A5\u008D. The last of these bits represents aspiration (a puffof air at the end of the pronunciation). It is 1 for aspirated (\u00E0\u00A4\u00AE\u00E0\u00A4\u00B9\u00E0\u00A4\u00BE\u00E0\u00A5\u0082\u00E0\u00A4\u00BE\u00E0\u00A4\u00A3) consonants(\u00E0\u00A4\u0096 \u00E0\u00A5\u008D, \u00E0\u00A4\u0098)\u00E0\u00A5\u008D while 0 for unaspirated (\u00E0\u00A4\u0085\u00EE\u0088\u00A0\u00E0\u00A5\u0082\u00E0\u00A4\u00BE\u00E0\u00A4\u00A3) consonants (\u00E0\u00A4\u0095,\u00E0\u00A5\u008D \u00E0\u00A4\u0097)\u00E0\u00A5\u008D. A table describing theproposed encoding is given in Appendix B.Figure 2Model for Bilingual Texts4.2 Transliterator PipelineThe transliterator takes a bilingual (or trilingual) document as its input and pro-duces an output document in the same format where the Sanskrit text is tran-scribed into the specified script. It consists of 5 stages, namely fragmentation,script detection, tokenisation, universalisation and specification, explained below.Fragmentation refers to splitting the given text into smaller fragments(words, sentences, paragraphs etc). The assumption shall be that the script andencoding remain same through these fragments if not through the entire text. Inorder to make it most general, currently fragmentation is done at the word level.Script Detection refers to identification of the language, scripts and encod-ings for the various fragments through a Naive Bayes model described in section5.Tokenisation refers to splitting the fragment further into tokens, each ofwhich represent a single sound. It is similar to the concept of English syllables. Sothe sound [ki] will be seen as one single token under this model.Universalisation refers to the conversion of the token to the universal 16-bit encoding designed by us. This is done through pre-populated hash maps fordifferent script tokens.Transliteration tool 171Specification refers to the conversion of the universal encoding to the spec-ified script using pre-populated hash maps.4.3 Use CasesPredEnglishPredSLP-1 RecallActualEnglish 72294 1605 97.8%ActualSLP-1 213 25004 99.2%Precision 99.7% 93.9% 98.2%Table 2Confusion matrix of English vs Sanskrit-SLP-1 without proper nouncorrection4.3.1 Web-based ApplicationsOne of the foremost uses of our transliteration tool is it's utility for web-basedapplications. A number of websites nowadays serve the historical epics like theGita and the Ramayana that were originally written in Sanskrit. Along with this,many websites also provide an avenue for people to learn Sanskrit grammar, un-derstand conjugation and splitting of words, along with explaining the variousforms of Sanskrit verb roots. Such websites are as of now available only in the De-vanagari script. Our tool can be used to transliterate these pages to a user definedscript/encoding at the backend. The model for this use case has been depictedin Figure 1. We insert our tool as a middle-ware between the backend and thefrontend. The user specifies his required script/encoding on the frontend and alloutgoing pages from the server pass through our tool while getting converted tothat required script. The frontend then renders the converted HTML to the userfor a seamless experience.4.3.2 Bilingual TextsNumerous Sanskrit texts have been modified to bilingual and trilingual textsthrough their translation to popular modern languages like English and Hindi.172 Caturvedi and GargThese works exist in a commentary form and incorporate massive amounts ofcode-switching. To represent any such text in a script different to that of its ori-gin turns out to be an ordeal because the tool needs to conditionally perform thetransliteration at a micro-level. This problem gets exacerbated when the Sanskritverses are written using their Romanised form while the translation language isEnglish. Figure 2 depicts the model for this use case.4.3.3 User DrivenThe third use for our tool is on the lines of Google input tools. Our tool can al-low a user to enter a line of Sanskrit (in any script) intertwined with English andwill output the resulting sentence to the user after transliteration. This not onlyprovides an unmatched amount of flexibility to the user, but also has abundantrelevance in the growing age of multi-lingual social media.Basline67.2%PredEnglishPredSLP-1 RecallActualEnglish 73178 721 99.0%ActualSLP-1 205 25012 99.2%Precision 99.7% 97.2% 99.1%(a) English vs SLP-1Basline58.6%PredEnglishPredVelthuis RecallActualEnglish 72649 1250 98.3%ActualVelthuis 860 24357 96.6%Precision 98.8% 95.1% 97.9%(b) English vs VelthuisTable 3Confusion matrix of English vs Sanskrit using different Romanisationschemata Part-1Transliteration tool 173Basline51.1%PredEnglishPredITRANS RecallActualEnglish 72778 1121 98.5%ActualITRANS 645 24572 97.4%Precision 99.1% 95.6% 98.2%(a) English vs ITRANSBasline68.5%PredEnglishPredHK RecallActualEnglish 73269 630 99.1%ActualHK 199 25018 99.2%Precision 99.7% 97.5% 99.2%(b) English vs Harvard-KyotoBasline73.4%PredEnglishPredISO RecallActualEnglish 73576 323 99.6%ActualISO 94 25123 99.6%Precision 99.9% 98.7% 99.6%(c) English vs ISO15919Basline71.5%PredEnglishPredIAST RecallActualEnglish 73368 531 99.3%ActualIAST 111 25106 99.6%Precision 99.8% 97.9% 99.4%(d) English vs IASTTable 4Confusion matrix of English vs Sanskrit using different Romanisationschemata Part-2174 Caturvedi and Garg5 Design of the Script DetectorDifferentiating English from Indian scripts, or differentiating different Indianscripts is easy as each uses a different alphabet with a different Unicode range.Hence, one can easily achieve a Word-level classifier with 100% accuracy. How-ever, differentiating English text from Romanized Sanskrit/Hindi texts requireslearning, specially to be able to do such classification at word-level. For this wedesigned a modified Naive Bayes classifier described next.5.1 Modified Naive-Bayes ClassifierWhile learning, two dictionaries are maintained. The first dictionary compiles allseen complete words, while the other forms an occurrence database of all possiblesubstrings of length <= 10. The intuition is that certain sounds (or sequence ofsounds) occur more frequently in some languages then the others.For a word, define the absolute frequency of a word as the actual number ofoccurrences for that word for a given language in the training dataset. On theother hand, the relative frequency of a given word is defined as its fraction ofoccurrences in the given language versus all other languages under consideration.While classifying, if the word is seen and the absolute as well as relative frequencyis above a pre-set threshold for a particular language in training data, we classifyit as that language. We use the relative frequency metric to account for mixedlanguage nature of Wikipedia pages used as our dataset.If the classifier encounters an unseen word, it is broken into all possible sub-strings of length >= 2 and length <= 10. Subsequently, the probability of seeing asubstring given a language, p(substr | lang), over all substrings of word using thetrained substring dictionary is computed. This is a simplified version of the NaiveBayes model for the problem at hand. The word is classified to the language forwhich this metric turns out to be the maximum.5.2 Training and Test DataTraining Data: One thousand random Wikipedia pages for both English andSanskrit were used as the training data. The Sanskrit pages were converted to dif-ferent Romanised Sanskrit encodings (such as SLP-1) using our universal encoder.We then parse out the irrelevant HTML meta-data and tags, stripping it down tojust plain text content.Test Data: One hundred more such random pages for both languages wereused as the test data.Transliteration tool 1756 Results and Future WorkWe tested our word-level language detection model on 100 random SanskritWikipedia pages (after converting them to the 6 most popular romanisationschemes of SLP1, Velthuis, ITRANS, Harvard-Kyoto, ISO15919 and IAST). Dur-ing our testing, we discovered that multiple English proper nouns like 'Bopanna'or 'Kannada' were getting classified as SLP-1 leading to a lower recall for English.In our opinion, such a misclassification aligns with the intention of the tool as itclassifies the origin based on the prevalent sounds in the word. For Indian propernouns appropriated to English these sounds still remain similar to those of theirSanskrit roots, and hence rather should be classified as that. These earlier resultsare presented in Table 2.The final confusionmatrices, obtained after manually removing proper nounsfrom the training and test dataset, are shown in Table 4. Each scheme shown has acorresponding baseline to compare our results with, shown in the top left cell. ForSLP1, this baseline was the existence of a capital letter in themiddle of a word. ForVelthuis, it was the existence of a full stop in the middle of a word or the existenceof doubly repeated vowels. For ITRANS, the baseline was similar to Velthuis,with repeated 'L' and 'R' instead of full stop. For Harvard-Kyoto, we selected thebaseline as capital in the middle of the word alongside repeated 'L' and 'R'. Lastly,for ISO15919 and IAST, it was kept as the existence of a letter beyond the simpleEnglish letters and punctuation within a word.As one can notice in Table 4, we in general attain a high precision for Englishand a high recall for the romanised words. A large number of misclassified wordsin both the English and SLP1 cases are 2-3 letter words. 'ati', 'ca' etc. are examplesof SLP1 words misclassified as English, while 'Raj', 'Jan', 'are' etc. are examples ofEnglish words misclassified as SLP1. For these words, the modified Naive-Bayesmodel does not end up having enough information for correct classification.We also tested our tool on a bilingual text test case by converting a extractfrom an English commentary on Ramayana from Gita-supersite (Prabhakar 2005)to an mixture of SLP-1 and English. Subsequently, we converted the previousresult back to Unicode Devanagari and English to see its differences with the orig-inal text. As can be seen in Figure 3, the transliteration from Devanagari-Englishto SLP1-English has a 100% accuracy due to our tool exploiting the difference inUnicode for the two scripts.Our tool is available athttps://github.com/709nikhil/sanskrit-transliteration. This toolcan be further improved in several ways. The primary one being heuristically176 Caturvedi and Garg(a) Original Bilingual Paragraph(b) Devanagari selectively transcribed to SLP1(c) SLP1-English transcribed back to Devanagari-EnglishFigure 3Transliteration of Bilingual Textsbreaking down word into syllables rather than substrings, to provide a strongerbasis for the phoneme intuition. One could also use machine learning approachesother than Naive Bayes, such as deep learning methods or conditional randomfields (CRFs) (Lafferty, McCallum, and Pereira 2001). One could also incorporatecontextual history into the transliteration to deal with the problem of incorrectclassification of proper nouns, thereby aiming at a near perfect accuracy.AcknowledgmentsWe thank the anonymous referees and the editors for their meticulous commentson the manuscript which helped in significantly improving the quality of the finalpaper.ReferencesAuer, Peter. 2013. Code-switching in conversation: Language, interactionand identity. Daryaganj, Delhi, India: Routledge.Bharati, Akshar, Vineet Chaitanya, Rajeev Sangal, and KV Ramakrishna-macharyulu. 1995. Natural language processing: a Paninian perspective.Delhi, India: Prentice-Hall of India, pp. 191\u00E2\u0080\u0093193.Chopde, Avinash. 1991. Indian languages TRANSliteration (ITRANS).https://www.aczoom.com/itrans/.Goyal, Pawan and G\u00C3\u00A9rard Huet. 2013. \u00E2\u0080\u009CCompleteness analysis of a San-skrit reader\u00E2\u0080\u009D. In: Proceedings of 5th International Symposium on San-skrit Computational Linguistics. DK Printworld (P) Ltd. IIT Bombay,India, pp. 130\u00E2\u0080\u0093171.Huet, G\u00C3\u00A9rard. 2009. \u00E2\u0080\u009CFormal structure of Sanskrit text: Requirements anal-ysis for a mechanical Sanskrit processor\u00E2\u0080\u009D. In: Proceedings of 3rd Inter-national Symposium on Sanskrit Computational Linguistics (LNAI, vol5402). University of Hyderabad, India, pp. 162\u00E2\u0080\u0093199.ISO/TC-46. 2001. ISO 15919 - Transliteration of Devanagari and relatedIndic scripts into Latin characters. https://www.iso.org/standard/28333.html.Jha, G. N. 2017. Sanskrit Sandhi recognizer and analyzer. http : / /sanskrit.jnu.ac.in/sandhi/viccheda.jsp.Kapp, Dieter B and Thomas Malten. 1997. Report on the Cologne San-skrit Dictionary Project. Read at 10th International Sanskrit Conference,Bangalore.Kulkarni, Amba. 2017. Samsadhani: A Sanskrit computational toolkit. http://sanskrit.uohyd.ac.in/scl/.Lafferty, John, Andrew McCallum, and Fernando CN Pereira. 2001. \u00E2\u0080\u009CCon-ditional random fields: Probabilistic models for segmenting and labelingsequence data\u00E2\u0080\u009D. In: Proceedings of the 18th International Conference onMachine Learning (ICML\u00E2\u0080\u009901). Williamstown, MA, USA, pp. 282\u00E2\u0080\u0093289.Maheshwari, Krishna. 2011. \u00E2\u0080\u009CFeatures of Sanskrit\u00E2\u0080\u009D. Hindupedia.Nair, R Raman and L Sulochana Devi. 2011. Sanskrit Informatics: Infor-matics for Sanskrit studies and research. Centre for Informatics Researchand Development.177178 Caturvedi and GargNakatani, H. 1996. Harvard Kyoto. https://en.wikipedia.org/wiki/Harvard-Kyoto.Prabhakar, T.V. 2005. Gita Supersite : Repository of Indian philosophicaltexts. https://www.gitasupersite.iitk.ac.in.Prasad, V. K. 2015. Sanscript. http://www.learnsanskrit.org/tools/sanscript.Rajan, Vinodh. 2001. Aksharmukha. http://www.virtualvinodh.com/wp/aksharamukha/.Scharf, Peter. 2011. Sanskrit Library Phonetic Basic encoding scheme(SLP1). https://en.wikipedia.org/wiki/SLP1.Scharf, Peter M and Malcolm Donald Hyman. 2012. Linguistic issues inencoding Sanskrit. Kamla Nagar, New Delhi, India: Motilal BanarsidassPublishers.Sohoni, Samir and Malhar Kulkarni. 2015. \u00E2\u0080\u009CCharacter Encoding for Compu-tational Ashtadhyayi\u00E2\u0080\u009D. In: Proceedings of 16th World Sanskrit Conference(WSC\u00E2\u0080\u009915): Sanskrit and the IT world. Bangkok.Trevelyan, Charles, William Jones, and Monier Williams. 1894. Interna-tional Alphabet of Sanskrit Transliteration. https://en.wikipedia.org/wiki/International_Alphabet_of_Sanskrit_Transliteration.Vasu, Srisa Chandra. 1897. The Ashtadhyayi of Panini. Rabindra Nagar,New Delhi, India: Sahitya Akademi.Velthuis, Frans. 1981. Velthuis. https : / / en . wikipedia . org / wiki /Velthuis.Transliteration tool 179Appendix A: Comparison of various DevanagariRomanisationsDevanagari Unicode Velthius SLP-1 WX ITRANS Harvard-Kyoto IAST ISO-15919\u00E0\u00A4\u0085 U+0905 a a a a a a a\u00E0\u00A4\u0086 U+0906 aa A A A/aa A \u00C4\u0081 \u00C4\u0081\u00E0\u00A4\u0087 U+0907 i i i i i i i\u00E0\u00A4\u0088 U+0908 ii I I I/ii I \u00C4\u00AB \u00C4\u00AB\u00E0\u00A4\u0089 U+0909 u u u u u u u\u00E0\u00A4\u008A U+090A uu U U U/uu U \u00C5\u00AB \u00C5\u00AB\u00E0\u00A4\u008F U+090F e e e e e e \u00C4\u0093\u00E0\u00A4\u0090 U+0910 ai E E ai ai ai ai\u00E0\u00A4\u0093 U+0913 o o o o o o \u00C5\u008D\u00E0\u00A4\u0094 U+0914 au O O au au au au\u00E0\u00A4\u008B U+090B .r f q RRi/R\u00C3\u00AE R \u00E1\u00B9\u009B r\u00EF\u00BF\u00BF\u00E0\u00A5\u00A0 U+0960 .rr F Q RRI/R\u00C3\u008E RR \u00E1\u00B9\u009D r\u00EF\u00BF\u00BF\u00E0\u00A4\u008C U+090C .l x L LLi/L\u00C3\u00AE lR \u00E1\u00B8\u00B7 l\u00EF\u00BF\u00BF\u00E0\u00A5\u00A1 U+0961 .ll X LLI/L\u00C3\u008E lRR \u00E1\u00B8\u00B9 l\u00EF\u00BF\u00BF\u00CC\u0084\u00E0\u00A4\u0085\u00E0\u00A4\u0082 U+0902 .m M M M/.n/.m M \u00E1\u00B9\u0083 m\u00CC\u0087\u00E0\u00A4\u0085\u00E0\u00A4\u0083 U+0903 .h H H H H \u00E1\u00B8\u00A5 \u00E1\u00B8\u00A5\u00E0\u00A4\u0085\u00E0\u00A4\u0081 U+0904 ~ z .N m\u00EF\u00BF\u00BF\u00E0\u00A4\u00BD U+093D .a \u00E2\u0080\u0099 \u00E2\u0080\u0099 .a \u00E2\u0080\u0099 \u00E2\u0080\u0099 \u00E2\u0080\u0099\u00E0\u00A4\u0095 U+0915 ka ka ka ka ka ka ka\u00E0\u00A4\u0096 U+0916 kha Ka Ka kha kha kha kha\u00E0\u00A4\u0097 U+0917 ga ga ga ga ga ga ga\u00E0\u00A4\u0098 U+0918 gha Ga Ga gha gha gha gha\u00E0\u00A4\u0099 U+0919 \u00E2\u0080\u009Dna Na fa Na Ga \u00E1\u00B9\u0085a \u00E1\u00B9\u0085a\u00E0\u00A4\u009A U+091A ca ca ca cha ca ca ca\u00E0\u00A4\u009B U+091B cha Ca Ca Cha cha cha cha\u00E0\u00A4\u009C U+091C ja ja ja ja ja ja ja\u00E0\u00A4\u009D U+091D jha Ja Ja jha jha jha jha\u00E0\u00A4\u009E U+091E na Ya Fa na Ja \u00C3\u00B1a \u00C3\u00B1a\u00E0\u00A4\u009F U+091F .ta wa ta Ta Ta \u00E1\u00B9\u00ADa \u00E1\u00B9\u00ADa\u00E0\u00A4\u00A0 U+0920 .tha Wa Ta Tha Tha \u00E1\u00B9\u00ADha \u00E1\u00B9\u00ADha\u00E0\u00A4\u00A1 U+0921 .da qa da Da Da \u00E1\u00B8\u008Da \u00E1\u00B8\u008Da\u00E0\u00A4\u00A2 U+0922 .dha Qa Da Dha Dha \u00E1\u00B8\u008Dha \u00E1\u00B8\u008Dha\u00E0\u00A4\u00A3 U+0923 .na Ra Na Na Na \u00E1\u00B9\u0087a \u00E1\u00B9\u0087a\u00E0\u00A4\u00A4 U+0924 ta ta wa ta ta ta Ta\u00E0\u00A4\u00A5 U+0925 tha Ta Wa tha tha tha Tha\u00E0\u00A4\u00A6 U+0926 da da xa da da da Da\u00E0\u00A4\u00A7 U+0927 dha Da Xa dha dha dha Dha\u00E0\u00A4\u00A8 U+0928 na na na na na na na\u00E0\u00A4\u00AA U+092A pa pa pa pa pa pa pa\u00E0\u00A4\u00AB U+092B pha Pa Pa pha pha pha pha\u00E0\u00A4\u00AC U+092C ba ba ba ba ba ba ba\u00E0\u00A4\u00AD U+092D bha Ba Ba bha bha bha bha\u00E0\u00A4\u00AE U+092E ma ma ma ma ma ma ma\u00E0\u00A4\u00AF U+092F ya ya ya ya ya ya ya\u00E0\u00A4\u00B0 U+0930 ra ra ra ra ra ra ra\u00E0\u00A4\u00B2 U+0932 la la la la la la la\u00E0\u00A4\u00B5 U+0935 va va va va/wa va va va\u00E0\u00A4\u00B6 U+0936 \u00E2\u0080\u009Dsa Sa Sa sha za \u00C5\u009Ba \u00C5\u009Ba\u00E0\u00A4\u00B7 U+0937 .sa za Ra Sha Sa \u00E1\u00B9\u00A3a \u00E1\u00B9\u00A3a\u00E0\u00A4\u00B8 U+0938 sa sa sa sa sa sa sa\u00E0\u00A4\u00B9 U+0939 ha ha ha ha ha ha Ha180 Caturvedi and GargAppendix B: Proposed Encoding SchemaThe characters in our encoding schema are represented in 16 bits as follows:b)5:::b))-b)(b1b0-b7b6b5-b4b+b2-b)b(. The bits b)5 \u00E2\u0088\u0092 b)) are used to represent thescript. The remainder bits are represented as given in the tables 5 and 6 below.Null for the consonant part represents a pure vowel, whereas null for the vowelpart represents consonants without a vowel sound. For example, \u00E0\u00A4\u0085 is representedas 00000-111-111-000-00. The symbol \u00EE\u008C\u0097 which is broken up as (\u00E0\u00A4\u0095 \u00E0\u00A5\u008D + \u00E0\u00A4\u00B7 = \u00E0\u00A4\u0095 \u00E0\u00A5\u008D + \u00E0\u00A4\u00B7 +\u00E0\u00A5\u008D \u00E0\u00A4\u0085)will be represented as two 16-bit units, 00000-000-000-111-11 representing (\u00E0\u00A4\u0095)\u00E0\u00A5\u008D and00000-010-101-000-00 representing (\u00E0\u00A4\u00B7).b7b6b5b 10b 9b 8000 001 010 011 100 101 110 111000 \u00E0\u00A4\u0095 \u00E0\u00A5\u008D \u00E0\u00A4\u0096 \u00E0\u00A5\u008D \u00E0\u00A4\u0097 \u00E0\u00A5\u008D \u00E0\u00A4\u0098 \u00E0\u00A5\u008D \u00E0\u00A4\u00B9 \u00E0\u00A5\u008D \u00E0\u00A4\u0099\u00E0\u00A5\u008D001 \u00E0\u00A4\u009A \u00E0\u00A5\u008D \u00E0\u00A4\u009B \u00E0\u00A5\u008D \u00E0\u00A4\u009C \u00E0\u00A5\u008D \u00E0\u00A4\u009D \u00E0\u00A5\u008D \u00E0\u00A4\u00B6 \u00E0\u00A5\u008D \u00E0\u00A4\u009E \u00E0\u00A5\u008D \u00E0\u00A4\u00AF \u00E0\u00A5\u008D010 \u00E0\u00A4\u009F \u00E0\u00A5\u008D \u00E0\u00A4\u00A0 \u00E0\u00A5\u008D \u00E0\u00A4\u00A1 \u00E0\u00A5\u008D \u00E0\u00A4\u00A2 \u00E0\u00A5\u008D \u00E0\u00A4\u00B7 \u00E0\u00A5\u008D \u00E0\u00A4\u00A3 \u00E0\u00A5\u008D \u00E0\u00A4\u00B0 \u00E0\u00A5\u008D011 \u00E0\u00A4\u00A4 \u00E0\u00A5\u008D \u00E0\u00A4\u00A5 \u00E0\u00A5\u008D \u00E0\u00A4\u00A6 \u00E0\u00A5\u008D \u00E0\u00A4\u00A7 \u00E0\u00A5\u008D \u00E0\u00A4\u00B8 \u00E0\u00A5\u008D \u00E0\u00A4\u00A8 \u00E0\u00A5\u008D \u00E0\u00A4\u00B2 \u00E0\u00A5\u008D100 \u00E0\u00A4\u00AA \u00E0\u00A5\u008D \u00E0\u00A4\u00AB \u00E0\u00A5\u008D \u00E0\u00A4\u00AC \u00E0\u00A5\u008D \u00E0\u00A4\u00AD \u00E0\u00A5\u008D \u00E0\u00A4\u00AE \u00E0\u00A5\u008D \u00E0\u00A4\u00B5 \u00E0\u00A5\u008D101 \u00E0\u00A5\u0098 \u00E0\u00A5\u008D \u00E0\u00A5\u0099 \u00E0\u00A5\u008D \u00E0\u00A5\u009A \u00E0\u00A5\u008D \u00E0\u00A5\u009B \u00E0\u00A5\u008D \u00E0\u00A5\u009C \u00E0\u00A5\u008D \u00E0\u00A5\u009D \u00E0\u00A5\u008D \u00E0\u00A5\u009E \u00E0\u00A5\u008D110 Udatta Anudatta \u00E2\u0097\u008C\u00E0\u00A4\u0083 \u00E2\u0097\u008C\u00E0\u00A4\u0082 \u00E2\u0097\u008C\u00E0\u00A4\u0081 \u00E0\u00A4\u00BD111 Latin Punc. Num. Vaid. NullTable 5Mapping for 6 consonant bitsb1b0b 4b 3b 200 01 10 11000 \u00E0\u00A4\u0085 \u00E0\u00A4\u0086001 \u00E0\u00A4\u0087 \u00E0\u00A4\u0088 \u00E0\u00A4\u008F010 \u00E0\u00A4\u008B \u00E0\u00A5\u00A0 \u00E0\u00A4\u0090011 \u00E0\u00A4\u008C \u00E0\u00A5\u00A1 \u00E0\u00A4\u0093100 \u00E0\u00A4\u0089 \u00E0\u00A4\u008A \u00E0\u00A4\u0094101110111Table 6Mapping for 5 vowel bitsModeling the Phonology of Consonant Duplicationand Allied Changes in the Recitation of TamilTaittir\u00C4\u00AByaka-sBalasubramanian RamakrishnanAbstract: The phonetics of the Vedas are described by the pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyaand s\u00C2\u00B4iks.a\u00C2\u00AF texts. Each Veda has its own pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya as well as specifics\u00C2\u00B4iks.a\u00C2\u00AF texts. There is also a Pa\u00C2\u00AFn. in\u00C2\u00AF\u00C4\u00B1ya s\u00C2\u00B4iks.a\u00C2\u00AF, which talks about generalrules applicable to all Veda-s. While there are similarities betweenthe various pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya and s\u00C2\u00B4iksa\u00C2\u00AF texts, there also tend to be impor-tant differences, leading to differences in the modes of chanting theVeda-s. Some differences are obvious, but a significant percentage ofthe differences can be detected only by the trained ear, consonant du-plication being in the latter category. While consonant duplication isnot always faithfully followed by reciters of classical Sanskrit verses, byeven trained pan.d. ita-s, duplication is faithfully preserved, largely ad-hering to the Taittir\u00C2\u00AF\u00C4\u00B1ya-Yajuh. Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam (TYP) rules, especiallyby the Tamil Taittir\u00C2\u00AF\u00C4\u00B1yaka-s (TT). Additional rules are to be found invarious s\u00C2\u00B4iks.a\u00C2\u00AF texts and some rules are known only from traditionalpractice. It is important to study the printed texts of the TTs whichuse the Grantha script since they offer a very concise representationof some unique duplication rules. Finally, analysis of actual recitationby experts, or field-study, is also required to completely understandthe phonetic rules.The aim of this paper can be summarized as:1. describe duplication rules among TTs, point out where they devi-ate from the TYP, compare and contrast with other pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s and Pa\u00C2\u00AFn. ini,2. develop an algebraic formulation of TT duplication rules,3. develop a Non-Deterministic Finite State Transducer (ND-FST)model from the algebraic formulation, and finally4. a Perl implementation of the model as a tool to study TT du-plication rules.181182 B Ramakrishnan1 IntroductionConsonant duplication is one of the truly arcane topics in Sanskrit. Thispaper is about consonant duplication among TTs, a phonological process,when conjunct consonants (sam. yuta\u00C2\u00AFks.ara) occur. While it should be clear,it is worth mentioning that this is different from the duplication of syllablesin a verbal root form, e.g., the third class of verbs reduplicated aorist, etc.,which are morphological processes. Consonant duplication occurs when spe-cific groupings of vowels and consonants (to be specified later) occur. Theduplicated syllable is always a full consonant and never a vowel and this ismade explicit by using the phrase \u00E2\u0080\u009Cconsonant duplication\u00E2\u0080\u009D. A vowel can beduplicated in the long (d\u00C2\u00AF\u00C4\u00B1rgha) svarita accent, but the topic of this paper isnot accentuation in the Veda. The list of vowels and consonants are mostlythe same in classical and Vedic Sanskrit. However, there are a few differencesbetween the different lists, even between the pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s of different Veda-s. This paper will not discuss these differences in detail and a good resourcefor this is Whitney (1871). This paper will also not examine the phoneticsof the standard Sanskrit vowels and consonants, which is a well-researchedtopic. A good source for phonetic analysis, including many Vedic forms,is Allen (1953), and many more references can be found in Scharfe (1973).However, we will discuss phonetics of a few specific vowels and consonantforms which are important in and peculiar to TT recitation, especially dur-ing duplication, and which are different from the classical Sanskrit, in therelevant sections.One of the first people to study duplication was Whitney, in his path-breaking studies of the pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s of the Atharva and the Taittir\u00C2\u00AF\u00C4\u00B1ya veda-s(Whitney 1863, 1871). Whitney had specific reservations about the valueof the detailed discussions on duplication in the pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s, which he ex-pressed rather forcefully, first in his translation of the Atharva pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya1(Whitney 1863)\u00E2\u0080\u009CThe subject of the duplicated pronunciation of consonants, orof the varn. akrama, as it is sometimes called, is one of the mostpeculiar in the whole phonetical science of the Hindus. It isalso the one, to my apprehension, which exhibits most strikinglytheir characteristic tendency to arbitrary and artificial theoriz-ing; I have not succeeded in discovering the foundation of fact1This can be found under his explanation to 3.28, page 470 (Whitney 1863)Tamil Taittir\u00C4\u00AByaka-s 183upon which their superstructure of rules is based, or explain-ing to myself what actual phonetic phenomena, liable to occurin a natural, or even a strained, mode of utterance, they sup-posed themselves to have noted, and endeavored thus to reduceto systematic form. The varn. akrama, however, forms a not in-conspicuous part of the phonetic system of all the Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyas,and is even presented by Pa\u00C2\u00AFn. ini (viii. 4. 46-52), although thelatter mercifully allows us our option as to whether we will orwill not observe its rules.\u00E2\u0080\u009D.2There are two questions which Whitney rightly raises: 1) what the phoneticbasis of duplication is and 2) whether it really makes a difference in Vedicchanting. In this paper, I\u00E2\u0080\u0099ll concentrate mostly on the latter question. Theconcise answer is that duplication does matter quite a bit and is reflectedin the recitation of pan.d. ita-s trained in the orthodox manner. The firstquestion is not the focus of this paper. However, in a few places, I will pointout some of the possible phonetic reasons for duplication.Opinion of the later \u0013svks.\u0016n-sFirst, we can look at texts which were written after the TYP, i.e., the s\u00C2\u00B4iks.a\u00C2\u00AFtexts, and see that it is an important topic in these texts as well. Thesarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF (Finke 1886) actually begins with the invocatory versekr.pa\u00C2\u00AFlum. varadam. devam. pran. ipatya gaja\u00C2\u00AFnanam |dvitva\u00C2\u00AFd\u00C2\u00AF\u00C4\u00B1na\u00C2\u00AFm. pravaks.ya\u00C2\u00AFmi laks.an. am. sarvasammatam ||Having prostrated to the compassionate and wish-grantingelephant-faced God,I will expound the phonetics of duplication, etc., (which are)agreeable to all.Since the sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF singles out duplication and is actually largelyabout this topic, it can be seen that the later textual tradition also heldthat consonant duplication was an important topic for reciters of the Veda.2He also states in page 313 of his translation of the TYP (Whitney 1871), \u00E2\u0080\u009CThus isbrought to an end the tedious subject of duplication, the physical foundation of which is ofthe obscurest, although the pains with which the Hindu \u0013s\u0016akhynah. have elaborated it, andthe earnestness with which they assert their discordant views respecting it, prove that ithad for them a real, or what seemed like a real, value.\u00E2\u0080\u009D, clearly expressing his reservationsonce again on the practical utility of the discussions on duplication.184 B RamakrishnanFurthermore, the s\u00C2\u00B4iks.a\u00C2\u00AF is somewhat ambitiously titled sarvasammata, i.e.,agreeable to all, which implies that a variety of opinions, especially on du-plication, existed at the same point in time. This is very clear from theTYP itself, where several contradictory opinions are stated and sometimeswith attribution to a specific authority. Whitney had also pointed out veryearly on that while duplication was a topic treated by all pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s, theTYP is unique in that it presents the contradictory opinion of many differ-ent authorities (Whitney 1871). The Vya\u00C2\u00AFsa (A\u00C2\u00AFca\u00C2\u00AFrya S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1pat.t.a\u00C2\u00AFbhira\u00C2\u00AFmas\u00C2\u00B4a\u00C2\u00AFstri1976) and especially the sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF clearly try to present a unifiedtheory of duplication by leaving out contradictory opinions and sometimesadding rules not found in the TYP. The sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF finishes withthe cautionary verse3s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1ks.a\u00C2\u00AF ca pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam. ca virudhyete parasparam |s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1ks.aiva durbaletya\u00C2\u00AFhuh. sim. hasyaiva mr. g\u00C2\u00AF\u00C4\u00B1 yatha\u00C2\u00AF ||(If there is any) mutual contradiction between the s\u00C2\u00B4iks.a\u00C2\u00AF andpra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam,They declare that the s\u00C2\u00B4iks.a\u00C2\u00AF is indeed the weaker, like the deer(in the presence) of a lion.Clearly the sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF\u00E2\u0080\u0099s author was aware that at least some ofthe s\u00C2\u00B4iks.a\u00C2\u00AF-s have opinions contradictory to the TYP. However, it should benoted that the same text adds some rules not found in the TYP, and alsoextends the scope of some rules found in the TYP. Again it was noted byWhitney in his treatment of the svarabhakti in the TYP, where he remarksthat the commentator seemed to rely more on his s\u00C2\u00B4iks.a\u00C2\u00AF text than the TYPitself (Whitney 1871). In this regard, the comment of Scharfe regarding theattitude of Pa\u00C2\u00AFn. in\u00C2\u00AF\u00C4\u00B1ya-s is very perceptive, is applicable in this case as well,and deserves to be quoted in full (Scharfe 1973)\u00E2\u0080\u009CThe as.t.a\u00C2\u00AFdhya\u00C2\u00AFy\u00C2\u00AF\u00C4\u00B1 can be compared to a code of law which is sub-ject to legal interpretation when cases that were not or could notbe foreseen by the lawmaker. The courts need a consistent andworkable application even to such cases. Lawyers are used to ob-taining this application by extrapolating principles embodied inthe code which is presumed to be comprehensive and consistentto the minute technical details; seemingly redundant features3It is interesting that the text uses the word \u0013s\u0016\u0010ks.\u0016a instead of \u0013syks.\u0016a , similar to the dayt-tyr\u0016\u0010ya epanys.at, thus placing the author squarely within the Kr.s.n. a-yajur-vuta tradition.Tamil Taittir\u00C4\u00AByaka-s 185must have their significance. If these extrapolations lead to op-posing conclusions this contradiction must be resolved. As alast recourse, the law may be amended, The Pa\u00C2\u00AFn. in\u00C2\u00AF\u00C4\u00B1ya-s are likesuch lawyers and we miss the point when we castigate them forreading later theories into the original texts.\u00E2\u0080\u009DMethodology of the Orthodox PractitionersThe methodology so aptly described by Scharfe can be described in generalby following a list of Boolean logic rules as given below, and is also applicableto the phonology of duplication:\u00E2\u0080\u00A2 If condition V), then execute Rule 1.\u00E2\u0080\u00A2 If conditions V) AND V2, then execute Rule 2.\u00E2\u0080\u00A2 \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7\u00E2\u0080\u00A2 If conditions V) AND V2 AND \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7 AND Vk, then execute Rule k.A series of rules like the above can be used to give the general and fundamen-tal rule (vidhi) (denoted by Rule 1) and modifying the rule by an exception,followed by an exception to the exception and so on. Note that the TYPitself follows such a formulation by first giving a general rule and addingelements to the list to either discard Rule 1 or modify its performance incertain cases. For example, Rule 2 could be \u00E2\u0080\u009Cdiscard the implementation ofRule 1\u00E2\u0080\u009D, if conditions V) and V2 occur together, thus providing an excep-tion. The s\u00C2\u00B4iks.a\u00C2\u00AF-s follow the lead of the TYP and add lawyerly emendationsto either modify or discard the rules within in TYP. In this manner, theycan claim to be completely consistent with the TYP and yet offer modifi-cations/emendations. Note that the question of why the TYP does not listall the rules followed by the orthodox tradition, or whether some of themwere later innovations, does not make sense within this formulation of theorthodox tradition. This is a sensible procedure for interpreting the TYPand s\u00C2\u00B4iks.a\u00C2\u00AF texts as a unified whole. It is also important for the orthodoxtradition to interpret these texts as a unified whole because while in somecases it is clear what the TYP considers as an accepted doctrine, in severalother cases what the TYP considers to be an accepted doctrine is known186 B Ramakrishnanonly from the commentary, as noted by Whitney (Whitney 1871).4 Thus ifthe orthodox tradition accepts the TYP as well as the s\u00C2\u00B4iks.a\u00C2\u00AF texts, to beginwith, then the only logical route is the lawyerly one.Regarding the orthodox method of learning the Vedic recitation, it waspointed out in a recent study of the Vedic tradition in Maharashtra by Lar-ios (Larios 2017) that the \u00E2\u0080\u009Cvedamu\u00C2\u00AFrti-s\u00E2\u0080\u009D or the orthodox Vedic chantersrarely have even seen a s\u00C2\u00B4iksa\u00C2\u00AF text5 and it has been observed by me inTamil tradition as well.6 However the phonological peculiarities of the TTsare very efficiently encoded in their texts published in the Grantha script(Na\u00C2\u00AFra\u00C2\u00AFyan. a S\u00C2\u00B4a\u00C2\u00AFstr\u00C2\u00AF\u00C4\u00B1 1930, 1931, 1935, Undated[a],[b]; Vaidyana\u00C2\u00AFthasa\u00C2\u00AFstri 1905).This should not be confused with an efficient and one-to-one mapping of theactual syllables chanted, e.g., in a manner as described in Scharf and Hyman(2009) from the point of view of computer-based processing or representa-4It can of course be hypothesized that the TYP represents the \u00E2\u0080\u009Coriginal\u00E2\u0080\u009D rules andthe modifications and additions found in the \u0013syks.\u0016a -s are later accretions. But this is notcorrect because the TYP does record a multitude of opinions, especially about duplication,and clearly, the TYP is only one strand among the different schools of thought. It is clearthat there were multiple schools of thought regarding certain phonetic and phonologicalprocesses, and that the TYP is the first attempt to unify the different strands. It alsoseems unlikely that there was a single original way of recitation, unless it is assumed thatthe dayttyr\u0016\u0010ya school was limited originally to a very small geographical area, which is notlikely. It\u00E2\u0080\u0099s also unlikely that a historical dating of different doctrines can be obtained bymere textual analysis.5In page 5 (Larios 2017), \u00E2\u0080\u009CNotably, none of the rr\u0016ahman. as I came in contact withhad memorized a \u0013syks.\u0016a or pr\u0016aty\u0013s\u0016akhya text at the time of our meeting. Many of those Iinterviewed during my fieldwork did not possess a copy of such a text, and many others hadnever seen a \u0013syks.\u0016a or pr\u0016aty\u0013s\u0016akhya text in their lives. Yet, as will be shown below, the rulesconcerning the Vedic recitation are mainly learned through the system of oral transmission,and this includes the pronunciation rules stipulated in the \u0013syks.\u0016a or pr\u0016aty\u0013s\u0016akhya texts ofeach Vedic branch.\u00E2\u0080\u009D6In Staal (1961) TT recitation is referred to as Tamil Iyer recitation. I prefer thereference as TTs since Iyer, Iyu onkars and M\u0016athvas settled in Tamil Nadu all follow thesame schools and the phonetics/phonology is the same between these different subsects.The Iyu ongars usually, but not always, learn the Tr\u0016avyt. a p\u0016at.ha. But this affects onlythe text of a few pra\u0013sna-s of the dayttyr\u0016\u0010ya \u0016Aran.yaka and not the phonetics/phonology.I have observed that the pan.t. yta-s from the Vedic school conducted by one of the fourAdvaita \u0016amn\u0016aya mat.ha-s, Taks.yn. \u0016amn\u0016aya \u0013cryngury \u0013c\u0016arat\u0016a P\u0016\u0010t.ham, are very similar to TTsin their recitations, but I do not know if it\u00E2\u0080\u0099s the same case with all reciters from theKarnataka region. I also haven\u00E2\u0080\u0099t had the opportunity of interacting with pan.t. yta-s fromthe Andhra region, but the few recordings I have heard show at least some differences fromthe TTs. The Nambudiri recitation is of course quite different from the TTs. Comparingthe duplication among the different Southern schools, including the Nambudiris, wouldbe an interesting field study.Tamil Taittir\u00C4\u00AByaka-s 187tion of general Sanskrit texts, but rather a concise encoding of only the mostimportant s\u00C2\u00B4iks.a\u00C2\u00AF rules, which will make complete sense only to people trainedwithin the oral tradition. It should be noted that if the texts were printedexactly as recited, with all duplicated consonants explicitly specified, therewould be a prolixity of consonants. This would have had a manifold effect:it would have significantly complicated the task of the scribe, increase theprobability of error propagation, as well as hinder text comprehension. Theprinted texts serve a dual purpose, aid recitation, and comprehension, andthis fact was likely not lost to the TTs who started the practice of usingthe Grantha script. On the other hand, unless some of the more obscurerules are actually represented in the printed text, it might result in thewrong recitation. The Grantha texts offer an admirable combination of pre-cisely encoding arcane duplication rules, yet avoiding prolixity in printingout consonants for people trained within the tradition. The Grantha textsthus make the TYP and s\u00C2\u00B4iks.a\u00C2\u00AF texts largely unnecessary for an orthodox re-citer. An ability to read Grantha is thus a sine-qua-non for understandingconsonant duplication among TTs.The method by which the Grantha texts achieve this is by writing anyconjunct consonant from top to bottom, even if they consist of more thantwo consonants, or inventing new characters for frequently used complexconjunct consonants. Standalone consonants mean something special, butthe meaning is context-dependent and thus the encoding is not one-to-one.For example, a standalone \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme within a sentence can stand for dif-ferent ways of pronouncing it, depending on the vowel-consonant clustersurrounding it (as will be seen in a subsequent section). Note again thatsome of these rules and exceptions can be found neither in the TYP nor inthe s\u00C2\u00B4iks.a\u00C2\u00AF texts, and are known only from the orthodox tradition. However,these are faithfully reflected in the Grantha texts. In contrast, Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1typesets typically do not have such complicated conjunct consonants, andthe complex phonological rules are largely ignored and not encoded. An-other issue is that some conjunct consonants which occur in the TT recita-tion due to phonological changes, e.g., s\u00C2\u00B4na which is transformed into s\u00C2\u00B4n\u00CB\u009Caexcept in the Ka\u00C2\u00AFt.haka borrowings, do not occur at all in classical Sanskritand are thus not available in Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 typesets as a standalone conjunctconsonant.7 An important effort to rectify these defects, and reflect TYP7The Vaidikavardhini press in Kumbhakonam published these Grantha texts and manyexcellent prayoga texts but unfortunately went out of business many decades ago. Thesebooks are hard to obtain, but facsimile copies of their dayttyr\u0016\u0010ya corpus are available188 B Ramakrishnanrules in the Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 script has been made, but still has some short-falls compared to the original Grantha texts (Ra\u00C2\u00AF Kr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AFGan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. 2003a,b). However, this series has a very learned intro-duction regarding some of the key, though not all, phonological aspects ofTT recitation. Furthermore, this series has very few printing errors, uses avery pleasing font, and is thus well worth using as a reference text.Just an ability to read the Grantha texts is not enough and actuallywould hinder the proper evaluation of duplication among TTs. I have per-sonally interacted with a number of krama\u00C2\u00AFnta-sva\u00C2\u00AFdhya\u00C2\u00AFyinah. and have alsolearned to recite a good portion of the Taittir\u00C2\u00AF\u00C4\u00B1ya s\u00C2\u00B4a\u00C2\u00AFkha\u00C2\u00AF, which has beenan invaluable help in understanding duplication rules. Field study wouldalso clear any questions about whether duplication makes a difference inVedic recitation. In my experience with laypeople and experts, I have seenthat most (but not all) laypeople who learn to recite the Vedas, especiallyas adults and using texts printed in the Devanagari script, confuse textualfamiliarity and chanting with seeming fluency with correct Vedic recitation.Relying on texts and not listening keenly to expert recitation makes laypeo-ple, even those who listen to or practice Vedic chanting regularly, unawareof the phonological sophistication of Vedic chanting, which cannot be com-pletely captured in textual form. When the correct syllables are not dupli-cated or wrongly duplicated, the expert recognizes this very quickly, whichanswers Whitney\u00E2\u0080\u0099s question on whether duplication makes a difference inVedic recitation. The answer is that it does to the expert, but laypeoplemay not understand the subtleties. Finally, while actual field study is im-portant and irreplaceable, a good source for authentic TT recitation is theaudio recordings released by an organization called Vediclinks (Sarma et al.among the pan.t. yta circles in Chennai. One of my teachers \u0013cr\u0016\u0010 \u0013cr\u0016\u0010kan. t.ha \u0016As\u0016ary\u0016ah. showedme his Kannada text and it seemed to follow the same principle as the Grantha texts withregard to conjunct consonants and standalone consonants. However, I am not familiarwith the Kannada script and did not perform a careful analysis. A reviewer also pointedout that the Telugu and Kannada texts follow the same principle as the Grantha texts. Itwould be interesting to compare texts in Kannada and Telugu scripts with the Granthatexts popular in Tamil Nadu. It is also not implied that Tuvan\u0016agar\u0016\u0010 typesets do nothave sophisticated conjunct consonant representations, but they are not as clear as theGrantha scripts which follow a binary rule of conjunct consonants always being representedas a single top-to-bottom cluster and standalone consonants representing exceptions to theduplication rules. This can certainly be corrected by introducing new conjunct consonantsin existing Tuvan\u0016agar\u0016\u0010 fonts.Tamil Taittir\u00C4\u00AByaka-s 1892004), which unfortunately seems to be defunct now.8Fundamental Rule and NotationConsonant duplication is specified in fundamentally the same way in differ-ent texts, and the basic rule (Condition V) in previous section) is commonacross all pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya-s as well as Pa\u00C2\u00AFn. ini.:\u00E2\u0080\u00A2 Taittir\u00C2\u00AF\u00C4\u00B1ya Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam 14.1 (Whitney 1871) - svarapu\u00C2\u00AFrvam.vyan\u00CB\u009Cjanam. dvivarn. am. vyan\u00CB\u009Cjanaparam.\u00E2\u0080\u00A2 S\u00C2\u00B4ukla-yajuh. Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam 4.99 (Rastogi 1967) - svara\u00C2\u00AFt sam. yoga\u00C2\u00AFdir-dvirucyate sarvatra.\u00E2\u0080\u00A2 R. k Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam 6.1 (Sastri 1931) - svara\u00C2\u00AFnusva\u00C2\u00AFropahito dvirucyatesam. yoga\u00C2\u00AFdih. sa krame\u00E2\u0080\u00B2vikrame san.\u00E2\u0080\u00A2 Atharva Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhyam 3.28 (Whitney 1863) - sam. yoga\u00C2\u00AFdi svara\u00C2\u00AFt.\u00E2\u0080\u00A2 Pa\u00C2\u00AFn. ini 8.4.47 (Vasu 1898) - anaci ca.In this paper we will use the terminology from the TYP to refer to vowels,consonants, or groups of consonants. The following abbreviations will beused for a concise description of the rules:1. k - all the consonants (vyan\u00CB\u009Cjana-s).2. Xi;j - the mutes (spars\u00C2\u00B4ah. ), the indices i and j stand for the row andcolumn in 5\u00C3\u0097 5 matrix with the ka, ca, t.a, ta and pa series (varga) asthe rows. As examples, X);4 would be gha and X5;5 would be ma.3. V - the semivowels (antasthah. ) ya, ra, la, and va.4. j - the sibilants (u\u00C2\u00AFs.man) s\u00C2\u00B4a, s.a, sa, ha, jihva\u00C2\u00AFmu\u00C2\u00AFl\u00C2\u00AF\u00C4\u00B1ya, and upadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1ya.5. ja - the four sibilants, s\u00C2\u00B4a, s.a, sa, and ha.6. h - vowels and the svarabhakti-s (svara).98This is available at https://archive.org/details/VedicLinks-SriKrishnaYajurVedam-TaitiriyaSamhita and I would like to thankMr. N. E. Venkateswaran for bringing this to my attention.9The svarabhakti is the sound the phonemes \u00E2\u0080\u0098r\u00E2\u0080\u0099 and \u00E2\u0080\u0098l\u00E2\u0080\u0099 attain in certain situations andwill be explained in detail later.190 B Ramakrishnan7. hx and hh stand for long (d\u00C2\u00AF\u00C4\u00B1rgha) and short (hrasva) vowels respec-tively.Specific consonants or vowels will also be used if the rules require specificity.The basic duplication rule from the TYP 14.1, svara-pu\u00C2\u00AFrvam. vyan\u00CB\u009Cjanam.dvi-varn. am. vyan\u00CB\u009Cjana-param, actually requires a sequence of consonants af-ter a vowel, where the sequence length is greater than one. Since two or moreconsonants cannot occur after a vowel without a following vowel, except atthe end of a sentence, which is exempted from duplication by TYP 14.15,the rule can be concisely expressed by the algebraic expression h)k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2where k \u00E2\u0089\u00A5 2. The subscripts stand for the numbering of the vowels and con-sonants in the required sequence. The basic transformational rule is thusexplained by the following equationh)k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2 7\u00E2\u0086\u0092 h)k)k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2 (.1)The rule is applicable whether the cluster occurs within a word or across twowords within a sentence, i.e., va\u00C2\u00AFkya. In printed texts, there is white spacebetween two words, when they do not coalesce due to sandhi. However, thespacing in printed texts is merely for the convenience of comprehension anddoes not affect pronunciation. However, there are specific rules for certainexceptions to duplication, when certain consonants occur at the end of aword (pada). These word-end rules are to be found in the s\u00C2\u00B4iks.a\u00C2\u00AF texts andin the orthodox tradition and are not specified by the TYP. It is convenientto treat the word-end exceptions separately. Unless specified, the presenceof white space in the text is ignored in all these rules.The next section will give a detailed description of all the duplicationrules and exceptions in various situations. Appropriate examples will begiven to illustrate the rules. I use the abbreviations TS, TB, and TA forthe Taittir\u00C2\u00AF\u00C4\u00B1ya Sam. hita\u00C2\u00AF, Bra\u00C2\u00AFhman. a and A\u00C2\u00AFran. yaka respectively. It should benoted that in some examples, there may be multiple duplicated consonantsand only the consonant which illustrates that particular rule will be shownexplicitly. Another important fact is that the duplication rules are largelyindependent of the stress accents uda\u00C2\u00AFtta, anuda\u00C2\u00AFtta and svarita, except inone case. So the stress markings are not given in the illustrative examplesunless the duplication rule demands it. The rules are very dependent on k),and the duplication rules are enumerated by the class to which k) belongs.Tamil Taittir\u00C4\u00AByaka-s 1912 Duplication RulesV) is a muteThis is the bulk of duplications, since there are 25 mutes, which forms themajority of all consonants.\u00E2\u0080\u00A2 k2 is a mute: The mute is duplicated, and if the mute is aspirated,then the corresponding unaspirated mute is used instead as per theruledvit\u00C2\u00AF\u00C4\u00B1ya-caturthayostu vyan\u00CB\u009Cjana-uttarayoh. pu\u00C2\u00AFrvah. |Of the 2nd and 4th after which a consonant (occurs), theprevious (consonant in the series, i.e., the 1st or 3rd isduplicated).There are important exceptions. If k) and k2 belong to the sameseries, then there is no duplication, except if the conditions that k2 isthe fifth consonant in the series and k) is not, are met simultaneously.For example, if k) and k2 are the phonemes \u00E2\u0080\u0098t\u00E2\u0080\u0099 and \u00E2\u0080\u0098th\u00E2\u0080\u0099 respectively,then there is no duplication. However if k) and k2 are \u00E2\u0080\u0098dh\u00E2\u0080\u0099 and \u00E2\u0080\u0098n\u00E2\u0080\u0099,then there is duplication. This is stated by the TYP 14.13 which is anexception to the general rule and 14.14 which is an exception to 14.13as10savarn. a-savarg\u00C2\u00AF\u00C4\u00B1ya-parah. |The (consonant which is) of the same quality and series (asthe) later (consonant is not duplicated).na\u00C2\u00AFnuttama uttamaparah. |(But) not in the case of a consonant (which is) not the last(in the series which has) the last (in the series followingit).Some examples of duplication in this case are10The Vy\u0016asa \u0013syks\u0016a 363 also gives the two rules concisely as varg\u0016\u0010y\u0016ananuttarorthvu halsavarn. ottara uva sa. The same two rules are stated in the \u0013cukla-yajuh. Pr\u0016aty\u0013s\u0016akhyam(SYP) 4.115 as well, by the exception sva-varg\u0016\u0010yu s\u0016anuttamu192 B Ramakrishnanjyok ca 7\u00E2\u0086\u0092 jyokkca (TS 1.8.5.3)ahim. budhniyam 7\u00E2\u0086\u0092 ahim. buddhniyam (TS 1.8.14.2)jagdhva\u00C2\u00AF 7\u00E2\u0086\u0092 jaggdhva\u00C2\u00AF (TS 2.2.6.2)pa\u00C2\u00AFpma\u00C2\u00AFnam 7\u00E2\u0086\u0092 pa\u00C2\u00AFppma\u00C2\u00AFnam (TS 2.1.10.3)\u00E2\u0080\u00A2 k2 is a semivowel: The fundamental rule applies, i.e., k) is duplicated.a\u00C2\u00AFpya\u00C2\u00AFyama\u00C2\u00AFnam 7\u00E2\u0086\u0092 a\u00C2\u00AFppya\u00C2\u00AFyama\u00C2\u00AFnam (TS 2.3.5.3)apa\u00C2\u00AFkra\u00C2\u00AFmat 7\u00E2\u0086\u0092 apa\u00C2\u00AFkkra\u00C2\u00AFmat (TS 2.3.7.1)s\u00C2\u00B4ukle 7\u00E2\u0086\u0092 s\u00C2\u00B4ukkle (TS 5.3.1.4)a\u00C2\u00AFgn\u00C2\u00AF\u00C4\u00B1dhram 7\u00E2\u0086\u0092 a\u00C2\u00AFgn\u00C2\u00AF\u00C4\u00B1ddhram (TS 4.7.8.1)\u00E2\u0080\u00A2 k2 is a sibilant: It should be noted that only the first mute in eachseries occurs before a sibilant. As per TYP 1.12,prathama u\u00C2\u00AFs.maparo dvit\u00C2\u00AF\u00C4\u00B1yam |(The) first (in the series which has a) sibilant after it (be-comes) the second in the series.After this, the normal duplication rule applies. Some examples willclarify this situation:r.ksa\u00C2\u00AFme 7\u00E2\u0086\u0092 r.khsa\u00C2\u00AFme 7\u00E2\u0086\u0092 r.kkhsa\u00C2\u00AFme (TS 6.1.3.1)hr. tsu 7\u00E2\u0086\u0092 hr. thsu 7\u00E2\u0086\u0092 hr. tthsu (TS 1.2.8.1)dipsanta 7\u00E2\u0086\u0092 diphsanta 7\u00E2\u0086\u0092 dipphsanta (TS 1.2.14.5)tat s.od. as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1 7\u00E2\u0086\u0092 taths.od. as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1 7\u00E2\u0086\u0092 tatths.od. as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1 (TS 6.6.11.1)We now briefly examine the phonetic basis of duplication about which(Whitney 1871) raised questions, which was quoted in the previous section.TYP 2.32 and 2.33 are key su\u00C2\u00AFtra-s in this analysis:yadupasam. harati tatka\u00C2\u00AFran. am | 2.32anyes.a\u00C2\u00AFm. tu yatra spars\u00C2\u00B4anam. tatstha\u00C2\u00AFnam | 2.33What comes close (the tip of the tongue), that (is the) cause (ofthe vowels).11The place of the production of others is where contact (with thetongue occurs).11The fact that the vowels are referred to in this s\u0016utra, is inferred from the previouss\u0016utra-s.Tamil Taittir\u00C4\u00AByaka-s 193The key issue is clear articulation and even flow, especially in Vedic recita-tion. Take the simple case of a word like rudra contrasted with the alliedverb roditi. The former has a conjunct consonant, while the latter does not.When a consonant is followed by a vowel, the tongue is first \u00E2\u0080\u009Cclose\u00E2\u0080\u009D to theorgan of production, and when the vowel sound is generated, it is not incontact. The movement of tongue will both be fluid and the sounds can bearticulated clearly and with an even flow, as long as only consonant-vowelphoneme pairs are repeated, e.g., the word roditi, which has 3 syllables ro,di and ti. Now take the word rudra consisting of two syllables, ru and dra.The first consonant-vowel phoneme pair is ru, which lends to clear articu-lation and even flow. The conjunct consonant \u00E2\u0080\u0098dra\u00E2\u0080\u0099 requires the tip of thetongue to be first placed at the back of the teeth, and the second phoneme\u00E2\u0080\u0098r\u00E2\u0080\u0099 requires the tongue to be moved a little back towards the throat and the\u00E2\u0080\u0098a\u00E2\u0080\u0099 phoneme requires no contact. This can happen without a pause and ina fluid manner easily if the phoneme \u00E2\u0080\u0098d\u00E2\u0080\u0099 is duplicated, i.e., when the tonguemakes contact with the back of the teeth.It should be noted here that there is a strand of thought that when k)and k2 are both mutes there should be no duplication. This is recorded inthe TYP 14.27 spars\u00C2\u00B4a-spars\u00C2\u00B4a-parah. , which is one of the exceptions to dupli-cation. According to the commentary, this doctrine is not approved by theTYP (Whitney 1871). It is also clear from the commentary12 on the Vya\u00C2\u00AFsas\u00C2\u00B4iks.a\u00C2\u00AF, and looking at the examples of duplication cited in the commentary,that this is not an approved rule by this text (A\u00C2\u00AFca\u00C2\u00AFrya S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1pat.t.a\u00C2\u00AFbhira\u00C2\u00AFmas\u00C2\u00B4a\u00C2\u00AFstri1976) as well. However, this is indeed a sound doctrine, except in the casewhen k2 is the fifth mute in a series. Listening to actual recitations, theduplication of d in rudra is extremely clear, whereas the duplication of themute g in va\u00C2\u00AFgdev\u00C2\u00AF\u00C4\u00B1, supposed to be va\u00C2\u00AFggdev\u00C2\u00AF\u00C4\u00B1, is not as clear. We may evensay that some duplicated syllables are more duplicated than others! Thephonetic basis of duplication certainly deserves a full treatment by itself.V) is the phoneme \u00E2\u0080\u0098y\u00E2\u0080\u0099k) being \u00E2\u0080\u0098y\u00E2\u0080\u0099 phoneme occurs usually only in a conjunct consonant with itself,e.g., rayyai in the TS. In the TB, there is a single occurrence ta\u00C2\u00AFn\u00CB\u009Ccham. yvanta(TB 1.5.9.3). Since this is after an anusva\u00C2\u00AFra, it does not undergo duplication.In the TA, there are two instances of the word va\u00C2\u00AFyvas\u00C2\u00B4va\u00C2\u00AF (TA 1.1.2 and TA1.21.1), which will undergo duplication as per the fundamental rule12The Vy\u0016asa \u0013syks.\u0016a itself omits this doctrine.194 B Ramakrishnanva\u00C2\u00AFyvas\u00C2\u00B4va\u00C2\u00AF 7\u00E2\u0086\u0092 va\u00C2\u00AFyyvas\u00C2\u00B4va\u00C2\u00AFV) is the phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099lavaka\u00C2\u00AFrapu\u00C2\u00AFrvaspars\u00C2\u00B4as\u00C2\u00B4ca paus.karasa\u00C2\u00AFdeh. 14.2spars\u00C2\u00B4a evaikes.a\u00C2\u00AFma\u00C2\u00AFca\u00C2\u00AFrya\u00C2\u00AFn. a\u00C2\u00AFm 14.3(That) the mute preceded by the \u00E2\u0080\u0098l\u00E2\u0080\u0099 or \u00E2\u0080\u0098v\u00E2\u0080\u0099 phonemes (is dupli-cated is the opinion) of Paus.karasa\u00C2\u00AFdi 14.2(That) the mute alone (is duplicated is the opinion) of someteachers 14.3The possibilities for k2 when k) = l are as follows:\u00E2\u0080\u00A2 k2 is a mute: Only the mute is duplicated, and if the mute is aspirated,then the corresponding unaspirated mute is duplicated.yajn\u00CB\u009Cena kalpata\u00C2\u00AFm 7\u00E2\u0086\u0092 yajn\u00CB\u009Cena kalppata\u00C2\u00AFm (TS 4.5.7.2)ca\u00C2\u00AFpagalbha\u00C2\u00AFya ca 7\u00E2\u0086\u0092 ca\u00C2\u00AFpagalbbha\u00C2\u00AFya ca (TS 4.5.6.1)This should be contrasted with the situation in the recitation bythe Nambudiris. It was first pointed out by Kunhan Raja that thephoneme \u00E2\u0080\u0098t\u00E2\u0080\u0099 is pronounced as the phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099 in certain situations,both in Vedic as well as regular Sanskrit by the Nambudiris (Raja1937). In the case of vedic recitation, the Nambudiris duplicate the\u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme when it has been substituted for a \u00E2\u0080\u0098t\u00E2\u0080\u0099 phoneme, but fol-low the TYP rules otherwise. For example, vatsa which is pronouncedvalsa would actually be said vallsa in the case of vedic recitation, whilekalpayati would be pronounced as kalppayati.\u00E2\u0080\u00A2 k2 is the phoneme \u00E2\u0080\u0098y\u00E2\u0080\u0099: The fundamental rule applies, i.e., the phoneme\u00E2\u0080\u0098l\u00E2\u0080\u0099 is duplicated, e.g.,ya\u00C2\u00AF kalya\u00C2\u00AFn. \u00C4\u00B1\u00C2\u00AF bahuru\u00C2\u00AFpa\u00C2\u00AF 7\u00E2\u0086\u0092 ya\u00C2\u00AF kallya\u00C2\u00AFn. \u00C4\u00B1\u00C2\u00AF bahuru\u00C2\u00AFpa\u00C2\u00AF (TS 7.1.5.7)\u00E2\u0080\u00A2 k2 is the phoneme \u00E2\u0080\u0098v\u00E2\u0080\u0099: In sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF-44laka\u00C2\u00AFras\u00C2\u00B4ca vaka\u00C2\u00AFras\u00C2\u00B4ca sam. yoge svarito yadi |sam. yuktau tu tada\u00C2\u00AF jn\u00CB\u009Ceya\u00C2\u00AFvasam. yuktau tadanyatha\u00C2\u00AF ||Where (there is) a svarita in the combination (of the) \u00E2\u0080\u0098l\u00E2\u0080\u0099 and\u00E2\u0080\u0098v\u00E2\u0080\u0099 phonemes,There (is) a combination (of the two), in other cases it isknown to be separate.Tamil Taittir\u00C4\u00AByaka-s 195A non-conjunct \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme is reflected in only four instances of thiscombination, all with the word bailvah. or bilvah. , and not elsewherewhere this combination occurs, by the TTs. However this distinctionis not specified by the sarvasammata s\u00C2\u00B4iks.a\u00C2\u00AF:bailvo yu\u00C2\u00AFp\u00C4\u00B1o 7\u00E2\u0086\u0092 bail\u00C3\u00B7vo yu\u00C2\u00AFp \u00C4\u00B1o (TS 2.1.8.1)bilv\u00C4\u00B1a ud\u00C4\u00B1atis.t.hat 7\u00E2\u0086\u0092 billv \u00C4\u00B1a ud \u00C4\u00B1atis.t.hat (TS 2.1.8.2)bailvo v\u00C4\u00B1a\u00C2\u00AF 7\u00E2\u0086\u0092 bail\u00C3\u00B7vo v \u00C4\u00B1a\u00C2\u00AF (TB 3.8.19.1)s.ad.bailva\u00C2\u00AF 7\u00E2\u0086\u0092 s.ad.bail\u00C3\u00B7va\u00C2\u00AF (TB 3.8.20.1)khalv\u00C4\u00B1a\u00C2\u00AFhuh. 7\u00E2\u0086\u0092 khallv \u00C4\u00B1a\u00C2\u00AFhuh. (TS 2.5.1.6)khalvaindramityeva 7\u00E2\u0086\u0092 khallvaindramityeva (TS 2.5.3.7)Note that in the last example (TS 2.5.3.7), the non-occurrence of asvarita does not matter and the \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme does not stand separately.This is correctly reflected in the printed Grantha texts. It is notclear if the author of the savrvasammata s\u00C2\u00B4iks.a\u00C2\u00AF was from a traditionwhere in the example TS 2.5.3.7 above, the \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme would benon-conjunct. In practice, there is also a very slight pause after thenon-conjunct \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme, which is pronounced as the phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099followed by approximately the last quarter of the r. phoneme.13 In Ra\u00C2\u00AFKr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. (2003a), the authorshave introduced a notation to make the non-conjunct nature clear toreaders. They print a \u00C3\u00B7 sign after the phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099 to clarify that it isnot just the usual phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099. In their previous book (Ra\u00C2\u00AF Kr.s.n. amu\u00C2\u00AFrtis\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. 2003b), they followed the principleof the Grantha texts by printing a free standing consonant, in thiscase an \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme, which they note was confusing to people usedto Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 typesetting. Note again that a free standing consonantin the Grantha typeset texts have a context based meaning, and thefact that the phoneme \u00E2\u0080\u0098l\u00E2\u0080\u0099 in this case is non-conjunct is clear to thepractitioners. I have adopted the clarifying notation introduced in Ra\u00C2\u00AFKr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. (2003a) in this paper.\u00E2\u0080\u00A2 k2 is a sibilant: The Vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AF 381 says:svarordhvos.man. i rephasya lasya\u00C2\u00AFpi svarabhaktita\u00C2\u00AF13It should be note that the TTs pronounce all non-conjunct or final consonants, exceptthe phoneme \u00E2\u0080\u0098m\u00E2\u0080\u0099 in this manner in their chantings, be it Vedic or otherwise, unlike nativeHindi speakers.196 B RamakrishnanThe \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme or \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme attain svarabhakti when (oc-curring) after a sibilant before which a vowel is presentThe \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme before a \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme is called karvin. \u00C4\u00B1\u00C2\u00AF, and beforethe \u00E2\u0080\u0098\u00C2\u00B4s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099 or \u00E2\u0080\u0098s\u00E2\u0080\u0099 phonemes is called the ha\u00C2\u00AFrita\u00C2\u00AF, as per the Vya\u00C2\u00AFsas\u00C2\u00B4iks.a\u00C2\u00AF 385-386. However, in practical chanting the two svarabhakti-s are pronounced in a similar fashion. It may be noted here thatthese svarabhakti-s are pronounced by the TTs very similar to a \u00E2\u0080\u0098l\u00E2\u0080\u0099phoneme followed by the kur\u00CC\u00B1r\u00CC\u00B1iyalukaram sound (John Lazarus 1878)in the Tamil language.14 In the Grantha texts these svarabhakti-sare indicated by a freestanding \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme, and the fact that it is asvarabhakti and not non-conjunct is clear only from the context. Someexamples are:sahasraval\u00C2\u00B4sa\u00C2\u00AFh. 7\u00E2\u0086\u0092 sahasraval\u00C3\u00B7s\u00C2\u00B4a\u00C2\u00AFh. (TS 1.1.2.1)malha\u00C2\u00AFm 7\u00E2\u0086\u0092 mal\u00C3\u00B7ha\u00C2\u00AFm (TS 1.8.19.1)V) is the \u00E2\u0080\u0098v\u00E2\u0080\u0099 phonemeFor the \u00E2\u0080\u0098v\u00E2\u0080\u0099 phoneme as the first consonant in the sequence, there are twopossibilities for k2:\u00E2\u0080\u00A2 k2 is a mute: In this case k2 can only be an \u00E2\u0080\u0098n\u00E2\u0080\u0099 or n. phoneme.man\u00C2\u00AF\u00C4\u00B1s.a\u00C2\u00AFmos.is.t.hada\u00C2\u00AFvne 7\u00E2\u0086\u0092 man\u00C2\u00AF\u00C4\u00B1s.a\u00C2\u00AFmos.is.t.hada\u00C2\u00AFv\u00C3\u00B7nne (TS 1.6.12.3)dadhikra\u00C2\u00AFvn. o aka\u00C2\u00AFris.am 7\u00E2\u0086\u0092 dadhikra\u00C2\u00AFv\u00C3\u00B7n.n. o aka\u00C2\u00AFris.am (1.5.11.4)\u00E2\u0080\u00A2 k2 is the phoneme \u00E2\u0080\u0098y\u00E2\u0080\u0099, \u00E2\u0080\u0098r\u00E2\u0080\u0099 or \u00E2\u0080\u0098l\u00E2\u0080\u0099: The fundamental rule applies, i.e., thephoneme \u00E2\u0080\u0098v\u00E2\u0080\u0099 is duplicated, e.g.,divya\u00C2\u00AF a\u00C2\u00AFpo 7\u00E2\u0086\u0092 divvya\u00C2\u00AF a\u00C2\u00AFpo (TS 6.1.2.3)t\u00C2\u00AF\u00C4\u00B1vro raso 7\u00E2\u0086\u0092 t\u00C2\u00AF\u00C4\u00B1vvro raso (TS 5.6.1.3)prayajuravl\u00C2\u00AF\u00C4\u00B1na\u00C2\u00AFt 7\u00E2\u0086\u0092 prayajuravvl\u00C2\u00AF\u00C4\u00B1na\u00C2\u00AFt (TS 6.1.2.4)14There is a detailed and good study of the phonetics of svarabhakti-s as described in the\u0013syks.\u0016a texts in Mohanty (2015). My main concern is not the phonetics of the svarabhakti-s, rather where it occurs and how it is treated in TT recitation. The svarabhakti ofthe \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme is slightly different from the non-conjunct \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme, but even trainedpan.t. yta-s sometimes blur the distinction.Tamil Taittir\u00C4\u00AByaka-s 197V) is the sibilant \u00E2\u0080\u0098\u0013s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099 or \u00E2\u0080\u0098s\u00E2\u0080\u0099\u00E2\u0080\u00A2 k2 is a mute: As per the TYP 14.9, after an unvoiced spirant, the firstmute of the series is inserted as abhinidha\u00C2\u00AFna:aghos.a\u00C2\u00AFdu\u00C2\u00AFs.ma\u00C2\u00AFn. ah. parah. prathamo\u00E2\u0080\u00B2bhinidha\u00C2\u00AFna-spars\u00C2\u00B4apara\u00C2\u00AFttasyasastha\u00C2\u00AFnah. |After an unvoiced spirant, a first consonant (of the sameseries as the) mute following it, (is inserted as) ab-hinidha\u00C2\u00AFna.Some illustrative examples ares\u00C2\u00B4us.men. od 7\u00E2\u0086\u0092 s\u00C2\u00B4us.pmen. od (TS 1.2.8.1)prasna\u00C2\u00AFt\u00C2\u00AF\u00C4\u00B1h. 7\u00E2\u0086\u0092 prastna\u00C2\u00AFt\u00C2\u00AF\u00C4\u00B1h. (TS 2.6.11.2)vis.n. o 7\u00E2\u0086\u0092 vis.t.n. o (TS 1.1.3.1)\u00E2\u0080\u00A2 k2 is a semi-vowel: The fundamental rule applies, i.e., the sibilant isduplicated. Some examples areava sya vara 7\u00E2\u0086\u0092 avassya vara (TS 1.2.3.3)s\u00C2\u00B4i\u00C2\u00B4sriye 7\u00E2\u0086\u0092 s\u00C2\u00B4i\u00C2\u00B4ss\u00C2\u00B4riye (TS 1.5.3.1)uttamas\u00C2\u00B4loko 7\u00E2\u0086\u0092 uttamas\u00C2\u00B4s\u00C2\u00B4loko (TS 5.7.4.3)as\u00C2\u00B4vamedhah. 7\u00E2\u0086\u0092 as\u00C2\u00B4s\u00C2\u00B4vamedhah. (TS 5.7.5.3)V) is the phoneme \u00E2\u0080\u0098r\u00E2\u0080\u0099The \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme is not duplicated as per TYP 14.15, avasa\u00C2\u00AFne ra-visarjan\u00C2\u00AF\u00C4\u00B1ya-jihva\u00C2\u00AFmu\u00C2\u00AFliyopadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1ya\u00C2\u00AFh. . The rule of Paus.karasa\u00C2\u00AFdi (TYP 14.2) and someunnamed teachers (TYP 14.3), applicable to the \u00E2\u0080\u0098l\u00E2\u0080\u0099 and \u00E2\u0080\u0098v\u00E2\u0080\u0099 phonemes areapplicable to the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme as well, as per TYP 14.4 repha\u00C2\u00AFt param. ca.However, note that all consonants after the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme are duplicated asper TYP 14.4, whereas only the mutes are duplicated after the \u00E2\u0080\u0098l\u00E2\u0080\u0099 and \u00E2\u0080\u0098v\u00E2\u0080\u0099phonemes.\u00E2\u0080\u00A2 k2 is a mute: Only the mute is duplicated, and if the mute is aspi-rated, then the corresponding unaspirated mute is duplicated. In theGrantha texts, the duplicated \u00E2\u0080\u0098t\u00E2\u0080\u0099 before a \u00E2\u0080\u0098th\u00E2\u0080\u0099 phoneme and a \u00E2\u0080\u0098d\u00E2\u0080\u0099 beforea \u00E2\u0080\u0098dh\u00E2\u0080\u0099 phoneme are explicitly printed, as mentioned previously.198 B Ramakrishnansa\u00C2\u00AFs\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1rken. a 7\u00E2\u0086\u0092 sa\u00C2\u00AFs\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1rkken. a (TS 1.6.10.4)u\u00C2\u00AFrdhva\u00C2\u00AF yasya\u00C2\u00AFmatih. 7\u00E2\u0086\u0092 u\u00C2\u00AFrddhva\u00C2\u00AF yasya\u00C2\u00AFmatih. (TS 1.2.6.1)\u00E2\u0080\u00A2 k2 is the phoneme \u00E2\u0080\u0098y\u00E2\u0080\u0099, \u00E2\u0080\u0098l\u00E2\u0080\u0099 or \u00E2\u0080\u0098v\u00E2\u0080\u0099: This is similar to the previous case ofk2 being a mute. The clarity of the duplicated syllable even by trainedpan.d. ita-s however varies quite a bit. It is quite probable that there wasa tradition of svarabhakti of the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme which occur before thesemi-vowels, which is reflected in this variation among the pan.d. ita-s.This can be seen from the ka\u00C2\u00AFt.haka section of the Taittir\u00C2\u00AF\u00C4\u00B1ya bra\u00C2\u00AFhman. a,where the word su\u00C2\u00AFrya is said to have three syllables in it, and wouldmake sense if the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme was pronounced as a svarabhakti, whichis considered to be a vowel as per the TYP.15parya\u00C2\u00AFgata 7\u00E2\u0086\u0092 paryya\u00C2\u00AFgata (TS 1.6.10.3)tairlokam 7\u00E2\u0086\u0092 tairllokam (TS 5.2.1.7)eva\u00C2\u00AF no du\u00C2\u00AFrve 7\u00E2\u0086\u0092 eva\u00C2\u00AF no du\u00C2\u00AFrvve (TS 5.2.8.3)\u00E2\u0080\u00A2 k2 is the phoneme \u00E2\u0080\u0098\u00C2\u00B4s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099 or \u00E2\u0080\u0098s\u00E2\u0080\u0099: The phoneme \u00E2\u0080\u0098r\u00E2\u0080\u0099 attains svarabhaktiif the series has only two consonants, i.e, the series is of the form h)r[s\u00C2\u00B4|s.|s ]h2, and this type of svarabhakti is called the harin. \u00C4\u00B1\u00C2\u00AF. This is fromthe rule in TYP 14.16 listing exceptions to duplicationu\u00C2\u00AFs.ma\u00C2\u00AF svaraparah.Clearly, if there is a third consonant following the \u00E2\u0080\u0098\u00C2\u00B4s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099 or \u00E2\u0080\u0098s\u00E2\u0080\u0099phonemes, there is no svarabhakti of the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme and the otherusual duplication rules take over. Some examples are:15See the K\u0016at.haka 1.9 (b\u0016a Kr.s.n. am\u0016urty \u0013s\u0016astry ant b\u0016a Gan. u\u0013svaratr\u0016avyt. ah. 2003a) wherethe eight-syllabled mantra of s\u0016urya is described as ghr.n. yryty tvu aks.ar \u00C4\u00B1\u00C4\u00B1u j s\u0016urya yty tr\u0016\u0010n. \u00C4\u00B1\u0010 j\u0016atytya yty tr\u0016\u0010n.\u00C4\u00B1\u0010 j. The mantra is thus ghr.n. yh. s\u0016urya \u0016atytyah. . This should be only 7 syllablesas per the regular mode of counting syllables. The later c\u0016uryopanys.at (A. MahadevaSastri 1921) describes the eight syllabled mantra by adding the salutation Om before thismantra, i.e., om. ghr.n. yh. s\u0016urya \u0016atytyah. , and thus getting the correct number of syllables.The verses from the c\u0016uryopanys.at 7 are omytyuk\u0016aks.aram. rrahma j ghr.n. yryty tvu aks.aruj s\u0016urya ytyaks.aratvayam j \u0016atytya yty tr\u0016\u0010n. yaks.ar\u0016an. y j utasyayva s\u0016uryasy\u0016as.t.\u0016aks.aro manuh.jj. If the k\u0016at.haka \u0013s\u0016akhyn-s actually pronounced the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme before a \u00E2\u0080\u0098y\u00E2\u0080\u0099 phoneme asa svarabhakti, then this feature is not preserved in the recitation of this pra\u0013sna by theTTs, while some other phonological peculiarities of the k\u0016at.haka pra\u0013sna-s are still preservedby the TTs. It seems quite likely that the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme before the semi-vowels receiveddiffering treatment from various Vedic groups, and at least some of them continue to thisday, e.g., the Tamil b. gvutyn-s actually duplicate the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme before the semi-vowels.Tamil Taittir\u00C4\u00AByaka-s 199dars\u00C2\u00B4apu\u00C2\u00AFrn. ama\u00C2\u00AFsau 7\u00E2\u0086\u0092 dar\u00C3\u00B7s\u00C2\u00B4apu\u00C2\u00AFrn. ama\u00C2\u00AFsau (TS 1.6.7.1)vars.avr.ddham 7\u00E2\u0086\u0092 var\u00C3\u00B7s.avr.ddham (TS 1.1.2.1)barsam 7\u00E2\u0086\u0092 bar\u00C3\u00B7sam (TS 2.5.7.1)ubhayatas\u00C2\u00B4s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1rs.n. \u00C4\u00B1\u00C2\u00AF 7\u00E2\u0086\u0092 ubhayatas\u00C2\u00B4s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1rs.t.n. \u00C4\u00B1\u00C2\u00AF (TS 1.2.4.2)da\u00C2\u00AFrs\u00C2\u00B4yam 7\u00E2\u0086\u0092 da\u00C2\u00AFrs\u00C2\u00B4s\u00C2\u00B4yam (TS 3.2.2.3)\u00E2\u0080\u00A2 k2 is the h phoneme: The \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme again attains svarabhakti un-der conditions similar to the \u00E2\u0080\u0098\u00C2\u00B4s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099 or \u00E2\u0080\u0098s\u00E2\u0080\u0099 phonemes and this type iscalled the karen. u\u00C2\u00AF. Just like the \u00E2\u0080\u0098l\u00E2\u0080\u0099 phoneme svarabhakti-s, these twosvarabhakti-s are pronounced in an identical fashion, although theyare classified under two different names.barhis.a\u00C2\u00AF 7\u00E2\u0086\u0092 bar\u00C3\u00B7his.a\u00C2\u00AF (TS 1.6.7.2)However, there is an important exception to the karen. u\u00C2\u00AF svarabhakti.If k) has the svarita accent and k2 has the anuda\u00C2\u00AFtta accent, then the\u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme does not attain svarabhakti, and there is duplication ofthe h phoneme. This rule is not found in the TYP or the two mains\u00C2\u00B4iks.a\u00C2\u00AF-s, but is followed in the Grantha texts and TT recitation.etadb\u00C4\u00B1arhirhy\u00C4\u00B1es.ah. 7\u00E2\u0086\u0092 etadb \u00C4\u00B1arhhirhy \u00C4\u00B1es.ah. (TA 4.5.5)\u00E2\u0080\u00A2 Special case of the form h)rr. : This form is also not explicitly describedin the TYP or the main s\u00C2\u00B4iks.a\u00C2\u00AF texts. However, the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme staysnon-conjunct, and is pronounced very similar to the svarabhakti.16V) is the phoneme \u00E2\u0080\u0098h\u00E2\u0080\u0099The \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme is not excluded from duplication in the TYP and TTrecitations, although Pa\u00C2\u00AFn. ini and the SYP exempt it.17\u00E2\u0080\u00A2 k2 is a mute: In this case k2 has to be the \u00E2\u0080\u0098n. \u00E2\u0080\u0099, \u00E2\u0080\u0098n\u00E2\u0080\u0099, or the \u00E2\u0080\u0098m\u00E2\u0080\u0099 phoneme.However, there is a peculiarity in the TYP when the \u00E2\u0080\u0098n. \u00E2\u0080\u0099, \u00E2\u0080\u0098n\u00E2\u0080\u0099, or the \u00E2\u0080\u0098m\u00E2\u0080\u0099phonemes occur after the \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme. A so called na\u00C2\u00AFsikya is insertedafter the \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme16The pronunciation of the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme in this situation should be contrasted with thatof the Tamil b. g-vutyn-s, who do not pronounce it with a svarabhakti like sound.17For example, the SYP 4.100, enjoins duplication of any consonant following the \u00E2\u0080\u0098r\u00E2\u0080\u0099and \u00E2\u0080\u0098h\u00E2\u0080\u0099 phonemes, pram. tu ruphahak\u0016ar\u0016arhy\u0016am.200 B Ramakrishnanhaka\u00C2\u00AFra\u00C2\u00AFnnan. anamapara\u00C2\u00AFnna\u00C2\u00AFsikyam 21.14From a \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme (with the phonemes) \u00E2\u0080\u0098n\u00E2\u0080\u0099, \u00E2\u0080\u0098n\u00E2\u0080\u0099 or \u00E2\u0080\u0098m\u00E2\u0080\u0099following (it), a nasal-sound (occurs)Whitney (Whitney 1871) translates it as \u00E2\u0080\u009CAfter h, when followed byn, n. or m, is inserted a na\u00C2\u00AFsikya\u00E2\u0080\u009D, which is quite reasonable. However,as he points out the commentator actually interprets this statementas \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme itself taking up a nasal sound.18 Whitney points outthat this is not a straightforward interpretation of this sutra and hiscontention seems correct, when the previous su\u00C2\u00AFtra-s and the commen-tators explanation of those are examined. The pronunciation of thenasals after the \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme seems to have a variety of opinions, butthe TT pronounce these conjunct consonants with the nasal before the\u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme and transitioning to the \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme towards the tail-endof the nasal. This is what the commentator of the TYP seems to havein mind as well.\u00E2\u0080\u00A2 k2 is a semi-vowel: The fundamental rule applies, i.e., the phoneme\u00E2\u0080\u0098h\u00E2\u0080\u0099 is duplicated.gr.hyate 7\u00E2\u0086\u0092 gr.hhyate (TS 6.5.10.1)prahriyate 7\u00E2\u0086\u0092 prahhriyate (TS 7.5.15.1)s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1tika\u00C2\u00AFvati hla\u00C2\u00AFduke 7\u00E2\u0086\u0092 s\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1tika\u00C2\u00AFvatihhla\u00C2\u00AFduke (TA 6.4.1 )bahv\u00C2\u00AF\u00C4\u00B1bhih. 7\u00E2\u0086\u0092 bahhv\u00C2\u00AF\u00C4\u00B1bhih. (TS 6.5.9.2)Anusv\u0016nrnThe conversion of the anusva\u00C2\u00AFra at the end of a word is generally the sameas in Pa\u00C2\u00AFn. ini, where the anusva\u00C2\u00AFra is usually transformed into a nasal ofthe same group (savarn. a-anuna\u00C2\u00AFsika). The difference in TT recitation is inhow it is treated before the sibilants, the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme and some notableexemptions. The Grantha texts explicitly specify the transformation of theanusva\u00C2\u00AFra unlike the Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 texts which simply use the anusva\u00C2\u00AFra symbol(a dot above the line) before the mutes and leave it to the reader to makethe conversion, which is not trivial during recitation. In fact, one of the sureways of identifying a low quality recitation is to see if the reciter substitutesthe \u00E2\u0080\u0098m\u00E2\u0080\u0099 phoneme for the anusva\u00C2\u00AFra. Since this is a phonologically important18The commentary says: tasm\u0016an nan. ama-param. hak\u0016aram. \u0016aruhya n\u0016asykyam. rhavaty,Thus, h when followed by n, n. or m becomes an inserted n\u0016asykya.Tamil Taittir\u00C4\u00AByaka-s 201distinction, I point out the rules governing the anusva\u00C2\u00AFra, although it isexempt from duplication, which is the main topic of this paper.\u00E2\u0080\u00A2 k) is the \u00E2\u0080\u0098r\u00E2\u0080\u0099, \u00E2\u0080\u0098\u00C2\u00B4s\u00E2\u0080\u0099, \u00E2\u0080\u0098s.\u00E2\u0080\u0099, \u00E2\u0080\u0098s\u00E2\u0080\u0099 or \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme: The anusva\u00C2\u00AFra remains and doesnot become a nasal as per TYP 5.29 na repha parah. , the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phonemebeing specified explicitly since the anusva\u00C2\u00AFra is transformed in the caseof the other semi-vowels. In these cases where the anusva\u00C2\u00AFra does notbecome a savarn. a-anuna\u00C2\u00AFsika, the sarvasammata s\u00C2\u00B4iksa\u00C2\u00AF-43 specifies:adhya\u00C2\u00AFye taittir\u00C2\u00AF\u00C4\u00B1ya\u00C2\u00AFn. a\u00C2\u00AFmanusva\u00C2\u00AFro yada\u00C2\u00AF bhavet |tada\u00C2\u00AFdyardho gaka\u00C2\u00AFrah. sya\u00C2\u00AFdaparastvanuna\u00C2\u00AFsikah. ||In the recitation of Taittir\u00C2\u00AF\u00C4\u00B1yaka-s, when the anusva\u00C2\u00AFra ispresent,Then, a half gaka\u00C2\u00AFra sound followed by the anuna\u00C2\u00AFsika is(chanted).This should be contrasted with the recitation of the Tamil R. g-vedin-s.19 The Vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AF further specifieshrasva\u00C2\u00AFddvittvamanusva\u00C2\u00AFrah. pra\u00C2\u00AFpnuya\u00C2\u00AFtsam. yute pare | 341tadanusva\u00C2\u00AFrapu\u00C2\u00AFrvas\u00C2\u00B4ca sam. yoga\u00C2\u00AFdirdvirucyate || 342After a short (vowel) the anusva\u00C2\u00AFra attains doubling, if thefollowing (u\u00C2\u00AFs.man) is in a conjunct consonant,That u\u00C2\u00AFs.man which has the anusva\u00C2\u00AFra prior, is said twice dueto being conjoined (with a consonant).Note that this is the case if the sibilant following the anusva\u00C2\u00AFra is notfollowed by a mute. If it is, then the rule TYP 14.9 needs to be appliedfurther and the sibilant itself is not duplicated but the correspondingfirst mute is inserted as abhinidha\u00C2\u00AFna. In summary, the pronunciationof the anusva\u00C2\u00AFra differs as follows:\u00E2\u0080\u0093 If k) is followed by a vowel, then the sound is like \u00E2\u0080\u009Cg-m\u00E2\u0080\u009D, wherethe \u00E2\u0080\u009Cg\u00E2\u0080\u009D has a svarabhakti like quality to it. Note that this willalways be the case with the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme. The following two casesapply only to the sibilants.\u00E2\u0080\u0093 If k) is followed by a consonant and h) was long, the the soundis like \u00E2\u0080\u009Cg\u00E2\u0080\u009D, an ardha-gaka\u00C2\u00AFra.19The b. k-pr\u0016aty\u0013s\u0016akhya actually requires doubling of the sibilants after the anusv\u0016ara.202 B Ramakrishnan\u00E2\u0080\u0093 If k) is followed by a consonant and h) was short, the the soundis like \u00E2\u0080\u009Cg-g\u00E2\u0080\u009D, i.e., a pure phoneme \u00E2\u0080\u0098g\u00E2\u0080\u0099 sound followed by an ardha-gaka\u00C2\u00AFra.In the Grantha texts, the first two cases are denoted by the vedicanusva\u00C2\u00AFra symbol, whereas the last case is the duplicated vedicanusva\u00C2\u00AFra. Some examples are:a\u00C2\u00AFditya\u00C2\u00AFna\u00C2\u00AFm\u00CB\u009C. sadasi 7\u00E2\u0086\u0092 a\u00C2\u00AFditya\u00C2\u00AFna\u00C2\u00AFm\u00CB\u009C. sadasi (TS 1.1.11.2)is.am\u00CB\u009C. ray\u00C2\u00AF\u00C4\u00B1n. a\u00C2\u00AFm 7\u00E2\u0086\u0092 is.am\u00CB\u009C. ray\u00C2\u00AF\u00C4\u00B1n. a\u00C2\u00AFm (TS 1.1.14.1)as\u00C2\u00B4ravam\u00CB\u009C. hi 7\u00E2\u0086\u0092 as\u00C2\u00B4ravam\u00CB\u009C. hi (TA 1.1.14.1)vayam\u00CB\u009C. sya\u00C2\u00AFma 7\u00E2\u0086\u0092 vayam\u00CB\u009C. ssya\u00C2\u00AFma (TB 2.11.4.28)indram\u00CB\u009C. sthaviram 7\u00E2\u0086\u0092 indram\u00CB\u009C. stthaviram (TB 2.4.2.20)\u00E2\u0080\u00A2 k) is a mute or the phonemes \u00E2\u0080\u0098y\u00E2\u0080\u0099, \u00E2\u0080\u0098l\u00E2\u0080\u0099 or \u00E2\u0080\u0098v\u00E2\u0080\u0099: The TYP rules governingthe behavior are similar to Pa\u00C2\u00AFn. ini and as followsnaka\u00C2\u00AFro anuna\u00C2\u00AFsikam TYP 5.26maka\u00C2\u00AFra spars\u00C2\u00B4aparastasya sastha\u00C2\u00AFnamanuna\u00C2\u00AFsikam TYP 5.27antastha\u00C2\u00AFparas\u00C2\u00B4ca savarn. amanuna\u00C2\u00AFsikam TYP 5.28Examples are not provided in this case since these rules are well knowneven in the classical Sanskrit.\u00E2\u0080\u00A2 Special cases: In the Vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AFjn\u00CB\u009Caghnottaro maka\u00C2\u00AFras\u00C2\u00B4cedanusva\u00C2\u00AFro \u00E2\u0080\u00B2tra kevalah. | 166dvima\u00C2\u00AFtra iti vijn\u00CB\u009Ceyo hyanyadharmavivarjitah. | 167The \u00E2\u0080\u0098m\u00E2\u0080\u0099 phoneme before a jn\u00CB\u009Ca or ghna (exists) as just ananusva\u00C2\u00AFra,This is known to be two ma\u00C2\u00AFtra-s (in length), and indeedobtains a different quality.This duration (ma\u00C2\u00AFtra) lengthening is reflected in actual TT practiceand a distinct pause occurs after the anusva\u00C2\u00AFra. The duration of thepause is somewhat variable. Distinct pauses can be noted in the chant-ing of such anusva\u00C2\u00AFra-s in (Sarma et al. 2004) and the pan.d. ita-s fromthe Sringeri vedapa\u00C2\u00AFt.has\u00C2\u00B4a\u00C2\u00AFla, but in my field study I have observed somepan.d. ita-s do not pause very clearly, and the pause is clear only to thetrained ear. The sarvasammata s\u00C2\u00B4iksa\u00C2\u00AF-32 elaborates further:Tamil Taittir\u00C4\u00AByaka-s 203nakis.t.am. ghnanti sam. jn\u00CB\u009Ca\u00C2\u00AFnam. priyam. jn\u00CB\u009Ca\u00C2\u00AFtim. tathaiva ca |dhu\u00C2\u00AFm. ks.n. a\u00C2\u00AF dam. ks.n. ava ityatra\u00C2\u00AFnusva\u00C2\u00AFro\u00E2\u0080\u00B2pi vidharmakah. ||In the places with nakis.t.am. ghnanti, sam. jn\u00CB\u009Ca\u00C2\u00AFnam. , priyam.jn\u00CB\u009Ca\u00C2\u00AFtim. ,dhu\u00C2\u00AFm. ks.n. a\u00C2\u00AF and dam. ks.n. ava, the anusva\u00C2\u00AFra (is pronouncedwith a) different quality.Note that this vidharma quality is observed whenever the com-bination m. jn\u00CB\u009Ca occurs, as specified by the vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AF, e.g.,sam. jn\u00CB\u009Capayantyaindrah. in TS 6.3.11.2, which is not covered by theverse quoted above. However the other cases, dhu\u00C2\u00AFm. ks.n. a\u00C2\u00AF anddam. ks.n. ava, quoted above are pronounced the same way, i.e., with thelengthening of the duration. Interestingly, it is not an anusva\u00C2\u00AFra, butrather the consonant n\u00CB\u0099, which has the extended ma\u00C2\u00AFtra. This phono-logical flourish can be clearly heard in authentic TT recitations. Thusthe actual TT practice is a combination of the specification of the twos\u00C2\u00B4iks.a\u00C2\u00AF-s.Phonological peculiarities with V) being the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phonemeSome phonological peculiarities of the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme in TT recitation areknown only from tradition and are reflected in the Grantha texts.\u00E2\u0080\u00A2 The first is when the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme occurs in the conjunct consonantnts, which would become nths as per TYP 14.12. The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phonemeobtains vidharma, just like the cases described previously. In practice,a pause occurs after the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme.20 The Grantha texts print outa standalone \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme in this case.\u00E2\u0080\u00A2 The second peculiarity is when the conjunct consonant npr occurs. Inthis case, the Grantha texts print the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme as a standaloneconsonant, but it neither gets vidharma nor is pronounced like an \u00E2\u0080\u0098n\u00E2\u0080\u0099phoneme at the end of a pada and which is not duplicated (describedin the next section). It reflects the fact that the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme is justnot duplicated, although the basic TYP rule would enjoin it.20This pause is also reflected in the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme obtaining the svarita accent if theprevious vowel had the svarita accent.204 B RamakrishnanThus a standalone \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme is pronounced in different ways, dependingon the context where it occurs. These two rules are not reflected in Ra\u00C2\u00AFKr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. (2003a).V) is nnk\u0016nrn or _nnk\u0016nrn at the end of a padaThis is a distinct feature of the TT recitation, where in some cases the \u00E2\u0080\u0098n\u00E2\u0080\u0099phoneme or the n\u00CB\u0099aka\u00C2\u00AFra are not duplicated when occurring at the end of aword and followed by a consonant in the next word. Note that the TYPdoes not have any explicit rules for this and some rules can be found in thes\u00C2\u00B4iks.a\u00C2\u00AF-s. This case has been explained very well along with examples in theintroduction to Ra\u00C2\u00AF Kr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. (2003a).Thus, the examples will not be repeated in this paper. However, I willsummarize the account from the s\u00C2\u00B4iks.a\u00C2\u00AF-s and what happens in actual TTrecitations. The sarvasammata s\u00C2\u00B4iksa\u00C2\u00AF-45 says:pada\u00C2\u00AFntasya naka\u00C2\u00AFrasya yavahes.u pares.u vai |naka\u00C2\u00AFrayavaha\u00C2\u00AFstatra tvasam. yukta\u00C2\u00AFh. prak\u00C2\u00AF\u00C4\u00B1rtita\u00C2\u00AFh. ||The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme at the end of a word, followed (by) ya, va orha (in the next word),There, the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme and the following ya, va or ha are wellknown as not conjoined.The Vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AF on the other hand says:yavahe parasthes.u naka\u00C2\u00AFras\u00C2\u00B4ca\u00C2\u00AFntagastviti | 364nasya\u00C2\u00AFntagasya d\u00C2\u00AF\u00C4\u00B1rgha\u00C2\u00AFttu yavahe he ca halpare | 365parairebhirhi tasyaiva na sya\u00C2\u00AFt sam. yuktata\u00C2\u00AF tatha\u00C2\u00AF | 366The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme at the end (of a word) which comes prior to ayaka\u00C2\u00AFra, vaka\u00C2\u00AFra or haka\u00C2\u00AFra (in the next word is not duplicated).The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme at the end of (a word) which occurs after along (vowel) and followed by a yaka\u00C2\u00AFra, vaka\u00C2\u00AFra or haka\u00C2\u00AFra, ora \u00E2\u0080\u0098h\u00E2\u0080\u0099 phoneme followed by a consonant(The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme) followed by these exists without (becominga) conjunct (consonant).Clearly there is a difference in the account between the two s\u00C2\u00B4iks.a\u00C2\u00AF-s, sincethe Vya\u00C2\u00AFsa s\u00C2\u00B4iks.a\u00C2\u00AF restricts the non-duplication to a word-end \u00E2\u0080\u0098n\u00E2\u0080\u0099 phonemewhich is preceded by a long vowel. However, neither reflect the trueTamil Taittir\u00C4\u00AByaka-s 205state of affairs in actual TT recitation, which is much more compli-cated and includes the n\u00CB\u0099aka\u00C2\u00AFra as well. It is actually summarized in thesarvalaks.an. aman\u00CB\u009Cjar\u00C2\u00AF\u00C4\u00B1san\u00CB\u0099grahah. quoted under the above su\u00C2\u00AFtra-s of the Vya\u00C2\u00AFsas\u00C2\u00B4iks.a\u00C2\u00AF (A\u00C2\u00AFca\u00C2\u00AFrya S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1pat.t.a\u00C2\u00AFbhira\u00C2\u00AFmas\u00C2\u00B4a\u00C2\u00AFstri 1976),pra\u00C2\u00AFpnuto \u00E2\u0080\u00B2ntau n\u00CB\u0099anau dvitvam. vya-vr. -hr. -vra-parau ca re |hrasva\u00C2\u00AFnno vaparo dvitvam. sarvatra yottarastu n\u00CB\u0099ah. ||The n\u00CB\u0099a and na at the end of a word are duplicated when followedby vya, vr. , hr. , vra or the \u00E2\u0080\u0098r\u00E2\u0080\u0099 phoneme (in the next word)The \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme preceeded by a short vowel and followed by a \u00E2\u0080\u0098v\u00E2\u0080\u0099phoneme is duplicated, and the n\u00CB\u0099aka\u00C2\u00AFra is always duplicatedwhen followed by a \u00E2\u0080\u0098y\u00E2\u0080\u0099 phoneme.The non-duplicated consonant is pronounced almost like a svarabhakti, butnot quite the same, and the TTs pause very slightly after the \u00E2\u0080\u0098n\u00E2\u0080\u0099 or the\u00E2\u0080\u0098n\u00CB\u0099\u00E2\u0080\u0099 phonemes, although this may not be discernible to the untrained ear.In the Grantha texts, this non-conjunction is again denoted by having thephonemes \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme or \u00E2\u0080\u0098n\u00CB\u0099\u00E2\u0080\u0099 as a standalone consonant. (Ra\u00C2\u00AF Kr.s.n. amu\u00C2\u00AFrtis\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. 2003a) again use the \u00C3\u00B7 symbol to delineatethis behavior.Rules for the visargaThe visarga changes into the corresponding sibilants in general, except be-fore the conjunct consonant21 ks.a. This means that before the ka and paseries, the visarga will change into the jihva\u00C2\u00AFmu\u00C2\u00AFl\u00C2\u00AF\u00C4\u00B1ya and the upadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1yarespectively. The key rule in duplication is TYP 14.9, which was quotedpreviously. After the unvoiced sibilants, namely the jihva\u00C2\u00AFmu\u00C2\u00AFl\u00C2\u00AF\u00C4\u00B1ya and theupadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1ya, the first mute of the series, i.e., k and p, are inserted as ab-hinidha\u00C2\u00AFna. Note that the jihva\u00C2\u00AFmu\u00C2\u00AFl\u00C2\u00AF\u00C4\u00B1ya and the upadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1ya themselves areexempted from duplication as per TYP 14.15. A point to note here aboutthe Grantha texts is that they convert the visarga to the correspondingsibilant if followed by s\u00C2\u00B4a, s.a or sa. Otherwise, the use the visarga symbol,including before the conjunct consonant ks.a. It is up to the reader to mapit into the jihva\u00C2\u00AFmu\u00C2\u00AFl\u00C2\u00AF\u00C4\u00B1ya, upadhma\u00C2\u00AFn\u00C2\u00AF\u00C4\u00B1ya, or an actual visarga. Some examplesare:21See TYP na ks.aparah. . Note that the ks.a itself would be converted to khs.a as perTYP 14.12.206 B Ramakrishnanyah. sa\u00C2\u00AFvitram 7\u00E2\u0086\u0092 yassa\u00C2\u00AFvitram (TB 3.10.9.36)s\u00C2\u00B4ukrah. s\u00C2\u00B4ukras\u00C2\u00B4ocis.a\u00C2\u00AF 7\u00E2\u0086\u0092 s\u00C2\u00B4ukras\u00C2\u00B4s\u00C2\u00B4ukras\u00C2\u00B4ocis.a\u00C2\u00AF (TB 1.1.1.2)yah. kr. ttika\u00C2\u00AFsu 7\u00E2\u0086\u0092 yah.kkr. ttika\u00C2\u00AFsu (TB 1.1.2.6)s\u00C2\u00B4ukrapa\u00C2\u00AFh. pran. ayantu 7\u00E2\u0086\u0092 s\u00C2\u00B4ukrapa\u00C2\u00AFh.ppran. ayantu (TB 1.1.1.1)ghana\u00C2\u00AFghanah. ks.obhanah. 7\u00E2\u0086\u0092 ghana\u00C2\u00AFghanah. ks.obhanah. (TS 1.2.1.1)2.1 A note on insertion of consonantsThere are several cases where consonants are inserted, even in the absenceof conjunct consonants, e.g., as described in TYP 14.8. Since these happenin the absence of conjunct consonants, most texts including those in theDevana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 script, specify this explicitly. Thus, these cases are not treatedin this paper. Another instance of insertion is the so-called yama or twin,described in the TYP 21.12-13, where a corresponding twin is inserted whena non-nasal mute is followed by a nasal mute. Strictly speaking, this de-scribes the phonetic phenomenon when a non-nasal sound transition to anasal sound, and can be excluded from the category of consonant duplica-tion.3 Algebraic FormulationIt is assumed for the sake of textual processing that white space exists be-tween two consonants only in the case of a word-end \u00E2\u0080\u0098n\u00CB\u0099\u00E2\u0080\u0099 or \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme.The reason is that only these cases have an effect on pronunciation in cer-tain cases. In practice, useless white space can be erased easily in softwaredeveloped in high-level languages such as Perl very easily. Note that therecan be a white space between a vowel and a consonant, namely splittingacross different words when there is no conjunction, to make the text morereadable. The following notation is used to write concise equations:\u00E2\u0080\u00A2 [ ] denotes a white space\u00E2\u0080\u00A2 [x)|x2| \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7 |xk] represents k possible choices. If there are k possiblechoices in the LHS of the equation, there will be k possible choices inthe RHS of the equation and the corresponding choices will be at thesame position (index).\u00E2\u0080\u00A2 The \u00C3\u00B7 sign will be used to represent non-conjunctivity.\u00E2\u0080\u00A2 The {r} or {l} notation are used to represent the svarabhakti-s.Tamil Taittir\u00C4\u00AByaka-s 207\u00E2\u0080\u00A2 A standalone \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme stands for the special case of it before theconjunct consonants beginning with pr.\u00E2\u0080\u00A2 The \u00E2\u0088\u0097 symbol is used after the vyan\u00CB\u009Cjana to indicate the vidharmatva.\u00E2\u0080\u00A2 Wikners transliteration for the anuda\u00C2\u00AFtta and svarita are adopted.22A mathematical representation of the rules, which affect a naka\u00C2\u00AFra orn\u00CB\u0099aka\u00C2\u00AFra before a white space, i.e., happening at the end of a pada, is asfollows:hn[ ]l = l\u00CB\u009Cl (.2)h [n|n\u00CB\u0099] [ ] [vya|vr. |hr. |vra|r] = h [n|n\u00CB\u0099] [vya|vr. |hr. |vra|r] (.3)hhn[ ]v = hhnv (.4)hxn[ ]v = hxn\u00C3\u00B7 v (.5)hn[ ]y = hn\u00C3\u00B7 y (.6)hn\u00CB\u0099[ ]y = hn\u00CB\u0099n\u00CB\u0099y (.7)hn\u00CB\u0099[ ]v = hn\u00CB\u0099\u00C3\u00B7 v (.8)hhn[ ]h = hhnn\u00C3\u00B7 h (.9)hxn[ ]h = hxn\u00C3\u00B7 h (.10)22See https://ctan.org/tex-archive/language/sanskrit.208 B RamakrishnanThe rest of the duplication is as follows:h)Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 = h)Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2Pif k( = Xi1;j2 andj2 \u00CC\u00B8= 5 ork( = k) = Xi1;5= h)Xi1;j1\u00E2\u0088\u0092)+j1mod2Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2Potherwise (.11)h) [r|l|v]Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 = h) [r|l|v]Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2Pif k( = Xi1;j2 andj2 \u00CC\u00B8= 5 ork( = k) = Xi1;5= h) [r|l|v]Xi1;j1\u00E2\u0088\u0092)+j1mod2Xi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2Potherwise (.12)h) [r|l]jah2 = h) [{r} | {l}]jah2 (.13)h)rr. = h)r \u00C3\u00B7 r. (.14)h)!rh_h2 = h)!rhh_h2 (.15)h)rVk(k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2 = h)rVk(k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2P if k( = V= h)rVVk(k) \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7kkh2P otherwise (.16)h)rjXi1;j1) k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 = h)rjXi1;)Xi1;j1k( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 (.17)h)rjVk( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 = h)rjjVk( \u00C2\u00B7 \u00C2\u00B7 \u00C2\u00B7klh2 (.18)bailv[a|o] = bailv\u00C3\u00B7 v[a|o] (.19)h)\u00E1\u00B9\u0083 [jn\u00CB\u009Ca|ghna] = h)\u00E1\u00B9\u0083 \u00E2\u0088\u0097 [jn\u00CB\u009Ca|ghna] (.20)[da|dhu\u00C2\u00AF]m. ks.n. a = [da|dhu\u00C2\u00AF] n\u00CB\u0099 \u00E2\u0088\u0097 khs.n. a (.21)Tamil Taittir\u00C4\u00AByaka-s 209Figure 1ND-FST for processing k) = mute210 B Ramakrishnan4 Non-deterministic Finite State Transducer Rep-resentationA Non-Deterministic Finite State Transducer (FST) is a 7-tuple (Sipser2006) (fP\u00CE\u00A3P\u00CE\u0093P \u000EP !P q(P F )1. f is a finite set called the states2. \u00CE\u00A3 is a finite set called the alphabet3. \u00CE\u0093 is a finite set called the output alphabet4. \u000E is the transition function5. ! is the output function6. q( is the start state7. F is the set of accept statesThe processing is done in units of one sentence. The start state is thebeginning of new sentence. The accept state is when the vira\u00C2\u00AFma symbol (|),i.e., the end of a sentence, is detected. The symbol \u00CF\u0095 stands for a white-spacebetween two words. Each sentence can be processed multiple times to applyall the rules. Of course, in actual software, this could be parallelized forfaster processing. Two ND-FST processing engines are illustrated. Figure1 shows the processing cycle when k) is a mute. The whitespace betweenwords are denoted by the symbol \u00CF\u0095. Note that some \u00E2\u0080\u009Csuper-states\u00E2\u0080\u009D areused (denoted by gray filling) in order to simplify the diagram. Each super-state will have to be broken into 5 different normal states for the 5 differentseries in the mutes. Figure 2 shows the processing for a word-end \u00E2\u0080\u0098n\u00CB\u0099\u00E2\u0080\u0099 or \u00E2\u0080\u0098n\u00E2\u0080\u0099phoneme. Clearly, ND-FSTs can be developed for all the other rule-groupsas well.Tamil Taittir\u00C4\u00AByaka-s 211Figure 2ND-FST for processing the \u00E2\u0080\u0098n\u00E2\u0080\u0099 phoneme at the end of a pada212 B Ramakrishnan5 Software for Studying DuplicationA software using the Perl programming language has been developed tostudy duplication and will be placed in the public domain. This softwareaccepts an input file which contains Yajurveda sentences in the Wiknertransliteration format. Rules such as svarabhakti, etc., are implemented andan output file with text in Wikner\u00E2\u0080\u0099s transliteration format, using principlessimilar to the Grantha texts, is produced. As an option, all duplicatedconsonants can also be explicitly specified. This can be included in anskt file, which serves as the input to Wikner\u00E2\u0080\u0099s pre-processor (which waswritten in the C programming language). Note that the Wikner\u00E2\u0080\u0099s C codewas modified slightly to accommodate some TT accentuations and syllables,which are not available from the regular pre-processor. Finally, xelatex canbe used to generate pdfs from the tex output of the modified Wikner pre-processor.6 Some Examples of DuplicationExamples of typesetting are given below with regular Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 typesettingfirst, Grantha-mode typesetting next, and finally the Grantha-mode withall duplicated syllables. This will clearly illustrate the differences betweenthe Grantha mode and traditional Devana\u00C2\u00AFgar\u00C2\u00AF\u00C4\u00B1 typesetting, as well as howconsonants are duplicated. It should be noted that these do not cover allthe TYP duplication rules.1. Example 1, TA 3.12.33.(a) sahasr \u00C4\u00B1as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1rs.a\u00C2\u00AF pur \u00C4\u00B1us.ah. | sahasra\u00C2\u00AFks.ah. sahasr \u00C4\u00B1apa\u00C2\u00AFt | sa bhu\u00C2\u00AFm \u00C4\u00B1\u00C4\u00B1m.vi\u00C2\u00B4svat\u00C4\u00B1o vr. tva\u00C2\u00AF | aty \u00C4\u00B1atis.t.haddas\u00C2\u00B4a\u00C2\u00AFm. gulam | pur \u00C4\u00B1us.a evedam\u00CB\u009C. sarv \u00C4\u00B1\u00C4\u00B1am| yadbhu\u00C2\u00AFtam. yacca bhavy \u00C4\u00B1\u00C4\u00B1am | uta\u00C2\u00AFm \u00C4\u00B1r. tattvasyes\u00C2\u00B4 \u00C4\u00B1a\u00C2\u00AFnah. | yadann \u00C4\u00B1ena\u00C2\u00AFtiroh\u00C4\u00B1ati | eta\u00C2\u00AFv \u00C4\u00B1a\u00C2\u00AFnasya mahima\u00C2\u00AF | atojya\u00C2\u00AFya\u00C2\u00AF \u00C4\u00B1m\u00CB\u009C. s\u00C2\u00B4ca pu\u00C2\u00AFr \u00C4\u00B1us.ah. ||(b) sahasr \u00C4\u00B1as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1r\u00C3\u00B7s.a\u00C2\u00AF pur \u00C4\u00B1us.ah. | sahasra\u00C2\u00AFkhs.assahasr \u00C4\u00B1apa\u00C2\u00AFt | sa bhu\u00C2\u00AFm\u00CB\u009C\u00C4\u00B1\u00C4\u00B1vvi\u00C2\u00B4svat\u00C4\u00B1ovr. tva\u00C2\u00AF | aty \u00C4\u00B1atis.t.haddas\u00C2\u00B4a\u00C2\u00AFn\u00CB\u0099gulam | pur \u00C4\u00B1us.a evedam\u00CB\u009C. sarv \u00C4\u00B1\u00C4\u00B1am |yadbhu\u00C2\u00AFta\u00CB\u009Cyyacca bhavy\u00C4\u00B1\u00C4\u00B1am | uta\u00C2\u00AFm \u00C4\u00B1r. tattvasyes\u00C2\u00B4 \u00C4\u00B1a\u00C2\u00AFnah. | yadann \u00C4\u00B1ena\u00C2\u00AFtiroh\u00C4\u00B1ati | eta\u00C2\u00AFv \u00C4\u00B1a\u00C2\u00AFnasya mahima\u00C2\u00AF | atojya\u00C2\u00AFya\u00C2\u00AF \u00C4\u00B1m\u00CB\u009C. s\u00C2\u00B4ca pu\u00C2\u00AFr \u00C4\u00B1us.ah. ||(c) sahassr \u00C4\u00B1as\u00C2\u00B4\u00C2\u00AF\u00C4\u00B1r\u00C3\u00B7s.a\u00C2\u00AF pur \u00C4\u00B1us.ah. | sahassra\u00C2\u00AFkkhs.assahassr \u00C4\u00B1apa\u00C2\u00AFt | sabhu\u00C2\u00AFm\u00CB\u009C\u00C4\u00B1\u00C4\u00B1vvi\u00C2\u00B4ss\u00C2\u00B4vat\u00C4\u00B1o vr. ttva\u00C2\u00AF | atty \u00C4\u00B1atis.t.t.haddas\u00C2\u00B4a\u00C2\u00AFn\u00CB\u0099gulam | pur \u00C4\u00B1us.aTamil Taittir\u00C4\u00AByaka-s 213evedam\u00CB\u009C. sarv\u00C4\u00B1\u00C4\u00B1am | yaddbhu\u00C2\u00AFta\u00CB\u009Cyyacca bhavvy \u00C4\u00B1\u00C4\u00B1am | uta\u00C2\u00AFm \u00C4\u00B1r. tatt-vassyes\u00C2\u00B4\u00C4\u00B1a\u00C2\u00AFnah. | yadann \u00C4\u00B1ena\u00C2\u00AF tiroh \u00C4\u00B1ati | eta\u00C2\u00AFv \u00C4\u00B1a\u00C2\u00AFnassya mahima\u00C2\u00AF |atojjya\u00C2\u00AFya\u00C2\u00AF\u00C4\u00B1m\u00CB\u009C. s\u00C2\u00B4ca pu\u00C2\u00AFr\u00C4\u00B1us.ah. ||2. Example 2, TB 1.1.1.(a) brahma sam. dh \u00C4\u00B1attam. tanm \u00C4\u00B1e jinvatam | ks.atram\u00CB\u009C. sam. dh \u00C4\u00B1attam.tanm\u00C4\u00B1e jinvatam | is.am\u00CB\u009C. sam. dh \u00C4\u00B1attam. ta\u00C2\u00AFm. m \u00C4\u00B1e jinvatam | u\u00C2\u00AFrjam\u00CB\u009C.sam. dh\u00C4\u00B1attam. ta\u00C2\u00AFm. m\u00C4\u00B1e jinvatam | rayim\u00CB\u009C. sam. dh \u00C4\u00B1attam. ta\u00C2\u00AFm. m \u00C4\u00B1ejinvatam | pus.t.im\u00CB\u009C. sam. dh \u00C4\u00B1attam. ta\u00C2\u00AFm. m \u00C4\u00B1e jinvatam | praja\u00C2\u00AFm\u00CB\u009C.sam. dh\u00C4\u00B1attam. ta\u00C2\u00AFm. m\u00C4\u00B1e jinvatam | pas\u00C2\u00B4u\u00C2\u00AFntsam. dh \u00C4\u00B1attam. ta\u00C2\u00AFnm \u00C4\u00B1e jin-vatam | stut \u00C4\u00B1o \u00E2\u0080\u00B2si jan \u00C4\u00B1adha\u00C2\u00AFh. | deva\u00C2\u00AFstv \u00C4\u00B1a\u00C2\u00AF s\u00C2\u00B4ukrapa\u00C2\u00AFh. pran. \u00C4\u00B1ayantu ||(b) brahma sandh \u00C4\u00B1attantanm \u00C4\u00B1e jinvatam | khs.atram\u00CB\u009C. sandh \u00C4\u00B1attantanm \u00C4\u00B1ejinvatam | is.am\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jinvatam | u\u00C2\u00AFrjam\u00CB\u009C.sandh\u00C4\u00B1attanta\u00C2\u00AFm. m\u00C4\u00B1e jinvatam | rayim\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jinvatam| pus.t.im\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jinvatam | praja\u00C2\u00AFm\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm.m\u00C4\u00B1e jinvatam | pas\u00C2\u00B4u\u00C2\u00AFnthsandh \u00C4\u00B1attanta\u00C2\u00AFnm \u00C4\u00B1e jinvatam | stut \u00C4\u00B1o \u00E2\u0080\u00B2sijan\u00C4\u00B1adha\u00C2\u00AFh. | deva\u00C2\u00AFstv \u00C4\u00B1a\u00C2\u00AF s\u00C2\u00B4ukrapa\u00C2\u00AFh. pran. \u00C4\u00B1ayantu ||(c) brahma sandh \u00C4\u00B1attantannm \u00C4\u00B1e jinnvatam | khs.attram\u00CB\u009C. sandh \u00C4\u00B1attanta-nnm\u00C4\u00B1e jinnvatam | is.am\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jinnvatam | u\u00C2\u00AFrjjam\u00CB\u009C.sandh\u00C4\u00B1attanta\u00C2\u00AFm. m\u00C4\u00B1e jinnvatam | rayim\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jin-nvatam | pus.t.t.im\u00CB\u009C. sandh \u00C4\u00B1attanta\u00C2\u00AFm. m \u00C4\u00B1e jinnvatam | praja\u00C2\u00AFm\u00CB\u009C.sandh\u00C4\u00B1attanta\u00C2\u00AFm. m\u00C4\u00B1e jinnvatam | pas\u00C2\u00B4u\u00C2\u00AFnthsandh \u00C4\u00B1attanta\u00C2\u00AFnnm \u00C4\u00B1e jin-nvatam | stut \u00C4\u00B1o \u00E2\u0080\u00B2si jan \u00C4\u00B1adha\u00C2\u00AFh. | deva\u00C2\u00AFsttv \u00C4\u00B1a\u00C2\u00AF s\u00C2\u00B4ukkrapa\u00C2\u00AFh.ppran. \u00C4\u00B1ayantu||3. Example 3, TA 1.22.86.1.(a) pus.karaparn. aih. pus.karadan.d. aih. pus.karai\u00C2\u00B4sc \u00C4\u00B1a sam\u00CB\u009C. sth\u00C2\u00AF\u00C4\u00B1rya |(b) pus.karaparn. aih. pus.karadan.d. aih. pus.karai\u00C2\u00B4sc \u00C4\u00B1a sam\u00CB\u009C. sth\u00C2\u00AF\u00C4\u00B1rya |(c) pus.kkaraparn.n. aih. ppus.kkaradan.d. aih. ppus.kkarai\u00C2\u00B4scc \u00C4\u00B1a sam\u00CB\u009C. stth\u00C2\u00AF\u00C4\u00B1ryya|7 ConclusionDuplication of consonants is uniquely complex and creates many interestingphonological flourishes in TT recitation. The duplication largely follows theTYP and the two main s\u00C2\u00B4iks.a\u00C2\u00AF -s, though not completely. The instances where214 B Ramakrishnanthe s\u00C2\u00B4iks.a\u00C2\u00AF -s add additional rules and where the two main s\u00C2\u00B4iks.a\u00C2\u00AF -s disagreewith each other have been pointed out. All aspects of duplication and theexceptions can be appreciated only by a trained ear. Duplication can beexpressed by algebraic equations as well as by an ND-FST. A Perl script hasbeen written to output the duplicated consonants in an exact manner. Thephonetic basis of duplication and comparisons between different Taittir\u00C2\u00AF\u00C4\u00B1yatraditions would be interesting future projects.AcknowledgmentsI would like to first thank my father for introducing me to the fascinatingsubject of vedic chanting at a very early age. I would like to thank allmy teachers, and in particular, the following teachers have offered invalu-able insights on chanting: S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1 Na\u00C2\u00AFra\u00C2\u00AFyan. a S\u00C2\u00B4a\u00C2\u00AFstrin. ah. of Mylapore, Chennai,S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1 S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1kan. t.ha A\u00C2\u00AFca\u00C2\u00AFrya\u00C2\u00AFh. of Los Angeles, California, S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1 Yajn\u00CB\u009Ces\u00C2\u00B4vara S\u00C2\u00B4a\u00C2\u00AFstrin. ah.of Chicago, Illinois, and S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1 Satyana\u00C2\u00AFra\u00C2\u00AFyan. a Bhat.t.a\u00C2\u00AFh. of Andover, Mas-sachusetts. I would like to thank the reviewers whose comments greatlyhelped improve the clarity of the presentation.ReferencesA. Mahadeva Sastri. 1921. Sa\u00C2\u00AFma\u00C2\u00AFnya Veda\u00C2\u00AFnta Upanis.ads with the Commen-tary of Sri Upanishad-Brahma-Yogin. Adyar Library.A\u00C2\u00AFca\u00C2\u00AFrya S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1pat.t.a\u00C2\u00AFbhira\u00C2\u00AFmas\u00C2\u00B4a\u00C2\u00AFstri. 1976. Vya\u00C2\u00AFsas\u00C2\u00B4iks.a\u00C2\u00AF S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1su\u00C2\u00AFryana\u00C2\u00AFra\u00C2\u00AFyan. asu\u00C2\u00AFra\u00C2\u00AFva-dha\u00C2\u00AFni-viracita-vedataijasa\u00C2\u00AFkhyaya\u00C2\u00AF vya\u00C2\u00AFkhyaya\u00C2\u00AF S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1ra\u00C2\u00AFja\u00C2\u00AFghanapa\u00C2\u00AFt.hi-viracita-sarva-laks.an. aman\u00CB\u009Cjarya\u00C2\u00AFssan\u00CB\u0099grahen. a ca sameta\u00C2\u00AF. Veda-m\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083s\u00C4\u0081nusandh\u00C4\u0081na-kendra.Allen, W. S. 1953. Phonetics in Ancient India. Oxford University Press.Finke, Otto A. 1886. Die Sarvasammata Siksa mit Commentar, her-augegeben, iibersetzt und erklart. Druck der Dieterichschen Univ.-Bruchdruckerei.John Lazarus. 1878. A Tamil Grammar: Designed for Use in Colleges andSchools. John Snow and Co., Ludgate Hill.Larios, Borayin. 2017. Embodying the Vedas - Traditional Vedic Schools ofContemporary Maharashtra. De Gruyter Open Access Hinduism.Mohanty, Monalisa. 2015. An analysis of svarabhakti in Yajurveda with refer-ence to Yajurvedic Siksa texts. http://hdl.handle.net/10603/128613,Department of Sanskrit, Utkal University.Na\u00C2\u00AFra\u00C2\u00AFyan. a S\u00C2\u00B4a\u00C2\u00AFstr\u00C2\u00AF\u00C4\u00B1, T. S. 1930. Taitir\u00C2\u00AF\u00C4\u00B1yayajurbra\u00C2\u00AFhman. am. prathamabha\u00C2\u00AFgam. ,sasvaram. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsamudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.\u00E2\u0080\u0094 1931. Taitir\u00C2\u00AF\u00C4\u00B1yayajurbra\u00C2\u00AFhman. am. dvit\u00C2\u00AF\u00C4\u00B1yabha\u00C2\u00AFgam. , sasvaram. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsa-mudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.\u00E2\u0080\u0094 1935. Taitir\u00C2\u00AF\u00C4\u00B1yayajurbra\u00C2\u00AFhman. am. tr. t\u00C2\u00AF\u00C4\u00B1yabha\u00C2\u00AFgam. , sasvaram. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsa-mudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.\u00E2\u0080\u0094 Undated(a). Facsimile copy of Kr.s.n. a yajurved\u00C2\u00AF\u00C4\u00B1ya tait\u00C2\u00AF\u00C4\u00B1r\u00C2\u00AF\u00C4\u00B1ya-sam. hita\u00C2\u00AFdvit\u00C2\u00AF\u00C4\u00B1yabha\u00C2\u00AFgam. , sasvaram. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsamudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.\u00E2\u0080\u0094 Undated(b). Facsimile copy of Kr.s.n. a yajurved\u00C2\u00AF\u00C4\u00B1ya tait\u00C2\u00AF\u00C4\u00B1r\u00C2\u00AF\u00C4\u00B1ya-sam. hita\u00C2\u00AFprathamabha\u00C2\u00AFgam. , sasvaram. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsamudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.Ra\u00C2\u00AF Kr.s.n. amu\u00C2\u00AFrti s\u00C2\u00B4a\u00C2\u00AFstri and Ra\u00C2\u00AF Gan. es\u00C2\u00B4varadra\u00C2\u00AFvid. ah. . 2003a. Kr.s.n. a yajurved\u00C2\u00AF\u00C4\u00B1yatait\u00C2\u00AF\u00C4\u00B1r\u00C2\u00AF\u00C4\u00B1ya-bra\u00C2\u00AFhman. am. S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1nr. sim. hapriya\u00C2\u00AF.\u00E2\u0080\u0094 2003b. Kr.s.n. a yajurved\u00C2\u00AF\u00C4\u00B1ya tait\u00C2\u00AF\u00C4\u00B1r\u00C2\u00AF\u00C4\u00B1ya-sam. hita\u00C2\u00AF. S\u00C2\u00B4r\u00C2\u00AF\u00C4\u00B1nr. sim. hapriya\u00C2\u00AF.215216 B RamakrishnanRaja, C. Kunhan. 1937. \u00E2\u0080\u009CNotes on Sanskrit-Malayalam Phonetics\u00E2\u0080\u009D. In: Jour-nal of Oriental Research of the University of Madras, Vol. I, Part 2, pp.1-4.Rastogi, Shrimati Indu. 1967. The S\u00C2\u00B4uklayajuh. -Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya of Ka\u00C2\u00AFtya\u00C2\u00AFyana,Critically edited from original manuscripts with English translation ofthe text. The Chowkamba Sanskrit Series Office, Varanasi-1.Sarma, Sridhara, Sundaresa Ghanapathi, Visvanatha Ghanapathi, and Ga-janana Sarma. 2004. Sri Krishna Yajurvedam: Samhitai and Shakhai.Vediclinks, Chennai.Sastri, Mangal Deva. 1931. The R.gveda with the commentary of Uva\u00E1\u00B9\u00ADa:Volume II Text in Su\u00C2\u00AFtra form and Commentary with Critical Apparatus.The Indian Press Limited, Allahabad.Scharf, Peter and Malcolm Hyman. 2009. Linguistic Issues in Encoding San-skrit. Motilal Banarsidass, Delhi.Scharfe, Harmut. 1973. Grammatical Literature in Sanskrit. OttoHarrasowitz-Wiesbaden.Sipser, Michael. 2006. Introduction to the Theory of Computation, 2nd Edi-tion. Cengage Learning, Boston, MA.Staal, Frits. 1961. Nambudiri Veda Recitation. \u00C5\u009B-Gravenhage : Mouton.Vaidyana\u00C2\u00AFthasa\u00C2\u00AFstri. 1905. Taittirriya A\u00C2\u00AFran. yakam, ka\u00C2\u00AFt.hakabha\u00C2\u00AFgasahitam.dra\u00C2\u00AFvid. apa\u00C2\u00AFt.hakramayutan\u00CB\u009Cca. S\u00C2\u00B4a\u00C2\u00AFradavila\u00C2\u00AFsamudra\u00C2\u00AFks.aras\u00C2\u00B4a\u00C2\u00AFla\u00C2\u00AF, Kumbha -ghon. am.Vasu, Srisa Candra. 1898. The As.t.a\u00C2\u00AFdhya\u00C2\u00AFy\u00C2\u00AF\u00C4\u00B1 of Pa\u00C2\u00AFn. ini interpreted according toThe Ka\u00C2\u00AFs\u00C2\u00B4ika\u00C2\u00AFvr. tti of Jaya\u00C2\u00AFditya and Va\u00C2\u00AFmana and translated into English,Book VIII. Sindhu Charan Bose.Whitney, William Dwight. 1863. The Atharva Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya or \u00C5\u009Aaunak\u00C4\u00AByaCatur\u00C4\u0081dhy\u00C4\u0081yik\u00C4\u0081. Journal of the American Oriental Society, Vol. 7, 1860-1863, pp. 333-615.\u00E2\u0080\u0094 1871. The Taittir\u00C2\u00AF\u00C4\u00B1ya Pra\u00C2\u00AFti\u00C2\u00B4sa\u00C2\u00AFkhya with its commentary, The Tribha\u00C2\u00AFs.yaRatna: Text, Translation, and Notes. American Oriental Society.Word complementation in Classical SanskritBrendan S. GillonAbstract: Classical Sanskrit has very flexible word order. This presentsa challenge to the application to Classical Sanskrit of categorial gram-mars and their type logical extensions, which generally assume a fixedtotal order on the words of the language. The paper outlines the factspertaining to complementation in Classical Sanskrit and proposes aform of the cancellation rule which accommodates Classical Sanskrit\u00E2\u0080\u0099sfree word order of words and their complements.1 IntroductionSyntacticians generally distinguish between complements and modifiers.Generative syntacticians, wittingly or unwittingly, use some form of a cat-egorial grammar to handle complementation. This approach works wellenough for those fragments of a language where complement order is rigid,but it does not handle in a satisfactory way derogations from rigid word or-der. Moreover, for languages such as Classical Sanskrit, where complementword order appears to be completely free, off the shelf categorial grammarsare utterly unsatisfactory. However, it is possible to alter the standardversion of a categorial grammar to accommodate in a deft fashion the freeordering of a word\u00E2\u0080\u0099s complements, as found, for example, in Classical San-skrit. The basic idea is to take advantage of the mathematically well-knownequivalence between sequences of length n on a set and functions from theset of positive integers up to and including n into the set.Like other Indo-European languages, Classical Sanskrit distinguishes be-tween nouns, verbs, adjectives and prepositions. And like other languages,its words from each of these categories have complements, some obligatory,some optional and some even excluded. In what follows, I shall assumethat the reader knows these distinctions and how they apply. I shall alsoassume that words are assigned lexical categories which are ordered pairs.217218 GillonThe first coordinate is its part of speech and its second is a complement list.For example, the English verb to greet is assigned the category \u00E3\u0080\u0088VP \u00E3\u0080\u0088NP\u00E3\u0080\u0089\u00E3\u0080\u0089,where V indicates that the word is a verb and where \u00E3\u0080\u0088NP\u00E3\u0080\u0089 indicates thatit requires a noun phrase complement. The English verb to introduce isassigned the category \u00E3\u0080\u0088VP \u00E3\u0080\u0088NPPPP\u00E3\u0080\u0089\u00E3\u0080\u0089, where the complement list shows thatthe verb takes two complements, a noun phrase complement, followed by aprepositional phrase complement. Intransitive verbs are assigned the cat-egory \u00E3\u0080\u0088VP \u00E3\u0080\u0088 \u00E3\u0080\u0089\u00E3\u0080\u0089. In other words, intransitive verbs have empty complementlists. To enhance readability, I shall write the labels for these categorieswith the part of speech label to the left of a colon and the complement listto the right. Here are the categories of the three verbs just mentioned inthis modified notation: V : \u00E3\u0080\u0088NP\u00E3\u0080\u0089, V : \u00E3\u0080\u0088NPPPP\u00E3\u0080\u0089 and V : \u00E3\u0080\u0088 \u00E3\u0080\u0089.The remainder of the paper proceeds in two steps. The first step is toset out the data pertaining to complementation in Classical Sanskrit. TheClassical Sanskrit data are drawn from Apte\u00E2\u0080\u0099s A Practical Sanskrit-EnglishDictionary (Apte 1957), Monier-Williams\u00E2\u0080\u0099s A Sanskrit English Dictionary(Monier-Williams 1899) and Apte\u00E2\u0080\u0099s Student guide to Sanskrit composition(Apte 1885). To help fix ideas, I shall provide examples from English as well.These data are taken from Gillon (2018), which in turn draws on Quirk et al.(1985) and Huddleston (2002). The second step is to set out the proposal.I shall conclude the paper with an overview both of what has been coveredand of what has not been.2 Survey of the dataWe begin with prepositions since their complements are the simplest to de-scribe. In English, in and into are both prepositions. They both admit asingle complement. The preposition into requires a complement, the prepo-sition in does not. The latter permits its complement to be omitted. Whenthe complement of a preposition is omitted, the argument correspondingto the omitted complement has its value determined contextually, eitherthrough its cotext or its setting, in ways familiar from the ways in whichthird-person pronouns have their values fixed contextually.(1) Dan stood in front of the house. When the phone rang,*he suddenly ran into.he suddenly ran in.Word complementation in Classical Sanskrit 219(where the asterisk is the usual sign meaning that the expression to whichit is prefixed is judged as odd.)English, as it happens, also has words, traditionally classified as adverbs,which are often, but not always, compounded from prepositions and whichexclude a complement. For example, the English adverb afterwards, thoughit expresses a binary relation, excludes any complement.1(2) Alice lived in Montreal until 2010.Afterwards, she moved to Vancouver.After that, she moved to Vancouver.Many Classical Sanskrit prepositions take a complement for which, asis well known, its case has to be specified. For example, the complementof the preposition adhas takes a noun phrase whose head is the sixth case,for example, tar\u00C5\u00AB\u00E1\u00B9\u0087\u00C4\u0081m adhas (beneath the trees) (Apte 1885, \u00C2\u00A7112). Aswith English, some Classical Sanskrit prepositions, for example upari, haveoptional complements:(3.1) muh\u00C5\u00ABrt\u00C4\u0081d upari (Monier-Williams (1899) sv upari)after a minute(3.2) upari paya\u00E1\u00B8\u00A5 pibet (Monier-Williams (1899) sv upari)afterwards he should drink milkNext come adjectives. Though not common, some English adjectivesrequire a complement.(4.1) *Max is averse.(4.2) Max is averse to games.(Quirk et al. 1985, ch. 16.69)Others admit optional complements. When omitted, their construal is eithercontextually fixed, as illustrated by the examples in (5), or is reciprocal, asillustrated by the examples in (6).(5.1) Bill lives faraway.Bill lives faraway from here.(5.2) Although Bill lives faraway, he visits his parents regularly.Although Bill lives faraway from them, he visits his parents regu-larly.1Similar examples are found in French and are detailed in Grevisse (1964, \u00C2\u00A7901).220 GillonThough cases where an adjective expresses a binary relation but excludesa complement are rare, they do exist. The English adjective alike, whichexcludes any complement, is particularly instructive in this regard, since ithas two synonyms, the adjective similar whose complement is optional, andthe preposition like, whose complement is obligatory.(6.1) Bill and Carol are alike.(6.2) Bill and Carol are similar (to each other).(6.3) Bill and Carol are like each other.Classical Sanskrit adjectives too take complements and the case of theircomplements has to be specified. For example, the adjective sukha, in thesense of good for, takes a fourth case complement. Other adjectives included\u00C5\u00ABra (distant), which takes a sixth case complement, kovida (proficient)which takes a seventh case complement, sama (same) which takes a thirdplace complement, prabhu (being a match for) which takes a fourth casecomplement and kalpa (fit), nika\u00E1\u00B9\u00ADa (near) and sam\u00C4\u00ABpa (near), which all takesixth case complements.English relational nouns are nouns with complements. A few requirea complement, such as the word lack (Herbst 1988, p. 5) or sake (ChrisBarker pc). A few exclude complements, such as stranger and foreigner.Most, however, admit optional complements. This is true, for example,of kinship nouns. When their complements are omitted, the construal ofthe missing complement is indefinite, or existential, as is illustrated by theequivalence of the sentences in (7).(7.1) Bill is a father.(7.2) Bill is a father of someone.Finally, many nouns resulting from the nominalization of a verb also admitoptional complements, even if the complements of the verbs from which theyderive are obligatory.While it is unclear whether or not Classical Sanskrit has relational nounswhose complements are required or excluded, it is obvious that it has rela-tional nouns with optional complements. This is the case for kinship nouns.It is also true for nouns such as p\u00C4\u0081r\u00C5\u009Bva (side), paryanta (edge), vi\u00E1\u00B9\u00A3aya (ob-ject), sth\u00C4\u0081na (object) (Apte 1885, \u00C2\u00A711b: U4). The default case assignmentis the sixth case, but other case assignments are also possible, for example,the seventh case for k\u00C4\u0081ra\u00E1\u00B9\u0087a (cause) or sp\u00E1\u00B9\u009Bha (desire).We now come to verbs and their complements. To fix ideas, let mesummarize the situation with verbs in English. Like other well-studied lan-Word complementation in Classical Sanskrit 221guages, English verbs may take none, one, two, or sometimes even threecomplements. An English verb which admits no complement (e.g., to bloom)is known as an intransitive verb. English linguistics has no term for verbswhich take exactly one complement. However, traditional English grammardoes have a term for verbs which takes an adjective phrase as its sole com-plement. They are called copular, or linking, verbs. They include verbssuch as to appear, to become and to sound. It also has a term for verbswhich take a noun phrase as its sole complement. As the reader knows,they are called transitive verbs. But English also has verbs whose sole com-plements are prepositional phrases such as to approve of, to rely on and towallow in, among many, many others. In addition, English has verbs whosesole complements are clauses, where clauses include finite interrogative anddeclarative clauses, interrogative and declarative infinitival phrases , andgerundial phrases. Here are examples of each of these kinds of verbs: tonote, to wonder, to decide, to recall and to enjoy.English verbs may also take two complements. The best known of suchverbs are verbs such as to give, one complement of which is known as thedirect object and the second as the indirect object. However, other pairsof complements are also common, for example, where one complement is anoun phrase and the other an adjective phrase (e.g., to consider) or whereone is a noun-phrase or prepositional phrase and the other is a clause (e.g.,to convince and to say respectively). Finally, there are a few verbs whichtake three complements such as to bet, to trade and to transfer.Let us now turn to Classical Sanskrit verbs. There has been no studyof word complementation in Classical Sanskrit, not even of just verb com-plementation. Still, enough for my purposes here can be gleaned from thesources mentioned earlier. Classical Sanskrit, like other Indo-European lan-guages has verbs with no complements, one complement, two complements,and three complements. Verbs with just one complement include verbswhose sole complement is a clause, an adjective phrase, and a noun phrase.I do not know whether or not Classical Sanskrit has any verbs whose solecomplement is either an adverbial phrase or a prepositional phrase. Verbswith two complements include verbs where one complement is a noun phraseand the other either another noun phrase, an adjective phrase, or a clause,where a clause may be an interrogative or declarative finite clause, a par-ticipial phrase or an infinitival phrase. There are also a few verbs with threecomplements.I shall illustrate the principal features of verb complements with verbs222 Gillontaking just one complement. The noun phrase complements of single com-plement Classical Sanskrit verbs, though typically taking the second case,may, depending on the verb, take any of the seven cases. The verb ghr\u00C4\u0081(to smell) takes a second case nominal complement, while tu\u00E1\u00B9\u00A3 (to be pleasedwith) takes a third case nominal complement, ruc (to please) takes a fourthcase nominal complement, viyuj (to separate from) a fifth case, samj\u00C3\u00B1\u00C4\u0081 (toremember with regret) a sixth case nominal complement and vyavah\u00E1\u00B9\u009B (todeal with) a seventh case complement. In addition, copular verbs such asv\u00E1\u00B9\u009Bt (to be) take a first case complement, which may be either a noun or anadjective.Copular verbs exhibit an important property of many words which takecomplements: the single complement they admit may be from any of severalcategories. I shall refer to words which admit alternate complements polyva-lent complements. The copular verb is the best-known example. In English,the copular verb admits one complement which may be either an adjectivephrase, a noun phrase, a prepositional phrase or an adverbial phrase; inClassical Sanskrit, the complement may be a noun phrase in any of theseven cases, an adjective phrase or a prepositional phrase.Another polyvalent Classical Sanskrit verb, well known to students ofClassical Sanskrit grammar, is the verb div (to play), whose complement isa noun phrase in either the third (instrumental) or second (accusative) case.(Apte 1885, \u00C2\u00A759 obs.)(8.1) ak\u00E1\u00B9\u00A3air d\u00C4\u00ABvyati.He plays dice.(8.2) ak\u00E1\u00B9\u00A3\u00C4\u0081n d\u00C4\u00ABvyati.He plays dice.English also has many, many such verbs. A particularly compelling exam-ple is the English verb to appoint, which admits alternately a prepositionalphrase complement and a noun phrase complement, for this verb is synony-mous with another English verb to choose, which admits only a prepositionalphrase complement.(9.1) Dan appointed Alice as chief minister.(9.2) Dan appointed Alice chief minister.(10.3) Dan chose Alice as chief minister.(10.4) *Dan chose Alice chief minister.Though polyvalency is common with verbs, it is not unique to verbs.Word complementation in Classical Sanskrit 223Another feature of verb complements is their optionality. I call wordswith optional complements polyadic. It is a feature of all word classes in En-glish and it seems to be common in other languages, including both Chineseand Classical Sanskrit. (Details pertaining to polyadic Classical Sanskritwords are given in Gillon (2015).)3 ProposalLet me now turn to the proposal. To understand the proposal for ClassicalSanskrit complementation, let me sketch out the proposal for complementa-tion in English. The reader familiar with categorial grammar will immedi-ately recognize the similarity between the phrase formation rule given belowfor English and the cancellation rule of categorial grammar.Careful study of English complementation (Quirk et al. (1985); Huddle-ston (2002)) shows that English verbs alone have nearly four dozen kinds ofcomplements. (The details are found in (Gillon 2018, ch. 10). The generalschema, to a first order approximation, is this:(11) english phrase formation rule schemaIf e|X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089 and fi|Ci, for each i \u00E2\u0088\u0088 Z+n , then ef) : : : fn|X:\u00E3\u0080\u0088 \u00E3\u0080\u0089(where 1 \u00E2\u0089\u00A4 i \u00E2\u0089\u00A4 n).In the schema, e, f)P : : : P fn are expressions. Each expression which isa simple word is assigned a category of the form X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089, wherem is either V (adjective), c (noun), e (preposition) or k (verb) and Ciis a complement, which itself may be a phrase, such as a noun phrase,an adjective phrase, a verb phrase, a prepositional hrase, or a clause ofsome kind. In effect, the rule says that, when an expression is assigned alexical category which admits n complements is followed by n complementsof corresponding categories, the resulting expression forms a phrase, thatis, it forms a constituent requiring no complements. Obviously, this is anenriched cancellation rule, as familiar from categorial grammar. (See Gillon(2012) for details, as well as Gillon (2018, ch. 10).)An application of an instance of the cancellation rule is given in thederivation below, where the expression greeting Bill is assigned the categoryVP, which is an abbreviation for V:\u00E3\u0080\u0088 \u00E3\u0080\u0089.224 GillonAliceNPgreetedV:\u00E3\u0080\u0088NP\u00E3\u0080\u0089BillNPVPSIn the derivation above, the total, or linear, order of the expressions matters,as one expects for English. Also, since we are talking about complementationhere and subject noun phrases are not complements, the last step resultsfrom an additional rule for clause formation.While every instance of phrase formation in English is an instance of thisschema, not all instances of this schema are instances of phrase formationin English. For example, it is thought that no English word has more thanthree complements. It, therefore, follows all instances of the schema wheren is greater than or equal to 4 do not form English phrases.Associated with the syntactic rule in (9) is the following semantic rule.(12) english phrase valuation rule schemaLet \u00E3\u0080\u0088jP i\u00E3\u0080\u0089 be a structure for an English lexicon.If e|X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089 and fj |Cj , for each j \u00E2\u0088\u0088 Z+n , thenvi(ef) : : : fn|X:\u00E3\u0080\u0088 \u00E3\u0080\u0089) = {x : \u00E3\u0080\u0088xP y)P : : : P yn\u00E3\u0080\u0089 \u00E2\u0088\u0088 vi(e|X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089) andyj \u00E2\u0088\u0088 vi(fj |Cj)P for each j \u00E2\u0088\u0088 Z+n }:Before seeing how these schemata can be generalized to handle wordswith alternate complements and optional complements, let us see how thesesimpler schemata can be applied to Classical Sanskrit. The basic idea is tochange the correspondence between a complement list and the complementsby having two n-tuples of the same length to the requirement that there bea bijective function from the complements onto the complement list whichpreserves various syntactic specifications of the complement expressions. Inparticular, the bijection must preserve phrasal and clausal categories and,in the case of phrases, preserve the case of a phrase\u00E2\u0080\u0099s head noun.(13) sanskrit phrase formation rule schemaLet e|X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089, let fj |Aj , for each j \u00E2\u0088\u0088 Z+n , and let g bea bijection from {Vi : i \u00E2\u0088\u0088 Z+n } into {Xj : j \u00E2\u0088\u0088 Z+n } such that thephrasal category and case, if the category has case, of A is identicalwith that of g(A). Then, ef) : : : fn|X:\u00E3\u0080\u0088 \u00E3\u0080\u0089.Consider the verb trai (to rescue), which takes two complements:Word complementation in Classical Sanskrit 225(14) bh\u00C4\u00ABm\u00C4\u0081d du\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana\u00E1\u00B9\u0083 tr\u00C4\u0081tum.to save Du\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana from Bh\u00C4\u00ABma.(Apte 1885, \u00C2\u00A778: Ve 3)Its lexical entry is trvi|\u00E3\u0080\u0088NP2PNP5\u00E3\u0080\u0089, where the subscripts indicate the caserequirement on the noun phrase complement. Now consider the expressionsbh\u00C4\u00ABm\u00C4\u0081d, du\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana\u00E1\u00B9\u0083 and tr\u00C4\u0081tum, under any ordering, form a constituentand there is a bijection from the expressions bh\u00C4\u00ABm\u00C4\u0081d and du\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana\u00E1\u00B9\u0083 intothe complement list \u00E3\u0080\u0088NP2PNP5\u00E3\u0080\u0089 which preserves the phrasal category andthe case associated with the phrasal category.tr\u00C4\u0081tumV:\u00E3\u0080\u0088NP2PNP5\u00E3\u0080\u0089du\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana\u00E1\u00B9\u0083NP2bh\u00C4\u00ABm\u00C4\u0081dNP5VPIn the derivation tree above, the expressions are not totally ordered withrespect to one another. In other words, any permutation of the lexical itemsresults in the same derivational step. This, of course, contrasts with English.Let us now consider a conservative extension of the categories and thecancellation rule which permits a simple treatment of word polyvalence. Thebasic idea is to replace each specification of a complement in the comple-ment list with a set of complement categories. In the case where a word hasjust one complement for a position in its complement list, then its positionis filled with a singleton set whose sole member is the relevant category; ifthe position has more than one complement which can be associated withthe position, then it is assigned the set of all the categories associated withit. Cancellation occurs when a complement to the word occurs in the cor-responding set in the complement list.(15) phrase formation rule schema (first extension)For each j \u00E2\u0088\u0088 Z+n , let Cj be a complement category, let Cj be anon-empty subset of the complement categories and let Cj be amember of Cj .If e|X:\u00E3\u0080\u0088C)P : : : PCn\u00E3\u0080\u0089 and fj |Cj , for each j \u00E2\u0088\u0088 Z+n , then ef) : : : fn|X:\u00E3\u0080\u0088 \u00E3\u0080\u0089.For example, the verb to choose takes one complement and its complementis a noun phrase, whereas the verb to appoint takes one complement but itscomplement may be either a noun phrase or a prepositional phrase (headedby the preposition as). The contrasting complement lists are illustratedbelow in the contrasting lexical entries for the two verbs.226 Gillon(16.1) vppoint|V:\u00E3\u0080\u0088{NP}P {NPPPP}\u00E3\u0080\u0089(16.2) xhoose|V:\u00E3\u0080\u0088{NP}P {PP}\u00E3\u0080\u0089Here are examples of Classical Sanskrit polyvalent verbs:(17.1) yiv|V:\u00E3\u0080\u0088{NP2PNP+}\u00E3\u0080\u0089(17.2) namask\u00E1\u00B9\u009B|V:\u00E3\u0080\u0088{NP2PNP4}\u00E3\u0080\u0089(17.3) pra\u00E1\u00B9\u0087am|V:\u00E3\u0080\u0088{NP2PNP4PNP6}\u00E3\u0080\u0089We must now adjust the semantic rule schema paired with the extended-phrase formation rule schema so that the semantic rule schema accommo-dates the revisions in the phrase formation rule schema.(16) phrase valuation rule schema (first extension)Let \u00E3\u0080\u0088jP i\u00E3\u0080\u0089 be a structure for an English lexicon.For each j \u00E2\u0088\u0088 Z+n , let Cj be a complement category, let Cj be anon-empty subset of the complement categories and let Cj be amember of Cj .If e|X:\u00E3\u0080\u0088C)P : : : P Cn\u00E3\u0080\u0089 and if fj |Cj (for each j \u00E2\u0088\u0088 Z+n ), thenvi(ef) : : : fn|X:\u00E3\u0080\u0088 \u00E3\u0080\u0089) = {x : \u00E3\u0080\u0088xP y)P : : : P yn\u00E3\u0080\u0089 \u00E2\u0088\u0088 vi(e|X:\u00E3\u0080\u0088C)P : : : P Cn\u00E3\u0080\u0089) andyj \u00E2\u0088\u0088 vi(fj |Cj)P for each j \u00E2\u0088\u0088 Z+n }:The final complication, occasioned by polyadic words, or words with op-tional complements, has been addressed for English and it can be addressedfor Classical Sanskrit, but it is not possible to set out the details in the brieftime allotted for the paper.4 ConclusionAs should be well known, the treatments in generative linguistics of the for-mation of a phrase from its head and the head\u00E2\u0080\u0099s complements amounts tosome form of the cancellation rule of Categorial Grammar. While this ruleworks for phrases where the word order of the head word and its comple-ments is total, or linear, it does not apply to languages where the order isnot. Classical Sanskrit is such a language. In this paper, I have shown howa simple enrichment of lexical categories and a modest change in the can-cellation rule permits languages where the head word and its complementsare freely ordered to be brought within the purview of a cancellation rule.In addition, I showed how the enrichment can be conservatively extended tohandle complement polyvalence, that is, where expressions of different syn-Word complementation in Classical Sanskrit 227tactic categories may appear in the same complement position. Finally, Istated, but did not show, that the conservative extension can be still furtherextended conservatively to apply to complement polyadicity, that is, wherecomplements to a word are optional.Three important points were not addressed: how is a clause formed froma subject noun phrase and a verb phrase; how does one distinguish in Clas-sical Sanskrit complements from modifiers; and how does the cancellationrule introduced here affect computational complexity. Let me bring thisbrief article to a close by saying a few words about each point.Since the focus on this paper is complementation, the question of thecorrect rule for clause formation does not arise, aside from providing an ex-ample derivation of an entire clause. Of course, in Categorial Grammar andits various type logical enrichments, the usual way to handle the formationof a clause from a subject noun phrase and a verb-phrase is through thecancellation rule, in effect, treating the subject noun phrase as a kind ofcomplement to the verb. But this is not empirically satisfactory, as it col-lapses the distinction between complements and subjects, something whichmay in fact be warranted in the case of Classical Sanskrit.The second point pertains to how to distinguish between modifiers andcomplements in Classical Sanskrit. There is some debate among linguistsas to how to make this distinction. Distinguishing between modifiers andcomplements is an empirical question and one whose answer will vary fromlanguage to language. It is not a question which has been addressed eitherby scholars using contemporary linguistic ideas or by scholars of the Indiangrammatical tradition. The latter fact is not surprising, since complementa-tion is not a notion used in the Indian grammatical tradition. Nonetheless,languages do share properties, and in the absence of empirical work on thesubject, I have followed the lead of what are generally regarded as comple-ments in the study of various Indo-European languages, of which ClassicalSanskrit is an example.Finally, I have not addressed the question of the computational tractabil-ity of the enriched cancellation rules proposed here. This important ques-tion, raised by a reviewer, is not one I am currently in a position to address.22In this regard, let me bring to the reader\u00E2\u0080\u0099s attention work by Alexander Dikovskyand collaborators seeks to address the problem which relatively free word order posesfor a cancellation rule of the kind found in categorial grammar: Dekhtyar and Dikovsky(2008) and Dekhtyar, Dikovsky, and Karlov (2015). This work was kindly brought to myattention by a reviewer of this paper.ReferencesApte, V\u00C4\u0081man Shivar\u00C4\u0081m. 1885. The Student\u00E2\u0080\u0099s Guide to Sanskrit Composition.A Treatise on Sanskrit Syntax for Use of Schools and Colleges. Poona,India: Lokasamgraha Press.\u00E2\u0080\u0094 1957. The practical Sanskrit-English dictionary. Poona, India: PrasadPrakashan.Dekhtyar, Michael I. and Alexander Ja. Dikovsky. 2008. \u00E2\u0080\u009CGeneralizedcategorial dependency grammars\u00E2\u0080\u009D. In: Pillars of Computer Science(Trakhtenbrot Festschrift). Vol. 4800. LNCS. Berlin, Germany: Springer,pp. 230\u00E2\u0080\u0093255.Dekhtyar, Michael I., Alexander Ja. Dikovsky, and Boris Karlov. 2015.\u00E2\u0080\u009CCategorial dependency grammars\u00E2\u0080\u009D. Theoretical Computer Science579pp. 33\u00E2\u0080\u009363.Gillon, Brendan S. 2012. \u00E2\u0080\u009CImplicit complements: a dilemma for model the-oretic semantics\u00E2\u0080\u009D. Linguistics and Philosophy 35.4pp. 313\u00E2\u0080\u0093359.\u00E2\u0080\u0094 2015. \u00E2\u0080\u009CConstituency and cotextual dependence in Classical Sanskrit\u00E2\u0080\u009D.In: Sanskrit syntax: selected papers presented at the seminar on Sanskritsyntax and discourse structures. Ed. by Peter M. Scharf. The SanskritLibrary, pp. 237\u00E2\u0080\u0093267.\u00E2\u0080\u0094 2018. Grammatical structure and its interpretation: an introduction tonatural language semantics. Cambridge, Massachusetts: MIT Press.Grevisse, Maurice. 1964. Le Bon Usage. Gembloux, Belgium: Duculot.Herbst, Thomas. 1988. \u00E2\u0080\u009CA valency model for nouns in English\u00E2\u0080\u009D. Journal ofLinguistics 24pp. 265\u00E2\u0080\u0093301.Huddleston, Rodney. 2002. \u00E2\u0080\u009CThe clause: complements\u00E2\u0080\u009D. In: The Cambridgegrammar of the English language. Ed. by Rodney Huddleston and Geof-frey K. Pullum. Cambridge University Press, pp. 213\u00E2\u0080\u0093322.Monier-Williams, Monier. 1899. A Sanskrit English dictionary. Oxford, Eng-land: Oxford University Press.Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartik.1985. A comprehensive grammar of the English language. London, Eng-land: Longman.228TEITagger: Raising the standard for digital textsto facilitate interchange with linguistic softwarePeter M. ScharfAbstract: For several years, members of the International Sanskrit Com-putational Linguistics Consortium working to facilitate interchangebetween digital repositories of Sanskrit texts, and digital parsers andsyntactic analyzers have recognized the need to standardize refer-ence to particular passages in digital texts. XML has emerged asthe most important standard format for document structure and datainterchange, and TEI as the most important standard for the XMLmarkup of textual documents. TEI provides methods to precisely de-scribe divisions in texts from major sections to individual morphemes,and to associate various versions with each other. Responsible textarchives, such as TITUS and SARIT, have adopted the TEI stan-dard for their texts. After a workshop to train doctoral candidatesat the Rashtriya Sanskrit Sansthan to mark-up texts in accordancewith TEI in May 2017, the Sanskrit Library developed software tosemi-automate the process with extensive use of regular expressionsand meter-identification software, and is currently marking-up all ofits texts using the TEITagger. The result will be a large repositoryof digital Sanskrit texts that can furnish text to the Sanskrit Heritageparser and the University of Hyderabad\u00E2\u0080\u0099s parser and syntax analyzerto allow passages parsed and analyzed for dependency structure to beinterlinked with their originals.1 XML and TEIIn the age in which oral productions and hand-written documents were thepredominant mode of expressing knowledge and exchanging information,each individual articulation or manuscript had its own format determinedby the author and heard or read by other individuals. In the age of the printmedium, presses produced multiple copies of individual productions which229230 Scharfcould be widely distributed to numerous other individuals. At the outset ofthe digital age, as Scharf and Hyman (2011, p. 2) and Scharf (2014, p. 16)noted, presentation of individual productions imitated the print medium.Document creators and software engineers created works to present knowl-edge to human readers. As Goldfarb (1990) noted, unfortunately the ten-dency persists as \u00E2\u0080\u009Ctheir worst habits\u00E2\u0080\u009D as if their production were meant onlyfor human eyes, and had no need to coordinate with software developed byothers. In 1969, however, Goldfarb, Mosher, and Lorie at InternationalBusiness Machines Corporation (IBM) developed the Generalized MarkupLanguage (GML), so called based on their initials (Goldfarb 1990, p. xiv), tomark up documents in terms of the inherent character of their constituents,such as prose, header, list, table, etc., to enable software to format the docu-ments variously for various devices, such as printers and display screens, byspecifying a display profile without changing the document itself (Wikipediacontributors 2017). Over the next decade, Goldfarb and others developedthe international Standard Generalized Markup Language (SGML), Interna-tional Standards Organization (ISO) document 8897, to describe documentsaccording to their structural and other semantic elements without referenceto how such elements should be displayed. Thus in contrast to the Hyper-Text Markup Language (HTML) which was designed to specify the displayformat of a text, SGML separates the inherent structure of a document fromhow it is presented to human readers and \u00E2\u0080\u009Callows coded text to be reusedin ways not anticipated by the coder\u00E2\u0080\u009D (Goldfarb 1990, p. xiii).The eXtensible Markup Language (XML) is an open-source meta-language consisting of a stripped-down version of SGML formally adoptedas a standard by the World Wide Web Consortium (W3C) in February 1998.In the couple of decades since, XML has become the single most importantstandard format for document structure and data interchange. W\u00C3\u00BCstner,Buxmann, and Braun (1998) noted, \u00E2\u0080\u009CXML has quickly emerged as an essen-tial building block for new technologies, offering a flexible way to create andshare information formats and content across the Internet, the World WideWeb, and other networks.\u00E2\u0080\u009D Benko (2000, p. 5) noted, \u00E2\u0080\u009CXML is expectedto become the dominant format for electronic data interchange (EDI).\u00E2\u0080\u009D Afew years ago, Zazueta (2014) noted, \u00E2\u0080\u009CXML emerged as a front runner torepresent data exchanged via APIs early on;\u00E2\u0080\u009D whereas \u00E2\u0080\u009CJavascript ObjectNotation (JSON), emerged as a standard for easily exchanging Javascriptobject data between systems.\u00E2\u0080\u009D He continues,TEITagger 231API designers these days tend to land on one of two formatsfor exchanging data between their servers and client developers- XML or JSON. Though a number of different formats for datahave been designed and promoted over the years, XML\u00E2\u0080\u0099s builtin validation properties and JSON\u00E2\u0080\u0099s agility have helped bothformats emerge as leaders in the API space.\u00E2\u0080\u009DBenko (2000, p. 2) also noted that two of the seven benefits the W3C definesfor establishing XML include the following:\u00E2\u0080\u00A2 Allow industries to define platform-independent protocols for the ex-change of data.\u00E2\u0080\u00A2 Deliver information to user agents in a form that allows automaticprocessing after receipt.As a simple metalanguage consisting of just seven characters (<, >, /,=, \", ', \u00E2\u0090\u00A3), XML allows users to develop markup languages of an unlim-ited variety. In order to facilitate interchange of textual documents, theText Encoding Initiative (TEI) developed a community-based standard forthe representation and encoding of texts in digital form. The TEI Guide-lines for Electronic Text Encoding and Interchange define and document amarkup language for representing the structural, renditional, and concep-tual features of texts. They focus (though not exclusively) on the encodingof documents in the humanities and social sciences, and in particular on therepresentation of primary source materials for research and analysis. TheText Encoding Initiative also makes the Guidelines and XML schema thatvalidate them available under an open-source license. TEI has become themost important standard for the XML markup of textual documents. Henceto facilitate the interchange, cross-reference, and unanticipated use of digitalSanskrit text, it is imperative that digital archives of Sanskrit texts maketheir texts available encoded in XML in accordance with the TEI Guidelines.2 Sanskrit digital archives and the use of TEIA number of organizations and individuals, such as GoogleBooks, The Mil-lion Books Project, Archive.org, the Digital Library of India, and the VedicReserve at Maharishi International University, have made images and PDFdocuments of Sanskrit printed texts available, and a number of libraries,232 Scharfsuch as the University of Pennsylvania in Philadelphia and the RaghunathTemple Sanskrit Manuscript Library in Jammu, have made images of theirSanskrit manuscripts available. Such productions have greatly facilitatedaccess to primary source materials; yet that access is limited exclusively tobeing read by a human being. Although Jim Funderburk developed softwareto search headwords in a list and highlight that headword in digital imagesof dictionary pages, and Scharf and Bunker developed software to approxi-mate the location of passages in digital images of Sanskrit manuscripts, theresults of such software are also merely displays for a human reader. PDFsdo not facilitate automatic processing after receipt.Numerous groups and individuals of various backgrounds have createddigital editions of Sanskrit texts and made them available on portable digitalstorage media and the Web. As opposed to image data, these documentsconsist of machine-readable character data. Most of these are structuredin simple data structures, such as lines of text numbered with a compositechapter-section-line number, in text files or directly in HTML files. Thesedocuments are intended to permit access by a human to passages by search-ing as well as for sequential reading. While the various providers of dig-ital text are too numerous to mention, one site has emerged as a centralregistry. The G\u00C3\u00B6ttingen Register of Electronic Texts in Indian Languages(GRETIL) lists about eight hundred such Sanskrit texts. These texts areopenly available for download so that others may subject them to varioussorts of linguistic processing such as metrical, morphological, and syntacticanalysis. As great a service as making these texts available in digital form is,GRETIL exerted minimal discipline on its early contributors so that there isgreat variability in the specification of metadata. In many cases, the sourceedition of the text is unknown. In addition, each contributor was free tostructure the document as he wished, so there is great variability in themanner of formatting verse and enumerating lines.Although GRETIL offers the texts in a few common standard encod-ings including UTF8 Unicode Romanization, there is variability in how thecontributors employed capitalization, encoded diphthongs versus contiguousvowel sequences, punctuation, etc. Texts available from other sources useDevan\u00C4\u0081gar\u00C4\u00AB Unicode, different ASCII meta-encodings, or legacy pre-Unicodefonts. Scharf and Hyman (2011) and Scharf (2014) have already dealt withthe issues regarding character encoding. Here I address higher-lever textand document structure encoding.Even by 2006, at the start of the International digital Sanskrit libraryTEITagger 233integration project, the Thesaurus Indogermanischer Text- und Sprachma-terialien (TITUS), which contributed its texts for integration with dictio-naries produced by the Cologne Digital Sanskrit Dictionaries project viamorphological analysis software produced by Scharf and Hyman at Brown,had begun partially using TEI tags to mark up the structure of its textsand metadata. Over the past four years, the Search and Retrieval of IndicTexts project (SARIT) marked up all of the texts which had previously beenmade available in various ad hoc formats at the Indology website, and sometwenty additional texts, in a consistent encoding in accordance with the TEIstandard. The site (http://sarit.indology.info) currently houses fifty-nine Sanskrit TEI documents made available under a Creative Commonslicense and provides clear instructions for how to mark up Sanskrit texts inaccordance with TEI.3 TEI trainingAt the bequest of the SARIT project, in an initial attempt to spur large-scale encoding of Sanskrit texts in accordance with the TEI standard, Iconducted a one-week e-text tutorial at the Rashtriya Sanskrit Sansthan\u00E2\u0080\u0099sJaipur campus in February 2010. While several participants produced TEIversions of small portions of texts, the workshop failed to instigate the col-laboration of technical expertise and abundant Sanskrit-knowing labor thatSARIT had hoped. In May 2017, however, I was invited by the RashtriyaSanskrit Samsthan to conduct a two-week TEI workshop at its Ganga NathJha campus in Allahabad. There I trained twenty Sanskrit doctoral can-didates in how to encode texts and catalogue manuscripts in accordancewith TEI Guidelines. In an additional week I worked with these studentsto encode twenty Sanskrit works in accordance with TEI, ten of which weredelivered complete in the next month.During the workshop, I trained students to analyze the structure of aplain text data-file with Sanskrit text in numbered lines or verses and toconstruct regular expressions to recognize strings of text with fixed num-bers of syllables. We constructed regular expressions to recognize a fewcommon verse patterns and had the students submit the verses found tothe Sanskrit Library\u00E2\u0080\u0099s meter analyzer produced and described by Melnad,Goyal, and Scharf (2015a,b). Once we knew that verses with a certain num-ber of syllables were typically in a certain metrical pattern, we constructed234 Scharfreplacement expressions to transform the recognized pattern to well-formedTEI line group elements (lg) with subordinate line (l) and segment ele-ments (seg) for each verse quarter (p\u00C4\u0081da) and to insert type, analysis, andmetrical pattern attributes (type, ana, met) in the (lg) tag. The replace-ment expressions inserted the enumeration provided by the source documentin (n) and (xml:id) attributes in the (lg) tag, and typed and lettered theverse quarters as well. Where complex numbers compiled the numbers oftext divisions, subdivisions, and passages within subdivisions, the regularexpression placed just the last in a separate group, and the replacement ex-pression inserted that number in the value of the n attribute while puttingthe whole number in the value of the xml:id attribute. For example, theregular expression and replacement expression shown in Figure 1 was pri-marily responsible for transforming the following verse of the Bhagavadg\u00C4\u00ABt\u00C4\u0081(in Sanskrit Library ASCII encoding) to the well-structured TEI (lg) ele-ment with its subsidiaries shown in Figure 2:06024070a ApUryamARam acalapratizWaM; samudram ApaHpraviSanti yadvat06024070c tadvat kAmA yaM praviSanti sarve; sa SAntim Ap-noti na kAmakAmII say, \u00E2\u0080\u009Cprimarily responsible,\u00E2\u0080\u009D because in fact the leading zeroes on thenumber of the verse were captured by this regular expression so that \u00E2\u0080\u0098070\u00E2\u0080\u0099was inserted in the value of the n attribute; an additional regular expressionremoved them.Now one will notice that the original text document conveniently indi-cated the break between the two verse quarters in each line of a Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubhverse by a semicolon and space. This indication allowed the regular expres-sion to group just the text of each verse quarter without leading or trailingspaces. However, no such indication was given for the break between versequarters in an Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh verse because there is frequently no word-break atthe p\u00C4\u0081da boundary of the ubiquitous \u00C5\u009Bloka. One would want to preservethe information whether or not there is a word break there, yet would notwant a p\u00C4\u0081da to begin with a space. Hence after a regular expression insertedeach verse quarter in a seg element, subsequent regular expressions movedleading spaces, where found, from the beginning of the second seg to theend of the first and set the second verse quarter on a separate line. Thusthe first verse of the Bhagavadg\u00C4\u00ABt\u00C4\u0081,TEITagger 235Figure 1Regular expression and replacement expression to transform a plain textverse in Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh meter to TEI236 ScharfFigure 2Bhagavadg\u00C4\u00ABt\u00C4\u0081 2.70 in Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh meter06023001a Darmakzetre kurukzetre samavetA yuyutsavaH06023001c mAmakAH pARqavAS cEva kim akurvata saMjayawas marked up in TEI and reformatted as shown in Figure 3 with each versequarter in a separate seg element.I also trained students in the workshop to compose regular expressions tocapture the speaker lines such as Dhr\u00EF\u00BF\u00BFtar\u00C4\u0081\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADra uv\u00C4\u0081ca that introduce speechesand to compose replacement expressions to put these in speaker elements.Similarly, I taught them to mark up prose sentences and paragraphs in sand p elements, to put speeches in sp elements, to insert head and trailerelements, to locate and capture enumeration of divisions, to insert div el-ements, to insert the whole in body and text elements, to insert page andline break elements, and to mark up bibliography. I then had them insertthese elements in a teiHeader template in the TEI element, and to validatethe complete TEI document. Figure 4 shows the first short speech of theBhag\u00C4\u0081vadg\u00C4\u00ABt\u00C4\u0081 with the speaker element in the context of parent sp, div,body, and text opening tags. Let me remark that guidelines for how tomark up Sanskrit text in accordance with TEI are conveniently available onthe SARIT website.11http://sarit.indology.info/exist/apps/sarit-pm/docs/encoding-guidelines-simple.htmlTEITagger 237Figure 3Bhagavadg\u00C4\u00ABt\u00C4\u0081 1.1 in Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh meterFigure 4TEI markup of a speech in the context of division, body, and text elements238 Scharf4 TEITagger softwareAfter the experience of teaching Sanskrit students with minimal technicalliteracy to transform a plain text document to well-structured XML in ac-cordance with TEI in a series of well-ordered steps, it occurred to me thatI could also teach a machine to do the same. Ralph Bunker, the technicaldirector of the Sanskrit Library, had previously developed software calledLinguistic Mapper at my request so that I could compile a driver file thatcontained a sequence of regular and replacement expressions that imple-mented historical sound change rules between a proto-language and a de-scendant language. We created TEITagger by modifying Linguistic Mapperto process a series of such sets of regular and replacement expressions thatmatched specified numbers of syllables in certain arrangements that approx-imated metrical patterns. By creating a regular expression that counted thecorrect number of syllables per p\u00C4\u0081da we could convert every such verse toproper TEI markup in lg elements, with each line in an l element, and eachp\u00C4\u0081da in a seg element. At the same time we could number the verse inan n attribute, insert an xml:id, and insert the presumed meter name andmetrical pattern in a type attribute. The meter name and metrical patternin the first version of TEITagger was presumed on the basis of the sylla-ble count, not automatically checked against a pattern of light and heavysyllables.We then revised TEITagger to include the feature of submitting a seg-ment of text that matched a certain regular expression to our meter identi-fication software that would identify the meter of a whole verse by checkingthe passage against specified patterns of light and heavy syllables as definedby classical metrical texts. If a match is found TEITagger version 2 au-tomatically inserts the meter name, general type, and metrical pattern intype, ana, and met attributes of the lg element. To simplify the regularexpression formulation in the command driver file for this program, we com-posed macros to represent vowels, consonants, syllables, syllable codas, andthe typical terms used in the lines that introduce speeches. These macrosare shown in Figure 5.To further simplify testing segments of text for any meter type with anynumber of syllables, we introduced an iterative loop command and iterationvariable in version 3. Thus, for example, with a command that consists ofthe single regular expression and replacement expression shown in Figure6, TEITagger can evaluate every segment of text in a file with four verseTEITagger 239quarters each consisting of n syllables per verse quarter, where the variablen is tested in order from 28\u00E2\u0080\u00931 thereby testing for all of the verses with thesame number of syllables per verse quarter. Metrical patterns with the samenumber of syllables per verse quarter include all 468 of the samavr\u00EF\u00BF\u00BFtta andupaj\u00C4\u0081ti types as well as some of the ardhasamavr\u00EF\u00BF\u00BFtta and vi\u00E1\u00B9\u00A3amavr\u00EF\u00BF\u00BFtta type.Similar expressions can be composed to match verses with unequal numbersof syllables per verse quarter. Such metrical patterns include those of theardhasamavr\u00EF\u00BF\u00BFtta type and m\u00C4\u0081tr\u00C4\u0081vr\u00EF\u00BF\u00BFtta type as well as irregular variations ofmore regular patterns. The current version (17) also passes verse lines andindividual p\u00C4\u0081das to the meter analyzer to detect their patterns in irregularverses.Figure 5TEITagger macros240 ScharfFigure 6TEITagger iterative command to match verses with four p\u00C4\u0081das with nsyllables per p\u00C4\u0081da, where an arbitrary range can be specified for n.TEITagger 241The TEITagger driver file also accepts commands to insert header andfooter files so that one can add the opening XML file tags, open and closebody and text tags, open and close TEI tags, and a teiHeader. Finally,TEITagger will pretty print the file if it is a valid XML file.5 Philological use of the TEITagger softwareMetrical analysis of Vedic, epic, and classical Sanskrit texts is not new.For instance, metrical analysis of the Mah\u00C4\u0081bh\u00C4\u0081rata has produced interestingresults that bear on the critical composition of the text and its history.Edgerton (1939) distinguished regular versus irregular varieties of Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADu-bh and Jagat\u00C4\u00AB meters that were significantly divided between the Vir\u00C4\u0081\u00E1\u00B9\u00ADa-parvan and Sabh\u00C4\u0081parvan respectively and thereby demonstrated separatecomposition and probably subsequent insertion of the Vir\u00C4\u0081\u00E1\u00B9\u00ADaparvan in thetext of the Mah\u00C4\u0081bh\u00C4\u0081rata. He also described several regular patterns in thehypermetric and hypometric irregular varieties based upon the location ofthe caesura.Fitzgerald (2006) reported the results of analyzing a database of the Tri-\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh and Jagat\u00C4\u00AB verses he assembled over the past couple of decades. Heanalyzed these metrical patterns into five segments: initial and final sylla-bles, and three sets of three syllables each: the opening, break, and cadence.He identified three standard varieties of Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh: (1) a regular Upaj\u00C4\u0081ti con-sisting of the alternating p\u00C4\u0081das of Indravajr\u00C4\u0081 and Upendravajr\u00C4\u0081, (2) \u00C5\u009A\u00C4\u0081lin\u00C4\u00AB,and (3) V\u00C4\u0081torm\u00C4\u00AB; and a standard variety of Jagat\u00C4\u00AB: an Upaj\u00C4\u0081ti consisting ofalternating p\u00C4\u0081das of Vam\u00CC\u0087\u00C5\u009Basth\u00C4\u0081 and Indravam\u00CC\u0087\u00C5\u009B\u00C4\u0081. Fitzgerald (2009) isolatedtwo measurable variables: (1) the degree of uniformity among the p\u00C4\u0081das ofthe Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh stanzas, and (2) the set of major Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh features that wereeliminated in the creation of the classical standard tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh. He isolatedpassages on the basis of runs of Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh and Jagat\u00C4\u00AB verses and measuredthe uniformity within verses in these passages to attempt to locate discon-tinuities that might signal different periods of composition of the passages.Fitzgerald (2004) argued, \u00E2\u0080\u009Cif we are able to make reasonable argumentsabout historical fissures in the text, we thereby enrich our understanding ofthe text\u00E2\u0080\u0099s possible meanings \u00E2\u0080\u00A6by distinguishing multiple voices, dialogicaltension, and innovation within the otherwise synchronic, unitary, receivedtext.\u00E2\u0080\u009D In his careful unpublished study of the episode of the dice match, hewas able to counter the conclusions of S\u00C3\u00B6hnen-Thieme (1999), and to con-242 Scharfclude that \u00E2\u0080\u009Cthis whole episode, the Upaj\u00C4\u0081ti passage of chapter 60 in whichDu\u00E1\u00B8\u00A5\u00C5\u009B\u00C4\u0081sana drags Draupad\u00C4\u00AB into the sabh\u00C4\u0081 by the hair, is likely later thanmost or all of the rest of this episode.\u00E2\u0080\u009DWork of the sort that Edgerton and Fitzgerald have done with carefulevaluation of statistics gathered with great effort over a long time could bevastly simplified and assisted by the automation provided by TEITagger.After testing TEITagger version 2 on the Bhagavadg\u00C4\u00ABt\u00C4\u0081, within a week, Itagged the entire critical edition of the Mah\u00C4\u0081bh\u00C4\u0081rata, including those withirregular patterns such as those with hypermetric or hypometric p\u00C4\u0081das. Adriver file of nearly a thousand lines individually matched every possiblecombination of the syllable counts per p\u00C4\u0081da, triple-line and single line versesas well as the normal double-line verses. For example, a separate set of aregular expression and its replacement expression targets triple-line Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADu-bh verses with a hypermetric first p\u00C4\u0081da, another targets such verses witha hypermetric second p\u00C4\u0081da, etc. The driver file assumed that such deviantmetrical patterns ought to be classified under a certain type despite thefailure of the meter analyzer to find a regular type. The task preceded andinspired the development of our iteration command and commands to sendverse lines and p\u00C4\u0081das to the meter analyzer described in the previous section.The driver file I developed to tag the Bh\u00C4\u0081gavatapur\u00C4\u0081\u00E1\u00B9\u0087a with these featuresadded consists of only 318 lines.TEITagger version 2 tagged 73,436 verses and 1,057 prose sentences in386 paragraphs. The verses include 68,860 Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubhs, 2,970 Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubhs,431 Jagat\u00C4\u00AB, 322 Indravajr\u00C4\u0081, 0 Upendravajr\u00C4\u0081, 496 of the standard Upaj\u00C4\u0081tivariety alternating the two preceding, 88 \u00C5\u009A\u00C4\u0081l\u00C4\u0081, 78 V\u00C4\u0081\u00E1\u00B9\u0087\u00C4\u00AB (other Upaj\u00C4\u0081tis),31 Aparavaktra (an ardhasamavr\u00EF\u00BF\u00BFtta meter), 22 Prahar\u00E1\u00B9\u00A3i\u00E1\u00B9\u0087\u00C4\u00AB, 16 Rucir\u00C4\u0081, 9M\u00C4\u0081lin\u00C4\u00AB, 4 Vasantatilak\u00C4\u0081, 4 Pu\u00E1\u00B9\u00A3pit\u00C4\u0081gr\u00C4\u0081, 1 \u00C5\u009A\u00C4\u0081rd\u00C5\u00ABlavikr\u00C4\u00AB\u00E1\u00B8\u008Dita, 1 Halamukh\u00C4\u00AB, 1\u00C4\u0080ry\u00C4\u0081g\u00C4\u00ABti (a type of \u00C4\u0080ry\u00C4\u0081), 1 mixture of half K\u00C4\u0081makr\u00C4\u00AB\u00E1\u00B8\u008D\u00C4\u0081 and half K\u00C4\u0081mu-k\u00C4\u00AB, and a hundred unidentified. The unidentified metrical patterns includefor instance, 1 mixture of half K\u00C4\u0081muk\u00C4\u00AB and half unidentified, 1 mixture ofa deviant p\u00C4\u0081da with subsequent Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh, jagat\u00C4\u00AB, and Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh p\u00C4\u0081das, aswell as 98 other uninvestigated unidentified patterns.The results of TEITagger version 2 are presented in Table 1 in compar-ison with some of the results Fitzgerald (2009) reported. One can see thatthere is a minor discrepancy of one passage in the enumeration of the prosepassages. The cause of this discrepancy needs to be investigated. Yet oth-erwise there is astonishing consistency in the enumeration of the prose andverse passages. There is a discrepancy of just two verses of the Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubhTEITagger 243meter. The discrepancy of 41 Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh/Jagat\u00C4\u00AB verses and 52 fancy metersis probably largely due to TEITagger\u00E2\u0080\u0099s incorrect assumption that a numberof irregular meters with 11\u00E2\u0080\u009312 syllables per p\u00C4\u0081da were of this type ratherthan fancy metrical patterns. For if the meter analyzer failed to identify averse, TEITagger relied on syllable count alone to classify it.Using TEITagger version 17 with the more refined feature of sendingverse lines and quarters to the meter analyzer, and with some revision ofthe meter analyzer itself, I reevaluated the metrical patterns of the Mah\u00C4\u0081-bh\u00C4\u0081rata. In this version, I made no assumptions about the conformity ofdeviant patterns to regular types; instead, where the meter analyzer failedto find a match for a verse, I permitted it to seek a match of each line ofthe meter, and failing to find a match for a line, to seek a match for eachp\u00C4\u0081da in the line. Where lines or p\u00C4\u0081das within a verse were identified as thesame, the metrical information was combined so that along with a singletype classification for the verse only the deviant lines or p\u00C4\u0081das are classifiedseparately. Labels consisting of the meter names in SLP1 for each differentmeter found within a verse are separated by a forward slash in the value ofthe type-attribute of the lg-element that contains the verse in the TEI file.These labels are preceded by letters indicating the p\u00C4\u0081das so labeled.Table 2 shows the numbers of verses with one to six metrical identifi-cations for the verse as a whole or parts of the verse individually. Table3 shows the meters recognized. Column three of Table 3 shows the num-ber of the meter indicated in column one that was recognized as a verse.Column four shows the number of additional sets of double lines recognizedwithin triple-line meters. Column five shows the number of lines recognizedin verses not recognized as verses or sets of double lines. Column six showsthe number of p\u00C4\u0081das recognized in lines not recognized as lines. The firstline of each section divided by double horizontal lines tallies the numbersof that general metrical type. Rows beginning with Upaj\u00C4\u0081ti in bold in theTri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh and Jagat\u00C4\u00AB sections tally the numbers for the Upaj\u00C4\u0081ti type patternslisted in subsequent rows within the same section. The Upaj\u00C4\u0081ti numbers areincluded in the tally for the section as a whole as well. At the bottom ofthe table, the row labeled Identified in bold summarizes the total number ofverses, additional pairs of lines, additional lines, and additional verse quar-ters recognized. The row labeled No type shows the number of verses notrecognized before querying the meter analyzer regarding lines and p\u00C4\u0081das,and the total number of p\u00C4\u0081das that remain unidentified. The p\u00C4\u0081das thatremain unidentified are provided with the label no_type within the value244 ScharfTable 1Metrical and non-metrical passages in the Mah\u00C4\u0081bh\u00C4\u0081rata identified byTEITagger v. 2compared with those identified by Fitzgeraldpassage type syllables/p\u00C4\u0081da TEITagger Fitzgerald2009passages 73,822 73,821proseparagraphs 386 385sentences 1,057verse 73,436 73,436Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh 8 68,860 68,858Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh/Jagat\u00C4\u00AB 11\u00E2\u0080\u009312 4,385 4,426Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh 11 2,970Indravajr\u00C4\u0081 11 322Upendravajr\u00C4\u0081 11 0Upaj\u00C4\u0081ti 11 662Indravajr\u00C4\u0081/Upendravajr\u00C4\u0081 11 496\u00C5\u009A\u00C4\u0081l\u00C4\u0081 11 88V\u00C4\u0081\u00E1\u00B9\u0087\u00C4\u00AB 11 78Jagat\u00C4\u00AB 12 431Fancy meters 100 152Halamukh\u00C4\u00AB 9 1Aparavaktra 13/12 31Pu\u00E1\u00B9\u00A3pit\u00C4\u0081gr\u00C4\u0081 12/13 4Prahar\u00E1\u00B9\u00A3i\u00E1\u00B9\u0087\u00C4\u00AB 13 22Rucir\u00C4\u0081 13 16Vasantatilak\u00C4\u0081 14 4M\u00C4\u0081lin\u00C4\u00AB 15 9K\u00C4\u0081makr\u00C4\u00AB\u00E1\u00B8\u008D\u00C4\u0081/K\u00C4\u0081muk\u00C4\u00AB 15/16 1\u00C5\u009A\u00C4\u0081rd\u00C5\u00ABlavikr\u00C4\u00AB\u00E1\u00B8\u008Dita 19 1\u00C4\u0080ry\u00C4\u0081g\u00C4\u00ABti 7 caturm\u00C4\u0081tr\u00C4\u0081s + 2 1unidentified 100TEITagger 245of the type-attribute in the TEI file. No lines or line pairs are so labeledbecause if they are unidentified their p\u00C4\u0081das are sent to the meter analyzerindividually for analysis. The row labeled Total in bold shows the totalnumber of verses in the Mah\u00C4\u0081bh\u00C4\u0081rata in column three but in column six justthe total number of p\u00C4\u0081das analyzed individually.Table 2Mixed metrical patterns in the Mah\u00C4\u0081bh\u00C4\u0081rata identified by TEITagger v. 17type identified not fully totalsingle 70,242 3,194 73,436mixed 689 2,505 3,194double 85 4 89triple 468 994 1,462quadruple 129 1,451 1,580quintuple 5 23 28sextuple 2 33 35TEITagger version 17 found matches for each of the fourteen varietiesof Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh Upaj\u00C4\u0081ti patterns and the several Jagat\u00C4\u00AB Upaj\u00C4\u0081ti patterns namedseparately. It also found several additional samavr\u00EF\u00BF\u00BFtta metrical patterns forlines and verse quarters not found by analyzing whole verses. Rows headedby these meter names show blanks in the columns for verses and lines whereno verses or lines of that type were found. These initial results of applyingTEITagger to analyze the metrical patterns in theMah\u00C4\u0081bh\u00C4\u0081rata demonstrateits capacity to reveal detailed information about a massive work and to markup the results in a way that permits computational compilation so that theseresults may be presented to scholars in ways that may inspire further insight.Table 3Metrical patterns in the Mah\u00C4\u0081bh\u00C4\u0081rata identified by TEITagger v. 17meter type syllables/ verse 2/3 lines line quarterp\u00C4\u0081daAnu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh 8 68,360 10 521 633Anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh3 8 68,322 10 518 610Pram\u00C4\u0081\u00E1\u00B9\u0087ik\u00C4\u0081 8 38 0 1 22Vidyunm\u00C4\u0081l\u00C4\u0081 8 2 1246 Scharfmeter type syllables verse 2/3 lines line quarterp\u00C4\u0081daVibh\u00C4\u0081 8 6Ham\u00CC\u0087saruta 8 1Tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh 11 1,355 62 970 3,252Indravajr\u00C4\u0081 11 171 3 271 941Upendravajr\u00C4\u0081 11 94 0 174 805V\u00C4\u0081torm\u00C4\u00AB 11 1 30 0 597Rathoddhat\u00C4\u0081 11 5 0 0 0\u00C5\u009A\u00C4\u0081lin\u00C4\u00AB 11 38 0 0 909Upaj\u00C4\u0081ti 11 1,046 29 525 0Bhadr\u00C4\u0081 11 68 2 167 0Ham\u00CC\u0087s\u00C4\u00AB 11 90 0 188 0K\u00C4\u00ABrti 11 114 3 0 0V\u00C4\u0081\u00E1\u00B9\u0087\u00C4\u00AB 11 98 4 0 0M\u00C4\u0081l\u00C4\u0081 11 73 1 0 0\u00C5\u009A\u00C4\u0081l\u00C4\u0081 11 82 0 170 0M\u00C4\u0081y\u00C4\u0081 11 50 3 0 0J\u00C4\u0081y\u00C4\u0081 11 50 1 0 0B\u00C4\u0081l\u00C4\u0081 11 82 5 0 0\u00C4\u0080rdr\u00C4\u0081 11 68 3 0 0R\u00C4\u0081m\u00C4\u0081 11 62 1 0 0R\u00EF\u00BF\u00BFddhi 11 85 3 0 0Buddhi 11 67 2 0 0Siddhi 11 57 1 0 0Jagat\u00C4\u00AB 12 411 4 94 343Vam\u00CC\u0087\u00C5\u009Basth\u00C4\u0081 12 359 3 73 181Indravam\u00CC\u0087\u00C5\u009B\u00C4\u0081 12 1 0 5 95Bhuja\u00E1\u00B9\u0085gapray\u00C4\u0081ta 12 3 0 0 0K\u00C4\u0081madatt\u00C4\u0081 12 4Vai\u00C5\u009Bvadev\u00C4\u00AB 12 3 55\u00C5\u009Aruti 12 2 8Upaj\u00C4\u0081ti 12 48 0 16 0\u00C5\u009Aa\u00E1\u00B9\u0085khanidhi 12 1 0 2 0Padmanidhi 12 2 0 14 0Vam\u00CC\u0087\u00C5\u009Bam\u00C4\u0081l\u00C4\u0081 12 45 1 0 0Fancy 116 0 37 32TEITagger 247meter type syllables verse 2/3 lines line quarterp\u00C4\u0081daHalamukh\u00C4\u00AB 9 1 0 0 0\u00C5\u009Auddhavir\u00C4\u0081j 10 1Aparavaktra 13/12 27 0 3 0Pu\u00E1\u00B9\u00A3pit\u00C4\u0081gr\u00C4\u0081 12/13 33 0 3 0Prahar\u00E1\u00B9\u00A3i\u00E1\u00B9\u0087\u00C4\u00AB 13 8 0 1 1Rucir\u00C4\u0081 13 28 0 11 28Prabhavat\u00C4\u00AB 13 1Vasantatilak\u00C4\u0081 14 3 0 0 1Prahara\u00E1\u00B9\u0087akalik\u00C4\u0081 14 1 0M\u00C4\u0081lin\u00C4\u00AB 15 9 0 0 0\u00C5\u009A\u00C4\u0081rd\u00C5\u00ABlavikr\u00C4\u00AB\u00E1\u00B8\u008Dita 19 1 0 0 0Upag\u00C4\u00ABti 5cm+l+1cm+g 6 0 29 0\u00C4\u0080ry\u00C4\u0081g\u00C4\u00ABti 7cm+gg 0 0 1 0Identified 70,242 76 1,622 4,267No type 3,194 4,297Total 73,436 8,5646 Communication between TEI files and linguisticsoftwareAs mentioned in section 1, one of the principal benefits of encoding Sanskrittexts using TEI XML is to fulfill the need to coordinate directly, withouthuman intervention, with software developed by others, possibly in waysnot anticipated. In particular, by encoding Sanskrit texts in TEI we antic-ipate coordinating a large repository of digital Sanskrit texts with parsersand syntax analyzers, such as the Sanskrit Heritage parser and the Uni-versity of Hyderabad\u00E2\u0080\u0099s .sMa;sa;a;Da;n\u0004a;a. TEI provides robust standardized methodsto coordinate various versions of texts and to refer to particular divisionsand segments within a text so that parsed and syntactically analyzed pas-sages may be interlinked with their originals. Naturally, the highest lev-els of coordination between versions would require standardized identifica-tion of the repository that houses the original file from which a passagewas taken and submitted to a linguistic analysis tool on another site. Anattribute value pair such as simply repository='sl', or more officiallyrepository='US-RiPrSl' using the International Standard Identifier for248 ScharfLibraries and Related Organizations (ISIL), ISO 15511, might identify theSanskrit Library as the repository. Obviously standardized identificationof the file within the repository is required, either by collection and itemidentifiers or by filename. These identifiers should be interpretable pro-grammatically as a URL, or be a URL directly provided with a submis-sion. For example, if I submit the first verse of the unanalyzed text ofthe Mah\u00C4\u0081bh\u00C4\u0081rata to the Sanskrit Heritage parser I might provide the URLhttp://sanskritlibrary.org/texts/tei/mbh1.xml with my submission.A second level of standardized identification is required to identify thetype of analysis. When the Sanskrit Library analyzed the TITUS archive\u00E2\u0080\u0099stexts for inclusion in 2006, it discovered a surprising variety in the degree andtype of analysis of sandhi. Some of these encoding practices can be specifiedin the encoding description of a document. However, standard designationof various degrees of analysis is needed to coordinate versions. At the least,one might consider standard designation for the types of analysis of Sanskrittexts described in Table 4. For clarity, it is strongly recommended that thesedifferent degrees of analysis be located in separate files, not combined in asingle file. TEI provides simple means of coordinating such versions bysynchronizing element identifiers (xml:id).Once a file containing the version of a text with a specific degree of analy-sis is identified, standardized reference to particular sections and passages isrequired. TEI provides machine-readable methods for declaring the elementused and the structure of references within two elements of the teiHeader:\u00E2\u0080\u00A2 tagsDecl\u00E2\u0080\u00A2 refsDeclThe tagging declaration may be used to document the usage of specific tagsin the text and their rendition.2 Figure 7 shows the tagsDecl element usedfor the Sanskrit Library\u00E2\u0080\u0099s TEI edition of the critical edition of the Mah\u00C4\u0081-bh\u00C4\u0081rata. Because the value of the partial attribute is specified as false, thetags listed as values of the gi attribute of the tagUsage elements are all theelements and the only elements that occur under the text element. The lg,l, and seg elements are used to mark up verses as shown in figures 2, 3,and 4, in the last of which are shown also the use of the body, div, sp, andspeaker elements. The p and s elements are used to mark up paragraphs2See the TEI P5 guidelines at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD57, and http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-tagsDecl.htmlTEITagger 249Table 4Degrees of analysis of Sanskrit texts1. continuous text (sam\u00CC\u0087hit\u00C4\u0081-p\u00C4\u0081\u00E1\u00B9\u00ADha)a. with breaks only where permitted in Devan\u00C4\u0081gar\u00C4\u00AB script, i.e. onlyafter word-final vowels, visarga or anusv\u00C4\u0081rab. with breaks where permitted in Roman script, i.e. after conso-nants as wellc. with breaks where permitted in Roman script with designationimmediately following characters representing sounds that resultfrom single replacement sandhi at word boundaries2. sandhi-analyzed text (pada-p\u00C4\u0081\u00E1\u00B9\u00ADha)a. with word final visarga throughout, without designation of com-pound constituentsb. distinguishing visarga originating in final s from visarga from finalrc. with designation (but not analysis) of compound constituents aspermitted in Devan\u00C4\u0081gar\u00C4\u00AB script, i.e. after constituent-final vowels,visarga or anusv\u00C4\u0081rad. with designation (but not analysis) of compound constituents aspermitted in Roman script, i.e. after constituent-final consonantsas welle. with designation (but not analysis) of compound constituents aspermitted in Roman script, with designation immediately follow-ing characters representing sounds that result from single replace-ment sandhi at constituent boundariesf. with analysis of sandhi between compound constituents as well3. morphologically analyzed text4. lexically and morphologically analyzed text5. syntactically analyzed texta. dependency structureb. phrase structure250 Scharfand sentences in prose. The numbers listed as values of the occurs attributein the tagUsage elements indicate the number of occurrences of the elementnamed in the value of the gi attribute. The numbers shown are those for theSvarg\u00C4\u0081roha\u00E1\u00B9\u0087aparvan. Those mentioned as values of the selector attributeof the rendition element with xml:id='skt' are all the elements and theonly elements that render Sanskrit text in SLP1 to be transcoded to UnicodeRoman, Devanagari, or another Indic Unicode encoding for redisplay. Theseelements provide all that is necessary to extract Sanskrit text from theencoding for display in HTML, and for submission as a unit to metrical,morphological and syntactic analysis software. The attribute values of theelements listed in the rendition element with xml:id='sktat' lists all theattributes and the only attributes whose values are Sanskrit text in SLP1 tobe transcoded. These attribute values are Sanskrit terms that might be usedto display menus in an HTML display to select divisions such as parvan, andadhy\u00C4\u0081ya.The reference declaration describes the reference system used in thetext.3 TEI offers the possibility of describing the pattern of canonical ref-erences formally in a manner amenable to machine processing. A regularexpression describing the pattern of the canonical reference is paired with areplacement expression that describes the path to the attributes that con-tain the referenced numbers (n attributes of div and lg elements in verse inthe Mah\u00C4\u0081bh\u00C4\u0081rata, and of p, and s in prose). Figure 8 shows the refsDeclelement of the Sanskrit Library\u00E2\u0080\u0099s TEI edition of the Svarg\u00C4\u0081roha\u00E1\u00B9\u0087aparvan.The pattern shown in the matchPattern attribute of the first cRefPatternelement describes a canonical reference to any verse quarter in the Mah\u00C4\u0081-bh\u00C4\u0081rata. The three sets of digits separated by periods refer to the parvan,adhy\u00C4\u0081ya, and verse; the letter refers to the p\u00C4\u0081da, for example, 6.24.70a refersto the first p\u00C4\u0081da of the seventieth verse of the twenty-fourth adhy\u00C4\u0081ya of thesixth parvan shown in Figure 2. (The 24th adhy\u00C4\u0081ya of that parvan is thesecond in the Bhagavadg\u00C4\u00ABt\u00C4\u0081.) The first of the two cRefPattern elementsgives a replacement expression that matches a path that has verses directlyas children of a div element; the second, one that has verses as childrenof an intervening sp element within an adhy\u00C4\u0081ya. Subsequent cRefPatternelements describe shorter references to whole verses, adhy\u00C4\u0081yas, and parvans.These elements and attributes directly provide an unambiguous method to3See the TEI P5 Guidelines at http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD54, and http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-refsDecl.htmlTEITagger 251Figure 7The tagsDecl element in the Sanskrit Library\u00E2\u0080\u0099s TEI edition of theSvarg\u00C4\u0081roha\u00E1\u00B9\u0087aparvan of the Mah\u00C4\u0081bh\u00C4\u0081rata252 Scharfresolve canonical references to particular passages. Yet, processed in the op-posite direction, from the replacement path to the match expression, the ref-erences provide a means to compose canonical references from n attributes.Once a standard system of exact references to specific passages in un-analyzed continuous text has been adopted, reference to various versionsof analyzed passages are easily constructed by specifying in addition thedegree of analysis described in Table 4. One method of doing this in aTEI document would be to specify the degree of analysis as a value of theana attribute of the text element. Another would be for archives to add astandard addition to the filename.Linguistic software that produces TEI output would add elements sub-ordinate to those containing text in the TEI document that contains thecontinuous text. A document that contains analyzed sandhi but no furtheranalysis would insert each word (pada), including compounds (samasta-pa-da), in a w element. A document that contains compound analysis wouldinsert the lexical constituents of compounds in a w element subordinate tothe compound\u00E2\u0080\u0099s w element. Although the types of analysis described in Ta-ble 4 do not envision tagging non-lexical morphemes such as the infix a andsuffix ti in the verb gacchati, such morphemes would be inserted in an melement. TEI provides attributes that may be used for lexical and mor-phological analysis of each word in a w element. The stem of the word ismade the value of the lemma attribute. We have chosen to make the lexicalidentifier a value of the type attribute and the morphological identifier avalue of the subtype attribute. Figure 9 shows our TEI mark up of thesandhi analysis of the first verse of the Bhagavadg\u00C4\u00ABt\u00C4\u0081, MBh. 6.23.1, andFigure 10 shows our TEI mark up of the lexical and morphological analysisof the same verse. Where authors deliberately compose passages that areamenable to more than one analysis (\u00C5\u009Ble\u00E1\u00B9\u00A3a), alternative analyses \u00E2\u0080\u0094 whetherof verses, lines, verse quarters, prose passages, or individual words \u00E2\u0080\u0094 maybe analyzed in separate files where, in order to permit coordination, theymay be supplied with the identical division numbers and xml:ids as theirunanalyzed passages and the preferred analysis.As a result of standardized coordination of markup and reference be-tween Sanskrit text archives and Sanskrit computational software, HTMLdisplays showing the unanalyzed version of a verse might be able to includea set of links to various analyzed versions for the convenience of studentsand scholars of Sanskrit. Conversely, displays of the results of analysis of apassage might also provide links to the unanalyzed source.TEITagger 253Figure 8The refsDecl element in the Sanskrit Library\u00E2\u0080\u0099s TEI edition of theSvarg\u00C4\u0081roha\u00E1\u00B9\u0087aparvan of the Mah\u00C4\u0081bh\u00C4\u0081rata254 ScharfFigure 9TEI mark up of the sandhi analysis of MBh. 6.23.1, the first verse of theBhagavadg\u00C4\u00ABt\u00C4\u0081TEITagger 255Figure 10TEI mark up of the lexical and morphological analysis of MBh. 6.23.1, thefirst verse of the Bhagavadg\u00C4\u00ABt\u00C4\u0081ReferencesBenko, Matthew. 2000. Understanding XML. Tech. rep. url: https://faculty.darden.virginia.edu/GBUS885-00/Papers/PDFs/Benko%20-%20Understanding%20XML%20draft%20TN.pdf.Edgerton, Franklin. 1939. \u00E2\u0080\u009CThe epic tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh and its hypermetric varieties\u00E2\u0080\u009D.Journal of the American Oriental Society 59.2pp. 159\u00E2\u0080\u0093174. doi: www.jstor.org/stable/594060.Fitzgerald, James L. \u00E2\u0080\u009CA meter-guided analysis and discussion of the dicingmatch of the Sabh\u00C4\u0081parvan of the Mah\u00C4\u0081bh\u00C4\u0081rata\u00E2\u0080\u009D.\u00E2\u0080\u0094 2006. \u00E2\u0080\u009CToward a database of the non-anu\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh verses of the Mah\u00C4\u0081bh\u00C4\u0081-rata\u00E2\u0080\u009D. In: Epics, Khilas, and Pur\u00C4\u0081\u00E1\u00B9\u0087as. continuities and ruptures. Pro-ceedings of the Third Dubrovnik International Conference on the San-skrit Epics and Pur\u00C4\u0081\u00E1\u00B9\u0087as. Ed. by Petteri Koskikallio. Zagreb: CroatianAcademy of Sciences and Arts, pp. 137\u00E2\u0080\u0093148.\u00E2\u0080\u0094 2009. \u00E2\u0080\u009CA preliminary study of the 681 tri\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADubh passages of of the Mah\u00C4\u0081-bh\u00C4\u0081rata\u00E2\u0080\u009D. In: Epic undertakings. proceedings of the 12th World SanskritConference. Ed. by Robert Goldman and Muneo Tokunaga. Delhi: Moti-lal Banarsidass, pp. 95\u00E2\u0080\u0093117.Goldfarb, Charles F. 1990. The SGML Handbook. Oxford: Clarendon Press.Melnad, Keshav, Pawan Goyal, and Peter M. Scharf. 2015a. \u00E2\u0080\u009CIdentificationof meter in Sanskrit verse\u00E2\u0080\u009D. In: Sanskrit syntax. selected papers presentedat the seminar on Sanskrit syntax and discourse structures, 13\u00E2\u0080\u009315 June2013, Universit\u00C3\u00A9 Paris Diderot, with a bibliography of recent research byHans Henrich Hock. Providence: The Sanskrit Library, pp. 325\u00E2\u0080\u0093346.\u00E2\u0080\u0094 2015b. \u00E2\u0080\u009CUpdating Meter Identifying Tool (MIT)\u00E2\u0080\u009D. In: (Bangkok, June 28\u00E2\u0080\u0093July 2, 2015). Paper presented at the 16th World Sanskrit Conference,Bankok.Scharf, Peter M. 2014. \u00E2\u0080\u009CLinguistic issues and intelligent technological solu-tions in encoding Sanskrit\u00E2\u0080\u009D. Document num\u00C3\u00A9rique 16.3pp. 15\u00E2\u0080\u009329.Scharf, Peter M. and Malcolm D. Hyman. 2011. Linguistic issues in encodingSanskrit. Delhi: Motilal Banarsidass.S\u00C3\u00B6hnen-Thieme, Renate. 1999. \u00E2\u0080\u009COn the composition of the Dy\u00C5\u00ABtaparvan ofthe Mah\u00C4\u0081bh\u00C4\u0081rata\u00E2\u0080\u009D. In: Composing a Tradition. Proceedings of the FirstDubrovnik International Conference on the Sanskrit Epics and Pur\u00C4\u0081\u00E1\u00B9\u0087as,256TEITagger 257August 1997. Ed. by Mary Brockington and Peter Schreiner. Zagreb:Croatian Academy of Sciences and Arts, pp. 139\u00E2\u0080\u0093154.Wikipedia contributors. 2017. IBM Generalized Markup Language. In:Wikipedia. The Free Encyclopedia. Wikipedia.W\u00C3\u00BCstner, E., P. Buxmann, and O. Braun. 1998. \u00E2\u0080\u009CXML \u00E2\u0080\u0094 The ExtensibleMarkup Language and its Use in the Field of EDI\u00E2\u0080\u009D. In: Handbook onarchitectures of information systems. Ed. by P. Bernus, K. Mertins, andG. Schmidt. International Handbooks on Information Systems. Berlin,Heidelberg: Springer.Zazueta, Rob. 2014. API data exchange. XML vs. JSON. How do you spellAPI? url: https://www.mashery.com/blog/api-data-exchange-xml-vs-json.Preliminary Design of a Sanskrit Corpus ManagerG\u00C3\u00A9rard Huet and Idir LankriAbstract: We propose a methodology for the collaborative annotation ofdigitalized Sanskrit corpus tagged with grammatical information. Themain features of the proposal are a fine grain view of the corpus at thesentence level, allowing expression of inter-textuality, sparse represen-tation allowing non-necessarily sequential acquisition, and distributedcollaborative development using Git technology. A prototype SanskritCorpus Manager has been implemented as a proof of concept, in theframework of the Sanskrit Heritage Platform. Possible extensions andpotential problems are discussed.1 IntroductionSeveral digital libraries for Sanskrit corpus have been developed so far. Wemay mention the GRETIL site of G\u00C3\u00B6ttingen\u00E2\u0080\u0099s University,1 with a fair cov-erage, under various formats. The Sarit site,2 developed by Dominik Wu-jastyk, Patrick McAllister and other Indology colleagues, contains a smallercorpus, but it follows a uniform format compliant with the Text EncodingInitiative (TEI) standard, and has a nice interface. Furthermore it bene-fits from a collaborative acquisition framework using Git technology. TheSanskrit Library3 developed by Peter Scharf and colleagues, also followsthe TEI, and benefits from the tagging services of the Sanskrit HeritagePlatform, since individual sentences link to its segmentation cum taggingservice. DCS4 developed at Heidelberg University by Oliver Hellwig, is themost advanced from the point of view of linguistic analysis, since it is fully1G\u00C3\u00B6ttingen Register of Electronic Texts in Indian Languages http://gretil.sub.uni-goettingen.de/gretil.htm2Search And Retrieval of Indic Texts http://sarit.indology.info3Sanskrit Library http://www.sanskritlibrary.org4Digital Corpus of Sanskrit http://kjc-sv013.kjc.uni-heidelberg.de/dcs/259260 Huet and Lankriannotated with morphological tags indexing a lexicon of stems. Its develop-ment involved several iterations of deep learning algorithms (Hellwig 2009,2015, 2016). Covering at present 560,000 sentences, it is today the closestanalogue for Sanskrit of the Perseus Digital Library for Greek and Latincorpus.5Several other efforts are currently under development, although unfortu-nately with little standardization effort. Not all digital libraries are publiclyavailable. For instance, the TITUS Thesaurus of Indo-European text6 isaccessible only to scholars participating in the acquisition effort.There exist now several computational linguistics tools that process San-skrit text in order to parse it under a grammatical representation that can beconsidered an approximation to a formal paraphrase of its meaning. Typi-cally, a sentence will yield a stream of morphological tags. The DCS analyserof Oliver Hellwig, based on statistical alignment on a data base of lemmastrained from a seed of human-annotated tags, has the advantage of beingfully automatic. The Sanskrit Heritage Platform under development at InriaParis offers a service of segmentation with tagging (at two levels, inflectionand morphology of stems), linking into a choice of two dictionaries. It alsohas a surface parser using k\u00C4\u0081raka analysis that can be used for learners onsimple sentences, but is not sufficient for corpus processing (Huet 2007). Italso links with Amba Kulkarni\u00E2\u0080\u0099s Sa\u00E1\u00B9\u0083s\u00C4\u0081dhan\u00C4\u00AB analyser,7 that helps producea dependency graph (Kulkarni 2013). This structure captures the semanticrole (k\u00C4\u0081raka) analysis of a sentence, provided it is not too dislocated. Fur-thermore, an auxiliary tool helps the annotator to transform a dislocatedsentence into its prose order by proper permutation of its segments.Thus it seems that the time is ripe to consider establishing a commonrepository that would store digital Sanskrit libraries in annotated form,either automatically, or with the help of competent annotators using inter-active tools. We present here a preliminary proposal for the design of aSanskrit corpus manager concept, that could serve as a seed repository forthe collaborative editing of texts, and that could support navigation andsearch through appropriate further tools. We have developed a simplifiedimplementation of the concept, using technology available off-the-shelf asfree software. We shall conclude by listing problems in the managing of ajoint corpus repository.5Perseus http://www.perseus.tufts.edu/hopper/6TITUS http://titus.uni-frankfurt.de/index.htm7Sa\u00E1\u00B9\u0083s\u00C4\u0081dhan\u00C4\u00AB http://scl.samsaadhanii.inSanskrit Corpus Manager 2612 Specificities of Sanskrit corpusProcessing Sanskrit by computer is in some sense easier than processingother natural languages, at least if we restrict ourselves to the classicallanguage. It benefits of the grammatical tradition [vy\u00C4\u0081kara\u00E1\u00B9\u0087a] dating backfrom hoary times, since the grammar of Sanskrit was fixed by P\u00C4\u0081\u00E1\u00B9\u0087ini 25centuries ago in his A\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADady\u00C4\u0081y\u00C4\u00AB, which was initially descriptive, but laterbecame prescriptive. Classical Sanskrit was not the vernacular local prakrit,which is used only in the theater. It was the language of the educated [\u00C5\u009Bi\u00E1\u00B9\u00A3\u00E1\u00B9\u00ADa].And thus, it was assumed grammatically correct, which means that we mayalign our segmentations to a precise recursive definition.Granted, there are many non-Paninian forms in epics literature, andthere are many corrupted texts. But we may record exceptions, and cor-rupted texts may perhaps be amended. Of course philologists will shudder atthe thought of amending texts, but they must excuse my irreverence, consid-ering that in my professional trade, programs with bugs must be corrected,and only secondarily treated as historical artifacts in the version-maintainingrepository. The main merit of mistakes is to trace possible filiations of ver-sions, since scribes often copied without amending their sources, and thuserrors would be transmitted. But this assumption is not always met, andthus the classical phylogenetic tradition is challenged (Hanneder 2017). Inany case, I am making the assumption that the corpus recorded in the globalrepository has been edited to the point of being grammatically correct. Pos-sibly as a result of the interactive use of grammatical tools, in as much asthey may be used as editing assistants.Actually, the Sanskrit language is not that regular. Even seemingly reg-ular processes such as compounding pose problems in the proper analysis ofwritten texts, since compounding is not associative, and accent is not markedin writing. Furthermore, there are many different styles, not just prose andpoetry. The grammatical s\u00C5\u00ABtra style is very concise, closer to algebraic rules,with phonemes used both for linguistic and meta-linguistic notation. The\u00C5\u009B\u00C4\u0081stra style of scholastic Sanskrit (Tubb and Boose 2007) is also highly artifi-cial. The Indian tradition of formal debate (v\u00C4\u0081da) (Tripathi 2016) producedtexts that are layers upon layers of commentaries, with counter-arguments(p\u00C5\u00ABrvapak\u00E1\u00B9\u00A3a) alternating with upheld theses (uttarapak\u00E1\u00B9\u00A3a, siddh\u00C4\u0081nta). Po-ets indulged in obscure constructions, rare lexemes, very long compounds,and dislocated sentences. Furthermore, the inherent ambiguity of phoneticenunciations where word boundaries are blurred by sandhi gave rise to a262 Huet and Lankriwhole new genre of \u00C5\u009Ble\u00E1\u00B9\u00A3a \u00E2\u0080\u0093 double entendre \u00E2\u0080\u0093 where ambiguous segmenta-tion yields possibly opposite meanings (Bronner 2010). For instance, con-sider nak\u00E1\u00B9\u00A3atrapathavartin\u00C4\u0081 r\u00C4\u0081j\u00C3\u00B1\u00C4\u0081 from Da\u00E1\u00B9\u0087\u00E1\u00B8\u008Din\u00E2\u0080\u0099s K\u00C4\u0081vy\u00C4\u0081dar\u00C5\u009Ba. It may meana glorious king \u00E2\u0080\u009Cfollowing the path of the stars\u00E2\u0080\u009D (nak\u00E1\u00B9\u00A3atra-patha-vartin\u00C4\u0081r\u00C4\u0081j\u00C3\u00B1\u00C4\u0081), or a despicable king, \u00E2\u0080\u009Cnot following a noble path\u00E2\u0080\u009D (na k\u00E1\u00B9\u00A3atra-patha-vartin\u00C4\u0081 r\u00C4\u0081j\u00C3\u00B1\u00C4\u0081), playing on the oronyms nak\u00E1\u00B9\u00A3atra and na k\u00E1\u00B9\u00A3atra. Here specificphilological apparatus is needed in order to display the two readings, it isnot just a matter of choice between segmentations, since both readings areintended. But if linear text is given up in benefit of graphical display, wemay visualise the mixed two readings as shown in our Reader tool, see Figure1.Figure 1Da\u00E1\u00B9\u0087\u00E1\u00B8\u008Din\u00E2\u0080\u0099s prototype \u00C5\u009Ble\u00E1\u00B9\u00A3aOther difficulties in interpreting Sanskrit text are the absence of distinc-tive sign for proper names (like capitals in Roman script), making e.g. k\u00E1\u00B9\u009B\u00E1\u00B9\u00A3\u00E1\u00B9\u0087aambiguous between the divine hero and the black color, and the ambiguityof pr\u00C4\u0081di compounds such as nirv\u00C4\u0081cya, that may mean \u00E2\u0080\u009Cwhat should not betalked about\u00E2\u0080\u009D (with nis preposition acting as negation) as well as \u00E2\u0080\u009Cwhatshould be explained\u00E2\u0080\u009D (now compositional future participle of verb nirvac).Another problem, at the discourse level this time, is indirect speech, whoseending is marked with particle iti, but whose beginning must be guessedfrom the context. All these reasons show that editing a text in order toexpress several possible meanings with distinct morphological annotations,explained through distinct grammatical analyses, is a much more difficulttask than simply listing raw sentences in sandhied form.Sanskrit Corpus Manager 263Finally, Sanskrit literature abounds in inter-textuality features. Mantrasare quoted, stories are retold in a manifold manner, bards adapt orallytransmitted tales, learned commentaries pile up on each other, numerousanthologies of poems and maxims (subh\u00C4\u0081\u00E1\u00B9\u00A3ita, ny\u00C4\u0081ya) share a lot of material,mah\u00C4\u0081k\u00C4\u0081vyas expand episodes of epics, etc.Considering all these difficulties, we propose a set-up for progressivecomputer-aided tagging of selected portions of corpus, with documentedintertextuality, as an alternative to TEI-style full digitalization of corpusin raw form. Thus one of the important requirements is that the (partial)corpus be represented at a low level of granularity, typically a \u00C5\u009Bloka forpoetry, or a sentence for prose.3 Available technologyThe main paradigm of the proposed annotation scheme is that it shouldbe a distributed service, not just available from a server for consultationof readers, but itself the locus of collaborative annotation activity. This isin line with the recommendation of Peter Robinson (Robinson 2009): \u00E2\u0080\u009CThemost radical impact of the digital revolution is to transform scholarly editingfrom the work of single scholars, working on their own on single editions, toa collaborative, dynamic and fluid enterprise spanning many scholars andmany materials\u00E2\u0080\u009D.In the software development domain, now Git technology (Chacon andStraub 2014) is the de facto standard for such collaborative development.Originally designed to serve as versioning cum distribution for the Linuxeffort, it quickly replaced all previous software management systems. It hasseveral implementations, one managing the GitHub site, popular for open-source development. The GitLab software offers additional functionalities,notably in terms of security.A Git project consists of branches evolving with time, each branch carry-ing a hierarchy of files. The hierarchy corresponds directly to the structureof the file system of the host operating system. The files typically containsource code of software, and of its documentation. But they may be of what-ever format. Collaborators of the project have a local copy of the relevantbranch on their own computer station. So they may not only compile andinstall locally the software, but they may modify it and add to it. After localtesting, the developer may request the supervisor of the branch to update264 Huet and Lankrithe global site with his modifications. On approval, the software merges themodifications with the current version, a possibly complex operation.Git is a major change of paradigm in the collaborative development ofmassive efforts. It is now used for the dynamic management of documentsof various nature. This is a mature technology, with the finest availablealgorithms in distributed computing, alignment, compaction, cryptography.It looks like the ideal collaborative tool for developers of a digital library.The other obvious technological decision is to use Web technology for theuser interface. HTML and XML support Unicode for presenting all writingsystems. Web services are now the absolute standard for distributed services.4 Implementing a prototype as a proof of conceptA 2-months effort in summer 2017 was defined as a student Master project.The second author, in the Master program of University Paris Diderot, andan Ocaml expert, was offered an internship at Inria for its implementation.He familiarized himself rapidly with the sources of the Sanskrit HeritagePlatform, put at this occasion on Inria\u00E2\u0080\u0099s GitLab site for distributed devel-opment under Git. At the same time, a second Git project was launched asthe Sanskrit Heritage Resources, to distribute the lexical resources used bythe Platform machinery, as well as the Sanskrit morphology XML databanksthat it produces.The requirement was to implement a corpus manager as a Web service,using the Sanskrit Heritage Platform as interactive tagging service, andproducing progressively an annotated corpus as a sub-branch of the SanskritHeritage Resources Git project. The hierarchical structure of the corpus isdirectly mapped on the directory structure of the UNIX file system.4.1 The Sanskrit Heritage Corpus ManagerThree levels of capabilities have been defined. The Reader capacity is avail-able to any user. As its name indicates, he is only allowed to read the library,but not to modify it. The Annotator capacity allows addition and correctionto the corpus files. The Manager capacity allows addition and correctionto the directory structure. These three capacities are mapped respectivelyto permissions of the UNIX file system, and to roles in the Sanskrit corpusproject, initially located as a component of the Sanskrit Heritage ResourcesGit project.Sanskrit Corpus Manager 265Texts are available as leaves of the directory structure, such as\u00E2\u0080\u009CKAvya/BANa/KAdambarI/\u00E2\u0080\u009D. In Manager mode, one may add to thisstructure, or edit it. In Reader mode one may navigate through it, througha simple Web interface with scrolling menus. In Annotator mode you mayadd to the text, or give corrections. For instance, let us assume that,in Annotator mode, we input a sentence in the initially empty directory\u00E2\u0080\u009CKAvya/BANa/KAdambarI/\u00E2\u0080\u009D. We are forwarded to the Sanskrit HeritageReader page (Goyal and Huet 2016), where we input in the text window thefollowing string:rajoju\u00E1\u00B9\u00A3e janmani sattvav\u00E1\u00B9\u009Bttaye sthitau praj\u00C4\u0081n\u00C4\u0081.m pralaye tama\u00E1\u00B8\u00A5sp\u00E1\u00B9\u009B\u00C5\u009Be |aj\u00C4\u0081ya sargasthitin\u00C4\u0081\u00C5\u009Bahetave tray\u00C4\u00ABmay\u00C4\u0081ya trigu\u00E1\u00B9\u0087\u00C4\u0081tmane nama\u00E1\u00B8\u00A5 ||The segmenter returns with 155,520 solutions represented on a singleHTML page as a graphical display where the annotator, if familiar with thetool, very quickly converges to the desired solution (in 13 clicks). Whenthis is done, a Save button prompts the annotator, who may save this seg-mentation in the corpus, or abort. On saving he is returned to the cor-pus interface, which prompts him for the next sentence. The screen hesees at this point is represented in Figure 2. It indicates that now branch\u00E2\u0080\u009CKAvya/BANa/KAdambarI/\u00E2\u0080\u009D supports a text where sentence 1 is now listed(in IAST notation according to local settings, could be Devan\u00C4\u0081gar\u00C4\u00AB or both).Figure 2After saving sentence 1This sentence 1 is itself a link, to the display of the Heritage reader, asshown in Figure 3. Note that this is what we saved, showing the unique so-266 Huet and Lankrilution selected by the annotator. Note also that what we see is just the usualdisplay of the Reader: you may click on segment rectangles in order to gettheir morphology. The morphology is itself linked to the dictionary, whichcan be set either to the Sanskrit-French Heritage dictionary maintained atthe site, or to the electronic version of the Sanskrit-English Monier-Williamsdictionary. It is not just an HTML static page, it is a dynamic page whereall services of the Platform are available. Including backtracking on theannotator choices, via the Undo service! You may also click on the UniqueSolution tick and continue the analysis with gender agreement. Or by click-ing on the UoH Analysis Mode tick, go further into k\u00C4\u0081raka analysis withAmba Kulkarni\u00E2\u0080\u0099s dependency analyser. Thus, it would be easy to extendthe service with other displays under various analyses, provided with propermeta-data.Figure 3Displaying sentence 1Now let us return to Figure 2. Please note the \u00E2\u0080\u009CS 1\u00E2\u0080\u009D button. It invitesthe annotator to continue his tagging task, with another sentence. If youclick on it, a scrolling menu prompts you with the sentence number, and anAdd button. You may at this point choose not to continue in sequence, forinstance choose sentence 4. On pressing Add we are back in the loop withthe Reader.After entering sentence 4, we get the image of this piece of corpus asFigure 4. There are now two mouse-sensitive buttons: one marked \u00E2\u0080\u009C2 \u00E2\u0080\u00933\u00E2\u0080\u009D for filling the hole between 1 and 4, the other one, marked \u00E2\u0080\u009CS 4\u00E2\u0080\u009D, forentering sentences after the 4th one. This illustrates the partial nature ofthe annotated corpus. Scholars may concentrate on often quoted parts, orverses they have a special interest in, without having to tag continuouslyfrom the beginning of the work. This is important, in view of the non-fully-automatic nature of the annotating task. It also allows work splittingSanskrit Corpus Manager 267between annotators of the same text. Merging their contributions will bemanaged by the Git mechanisms.Figure 4After annotating sentences 1 and 4When a Reader browses the corpus, he will see exactly the same display,except that the adding buttons will not appear.When an annotator has completed some tagging, he may call a rou-tine that will store his annotations in the local repertory of the Corpus Gitproject. He may then use the commit Git command to commit his annota-tions, with proper documentation. Once in a while he may distribute hiscontributions to the community by using the push Git command in orderto merge his work with those of the other annotators, under control of theproject Managers.4.2 Application to citations analysis in the Heritage dictio-naryThis prototype of a corpus manager has been implemented as an auxil-iary service of the Heritage platform segmenter. It is currently being putto use to manage citations in the Heritage hypertext dictionary. Its cur-rent 800 citations are being progressively tagged, and entered in a stan-dalone branch \u00E2\u0080\u009CHeritage_citations\u00E2\u0080\u009D of the corpus. The corpus structure isimplemented as a sub-project of the Heritage_Resources Git project, andas such is incorporated in the Heritage_platform server data, at installa-tion time. Thus the facility is available for testing to whoever installs thetwo software packages through Inria\u00E2\u0080\u0099s GitLab server, as projects https://gitlab.inria.fr/huet/Heritage_Resources.git and https://gitlab.inria.fr/huet/Heritage_Platform.git respectively. The public server268 Huet and Lankrisite http://sanskrit.inria.fr has been updated with the corpus man-ager, available as a \u00E2\u0080\u009CCorpus\u00E2\u0080\u009D service from the site directory at the bottomof its pages. Of course only Reader mode is available at the public site.But the distribution version, available through Git, will allow annotators todevelop their own tagged corpus, and possibly merge them in the commonGit repository when registered as an official Annotator.An example of such analysed citation may be viewed in our dictionaryat entry kunda. Please visit URL http://sanskrit.inria.fr/DICO/21.html#kunda. This entry is illustrated by a quotation from \u00C5\u009Bloka 6.25 ofK\u00C4\u0081lid\u00C4\u0081sa\u00E2\u0080\u0099s \u00E1\u00B9\u009Aitusa\u00E1\u00B9\u0083h\u00C4\u0081ra, underlined as mouse-sensitive. Clicking on it bringsyou to the corresponding corpus page, where the sentence is displayed as alist of colored segments, as shown in Figure 5. Clicking on a segment bringsits lemma, with lexicon access to the root items. Although it has the samelook-and-feel as the segmentation tool, it is actually displayed by the corpusmanager, navigating in Reader mode in its \u00E2\u0080\u009CHeritage_citations\u00E2\u0080\u009D branch.This can be verified by clicking on the \u00E2\u0080\u009CContinue reading\u00E2\u0080\u009D button, whichbrings you to this branch directory, where the \u00C5\u009Bloka appears as item 10.This shows the smooth integration of this tool within other services.Figure 5Annotated quotation5 Extending the prototype to other toolsThe extreme simplicity of this design makes it easily extensible to othergrammatical tools implemented as Web services. All that is needed to in-corporate them is to include a save button in the pages that return theresult of their analysis, with the functionality of saving their HTML sourcein the corpus hierarchy. Or, even, in the style of our own implementation,Sanskrit Corpus Manager 269to store the sentence analysis data as parameters for the invocation of adynamic corpus crawler. Conversely the Add facility of the corpus managerwill have to be made aware of the variety of such services, and its display ac-commodated to show all analyses of the given \u00C5\u009Bloka by the various services.This assumes of course that these services are reachable from the putativeannotators, either installed on their station\u00E2\u0080\u0099s own Web server, or availableat publicly available Internet servers. The Heritage set of services may beused both ways, since it is itself distributed as an open-source system fromits Git project repository. Should the concept prove itself useful, it wouldbe easy to separate the Corpus Manager from the Heritage distribution, andmake it a stand-alone facility.It is to be remarked that having several grammatical tools availablefor displaying corpus in analysed form does not induce any commitmenton a standard display, each tool may keep its look-and-feel, and links toits specific functionalities. We are not demanding either to synchronize oralign taggings effected by various tools. Annotators using one tool maytag sentences irrespective of whether they have been already processed withsome other tool. All we have to agree on is the directory structure and itsmetadata format (under control by the Git users with Manager capability),and in the designation scheme of individual files representing the analyses.6 Design of inter-textuality functionalitiesThis simple prototype provides for the moment a strictly hierarchical viewof the corpus. This is too restrictive, since it allows no sharing. For instance,in the skeleton corpus of \u00E2\u0080\u009CHeritage_citations\u00E2\u0080\u009D, we would like to link item10 to its original in \u00E1\u00B9\u009Atusa\u00E1\u00B9\u0083h\u00C4\u0081ra. Of course we could enter its duplicatein its proper branch, say \u00E2\u0080\u009CKAvya/KAlidAsa/Ritusamhara/6/25\u00E2\u0080\u009D. But wewould like to document this by recording its \u00E2\u0080\u009Cabsolute\u00E2\u0080\u009D link in the \u00E2\u0080\u009CHer-itage_citations\u00E2\u0080\u009D branch at item 10. This would be an easy extension of thecurrent mechanism. But this is only one simple example of inter-textuality.Some of the citations are not to a full \u00C5\u009Bloka, but perhaps to a portion, ora simplification, or a reordering of some original quotation. Thus we wouldneed to design a notation to document such partial sharing between differentbranches of the corpus.270 Huet and Lankri6.1 Collating recensions and manuscript segmentsWe also want to be able to use the tool for recording, and comparing, vari-ous manuscripts traditions of the same text. Actually, the idea of this low-granularity corpus representation arose from a presentation by Pr Brocking-ton at a seminar in Paris in december 2016 (Brockington 2016). He showedthere two representations of various manuscripts of Sanskrit epics.The first one, extracted from traditional phylogenetic methods (Phillips-Rodriguez, Howe, and Windram 2009) represents a tree of manuscripts ofMah\u00C4\u0081bh\u00C4\u0081rata, expressing the growth of the material over time. It has beenobtained through phylogenetic analysis performed on sargas 43-47, 51, 59-60and 64-65 of the Dy\u00C5\u00ABtaparvan by the Supernetwork method in the SplitsTreepackage. The sigla used are those of the Critical Edition, with J substitutedfor \u00C3\u0091 and Z for \u00C5\u009A. It is reproduced in Figure 6 below (courtesy WendyPhillips).The second one is a Venn diagram of R\u00C4\u0081m\u00C4\u0081ya\u00E1\u00B9\u0087a\u00E2\u0080\u0099s manuscript relation-ships, reproduced in Figure 7 (taken from (Brockington 2000), courtesyJohn Brockington). This Venn diagram representation (possibly completedby the suitable ordering of the verse portions) is a more informative view ofrelationships between manuscript groups, since it represents the (multi-)setof all \u00C5\u009Blokas of all manuscripts, each one represented as a subset, possi-bly intersecting in complex ways with other manuscripts. In other words,the R\u00C4\u0081m\u00C4\u0081ya\u00E1\u00B9\u0087a is there considered as a Boolean expression in terms of itsmanuscripts segments, a more detailed concept than the phylogenic tree, al-though not currently producible automatically from recensions in an obviousmanner.This suggests that our \u00C5\u009Bloka-level corpus ought to accommodate notationamenable to express complex sharing relationships between the manuscripts,such as:V = W [1\u00E2\u0088\u0092 250] ;X [5\u00E2\u0088\u0092 23] ;W [251\u00E2\u0088\u0092 300]expressing that manuscript A is the same as B with interpolation of a portionof C. Such sharing relationships ought to turn into extra annotations on thecorpus data representations, so that navigation through the various versionswould be available.Sanskrit Corpus Manager 271Figure 6Phylogenetic analysisFigure 7Venn diagram6.2 Paths management on shared corpusIt should be obvious at this point that an extra level of abstraction is neededin order to be able to name contiguous portions of corpus recensions thatare shared across manuscript versions, such as X [5\u00E2\u0088\u0092 23] in the notationabove. This path in our corpus tree is shared between recensions V andX. If we want to express this sharing in our corpus structure, and thusavoid the duplication of \u00C5\u009Bloka annotations between V and X, we shall needto introduce the notion of path through a dag8 of branches, of which ourcorpus structure is only a specific spanning tree. This induces a need toexpress the concept of the successor of a \u00C5\u009Bloka node along a given path,since in our example node W:250 has successor W:251 along path W, butX:5 instead along path V. Thus we need to record this information in nodeW:250, so that we may later navigate along path V by following the pathW until its 250th node, and then continue from node X:5, until node X:23,which will be followed by node W:251 along the V path.This of course assumes that the numbering of \u00C5\u009Blokas is now a functionof its path, so that e.g. \u00C5\u009Bloka W:251 appears at index 269 along path V,since \u00C5\u009Bloka W:251 is shared with V:269 The same mechanism could allowfor instance to assign to index Bhagavadg\u00C4\u00ABt\u00C4\u0081.1.1 the same \u00C5\u009Bloka as Mah\u00C4\u0081b-h\u00C4\u0081rata.6.63.23.8directed acyclic graph272 Huet and LankriThe determination of the portions of text that are amenable to sharingis decided by the human corpus managers/annotators, not a fully automaticprocess, since we do not want to share maximally by identifying all identical\u00C5\u009Blokas across all texts. For instance, we shall not identify all the evocationsof a ritual mantra across all texts, with absurd cluttering of a unique nodewith all possible successors in all the texts. Furthermore, we do not wantthat two occurrences of the same verse in one text lead to looping paths.6.3 Cohabitation of readingsRepresenting the padap\u00C4\u0081tha form of a Sanskrit utterance is the first level ofits interpretation. Assigning morphology to its segments is a further levelof interpretation. Assigning k\u00C4\u0081raka semantic roles consistent with nominalcases and verbal voices is still a deeper interpretation; linking anaphoricreferences to their antecedent and cataphoric ones to their postcedent, to-gether with entity-name recognition, brings analysis at the discourse level.Accommodating these various levels of analysis of a text will need adapta-tions to our corpus representation structure. The basic idea is that a piece ofcorpus represents more than the raw text as a stream of phonemes, and thatpaths through the fine-grain structure represent not just a list of phoneticproductions, but a specific reading of this text.Thus we must admit paths that represent different glosses of a given text,possibly contradictory. For instance, we would need different path assign-ments for Bhagavadg\u00C4\u00ABt\u00C4\u0081 according to \u00C5\u009Aa\u00E1\u00B9\u0085kara and to Madhva respectively,so that e.g. BhG{24.2.17} appears as n\u00C4\u0081satovidyatebh\u00C4\u0081von\u00C4\u0081bh\u00C4\u0081vovidyate-sata\u00E1\u00B8\u00A5 on the first path, and n\u00C4\u0081satovidyate\u00E2\u0080\u0099bh\u00C4\u0081von\u00C4\u0081bh\u00C4\u0081vovidyatesata\u00E1\u00B8\u00A5 on thesecond.9 Note that in this example, the use of avagraha does disambiguatethe two readings, but as stream of phonemes they are the same. This isa case showing that we need two different nodes in our corpus representa-tion for common phonetic material, since their meanings are not compatible.Note that the two readings are oronyms, but this is not a case of \u00C5\u009Ble\u00E1\u00B9\u00A3a, wherethe two meanings are intended. We could talk of XOR-oronyms, contrastedwith AND-oronyms (the genuine \u00C5\u009Ble\u00E1\u00B9\u00A3as), for which we want to represent thetwo readings together in the same structure. The XOR/AND terminologystems from Boolean algebras in Stone form, such as Venn diagrams.Genuine \u00C5\u009Ble\u00E1\u00B9\u00A3as are often used for expressing simili figures of style, asB\u00C4\u0081\u00E1\u00B9\u0087a demonstrated ad nauseum in K\u00C4\u0081dambar\u00C4\u00AB. Their translation in lan-9communicated by Pr. Madhav DeshpandeSanskrit Corpus Manager 273guages such as French or English necessitates heavy paraphrases weaving thedescription and its metaphor as coordinated phrases.10 Giving a notation torepresent the two readings without duplication is an interesting challenge:we want to represent minimally the two segmentations while sharing theircommon segments. Note that the graphical interface of the Heritage Readergives a possible solution to this problem, since we may keep the two sets ofsegments, without any duplication, by trimming all segments that appearin neither. See Figure 1.The design of a proper notation for annotated corpus is beyond the scopeof the present paper, and is the affair of professional philologists, but ourprototype could provide a test bed for such experiments.Actually, we could also include in the corpus directories informationconcerning studies of a particular \u00C5\u009Bloka or portion of text, mentioning bib-liographic references to relevant literature. It could also refer to discussionsconcerning specific grammatical points, respective validity of the various an-notations, etc. Each Sanskrit \u00C5\u009Bloka could have its own blog page, and theglobal corpus structure could evolve into social networking for Sanskrit text!7 Remaining problemsOur toy corpus manager raises serious issues which will have to be wellassessed before scaling up to a durable infrastructure.First of all, we are suggesting that a common repository of analysedSanskrit text be agreed upon by both developers of computational linguisticstools and scholars managing digital libraries. This raises issues of a legal andsociological nature. Certain institutions will want to control what they deemis their intellectual property. Certain scholars will refuse to compromisewith their freedom of doing things their own way. Even if a critical mass ofindividuals agree on sharing their work on a common source-free repository,we know from experience that committee work is not always optimum todesign a technical artifact. Apparently simple issues such as the naming ofbranches may reveal complex problems, the solutions of which may not beeasy to agree on.10In French, \u00C5\u009Ble\u00E1\u00B9\u00A3a is limited to curiosities like the holorime \u00E2\u0080\u009CGal, amant de la Reine, alla,tour magnanime, galamment de l\u00E2\u0080\u0099ar\u00C3\u00A8ne \u00C3\u00A0 la tour Magne \u00C3\u00A0 N\u00C3\u00AEmes\u00E2\u0080\u009D and jokes like \u00E2\u0080\u009Cmonfr\u00C3\u00A8re est ma s\u00C5\u0093ur\u00E2\u0080\u009D playing on the oronyms ma s\u00C5\u0093ur/masseur274 Huet and LankriAnother important issue is durability. Our proposal assumes that theanalyzing tools will be perennial, in as much as their proper availability isnecessary for the display of their analyses. This is implicit from the factthat we are not restricting ourselves to displaying static XML or HTMLpages, but allow the execution of Web services (cgi-bin executables in theWeb jargon) which demand availability of programming languages and theircompilers over the life span of the digital library. Thus robustness and main-tainability of the satellite tools are critical. Versioning is another issue, sinceour analysis tools are not finished products, but experimental software thatkeeps evolving, and that may depend on lexical resources that also evolvethemselves. Thus non-regression analysis tools will have to be developed,in order to correct taggings that are no longer identified or are no longerunique after a change of version. However, please note that improvementsin precision that do not compromise recall often do not require revisitingthe analysed corpus, which should be robust to such upward-compatibleimprovements.Finally, let us emphasize that our proposal concerns just the foundationsof a collaborative framework for the grammatical annotation of Sanskrittext, and has no pretense at providing philological tools such as collatingsoftware. Such tools will have to be re-thought over this low-level represen-tation of the corpus.8 ConclusionWe have presented general ideas concerning a Sanskrit corpus manager, andimplemented a prototype with the Sanskrit Heritage Platform to test themain concepts. The main design imperative is that corpus managing oughtto be a collaborative effort, allowing text annotation on a variety of gram-matical analysis services. The prototype implementation, in a restrictedsetting, shows that the infrastructure development is actually rather simple,if one uses off-the-shelf technology such as Web services and Git reposito-ries. It is hoped that this proposal will spur interest from philologists andcomputational linguists, and hopefully contribute to their increased collab-oration.ReferencesBrockington, John. 2000. \u00E2\u0080\u009CTextual Studies in V\u00C4\u0081lm\u00C4\u00ABki\u00E2\u0080\u0099s R\u00C4\u0081m\u00C4\u0081ya\u00E1\u00B9\u0087a\u00E2\u0080\u009D. In: EpicThreads: John Brockington on the Sanskrit Epics. Ed. by Greg Baileyand Mary Brockington. Oxford University Press, New Delhi, pp. 195\u00E2\u0080\u0093206.\u00E2\u0080\u0094 2016. \u00E2\u0080\u009CRegions and recensions, scripts and manuscripts: the textual his-tory of the R\u00C4\u0081maya\u00E1\u00B9\u0087a and Mah\u00C4\u0081bh\u00C4\u0081rata\u00E2\u0080\u009D. In: Issues in Indian Philol-ogy: Traditions, Editions, Translations/Transfers. (Abstract) Coll\u00C3\u00A8ge deFrance.Bronner, Yigal. 2010. Extreme poetry. Columbia University Press, New York.Chacon, Scott and Ben Straub. 2014. Pro Git. Apress (available as https://git-scm.com/book/en/v2).Goyal, Pawan and G\u00C3\u00A9rard Huet. 2016. \u00E2\u0080\u009CDesign and analysis of a lean in-terface for Sanskrit corpus annotation\u00E2\u0080\u009D. Journal of Linguistic Modeling4.2pp. 117\u00E2\u0080\u0093126.Hanneder, J\u00C3\u00BCrgen. 2017. To edit or not to edit. Pune Indological Series I,Aditya Prakashan, Pune.Hellwig, Oliver. 2009. \u00E2\u0080\u009CSanskritTagger, a Stochastic Lexical and POS tag-ger for Sanskrit\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics 1 & 2. Ed. byG\u00C3\u00A9rard Huet, Amba Kulkarni, and Peter Scharf. Springer-Verlag LNAI5402, pp. 266\u00E2\u0080\u0093277.\u00E2\u0080\u0094 2015. \u00E2\u0080\u009CUsing Recurrent Neural Networks for joint compound splittingand Sandhi resolution in Sanskrit\u00E2\u0080\u009D. In: Proceedings, 7th Language andTechnology Conference. Ed. by Zygmunt Vetulani and Joseph Mariani.Springer-Verlag LNAI (to appear).\u00E2\u0080\u0094 2016. \u00E2\u0080\u009CImproving the Morphological Analysis of Classical Sanskrit\u00E2\u0080\u009D. In:Proceedings, 6th Workshop on South and Southeast Asian Natural Lan-guages. Association for Computational Linguistics, pp. 142\u00E2\u0080\u0093151.Huet, G\u00C3\u00A9rard. 2007. \u00E2\u0080\u009CShallow syntax analysis in Sanskrit guided by semanticnets constraints\u00E2\u0080\u009D. In: Proceedings of the 2006 International Workshopon Research Issues in Digital Libraries. Kolkata, West Bengal, India:ACM. doi: http://doi.acm.org/10.1145/1364742.1364750. url:yquem.inria.fr/~huet/PUBLIC/IWRIDL.pdf.Kulkarni, Amba. 2013. \u00E2\u0080\u009CA Deterministic Dependency Parser with DynamicProgramming for Sanskrit\u00E2\u0080\u009D. In: Proceedings of the Second International275276 Huet and LankriConference on Dependency Linguistics (DepLing 2013). Prague, CzechRepublic: Charles University in Prague, Matfyzpress, Prague, Czech Re-public, pp. 157\u00E2\u0080\u0093166. url: http://www.aclweb.org/anthology/W13-3718.Phillips-Rodriguez, Wendy J., Christopher J. Howe, and Heather F. Win-dram. 2009. \u00E2\u0080\u009CChi-Squares and the Phenomenon of \u00E2\u0080\u009CChange of Exem-plar\u00E2\u0080\u009D in the Dy\u00C5\u00ABtaparvan\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics 1 &2. Ed. by G\u00C3\u00A9rard Huet, Amba Kulkarni, and Peter Scharf. Springer-Verlag LNAI 5402, pp. 380\u00E2\u0080\u0093390.Robinson, Peter. 2009. \u00E2\u0080\u009CTowards a Scholarly Editing System for the NextDecades\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics 1 & 2. Ed. by G\u00C3\u00A9rardHuet, Amba Kulkarni, and Peter Scharf. Springer-Verlag LNAI 5402,pp. 346\u00E2\u0080\u0093357.Scharf, Peter. 2018. \u00E2\u0080\u009CTEITagger: Raising the standard for digital texts tofacilitate interchange with linguistic software\u00E2\u0080\u009D. In: Computational San-skrit & the Digital Humanities. Ed. by G\u00C3\u00A9rard Huet and Amba Kulkarni.D.K. Publishers, New Delhi.Tripathi, Radhavallabh. 2016. V\u00C4\u0081da in Theory and Practice. D.K. Print-world, New Delhi.Tubb, Gary A. and Emery R. Boose. 2007. Scholastic Sanskrit. ColumbiaUniversity, New York.Enriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti byadding variants from the Ny\u00C4\u0081sa and Padama\u00C3\u00B1jar\u00C4\u00ABTanuja P. Ajotikar, Anuja P. Ajotikar, and Peter M.Scharf1 Introduction1.1 Importance of the present workAs is well-known, theKa\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti (KV.), written by Jay\u00C4\u0081ditya and V\u00C4\u0081mana inthe seventh century ce, is the oldest extant complete running commentaryon P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB (A.). While several complete editions of the texthave been published, and a critical edition by Sharma, Deshpande, and Pad-hye (1969\u00E2\u0080\u00931970), it is known that the KV. has textual problems. Sharma,Deshpande, and Padhye\u00E2\u0080\u0099s critical edition is based on only nine manuscriptsas well as four previous editions while in the New Catalogus CatalogorumRaghavan and Kunjunni Raja (1968, 116b\u00E2\u0080\u0093188a) have noticed more thantwo hundred manuscripts. Efforts to produce a truly critical edition begunnearly thirty years ago led to the publication of an edition of the praty\u00C4\u0081h\u00C4\u0081rasection by Haag and Vergiani (2009). Now with new funding under the di-rection of Malhar Kulkarni at the Indian Institute of Technology Bombay, aproject promises to produce an edition of the first p\u00C4\u0081da of the first adhy\u00C4\u0081ya.Manuscripts in India last no more than about five hundred years. Theoldest readable manuscript of the KV. dates only to the early fifteenth cen-tury. Yet several centuries earlier, the KV. was commented upon in the K\u00C4\u0081-\u00C5\u009Bik\u00C4\u0081vivara\u00E1\u00B9\u0087apa\u00C3\u00B1jik\u00C4\u0081 by Jinendrabuddhi in the eighth or ninth century andthen in the Padama\u00C3\u00B1jar\u00C4\u00AB by Haradatta in the thirteenth century. Thesecommentators provide information about the constitution of the text ofKV. in several ways: by direct citation and incipits, as well as less directlyby a discussion on the text. The information provided by commentators277278 Ajotikar, Ajotikar and Scharfhundreds of years prior to the oldest manuscript is invaluable to reliablyestablish the text of the KV. It would be extremely helpful for the commu-nity of Sanskrit grammarians if an edition supplemented with the readingsavailable in the commentaries of the KV. is made available.The Osmania edition seldom mentions variants reported in the commen-taries, and, when it does, occasionally does so erroneously. Kulkarni et al.(2016) include an appendix indicating which readings of the Osmania edi-tion of the KV. on the praty\u00C4\u0081h\u00C4\u0081ra s\u00C5\u00ABtras are supported by the Ny\u00C4\u0081sa (NY.)and Padama\u00C3\u00B1jar\u00C4\u00AB (PM.). In that appendix, they use various signs to indi-cate which reading is supported by NY., which is supported by PM., andwhich is supported by both of them. It is a useful appendix, yet it coversa small fraction of the text and it lacks information concerning readings incommentaries that differ from the Osmania edition, whether the PM., whichis a later commentary, is aware of the reading given by the NY., how manyreadings are regarded as wrong by these commentators, etc. Therefore,there is a need to create an edition that presents this information accuratelyfor the whole text to the community of Sanskrit grammarians in particularand Sanskrit scholars in general.2 Method of data collectionThe complex, diffuse, and extensive nature of the data in the commentariesregarding readings in the KV. begs for systematic digital methodology. Thedigital medium provides a means to collect and organize complex informa-tion reliably, and to present that information in multiple uncomplicatedviews. The Text-Encoding Initiative (TEI) provides a means to indicatesupporting and variant readings in a critical apparatus. The digital textof the Osmania edition (1970) of the KV. is available from the Sanskrit Li-brary in a sandhi-analyzed form in the Sanskrit Library\u00E2\u0080\u0099s Phonetic Encoding(SLP1). Hence we choose to undertake the production of a digital editionof the sandhi-analyzed KV. with critical apparatus tagged in accordancewith TEI in SLP1. We proceed in this undertaking despite the fact that thesource digital text is not yet reliably proofread and is not yet itself markedup according to TEI. We propose to so mark it up during the course of ourwork in accordance with the method demonstrated by Scharf (2018).Enriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 2792.1 TEI critical apparatus tagsTEI offers the following elements and attributes to mark up a critical appa-ratus:1. The app (apparatus) element is used to group together each lemmaand all its variations; it has two child elements: lem and rdg.2. The lem (lemma) element is an optional child of the app element. Inthis context, the term lemma signifies the accepted reading in the basetext.3. The rdg (reading) element is a required child of the app element usedto indicate variations from the base text.4. The loc (location) attribute of the app element specifies the locationof the lemma in the base text.5. The wit (witness) attribute specifies which commentary supports thereading. This attribute is used in both of the elements lem and rdg.6. The type attribute is used to specify whether the reading is termedoccasional, wrong or desired in the apparatus.At present the loc attribute specifies only the canonical number of the s\u00C5\u00ABtraunder which the lemma occurs. In a sandhi-analyzed TEI text fully markedas described by Scharf (2018), the location will be specified in additionprecisely to the paragraph, sentence, and possibly word.2.2 SiglaThe wit attribute\u00E2\u0080\u0099s values are sigla that indicate which commentary andvariants reported in commentaries witness a particular reading. The follow-ing sigla are used:1. ny stands for the reading given by the Ny\u00C4\u0081sa.2. pm stands for the reading given by the Padama\u00C3\u00B1jar\u00C4\u00AB.280 Ajotikar, Ajotikar and Scharf2.3 Types of readingsFour different types of readings are found in each of the two commentaries.We indicate these by the following values of the type attribute in SLP1encoding:1. apapAWa indicates that the reading is considered wrong by the com-mentator.2. kvacit indicates that the reading is mentioned as a variant found bythe commentator somewhere other than in his principal text.3. yukta indicates that the reading in question is not received by thecommentator, but suggested by him as the correct reading.4. pratIka indicates that the reading is an incipit that supports but isnot identical to the lemma.2.4 SamplesBelow are shown three samples of TEI tagging in our critical apparatus.The first shows a lemma supported by both commentaries. The secondshows a lemma for which each commentator has given a different reading.The readings are assumed to be found by each commentator in his principalmanuscript of the KV. since the readings are provided without any commentregarding their source. The third example shows a lemma supported byJinendrabuddhi\u00E2\u0080\u0099s principal text and partially supported by Haradatta\u00E2\u0080\u0099s, yetfor which Jinendrabuddhi remarks that the reading in another manuscriptis incorrect.vfdDiSabdaHjayantipaWanti paWanpacanti jayantitvayakA kftamEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 281tvakayAtvayakA3 Issues3.1 Data representationGathering comments regarding readings from commentaries differs from thecollation of manuscripts. When a critical edition is prepared, the assumptionis that each manuscript covers the entire span of text edited unless commentsto the contrary are made in the manuscript description in the introductionor a comment regarding omission is made in the critical apparatus. Henceonly variants are reported in the critical apparatus and it is assumed that si-lence regarding a witness reports its support for the adopted text. Readingsthat are identical to the lemma are not reported. In contrast, commentarieson scientific texts, and on grammatical texts in particular, generally do notmention or comment upon every word of their base text. Even the KV.,as a commentary on every s\u00C5\u00ABtra of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB, does not mention everyword of every s\u00C5\u00ABtra as there found. Subcommentaries as a rule specificallymention only a small proportion of the words in the base text. Since thefull text is not always cited, one cannot assume that silence regarding read-ing in the base text indicates support. Therefore, while collecting readingsfrom the commentaries, it is necessary to note explicit comments regardingsupport along with variants. The notation of positive readings, as well asvariants, has the additional advantage of allowing us to analyze how muchof the existing text in the Osmania edition is supported by each of thesecommentaries.To compile statistics concerning the percent of text covered, supported,and departed from by the commentaries calls for a consistent unit of enu-meration. Traditional accounting of the extent of text in India used theorthographic syllable (\u00C2\u00BA:\u00C2\u00BBa:=) as the basic unit. The most accurate modernmethod would be to use the character. We plan to use characters in thephonetic encoding scheme SLP1. Neither a word nor a reading can accu-rately serve as such a unit as will become clear shortly; however, tabulatinglemmata and calculating the number of characters in each will provide anaccurate measure of the extent covered by each commentary.282 Ajotikar, Ajotikar and Scharf3.2 OmissionsOmissions are recorded in TEI with an empty rdg tag and optionally sup-plied with a cause attribute that explains the reason for the deletion of thetext. Possible values of the cause attribute suggested by the TEI editorsinclude for example the following:1. homeoarchy, which indicates the accidental skipping of a line of textbecause its beginning is similar to preceding or following lines, and2. haplography, which indicates the inadvertent omission of a repeatedwritten letter or letters.While such explanations may be relevant for omissions in manuscripts, theyare hardly relevant to edited commentaries where presumably editors havecorrected such errors. The reason for the absence of a certain word orsentence in any commentary is usually inexplicable. Hence it is not usefulto use the cause attribute. To represent the absence of a segment of thebase text in the commentary, an empty rdg element is used, for example,as follows:On A. 1.1.51 o+=+\u00C2\u00BE'a;pa:=H , while explaining the importance of the first wordoH in the s\u00C5\u00ABtra, the KV. as in the Osmania edition gives two counterexam-ples, Kea;ya;m,a and .gea;ya;m,a. Both the NY. and PM. witness and explain the firstcounterexample. The text of the NY. quotes the example and states itsderivation as follows: Kea;ya;\u000Ea;ma;\u000Ba;ta \u00C3\u0081 IR ..ca Ka;naH I+.\u000Ba;ta k\u00C2\u0094+.a;p,a \u00C3\u0081 I+.k+:a:=+(\u00C3\u00A3\u00C3\u0089a;a;nta;a;de ;ZaH \u00C3\u0081 \u00C2\u00BA;a;\u00C2\u0091\u00C3\u00A7 \u00C3\u0085u +\u00C2\u00BEaH \u00C3\u0081.1The PM. says: Kea;ya;\u000Ea;ma;\u000Ba;ta \u00C3\u0081 IR ..ca Ka;naH \u00C3\u0081. They then both proceed directly to thederivation of the counterexample to the second word in the s\u00C5\u00ABtra, .sa;Ea;Da;a;ta;a;kH ,skipping any mention or discussion of the word .gea;ya;m,a given in the Osmaniaedition. The fact that both commentaries proceed from the explanation ofthe first example relevant to the first word in the s\u00C5\u00ABtra directly to discussionrelevant to the second word in the s\u00C5\u00ABtra implies that the second counterex-ample on the first word in the Osmania edition was not in the text of theKV. referred to by the NY. and PM. The omission of the second word bythe commentators is represented by an empty rdg element as follows:geyam1We cite the text of the Osmania editions with sandhi as is but drop quotes andreferences, and use only the da\u00E1\u00B9\u0087\u00E1\u00B8\u008Da as punctuation.Enriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 2833.3 Problem in lemmatizing words as examplesIt is not always appropriate to select individual words as the unit of a lemma.Frequently sentences are used as examples, particularly where interwordphonetic changes are demonstrated or where syntax is relevant. Each suchexample should be understood as a single unit rather than as a series ofindividual words. For instance, A. 1.1.12 \u00C2\u00BA;d;sa;ea ma;a;t,a terms :pra;gxa;h\u00C3\u00B9\u00C3\u0085:a a vowel IRor \u0018 preceded by a m,a in forms of the demonstrative pronoun \u00C2\u00BA;d;s,a, therebypreventing by A. 6.1.125 sandhi with a following vowel such as would occurby A. 6.1.77. If A. 1.1.12 did not include the word \u00C2\u00BA;d;saH , then any vowel IRor \u0018 preceded by a m,a would be termed :pra;gxa;h\u00C3\u00B9\u00C3\u0085:a and not undergo sandhi. TheKV. on this rule shows the importance of the word \u00C2\u00BA;d;saH by citing twocounterexamples: Za;}ya:\u00C2\u0088a and d;a;a;q+.}ya:\u00C2\u0088a. If each of these counterexamples wererepresented as a sequence of two individual words with sandhi analyzed Za-;m\u0004a;a \u00C2\u00BA:\u00C2\u0088a, d;a;a;q+.m\u0004a;a \u00C2\u00BA:\u00C2\u0088a, as currently in the sandhi-analyzed digital edition, thesignificance of the counterexamples would vanish. Hence, sandhi is restoredin these and similar cases, and each such example is treated as a singlelemma.3.4 Problems in lemmatizing altered sequences of examplesThe KV. cites two examples on A. 1.1.51: k+:ta;Ra and h;ta;Ra. The order ofthe examples is significant in establishing the correct text; hence how thatorder is attested in both manuscripts and commentaries is pertinent. TheNY. quotes these examples in the same order as k+:ta;Ra \u00C3\u0081 h;ta;Ra I+.\u000Ba;ta before furtherexplaining each form. If there were a variant that inserted another examplebetween these two examples, then certainly that variant would be post-NY.It would be possible to represent each of these examples in a separate appelement and to represent an addition by an app element between them thatpairs a rdg element with an empty lem element. Conversely, it would bepossible to represent an omission by an app element that pairs a lem elementwith an empty rdg element. However, such a method is more cumbersomeand generally not adopted in critical editing. Hence, where the sequenceof examples is an issue showing some variation in the commentaries, thesequence is represented by tagging those examples in a single app element.284 Ajotikar, Ajotikar and ScharfkartA hartASimilarly, in many cases it is simpler and more comprehensible to an-notate variants of a sentence by taking the whole sentence as a single unitrather than its phrases or individual words as units. For example, on A.1.1.57, the Osmania edition reads tua;a;k k+:tRa;v.yea na .~Ta;a;\u000Ca;na;va;d, Ba;va;\u000Ba;ta \u00C3\u0081 and theNY. reads tua;a;k na .~Ta;a;\u000Ca;na;va;;\u00C2\u0091\u00C3\u0082\u00C3\u00A5\u00C3\u0085 +va;t\u0004a;a;\u000Ba;ta \u00C3\u0081. Since positive readings are reported as wellas variants, there are three ways to report this reading. One way is to tagevery word and report the absence of the word k+:tRa;v.yea as an omission. Thesecond way would be to tag the phrase tua;a;k k+:tRa;v.yea, and the third would beto tag the entire sentence. Under either of the first two methods, we stillrequire app elements to represent the support of the manuscripts for theother words or phrase in the sentence. Hence it is simpler to tag the wholesentence as a single unit and to treat the reading available in the NY. as asingle variant as follows:tuki kartavye na sTAnivat Bavatituki na sTAnivat BavatiMoreover, it is often the case that if an edition selects small units suchas individual words and represents variants in the form of the omission ofthose words, the reader requires more effort to understand what the exactreading of the witness is because he has to reconstruct the sentence fromfragments. Sanskrit commentators themselves describe such additional ef-fort as prolixity of understanding (:pra;\u000Ba;ta;pa;a:\u00C2\u0086a;ga;Ea:=+va). Thus, we tag the data onthe level of the word, phrase, or sentence according to the demand of thesituation. The following are a couple of additional examples of the omissionof words handled as variants of phrases or sentences.Under A. 1.1.47, the Osmania edition reads .~Ta;a;nea-ya;ea;ga-:pra;tya;ya;pa:=+tva;~ya \u00C2\u00BA;ya;m,a\u00C2\u00BA;pa;va;a;dH \u00C3\u0081. The PM. omits the word \u00C2\u00BA;ya;m,a and reads .~Ta;a;nea;ya;ea;ga;pra;tya;ya;pa:=+tva;~ya;a;pa-;va;a;dH \u00C3\u0081. Instead of representing this omission in three app elements, the firstand third taking .~Ta;a;nea;ya;ea;ga;pra;tya;ya;pa:=+tva;~ya and \u00C2\u00BA;pa;va;a;dH as lemmata with the PM.as witness, and the second with \u00C2\u00BA;ya;m,a as lemma and an empty rdg elementwith the PM. as witness, we treat the whole sentence as a single variant andtag it in a lem element under a single app element as follows:sTAneyogapratyayaparatvasya ayam apavAdaHEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 285sTAneyogapratyayaparatvasya apavAdaHOn A. 1.1.48 the KV. reads: :=E \u00C2\u00BA;\u000Ba;ta;\u0006a:= \u00C3\u0081 na;Ea \u00C2\u00BA;\u000Ba;ta;nua \u00C3\u0081 The NY. reads \u00C2\u00BA;\u000Ba;ta;\u0006a:= \u00C3\u0081\u00C2\u00BA;\u000Ba;ta;nua I+.\u000Ba;ta \u00C3\u0081 According to the Osmania edition, the KV. supplies the examples\u00C2\u00BA;\u000Ba;ta;\u0006a:= and \u00C2\u00BA;\u000Ba;ta;nua with the base words :=E and na;Ea of the final constituents of thecompounds which undergo replacement of their final vowels with a shortvowel by A. 1.2.47. Both the NY. and PM. omit these base words and attestonly the examples. This can be represented in three ways: (1) by takingeach word individually and representing :=E and na;Ea as omitted, (2) by takingthe set of both examples as a single unit and representing the omission ofthese two base words as one variant, or (3) by the medial course of takingeach set of base word plus example as a unit and representing the omissionof the base word in each as a variant consisting of just the example. Herewe chose the third course and placed each set of base word and examplein a lem element under an app element and the reading in a rdg elementwitnessed by the NY. and PM. as follows:rE atiriatirinO atinuatinu3.5 Difference in orderOn A. 1.1.47 the Osmania edition has three examples: ;a;va:\u001D+:\u00C2\u00BEa;\u000Ca:;d\u00C3\u0084\u00C3\u00A2 , mua:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;\u000Ba;ta, and:pa;ya;Ma;\u000Ea;sa. The NY. has the variant .\u001D+:\u00C2\u00BEa;\u000Ca:;d\u00C3\u0084\u00C3\u00A2 without the preverb ;a;va instead of;a;va:\u001D+:\u00C2\u00BEa;\u000Ca:;d\u00C3\u0084\u00C3\u00A2 , and places this example last in an order different from that ofthe Osmania edition: mua:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;\u000Ba;ta, :pa;ya;Ma;\u000Ea;sa, and finally .\u001D+:\u00C2\u00BEa;\u000Ca:;d\u00C3\u0084\u00C3\u00A2 . Two differences arerelevant: the change in the order of examples, and a variant for one ofthem. As above, these differences could be represented as an omission andan addition. However, it is simpler to tag all three words in the KV. in asingle lem element under one app element, to treat the reading in the NY. asa single variant, and to record it in a single rdg element.viruRadDi . muYcati . payAMsi .286 Ajotikar, Ajotikar and ScharfmuYcati . payAMsi . ruRadDi3.6 Inferring readings from explanationsJinendrabuddhi and Haradatta often provide explanations that permit oneto infer that they had certain readings of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti even though theydo not directly cite the reading. For example, the Osmania edition on A.1.3.63, \u00C2\u00BA;a;}\u00C3\u00A5.pra;tya;ya;va;tkx +:Va;ea Y;nua;pra;ya;ea;ga;~ya, cites two examples: IR +\u00C2\u00BBa;a:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;k\u00C3\u0092e and IR +.h;a:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;k\u00C3\u0092e .The NY. comments on these examples as follows:IR +\u00C2\u00BBa;a:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;k\u00C3\u0092e I+.tya;a;a;d \u00C3\u0081 IR +\u00C2\u00BBa d;ZRa;nea \u00C3\u0081 IR +.h ..cea;\u00C2\u008D;a;ya;a;m,a \u00C3\u0081 \u0018+:h ;a;va;ta;keR \u00C3\u0081 ;\u000Ca;l+.f, \u00C3\u0081 I+ja;a;de HI+.tya;a;a;d;na;aYY;m,a \u00C3\u0081 \u00C2\u00BA;a;maH I+.\u000Ba;ta le +.lR u +.k, \u00C3\u0081IR +\u00C2\u00BBa;a:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;k\u00C3\u0092e etc. After the roots IR +\u00C2\u00BB,a \u00E2\u0080\u0098see\u00E2\u0080\u0099, IR +.h, \u00E2\u0080\u0098strive\u00E2\u0080\u0099, and \u0018+:h, \u00E2\u0080\u0098con-jecture\u00E2\u0080\u0099, the affix ;\u000Ca;l+.f, is introduced (by A. 3.2.115); \u00C2\u00BA;a;m,a is in-troduced by A. 3.1.36 I+ja;a;de ;(\u00C3\u00A3\u00C3\u0089a gua:\u001D+:ma;ta;ea Y;nxa;.cCH ; and the affix ;\u000Ca;l+.f,is deleted by A. 2.4.81 \u00C2\u00BA;a;maH .Here the NY. refers to three verbal roots, namely IR +\u00C2\u00BB,a, IR +.h, , and \u0018+:h, . TheOsmania edition gives only two forms which are derived from the roots IR +\u00C2\u00BB,aand IR +.h, . The citation of the additional verbal root \u0018+:h, in the NY. is relevantto the form \u0018+:h;a:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;k\u00C3\u0092e which must have been an additional example. Hencethe text of the KV. received by the NY. must have had three examples,the third of which the established text in the Osmania edition lacks. Wetag such an inferred reading in the same way we tag a direct reading. Anaddition is tagged conversely to the way an omission is tagged by providingan empty lem-element with an associated reading in a separate app-element(cf. \u00C2\u00A73.2). Thus the present case is tagged as follows:IkzAYcakreUhAYcakreSimilarly, under A. 1.4.20 \u00C2\u00BA;ya;sma;ya;a;d\u0004 ;a;\u000Ca;na C+.nd;\u000Ea;sa, the KV. explains thepurpose of the rule in the following words: Ba;pa;d;sa;V\u00C2\u00BCa;a;\u000Ba;Da;k+:a:=e ;a;va;Da;a;na;a;t,a .tea;na mua;Kea;na.sa;a;Dua;tva;ma;ya;sma;ya;a;d\u0004 ;a;na;a;m,a ;a;va;D\u0004a;a;ya;tea \u00C3\u0081 \u00E2\u0080\u0098By means of the inclusion of this rule in thesection headed by the terms :pa;d and Ba, the fact that the words included in theEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 287list beginning with \u00C2\u00BA;ya;sma;ya are correct is provided.\u00E2\u0080\u0099 In this explanation, theOsmania includes the phrase .tea;na mua;Kea;na. The NY. comments on this sentenceas follows: k+:TMa :pua;na:=e +Sa;Ma .sa;a;Dua;tvMa ;a;va;D\u0004a;a;ya;ta I+.tya;a;h Ba;pa;d;sMa;\u00C2\u00BCa;a;\u000Ba;Da;k+:a:=e I+.tya;a;a;d \u00C3\u0081 d\u00C3\u00B5 ;a:=+m,a \u00C3\u0081 mua;Ka-;m,a \u00C3\u0081 o+.pa;a;ya I+.tya;na;Ta;Ra;nta:=m,a \u00C3\u0081 \u00E2\u0080\u0098In answer to the question, \u00E2\u0080\u009CBut how is the validityof these words established?\u00E2\u0080\u009D he says, \u00E2\u0080\u009CBy means of the inclusion of this rulein the section headed by the terms :pa;d and Ba etc.\u00E2\u0080\u009D d\u00C3\u00B5 ;a:= \u00E2\u0080\u0098door\u00E2\u0080\u0099, mua;Ka \u00E2\u0080\u0098mouth\u00E2\u0080\u0099,o+.pa;a;ya \u00E2\u0080\u0098means\u00E2\u0080\u0099 \u00E2\u0080\u0094 there is no difference in meaning. Because of the fact thatthe words mua;Ka and o+.pa;a;ya follow the word d\u00C3\u00B5 ;a:=, they may serve to explain thelatter. In that case, the word mua;Ka would not be a quotation from the basetext. Hence, Jinendrabuddhi\u00E2\u0080\u0099s comment may indicate that the word d\u00C3\u00B5 ;a:= wasread instead of the word mua;Ka in the version of the KV. available to him. Thesentence in the reading received by Jinendrabuddhi would then have beenthe following: Ba;pa;d;sa;V\u00C2\u00BCa;a;\u000Ba;Da;k+:a:=e ;a;va;Da;a;na;a:\u00C2\u0086ea;na d\u00C3\u00B5 ;a:=e +\u00C2\u00BEa .sa;a;Dua;tva;ma;ya;sma;ya;a;d\u0004 ;a;na;Ma ;a;va;D\u0004a;a;ya;tea \u00C3\u0081The PM. demonstrates that this supposition is correct and that Haradattareceived the same reading as Jinendrabuddhi. For Haradatta states ya;a;d .sa-;V\u00C2\u00BCa;a ;a;va;D\u0004a;a;yea;ta \u00C2\u00BA;a;na;nta;ya;Ra;;\u00C2\u0091\u00C3\u0082\u00C3\u00A5\u00C3\u0085 +sa;V\u00C2\u00BCa;a;a;va;Da;a;na;d\u00C3\u00B5 ;a:=e +\u00C2\u00BEEa;va ;\u000Ca;na;pa;a;ta;nMa .~ya;a;t,a \u00E2\u0080\u00A6Ba;pa;d;sMa;\u00C2\u00BCa;a;\u000Ba;Da;k+:a:=e I+.tya;a;a;d \u00C3\u0081d\u00C3\u00B5 ;a:=+m,a o+.pa;a;yaH \u00C3\u0081 \u00E2\u0080\u0098If this rule provided a term, due to the fact that it occurs justafter (the provision of the term Ba in A. 1.4.18), mention would be madeonly of words that occur by the provision of the term Ba. The term :pa;d wouldnot occur, nor would the conjunction of the terms Ba and :pa;d. \u00E2\u0080\u00A6Ba;pa;d;sMa;\u00C2\u00BCa;a-;\u000Ba;Da;k+:a:=e etc. The word d\u00C3\u00B5 ;a:= means o+.pa;a;ya.\u00E2\u0080\u0099 The PM. explicitly mentions theword d\u00C3\u00B5 ;a:=e +\u00C2\u00BEa and does not mention the word mua;Ka at all. Instead it explainsthe word d\u00C3\u00B5 ;a:= by the word o+.pa;a;ya. Hence, the PM. clarifies the statement inthe NY. and must be based on the same text that inspired the statement inthe NY. Although neither Jinendrabuddhi nor Haradatta refers to the wordd\u00C3\u00B5 ;a:=e +\u00C2\u00BEa directly as a citation by using the word I+.\u000Ba;ta after it, their commentsare a direct indication of a variant of the reading in the Osmania edition.We represent this case as follows:muKenadvAreRaThe following is another case where Haradatta\u00E2\u0080\u0099s comments imply a vari-ant reading. Under A. 1.4.3 yUa .~:\u00C2\u0090a;a;K.ya;Ea na;d\u0004 ;a, the KV. explains the word yUa in thes\u00C5\u00ABtra as IR ..ca \u0018 ..ca yUa. The PM. quotes this statment in the KV. and furthersays, \u00C3\u00AB\u00C3\u0090\u00C3\u0085\u00C3\u00AB\u00C3\u0081*:+:\u000Ca;.ca:\u00C2\u0086ua ;a;va;Ba;\u00C2\u0080\u00C2\u0094+.a;nta;mea;va :pa;F:\u00C2\u0095a;tea \u00E2\u0080\u0098But in some places the form is read ending ina nominal termination.\u00E2\u0080\u0099 This statement indicates that the nominative dual288 Ajotikar, Ajotikar and Scharfform yva;Ea was read in some manuscript available to Haradatta. We representthis inferred reading in the apparatus as follows:I ca U ca yUyvO3.7 Mistakes in the editions of the commentariesUnfortunately the editions of the NY. and PM. include mistakes. We havediscovered errors of mistaken sandhi analysis, mistaken sentence division,and mistaken quotation in our work so far. The following are three examples.On A. 1.1.39, there is a set of counterexamples: \u00C2\u00BA;a;Da;yea, ;\u000Ca;.ca;k +:a;SRa;vea, andku +:}Ba;k+:a:=e +ByaH . The Osmania edition of the NY. reads ;\u000Ca;.ca;k +:a;SRa;vaH I+.\u000Ba;ta \u00C3\u0081 At firstglance, it seems that this is a variant for ;\u000Ca;.ca;k +:a;SRa;vea. ;\u000Ca;.ca;k +:a;SRa;vea is the dativesingular of the nominal base ;\u000Ca;.ca;k +:a;SRua, and ;\u000Ca;.ca;k +:a;SRa;vaH is the nominative plural.The description of the form in the NY. is of the dative singular. The NY.explicitly states that the form is a dative singular of the nominal base ;\u000Ca;.ca;k +:a-;SRua, formed by applying the fourth-triplet nominal termination :ze and the gua;\u00C2\u00BEareplacement of the final vowel o by A. 7.3.111 ;Gea;a;zR +.\u000Ba;ta. Thus the nominativeplural form ;\u000Ca;.ca;k +:a;SRa;vaH does not fit the description given by the NY., and thecorrect form is ;\u000Ca;.ca;k +:a;SRa;vea. Hence there is no variant for the word ;\u000Ca;.ca;k +:a;SRa;vea inKV. in the text of the NY.How did the erroneous word ;\u000Ca;.ca;k +:a;SRa;vaH come to be found in the editionof the NY.? The editors of the Osmania editions often analyze sandhi in anattempt to be helpful to readers. Their original manuscripts must all haveread ;\u000Ca;.ca;k +:a;SRa;va I+.\u000Ba;ta with regular sandhi. In the Osmania edition of the NY.,the editors regularly analyze sandhi of examples and quotations followed byI+.\u000Ba;ta and place them in quotation marks. The sandhi of ;\u000Ca;.ca;k +:a;SRa;va I+.\u000Ba;ta can beanalyzed in two ways: ;\u000Ca;.ca;k +:a;SRa;vea I+.\u000Ba;ta and ;\u000Ca;.ca;k +:a;SRa;vaH I+.\u000Ba;ta. Thus wrong sandhidissolution created what appears to be a variant in the NY. when in factthe text has no such variant. On the basis of internal evidence, we infer thecorrect reading and report it as follows:cikIrzaveThe same sandhi error is made by the editors of the Osmania edition ofthe NY. on A. 1.1.67. The Osmania edition of the KV. states ta;sma;a;t,a I+.\u000Ba;taEnriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 289:pa:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;}ya;TRa;\u000Ca;na;deR ;Za o+\u00C2\u0086a:=+~yEa;va k+:a;y a Ba;va;\u000Ba;ta \u00C3\u0081 na :pUa;vRa;~ya \u00C3\u0081. Here the Sanskrit library sandhi-analyzed text reads :pa:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;}ya;TRa;\u000Ca;na;deR ;Zea in the locative. The Osmania edition of theNY. states ta;sma;a;t,a I+.\u000Ba;ta \u00C3\u0081 :pa:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;}ya;TRa;\u000Ca;na;deR ;ZaH I+.\u000Ba;ta \u00C3\u0081 First of all there should not be afull-stop after I+.\u000Ba;ta in the phrase ta;sma;a;t,a I+.\u000Ba;ta. Moreover the sandhi-dissolution:pa:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;}ya;TRa;\u000Ca;na;deR ;ZaH I+.\u000Ba;ta is wrong. As in the preceding example, the proper dis-solution is :pa:\u00C3\u00BA\u00C3\u00A3\u00C3\u0081*.a;}ya;TRa;\u000Ca;na;deR ;Zea as in the Sanskrit Library\u00E2\u0080\u0099s sandhi-analyzed text.Hence we do not report this case as a variant, but take it as support for thetext of the KV. as analyzed in the Sanskrit Library edition and report it asfollows:tasmAt iti paYcamyarTanirdeSeOn A. 1.1.56 the Osmania edition of the NY. includes an erroneoussentence break and erroneous indication of a quotation of the base text.The Osmania edition of the KV. reads na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH I+.tya;TRaH \u00C3\u0081 In theOsmania edition of the NY., Jinendrabuddhi\u00E2\u0080\u0099s explanation of the compound\u00C2\u00BA;na;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da is edited as follows: .sa :pua;naH .sa;ma;a;sa;ea ma;yUa:=+v.yMa;sa;k+:a;a;d;tva;a;t,a .sa;ma;a;sMa kx +:tva;ana;Vsa;ma;a;saH kx +:taH \u00C3\u0081 \u00E2\u0080\u0098na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH \u00E2\u0080\u0099 I+.\u000Ba;ta \u00C3\u0081 The editors of the NY. put a da-\u00E1\u00B9\u0087\u00E1\u00B8\u008Da after kx +:taH and put single quotes around na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH to indicatethat it is a quotation from the KV. This is a mistake. Careful reading ofthe text indicates that the d;\u00C2\u00BEq should be removed and the passage endingwith I+.\u000Ba;ta read as a single sentence as follows: .sa :pua;naH .sa;ma;a;sa;ea ma;yUa:=+v.yMa;sa;k+:a;a;d-;tva;a;tsa;ma;a;sMa kx +:tva;a na;Vsa;ma;a;saH kx +:ta;ea na;a;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da;\u0006a:=+\u000Ba;ta \u00C3\u0081 \u00E2\u0080\u009CBut that compound,formed because it is included in the list beginning with ma;yUa:=+v.yMa;sa;k, is formedas a negative tatpuru\u00E1\u00B9\u00A3a compound (na;Vsa;ma;a;sa): na \u00E2\u0080\u0098not\u00E2\u0080\u0099 \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da \u00E2\u0080\u0098a phoneticoperation\u00E2\u0080\u0099 = \u00C2\u00BA;na;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da.\u00E2\u0080\u009D The cited phrase na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH is not a citationto the KV.; it does not refer to the base text. It is a typical compoundanalysis of a na\u00C3\u00B1tatpuru\u00E1\u00B9\u00A3a compound. Such an analysis may have beenmade originally by a commentator on the KV., even by Jinendrabuddhihimself, rather than by the authors of the KV.. Hence without independentsupport from manuscripts, it should not be adopted in the text of the KV. onthe basis of the explanation provided in the NY.. However, since the editorsof the Osmania edition of the KV. have adopted the sentence na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=-+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH I+.tya;TRaH \u00C3\u0081 in their base text, the editors of the Osmania edition ofthe NY. marked na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH \u00C2\u00BA;na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH as a quotation from the base text.Unfortunately this is misleading. If it were a quotation from the base textit would have included the closing words I+.tya;TRaH \u00C3\u0081 We do not accept that the290 Ajotikar, Ajotikar and Scharftext of the NY. supports the reading na \u00C2\u00BA;\u0000//////\u000Fa;\u007Fva;\u000Ba;Da:=+na;\u0000//////\u000Fa;\u007Fva;\u000Ba;DaH I+.tya;TRaH in the KV.and hence refrain from including it in our critical apparatus.3.8 Discrepancies in quotations within different sections inthe same commentaryThere are many occasions where the commentary on the KV. on one s\u00C5\u00ABtracites text from the KV. on another s\u00C5\u00ABtra. Both the NY. and PM. do this.We mark these cases as support or variants of the text they cite just as wedo citations to the base text in commentaries on the cited base text underthe same s\u00C5\u00ABtra. If the citation does not differ from commentary on thebase text on the same s\u00C5\u00ABtra, we make no addition. However, if the citationconstitutes a variant that differs from one under the base text on the sames\u00C5\u00ABtra, or support for the reading of the base that received no support fromthe commentary on the base text on the same s\u00C5\u00ABtra, we add an additionalrdg element containing the new reading with a source attribute indicatingthe s\u00C5\u00ABtra under which that reading was found.For example, on A. 2.3.19, the Osmania edition of the KV. reads ;a;pa;tua:=;\u00C2\u0088a;a;k\u00C3\u0092 +:ya;a;a;d;sa;}ba;nDaH Za;b.de ;na;ea;.cya;tea \u00C3\u0081 :pua:\u00C2\u0088a;~ya tua :pra;t\u0004a;a;ya;ma;a;na I+.\u000Ba;ta ta;~ya;a;pra;a;Da;a;nya;m,a \u00C3\u0081 On A. 1.1.56,the NY. quotes the text exactly as given in the KV. on A. 2.3.19. However,while commenting on A. 2.3.19, instead of :pua:\u00C2\u0088a;~ya tua :pra;t\u0004a;a;ya;ma;a;na I+.\u000Ba;ta ta;~ya;a;pra;a;Da;a-;nya;m,a, the NY. quotes :pua:\u00C2\u0088a;~ya tua :pra;t\u0004a;a;ya;ma;a;na;tva;a;d;pra;a;Da;a;nya;m,a, adding the affix tva;a;t,aand omitting I+.\u000Ba;ta ta;~ya. Thus the NY. gives two different readings for thesame base text at two different places. We report both of these readings asfollows:pituH atra kriyAdisambanDaH Sabdena ucyate.putrasya tu pratIyamAnaH iti tasya aprADAnyamputrasya tu pratIyamAnatvAt aprADAnyam4 Sample resultsBelow we report the results of 578 readings gleaned from the our tagged dataof the third quarter of the first chapter of the A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB (A. 1.3). Indicatedis the number of times the commentators agree with or differ from the basetext, agree with or differ from each other, report, approve of or disapproveof variants.Enriching the digital edition of the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00CC\u00A5tti 2911. Only the NY. agrees with the base text: 2272. Only the PM. agrees with the base text: 1313. The NY. and the PM. share the same reading which agrees with thebase text: 1554. Only the NY. differs from the base text: 245. Only the PM. differs from the base text: 236. The NY. and the PM. share the same reading which differs from thebase text: 97. The NY. and the PM. each mention a reading which differs from thereading of the other: 68. The PM. is aware of variants: 99. The PM. received a different reading for which it suggests a betteroption: 1Ten percent (10%) of the readings gleaned from the commentators in A.1.3 support a change in the base text of the KV. The project of collectingreadings from commentators, therefore, promises to contribute significantlyto the establishment of a more correct text of the KV.5 ConclusionThe issues discussed demonstrate the depth of understanding required todetermine what each commentator must have read and the care required torepresent that information accurately. The method of preparing a criticalapparatus of readings of the KV. attested in the NY. and PM. describedabove provides a reliable and well-structured database of valuable informa-tion about the text of the KV. and its historical transmission that is bothhuman and machine-readable. This database will serve as a valuable re-source for producing a critical edition of the KV. The results of this projectwill also reveal the textual history of the KV. between when Jinendrabuddhiwrote his commentary in the eighth or ninth century, Haradatta wrote his inthe thirteenth century and the more recent dates of the extant manuscriptsof the text. The database will permit one to determine systematically how292 Ajotikar, Ajotikar and Scharfmuch of the text of the KV. was known to each of the commentators. Itwill reveal how many variations occurred in the transmission of the textand how many readings have been lost to us in the course of time. Themethods used in this project are applicable to similar philological work toprepare an edition and determine the textual history of any Sanskrit textwith commentaries or indeed of any commented text extant in the form ofmanuscripts.ReferencesHaag, Pascale. and Vincenzo Vergiani, eds. 2009. Studies in the K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vr\u00EF\u00BF\u00BFtti.The section on praty\u00C4\u0081h\u00C4\u0081ras; critical edition, translation and other con-tributions. Firenze: Societ\u00C3\u00A0 Editrice Fiorentina. Reprint: London; NewYork: Anthem, 2011.Kulkarni, Malhar, Anuja Ajotikar, Tanuja Ajotikar, and Eivind Kahrs. 2016.\u00E2\u0080\u009CDiscussion on some important variants in the praty\u00C4\u0081h\u00C4\u0081ras\u00C5\u00ABtras in theK\u00C4\u0081sik\u00C4\u0081vr\u00EF\u00BF\u00BFtti\u00E2\u0080\u009D. In: vy\u00C4\u0081kara\u00E1\u00B9\u0087aparipr\u00EF\u00BF\u00BFcch\u00C4\u0081. proceedings of the Vy\u00C4\u0081kara\u00E1\u00B9\u0087a sec-tion of the 16th World Sanskrit Conference, 28 June\u00E2\u0080\u00932 July 2015, San-skrit Studies Center, Silpakorn University, Bangkok. Ed. by George Car-dona and Hideyo Ogawa. New Delhi: D. K. Publishers, pp. 209\u00E2\u0080\u0093236.Pullela, Ramachandra, ed. 1981a. \u00C5\u009Ar\u00C4\u00ABharadattami\u00C5\u009Braviracit\u00C4\u0081 padama\u00C3\u00B1jar\u00C4\u00ABk\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vy\u00C4\u0081khy\u00C4\u0081. Prathamo Bh\u00C4\u0081ga\u00E1\u00B8\u00A5, 1\u00E2\u0080\u00934 adhy\u00C4\u0081y\u00C4\u0081\u00E1\u00B8\u00A5. Sam\u00CC\u0087skr\u00EF\u00BF\u00BFtapari\u00E1\u00B9\u00A3adgra-ntham\u00C4\u0081l\u00C4\u0081 25. Hyderabad: Sanskrit Parishad, Osmaniya University.\u00E2\u0080\u0094 ed. 1981b. \u00C5\u009Ar\u00C4\u00ABharadattami\u00C5\u009Braviracit\u00C4\u0081 padama\u00C3\u00B1jar\u00C4\u00AB k\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vy\u00C4\u0081khy\u00C4\u0081.Dvit\u00C4\u00AByo Bhaga\u00E1\u00B8\u00A5, 5\u00E2\u0080\u00938 adhy\u00C4\u0081y\u00C4\u0081\u00E1\u00B8\u00A5. Sam\u00CC\u0087skr\u00EF\u00BF\u00BFtapari\u00E1\u00B9\u00A3adgrantham\u00C4\u0081l\u00C4\u0081 26. Hyder-abad: Sanskrit Parishad, Osmaniya University.\u00E2\u0080\u0094 ed. 1985. Ny\u00C4\u0081sapar\u00C4\u0081khy\u00C4\u0081 k\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vivara\u00E1\u00B9\u0087apa\u00C3\u00B1jik\u00C4\u0081. Prathamo Bhaga\u00E1\u00B8\u00A5, 1-4 adhy\u00C4\u0081y\u00C4\u0081\u00E1\u00B8\u00A5. Sanskrit Parishad Granthamala 33. Hyderabad: SanskritParishad, Osmaniya University.\u00E2\u0080\u0094 ed. 1986. Ny\u00C4\u0081sapar\u00C4\u0081khy\u00C4\u0081 k\u00C4\u0081\u00C5\u009Bik\u00C4\u0081vivara\u00E1\u00B9\u0087apa\u00C3\u00B1jik\u00C4\u0081. Dvit\u00C4\u00AByo Bhaga\u00E1\u00B8\u00A5, 5-8 adhy\u00C4\u0081y\u00C4\u0081\u00E1\u00B8\u00A5. Sanskrit Parishad Granthamala 35. Hyderabad: SanskritParishad, Osmaniya University.Raghavan, V. and K. Kunjunni Raja. 1968. New Catalogus Catalogorum.an alphabetical register of Sanskrit and allied works and authors. Vol. 4.Chennai: University of Madras.Scharf, Peter M. 2018. \u00E2\u0080\u009CRaising the standard for digital texts to facilitateinterchange with linguistic software\u00E2\u0080\u009D. In: Computational Sanskrit andDigital Humanities. Papers accepted for presentation in the Computa-tional Sanskrit and Digital Humanities section of the Seventeenth WorldSanskrit Conference, Vancouver, 9\u00E2\u0080\u009313 July 2018. Ed. by G\u00C3\u00A9rard P. Huetand Amba P. Kulkarni. New Delhi: D. K. Publishers. Forthcoming.Sharma, Aryendra, Khanderao Deshpande, and D. G. Padhye, eds. 1969\u00E2\u0080\u00931970. K\u00C4\u0081\u00C5\u009Bik\u00C4\u0081. a commentary on P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammar by V\u00C4\u0081mana & Ja-293294 Ajotikar, Ajotikar and Scharfy\u00C4\u0081ditya. Sanskrit. 2 vols. Sanskrit Academy Series 17, 20. Hyderabad:Sanskrit Academy, Osmania University. Reprinted in one volume, 2008.From the Web to the desktop: IIIF-Pack, adocument format for manuscripts using LinkedData standardsTimothy BellefleurAbstract: This paper describes the implementation of a document fileformat for the representation of the composite image, text, and ad-ditional data, focusing on the use case of manuscripts. The organi-zational methodology follows emerging standards for Linked Data, aswell as some standards already in use by scholars and projects in San-skrit Digital Humanities. It also presents a model for scholars in needof organizing this relevant data to begin to do so in a manner thatfacilitates future transition into online spaces.1 IntroductionTextual scholars face a number of practical challenges when organizing andworking with their materials in the digital space. While common, establishedformats exist to serve the needs of the individual artifacts of the process(principally images and text), few adequately provide for the compilationof these pieces, and fewer still for connections between them. As a result,scholars typically must make do with collections of related files, deficientformats, and even ad hoc strategies for organizing and referencing theirdata. These issues can be particularly severe for textual projects in Indology,which often deal with large amounts of data. Solutions to these sorts ofchallenges have been developed, but are targeted towards the Internet, whereinterconnected networks of disparate data resources are commonplace andthe necessity of navigating through them requires both robust and extensiblestandards. However, as theoretically ideal as working in a networked onlinespace might be, practical concerns of access, convenience, distribution, and295296 Bellefleurportability dictate that we still often require offline documents, even if thosedocuments are of a particular composite variety.In this paper, I describe the implementation of a document file for-mat for the representation of the structured, composite image, text, andextensible additional data, focusing on the use-case of manuscripts. Theorganizational methodology follows emerging standards for Linked Data,employing the International Image Interoperability Framework (Appleby etal. 2017b), JSON-LD (Sporny, Kellogg, and Lanthaler 2014), and Web An-notation Data Model (Sanderson, Ciccarese, and Young 2017) standardsas well as their related ontologies for data description and linking. It alsoincorporates some standards already in use by scholars and projects in San-skrit Digital Humanities, such as the XML-based Text Encoding InitiativeGuidelines (TEI Consortium 2017) for textual representation.The overall objectives of this project are two-fold. First, it presentsan offline-friendly document format that can serve in many cases as a re-placement for composite document containers such as Adobe Corporations\u00E2\u0080\u0099Portable Document Format (PDF) while being immediately more extensiblefor the inclusion of textual and other related data as well as interconnectionsbetween the document components. Second, it presents a model for schol-ars in need of organizing this relevant data to begin to do so in a mannerthat facilitates future transition into online spaces with as little friction aspossible.To provide a specific use case for this project: In Dr. Adheesh Sathaye\u00E2\u0080\u0099songoing digital critical editing work on the medieval Sanskrit story collectionVet\u00C4\u0081lapa\u00C3\u00B1cavi\u00E1\u00B9\u0083\u00C5\u009Bati, the primary dataset includes scans or photographs ofsome 90 manuscripts in several different Indic scripts along with electronictext transcriptions using IAST romanization. Thus far, the most convenientsolution for organizing and storing these resources has been to compile themanuscript images into PDF files, to keep each transcription as a separatefile, and to annotate each folio in the transcription to its correspondingPDF page number. While this method is simple enough to implement andrequires no special software besides Adobe Acrobat, it is not ideal for anumber of reasons. Links between text and images are defined, but there isno easy way to navigate from one to the other; despite being representationsof the same text (facsimile image and electronic text), the files for eachrepresentation of a manuscript are essentially isolated from each other. ThePDF format\u00E2\u0080\u0094however ubiquitous and relatively efficient it functions as asimple multi-image file format\u00E2\u0080\u0094is difficult to extend, parse, and embedFrom the Web to the desktop 297into other programs, as well as relying on commercial software for optimalcreation and editing. As retrieval of specific folio images, linking textualtranscription data more directly with these images and navigating theselinks becomes necessary, it is likely that the project will need to abandon itscurrent PDF-based scheme. Adopting existing standards for Linked Dataserves the project\u00E2\u0080\u0099s goal of describing its data and relationships betweenthat data in a well-defined, extensible way, where it will come to includeeditorial secondary data in an eventual online space. However, even in theirsimplest useful implementations, these standards currently do not providecomparable simplicity or convenience to that of a single, editable documentfile. Furthermore, implementing an online-only method of storage, retrieval,and editing enforces its own complexities and restrictions on interacting withthe data, especially since the primary data objects can be conceived of simplyas discrete documents of text and image data. By adapting Linked Datamodels to the extent necessary for use in an offline idiom, we can maintainand even increase ease of use for this project (and other document-centricprojects) while simultaneously employing the same methodologies used inonline spaces.2 BackgroundThe term Linked Data comes from the World Wide Web Consortium\u00E2\u0080\u0099s Se-mantic Web project and has come to encompass a variety of standardsfor identifying, describing, representing, and exchanging structured dataand the relationships between this data using standard Web technologies.Linked Data formats follow the model of the Resource Description Frame-work (RDF), which represents data in a graph of resources (nodes or ver-tices) and the relationships (or edges) between them. A basic premise ofLinked Data is that every node should possess a unique URI (Uniform Re-source Identifier, like a web address), which may be used to retrieve it orinformation about it or be used as an unambiguous reference to the resourceby other objects. Description of the information in these resources is stan-dardized by the use of a variety of different controlled vocabularies (widelyreferred to as \u00E2\u0080\u009Contologies\u00E2\u0080\u009D) which each define a set of terms and the inter-pretation of their values.1 A number of syntax notations exist for serializing1Among the most widespread of these ontologies is the Dublin Core Metadata InitiativeTerms for general metadata (DCMI Usage Board 2012, also used widely in digital libraries298 Bellefleurthese RDF graphs in existing common formats used in data interchangesuch as XML (via RDF/XML) and JSON (via JSON-LD). Employing thismodel, further standards have been developed for the representation of spe-cific complex data structures in a well-defined manner. Among these, theInternational Image Interoperability Framework and the Web AnnotationsData Model are central to the design of this project.The International Image Interoperability Framework (IIIF) is a compre-hensive Linked Data standard providing a core model for representing imagecollections in the JSON-LD format (the Presentation API), as well as a setof additional APIs for dealing with the querying and presentation of theseresources. IIIF resources may be described in terms of their often-multiplestructural and logical arrangements and divisions and extensively linkedwith related resources such as text transcriptions and annotations. Theselinkages are facilitated through the use of the World Wide Web Consortium\u00E2\u0080\u0099sWeb Annotations Data Model, which defines a schema for creating detailedmetadata about and relationships between resources, as well as specifyinga flexible set of methods for targeting specific portions of a given resourcerelevant to the annotation or relationships involved.In the IIIF\u00E2\u0080\u0099s core model (see Figure 1), a document consists of a series ofvirtual canvases onto which content such as images are associated by meansof annotations (using the Web Annotation Data Model). These canvasesare organized into one or more sequences that provide a logical ordering.Although each of these component resources has its own unique identifier,a set of required structural resources is defined within a single manifest re-source which also contains overall metadata for the document. Additionalresource types such as collections (groupings of manifests), ranges (group-ings of canvases), annotation lists (collections of annotation data), and lay-ers (groupings of annotation lists) provide logical structures for organizingassociated data. In some cases, these additional resources may be defineddirectly in the manifest. However, most resources representing data beyondthe basic structure of a document are defined externally and their identi-fiers referenced in the manifest so that they may be retrieved as necessary.The most ubiquitous of these external resources are annotation lists, whichcontain all annotations and resource linkages for canvases besides the basicassociated images.and archival studies), the Simple Knowledge Organization System for classification andtaxonomy (Miles and Bechhofer 2009), and the Friend of a Friend ontology for describingpersons and agents (Brickley and Miller 2014).From the Web to the desktop 299Figure 1IIIF resource hierarchy (Appleby et al. 2017b).300 BellefleurThe IIIF model satisfies all the basic needs of a structured, image-baseddocument, much like Adobe\u00E2\u0080\u0099s Portable Document Format but in a more eas-ily parseable and extensible way. For composite documents like manuscripts,which may not only have multiple divisions and organizational structuresbut also associated transcription data, the advantages of the IIIF becomemore pronounced. However, since it is designed for use online Linked Dataservices and to be efficient for large quantities of variable annotations andmetadata, the IIIF specification only permits a limited number of resourcesto be fully described within its manifest and does not specify a method ofpackaging additional associated resources together. This leads to a signifi-cant quantity of individual resource files within an IIIF document\u00E2\u0080\u0099s struc-ture. For an online service, where these resources can be stored in a databaseor as a complex file structure that manages user queries, this poses a minimalchallenge to accessibility in most cases. However, if a user wishes to retrievea document\u00E2\u0080\u0099s entire collection of resources all at once, a solution must bedevised. It is a primary aim of this paper to propose such a method thatbridges the divide between complex online structure and simple documentfiles while maintaining the benefits of the IIIF model.3 IIIF-Pack Format StructureThe format proposed here, provisionally named IIIF-Pack, provides amethod for packaging an IIIF resource into a single file along with its relatedparts and optional external resources. Taking inspiration from the Open-Document (Durusau and Brauer 2011) and Microsoft Office Open XML(ISO/IEC 2012) formats, the separate resources are compiled together un-der a well-defined file and folder structure using the ubiquitous Zip archiveformat originally developed by PKWARE and implemented widely in open-source libraries such as Zlib (Gailly and Adler 2017). This file archive strat-egy solves the issue of compiling numerous files as well as providing fast,adequate compression for text-based resources and an index through whichindividual files may be efficiently extracted, appended, or removed from thearchive container. While the internal path structure of the Zip format doesnot rely on the order of files it contains\u00E2\u0080\u0094this is managed by the centraldirectory section at the end of the file\u00E2\u0080\u0094it is prudent that the IIIF-Packformat prioritizes storing static resources at the beginning of the archive tofacilitate performant modification of its contents.From the Web to the desktop 301Rather than identifying individual resources with URIs according toInternet-style addresses, identifiers for resources within the IIIF-Pack fileare defined as absolute pathnames, with the root of the archive acting asthe root of the virtual path. IIIF resources within this structure follow therecommended nomenclature for given resource types (see Figure 2) with theleading {scheme}://{host}/{prefix}/{identifier} segments omitted.2All resources types except \u00E2\u0080\u009Ccontent\u00E2\u0080\u009D3 are assumed to be in the JSON-LDgraph description format. Whenever one IIIF resource is defined withinanother, such as in the manifest, these are loaded into the resource graphalong with their parent and become directly accessible by their identifiername. Where a resource may be requested by its identifier but not alreadypresent within the loaded graph, the local file extension .json is assumedby the parser and the file is dereferenced from the archive. Accordingly, thedocument manifest is stored in the root of the archive as manifest.jsonand assigned the URI /manifest.By directly adopting the IIIF\u00E2\u0080\u0099s organizational model, IIIF-Pack benefitsfrom more than simply a standardized method of description designed forcomplex document structures. It also facilitates easy translation betweenboth online and offline spaces for projects that may want to transition atsome future point. In that case, all that is required is to extract the filesand add the appropriate {scheme}://{host}/{prefix}/{identifier} toeach resource\u00E2\u0080\u0099s identifier. This process also functions in reverse, should aproject wish to package an existing IIIF resource into a single document file.Furthermore, it remains closely compatible with software developed to workwith the IIIF standards.For the initial use case of IIIF-Pack as a document format formanuscripts, the two principal types of content resources are images andtext transcriptions. In virtually all cases, images make up the largest bulkof the data in a document, and as such, efficient image compression strategiesare critical. While the IIIF Presentation API does not specify a restrictedset of image formats to be associated with its canvases, the IIIF Image API(Appleby et al. 2017a) identifies seven commonly-supported formats which2Implementing IIIF collections in a single package is certainly possible as well, thoughnot explored here. In this case, the IIIF document \u00E2\u0080\u009Cidentifier\u00E2\u0080\u009D segment of the pathwould need to be defined and retained.3The IIIF considers \u00E2\u0080\u009CContent\u00E2\u0080\u009D type resources to be any internal resources that are notIIIF structural resources or annotations. In practice this usually means images, but mayinclude additional associated data.302 BellefleurFigure 2IIIF recommended URI patterns (Appleby et al. 2017b).users may want to extract from a document: JPEG, TIFF, PNG, GIF,JPEG2000, PDF, and WebP. The inclusion of PDF as an image format inthis list is notable not only for its wide support but also because the PDFstandard supports a variety of highly-efficient algorithms for compressingbi-level (black and white) images, in particular the JBIG2 standard (Onoet al. 2000). JBIG2 provides the highest lossless compression of bi-levelimages currently available by employing symbol recognition and arithmeticencoding and is thus invaluable in efficient storage of manuscript images,many of which may be acquired in black and white as photocopies or print-outs of microfilm. Individual JBIG2 images may be stored in their ownfile containers with a minor increase in efficiency over single-image PDFs,however, this method is less well-supported, while all major PDF librariessupport decoding the format.For textual data resources, the prevailing schema in use in Sanskrit Dig-ital Humanities is the Text Encoding Initiative\u00E2\u0080\u0099s TEI Guidelines (TEI Con-sortium 2017) and related standards such as EpiDoc (Elliott et al. 2017).The SARIT project (Wujastyk et al. 2017) is one of the leading contribu-From the Web to the desktop 303tors to the adoption of this format in digital Indology today. These XML-based standards define the representation of tagged, annotated textual con-tent, structure, and emendation in a variety of flexible ways. Associatinga given part of an XML document to an IIIF canvas is effectively accom-plished using the Web Annotations Data Model\u00E2\u0080\u0099s selector system, whichsupports identifier-based selection through the use of XPath (Robie, Dyck,and Spiegel 2017), CSS-style selectors (\u00C3\u0087elik et al. 2011), and the XPointerFramework (Grosso et al. 2003), for which the TEI standard has contributedseveral registered schemes.4 IIIF-Pack Example DocumentThe figures below describe a simple example IIIF-Pack file structure andits components, using the image and textual data for a single manuscriptfrom the aforementioned Vet\u00C4\u0081lapa\u00C3\u00B1cavim\u00CC\u0087\u00C5\u009Bati digital critical edition project.The manuscript data includes 217 folio images extracted from a PDF doc-ument into individual JBIG2-compressed files along with a single-file TEIXML transcription of the text. Figure 3 illustrates the file and folder struc-ture inside the archive. Figure 4 illustrates the opening portion of the IIIFmanifest file, including the full definition of the first canvas resource in thedocument. Figure 5 shows the annotation list for a single canvas, containingthe association of transcription data with the manuscript folio it representsusing XPointer selectors on the associated TEI document. Finally, Figure6 contains the beginning of the definition file for the transcription layerresource, which groups together all of the individual annotation lists pro-viding granular links to each folio\u00E2\u0080\u0099s transcription. Although it is omittedfrom Figure 4, the manifest resource is also directly associated with an an-notation list identifying the full TEI document as a transcription of the fullIIIF resource. The resulting IIIF-Pack file is 20% smaller in size (9.3MB)than the image source PDF (11.8MB), despite including IIIF structural andannotation data as well as a text transcription.304 BellefleurFigure 3Example IIIF-Pack internal file layout.Figure 4Beginning of manifest file for example document.From the Web to the desktop 305Figure 5Annotation list for canvas of folio 1 (verso) in example document.Figure 6Beginning of transcription layer definition in example document.306 Bellefleur5 Software SupportAs with many Linked Data standards, general-use software tools for IIIFresources are fairly sparse. The most mature of these tools is Mirador,an open-source, web-based image and annotation viewer (Project MiradorContributors 2017) designed for digital libraries and museums. The re-rum.io project, based out of the Center for Digital Humanities at St. LouisUniversity, has begun to develop online tools and services for facilitatinguser onboarding as part of their early adoption of the IIIF model (Cuba,Hegarty, and Haberberger 2017). The most noteworthy of these is a tool forgenerating IIIF manifests and other Presentation API structures from a listof images. Two actively maintained web server frameworks exist for servingimages according to the IIIF Image API specification, providing support forretrieving specific canvases or regions of IIIF resources in multiple formats.Overall, the state of the software suggests that while the IIIF standardshave gained traction, there is still significant work to be done to broadentheir scope of use. For the IIIF-Pack format, I have developed rudimentaryparsing and viewing components in order to test its viability as a proof-of-concept. These include support for a separate JBIG2-format image decoderto eliminate the overhead of using single-page PDF files where JBIG2 com-pression is desired.The development of two tools is critical to the success of this project:First, a cross-platform IIIF-Pack file viewer that can display the canvasesalong with their associated textual data according to the document\u00E2\u0080\u0099s de-fined sequences and annotations (see mock-up in Figure 7). I am examiningemploying a subset of Mirador as a basis for this application using the cross-platform Electron framework (GitHub Inc. et al. 2017) employed widely indesktop software built on web technologies. The second tool simplifies theworkflow of producing a complete IIIF resource structure from source im-ages and transcriptions and packaging them into an IIIF-Pack file. Followingthese, a library that facilitates performant, in-place editing of the IIIF-Packfile\u00E2\u0080\u0099s resources will provide the full suite of functionality for the documentformat. I expect these tools to begin public-facing testing by the first half of2018 and plan to demonstrate them at the 17th World Sanskrit Conferencein July.From the Web to the desktop 307Figure 7Side-by-side view of manuscript folio image and its TEI-XMLtranscription.308 Bellefleur6 ConclusionWhile this project is in an early stage, I hope that its utility is already appar-ent. Textual scholars need not abandon the convenience of offline documentsto benefit from advances in structuring and connecting their data. Giventhe promise of emerging new standards and the unfortunate state of exist-ing document formats in meeting a variety of needs, some way of bringingcurrent advances to bear on the practical needs of scholars is sorely needed.Furthermore, the broad extensibility of these standards leaves many addi-tional possibilities left unexplored here\u00E2\u0080\u0094for example, the inclusion of digitalstemmatic and other philological data, or metadata integration with grow-ing online databases such as PANDiT (Bronner et al. 2017). By adoptingLinked Data standards designed for the web, we may benefit from theirmanifest organizational proficiencies and future developments without re-quiring either that textual work happen natively online or that a significantreconception of the idea of a document takes place. In doing so, we canalso prepare our work for the ongoing movement into interconnected onlinespaces in a less onerous, more effective way.ReferencesAppleby, Michael, Tom Crane, Robert Sanderson, Jon Stroop, and SimeonWarner. 2017a. International Image Interoperability Framework ImageAPI. Version 2.1.1. http://iiif.io/api/image/2.1/. IIIF Consor-tium.\u00E2\u0080\u0094 2017b. International Image Interoperability Framework PresentationAPI. Version 2.1.1. http://iiif.io/api/presentation/2.1/. IIIFConsortium.Brickley, Dan and Libby Miller. 2014. FOAF Vocabulary Specification. Ver-sion 0.99\u00E2\u0080\u0094Paddington Edition. http : / / xmlns . com / foaf / spec /20140114.html.Bronner, Yigal, Omer Kesler, Andrew Ollett, Sheldon Pollock, Karl Potter,et al. 2017. PANDiT: Prosopographical Database for Indic Texts. http://www.panditproject.org/.\u00C3\u0087elik, Tantek, Elika J. Etemad, Daniel Glazman, Ian Hickson, Peter Linss,and John Williams. 2011. Selectors Level 3. Recommendation. http://www.w3.org/TR/2011/REC- css3- selectors- 20110929/. W3CWorld Wide Web Consortium.Cuba, Patric, Donal Hegarty, and Bryan Haberberger. 2017. rerum.io. Cen-ter for Digital Humanities, St. Louis University. http://rerum.io/.DCMI Usage Board. 2012. DCMI Metadata Terms. http://dublincore.org/documents/2012/06/14/dcmi- terms/. Dublin Core MetadataInitiative.Durusau, Patrick and Michael Brauer. 2011. Open Document Format forOffice Applications. Version 1.2. OASIS Standard. http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os.html.Elliott, Tom, Gabriel Bodard, Elli Mylonas, Simona Stoyanova, CharlotteTupman, Scott Vanderbilt, et al. 2017. EpiDoc Guidelines: Ancient doc-uments in TEI XML. Version 8. http://www.stoa.org/epidoc/gl/latest/.Gailly, Jean-loup and Mark Adler. 2017. Zlib Compression Library. Version1.2.11. https://zlib.net/.GitHub Inc. et al. 2017. Electron framework. http://electronjs.org.309310 BellefleurGrosso, Paul, Eve Maler, Jonathan Marsh, and Norman Walsh. 2003.XPointer Framework. Recommendation. http://www.w3.org/TR/2003/REC-xptr-framework-20030325/. W3C World Wide Web Consortium.ISO/IEC. 2012. Office Open XML File Formats \u00E2\u0080\u0094 Part 2: Open PackagingConventions. ISO/IEC Standard 29500-2:2012. http : / / standards .iso.org/ittf/PubliclyAvailableStandards/c061796_ISO_IEC_29500-2_2012.zip. International Standards Organization.Miles, Alistair and Sean Bechhofer. 2009. SKOS Simple Knowledge Organi-zation System. Recommendation. http://www.w3.org/TR/2009/REC-skos-reference-20090818/. W3C World Wide Web Consortium.Ono, Fumitaka, William Rucklidge, Ronald Arps, and Cornel Constanti-nescu. 2000. \u00E2\u0080\u009CJBIG2: The Ultimate Bi-level Image Coding Standard\u00E2\u0080\u009D. In:Proceedings of the International Conference on Image Processing, Sep10\u00E2\u0080\u009313, 2000. Institute of Electrical and Electronics Engineers (IEEE).Project Mirador Contributors. 2017. Mirador: Open-source, web based,multi-window image viewing platform. http://projectmirador.org/.Robie, Jonathan, Michael Dyck, and Josh Spiegel. 2017. XML Path Lan-guage (XPath). Version 3.1. Recommendation. https://www.w3.org/TR/2017/REC-xpath-31-20170321/. W3C World Wide Web Consor-tium.Sanderson, Robert, Paolo Ciccarese, and Benjamin Young. 2017. Web An-notation Data Model. Recommendation. https://www.w3.org/TR/2017/REC-annotation-model-20170223. W3C World Wide Web Con-sortium.Sporny, Manu, Gregg Kellogg, and Markus Lanthaler. 2014. JSON-LD 1.0:A JSON-based Serialization for Linked Data. Recommendation. https://www.w3.org/TR/2014/REC-json-ld-20140116/. W3C World WideWeb Consortium.TEI Consortium. 2017. TEI P5: Guidelines for Electronic Text Encoding andInterchange. Version 3.2.0. http://www.tei-c.org/Guidelines/P5.TEI Consortium.Wujastyk, Dominik, Patrick McAllister, Liudmila Olalde, Andrew Ollett,et al. 2017. SARIT: Search and Retrieval of Indic Texts. http://sarit.indology.info/.New Vistas to study Bhart\u00E1\u00B9\u009Bhari: Cognitive NLPJayashree Aanand Gajjam, Diptesh Kanojia and MalharKulkarniAbstract: A sentence is an important notion in the Indian grammaticaltradition. The collection of the definitions of a sentence can be foundin the text V\u00C4\u0081kyapad\u00C4\u00ABya written by Bhart\u00E1\u00B9\u009Bhari in fifth century C.E.The grammarian-philosopher Bhart\u00E1\u00B9\u009Bhari and his authoritative workV\u00C4\u0081kyapad\u00C4\u00ABya have been a matter of study for modern scholars, at leastfor more than 50 years, since Ashok Aklujkar submitted his Ph.D.dissertation at Harvard University. The notions of a sentence anda word as a meaningful linguistic unit in the language have been asubject matter for the discussion in many works that followed later on.While some scholars have applied philological techniques to criticallyestablish the text of the works of Bhart\u00E1\u00B9\u009Bhari, some others have devotedthemselves to exploring philosophical insights from them. Some othershave studied his works from the point of view of modern linguistics,and psychology. Few others have tried to justify the views by logicaldiscussions.In this paper, we present a fresh view to study Bhart\u00E1\u00B9\u009Bhari, and hisworks, especially the V\u00C4\u0081kyapad\u00C4\u00ABya. This view is from the field ofNatural Language Processing (NLP), more specifically, what is calledCognitive NLP. We have studied the definitions of a sentence givenby Bhart\u00E1\u00B9\u009Bhari at the beginning of the second chapter of V\u00C4\u0081kyapad\u00C4\u00ABya.We have researched one of these definitions by conducting an experi-ment and following the methodology of silent-reading of Sanskrit para-graphs. We collect the Gaze-behavior data of participants and ana-lyze it to understand the underlying comprehension procedure in thehuman mind and present our results. We evaluate the statistical sig-nificance of our results using the T-test and discuss the caveats of ourwork. We also present some general remarks on this experiment andthe usefulness of this method for gaining more insights into the workof Bhart\u00E1\u00B9\u009Bhari.311312 Gajjam et al1 IntroductionLanguage is an integral part of the human communication process. It ismade up of structures. There are sentences, which are made up of words,which in turn are made up of syllables. There has been a lot of discussionabout which among these is a minimal meaningful unit in the language. Thenotions of a sentence and a word have been described in different fields ofknowledge such as grammar, linguistics, philosophy, cognitive science, etc.Some provide a formal definition of a sentence, while others give the seman-tic definition. The Vy\u00C4\u0081kara\u00E1\u00B9\u0087a, M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083s\u00C4\u0081 and Ny\u00C4\u0081ya schools of thought inSanskrit literature hold some views about the nature of a sentence. Thegrammarian-philosopher Bhart\u00E1\u00B9\u009Bhari enumerated eight definitions of a sen-tence given by early grammarians and M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083sakas in the second K\u00C4\u0081\u00E1\u00B9\u0087\u00E1\u00B8\u008Da(Canto) of his authoritative work V\u00C4\u0081kyapad\u00C4\u00ABya.The question that how does a human being understand a sentence hasbeen dealt with in the field of psycholinguistics for the last 20 years. Var-ious studies conducted in the last decade have addressed this question byusing several experimental methods. There are many off-line tasks1 suchas Grammaticality Judgement task, Thematic Role Assignment task, etc.which are helpful in examining how the language-users process the com-plete sentences. In addition to these off-line techniques, psycho-linguistshave investigated a number of sophisticated on-line language comprehensionmethodologies. Some of them are behavioral methods such as AcceptabilityJudgement, Speed-Accuracy Trade-off, Eye-Movement Behavior, Self-PacedReading, etc. Some are neuro-cognitive methods such as electroencephalo-gram (EEG),2 Event-Related brain Potentials (ERPs),3 functional MagneticResonance Imaging (fMRI),4 Positron Emission Tomography (PET)5 etc.1These methodologies are called as \u00E2\u0080\u0098off-line\u00E2\u0080\u0099 because they study the comprehensionprocess after the participant performs the task, most of which are the pen-paper methods.2EEGs measure the electrical activities of the brain while performing a task by applyingelectrode/s to the scalp.3ERPs provide a very high temporal resolution. The spontaneous electrical activity ofthe brain is measured non-invasively by means of electrodes applied to the scalp (Choud-hary 2011).4fMRIs are BOLD (Blood Oxygen Level Dependent) techniques and used while study-ing both neurologically healthy adults and people with reading disabilities, mostly thebrain-damaged patients.5PETs are the neuroimaging techniques which are based on the assumptions that areasof high radioactivity are correlated with the brain activities.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 313which study the ongoing or real-time cognitive procedure while a partici-pant performs a task.This paper addresses one of the eight definitions given by Bhart\u00E1\u00B9\u009Bhari.The main goal is to study this definition from the cognitive point of viewi.e. to study the underlying comprehension procedure in human beingstaking this definition as the foundation. It also allows us to find the casesof the linguistic behavior of the readers in which this definition holds true.We use an Eye Tracker device to collect the Gaze (Eye) Movement data ofreaders during the procedure of silent reading6 of Sanskrit paragraphs.Gaze Tracking: An IntroductionGaze tracking is the process of measuring a gaze point or the movementof the participants\u00E2\u0080\u0099 eyes. The device which measures the eye-movements iscalled as Eye-Tracker. We use an \u00E2\u0080\u0098SR-Research Eyelink-1000 Plus\u00E2\u0080\u00997 whichmainly comprises of two PCs (Host PC and Display PC), a camera and aninfrared illuminator. It performs the monocular eye-tracking with a sam-pling rate of 500Hz (one sample/2 millisecond). The Host PC is used bythe supervisor for navigating through the experiment. A supervisor can setup the camera, perform the eye-calibration process, check and correct thedrifts, present the paragraphs to the readers, and record the session on theHost PC. Similarly, Display PC is used by the reader for reading the para-graphs and answering the questions. The pupil of the participant is capturedby the camera and the eye-movements are captured by the infrared illumi-nator. These eye-movements are mapped to the data that is presented tothe participant on the Display PC with the help of some image processingalgorithms.Eye-Tracker records several eye-movement parameters on the Area ofInterest (AOI) such as Pupil size, Fixations and Saccades. An AOI is anarea of the display that is of the concern, like a word or a sentence or a para-graph, which in our case is a word. A Fixation is when the gaze is focused6The oral and silent reading represents the same cognitive process. However, readersdecrease processing time on difficult words in silent as compared to oral reading. (Jueland Holmes 1981). For the current paper, we focus on the silent-reading methodology ofthe paragraphs.7More information can be found at the link: http://www.sr-research.com314 Gajjam et alon a particular interest area for 100-500 milliseconds. A Saccade8 is themovement of gaze between two fixations which occurs at an interval of 150-175 milliseconds.9 Specifically, due to its high sampling rate, Eye-Trackeris also able to capture Saccadic-Regressions and similarly Progressions. ARegression a.k.a Back-tracking is a backward-moving saccadic movement inwhich the reader looks back to something that they had read earlier. Onthe contrary, a Progression is a forward-moving saccadic path.The availability of embedded inexpensive eye-trackers on hand-held de-vices has come close to reality now. This opens avenues to get eye-trackingdata from inexpensive mobile devices from a huge population of online read-ers non-intrusively, and derive cognitive features. For instance, Cogisen: hasa patent (ID: EP2833308-A1)10 on eye-tracking using an inexpensive mobilewebcam.Till date, there has been lots of research which have been carried outusing eye movement data on various tasks such as reading (texts, poetry,musical notes, numerals), typing, scene perception, face perception, math-ematics, physics, analogies, arithmetic problem-solving and various otherdynamic situations (driving, basketball foul shooting, golf putting, tabletennis, baseball, gymnastics, walking on uneven terrain, mental rotation,interacting with the computer screens, video game playing, etc.) and me-dia communication (Lai et al. 2013) etc. Reading researchers have appliedeye-tracking for behavioral studies as surveyed by Rayner (1998). Recently,some researchers have even used this technique to explore learning processesin complex learning contexts such as emergent literacy, multimedia learning,and science problem-solving strategies.In Section 2, we discuss the related work in the fields of Sanskrit gram-matical tradition and cognitive NLP. In the next Section 3, we present ourapproach which focuses on the experimentation details and we present theanalysis and results in Section 4. Section 5 gives the evaluation of our work,8The word \u00E2\u0080\u0098Saccade\u00E2\u0080\u0099 is a French-origin word. It was Luis \u00C3\u0089mile Javal (French eyespecialist and a politician) who named the movement of the eyes as \u00E2\u0080\u0098Saccades\u00E2\u0080\u0099 for the firsttime in 19th C.9As far as human anatomy is concerned, eyes are never still; there are small move-ments/tremors of the eyes all the time. They are called as \u00E2\u0080\u0098Nystagmus\u00E2\u0080\u0099 (Rayner 1998).These eye movements are involuntary and hence not measured by the machine. The move-ments of the eyes which are deliberate, occur at the interval of 150-175 ms and they areconsidered as the features for the analysis.10http://www.sencogi.comBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 315which is followed by the Section 6 on discussion. We conclude this paper inSection 7 by suggesting possible future work.2 Related WorkIn this section, we discuss the work that has been done on the notions ofsentence and sentence-meaning by Indian and Western scholars in subsection2.1. The studies that have been carried out in the fields of Cognitive NLPare presented in subsection 2.2. We also present a bird\u00E2\u0080\u0099s eye view of ourresearch area in the figure at the end of this section.2.1 Sentence Definitions and ComprehensionSanskrit grammatical tradition is started with P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s Ashtadhyayi. P\u00C4\u0081\u00E1\u00B9\u0087iniin his work doesn\u00E2\u0080\u0099t define a sentence explicitly. However, few modern schol-ars attribute a sentence as the base of the derivational process in P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099sgrammar (Kiparsky and Staal 1969). This view is criticized by Houben(2008) and SD Joshi and Roodbergen (2008). According to some scholars,the notion of K\u00C4\u0081raka (Huet 2006) or the notion of S\u00C4\u0081marthya (Deshpande1987; Devasthali 1974) are P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s contribution to the syntax. The latterview is opposed by Mahavir (1984). After P\u00C4\u0081\u00E1\u00B9\u0087ini, K\u00C4\u0081ty\u00C4\u0081yana who wroteV\u00C4\u0081rttikas on the rules of A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB gave two definitions of the sentence11(P.2.1.1 Vt.9) (A sentence is chiefly the action-word, accompanied by theparticle, nominal words, and adjectives) and ekati\u00E1\u00B9\u0085 v\u00C4\u0081kya\u00E1\u00B9\u0083 (P.2.1.1 Vt.10)(a sentence is that [cluster of words] containing a finite verb [as an ele-ment]). for the first time, which are said to be formal in their nature andnot referring to the meaning content (Laddu 1980; Matilal 1966; Pillai 1971).Deshpande (1987) argued that K\u00C4\u0081ty\u00C4\u0081yana\u00E2\u0080\u0099s claim that each sentence musthave a finite verb relates to the deeper derivational level and not to its sur-face expressions. Hence, a sentence may or may not contain a finite verb onthe surface level and there can be a purely nominal sentence (Bronkhorst1990; H. Coward 1976; Tiwari 1997). Pata\u00C3\u00B1jali in his Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3ya discussedthe integrity of a sentence in terms of having only one finite verb. Accordingto him, a sentence must have only one finite verb, and also purely nominalsentences may not be considered as complete. The word asti (is) shouldbe understood in those sentences (Bronkhorst 1990). Modern scholars dis-11\u00C4\u0081khy\u00C4\u0081ta\u00E1\u00B9\u0083 s\u00C4\u0081vyayak\u00C4\u0081rakavi\u00C5\u009Be\u00E1\u00B9\u00A3a\u00E1\u00B9\u0087a\u00E1\u00B9\u0083 v\u00C4\u0081kya\u00E1\u00B9\u0083316 Gajjam et alcussed that a sentence having two identical finite verbs12 doesn\u00E2\u0080\u0099t militateagainst the integrity of a sentence (Deshpande 1987; Jha 1980; Laddu 1980;Pillai 1971).Bhart\u00E1\u00B9\u009Bhari, for the first time, deals with the semantic issues in thesecond K\u00C4\u0081\u00E1\u00B9\u0087\u00E1\u00B8\u008Da i.e V\u00C4\u0081kyak\u00C4\u0081\u00E1\u00B9\u0087\u00E1\u00B8\u008Da of V\u00C4\u0081kyapad\u00C4\u00ABya (VP). We can find a compre-hensive treatment on various theories of sentence and their meanings alongwith their philosophical discussions. He enumerates eight views on the no-tion of a sentence which are held by earlier theorists in India. The verseis:\u00C4\u0080khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5 sa\u00E1\u00B9\u0085gh\u00C4\u0081to j\u00C4\u0081ti\u00E1\u00B8\u00A5 sa\u00E1\u00B9\u0085gh\u00C4\u0081tavartin\u00C4\u00ABEko\u00E2\u0080\u0099navaya\u00E1\u00B8\u00A5 \u00C5\u009Babda\u00E1\u00B8\u00A5 kramo buddhyanusa\u00E1\u00B9\u0083h\u00E1\u00B9\u009Bti\u00E1\u00B8\u00A5 |Padam\u00C4\u0081dya\u00E1\u00B9\u0083 p\u00E1\u00B9\u009Bthaksarva\u00E1\u00B9\u0083 pada\u00E1\u00B9\u0083 s\u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0085k\u00E1\u00B9\u00A3amityapiV\u00C4\u0081kya\u00E1\u00B9\u0083 prati matirbhinn\u00C4\u0081 bahudh\u00C4\u0081 ny\u00C4\u0081yav\u00C4\u0081dinam || (VP.II.1-2)The definitions are as follows: (1) \u00C4\u0080khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5- The verb, (2) Sa\u00E1\u00B9\u0085gh\u00C4\u0081-ta\u00E1\u00B8\u00A5- A combination of words, (3) J\u00C4\u0081ti\u00E1\u00B8\u00A5 sa\u00E1\u00B9\u0085gh\u00C4\u0081tavartin\u00C4\u00AB - The universalin the combination of words, (4) Eko\u00E2\u0080\u0099navayava\u00E1\u00B8\u00A5 \u00C5\u009Babda\u00E1\u00B8\u00A5- An utterancewhich is one and devoid of parts, (5) Krama\u00E1\u00B8\u00A5- A sequence of words, (6)Buddhyanusa\u00E1\u00B9\u0083h\u00E1\u00B9\u009Bti\u00E1\u00B8\u00A5- The single whole meaning principle in the mind, (7)Padam\u00C4\u0081dyam- The first word, and (8) P\u00E1\u00B9\u009Bthak sarvam padam s\u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0087k\u00E1\u00B9\u00A3am-Each word having expectancy for one another. These eight views on thesentence are held by earlier grammarians and M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083sakas. They look atthe sentence from different angles depending upon the mental dispositionsformed due to their discipline in different \u00C5\u009A\u00C4\u0081stras.13The definitions j\u00C4\u0081ti\u00E1\u00B8\u00A5 sa\u00E1\u00B9\u0085gh\u00C4\u0081tavartin\u00C4\u00AB, eko\u00E2\u0080\u0099navayava\u00E1\u00B8\u00A5 \u00C5\u009Babda\u00E1\u00B8\u00A5 and bud-dhyanusa\u00E1\u00B9\u0083h\u00E1\u00B9\u009Bti\u00E1\u00B8\u00A5 can be categorized under Bhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s theory of Spho\u00E1\u00B9\u00ADawhich believes that a sentence is \u00E2\u0080\u0098a single undivided utterance\u00E2\u0080\u0099 and its mean-ing is \u00E2\u0080\u0098an instantaneous flash of insight\u00E2\u0080\u0099. This definition is studied by var-ious modern scholars in their respective works. (H. Coward 1976; Loundo2015; Pillai 1971; Raja 1968; Sriramamurti 1980; Tiwari 1997). Some mod-ern scholars have studied the theory of Spho\u00E1\u00B9\u00ADa in different perspectives.G. H. Coward (1973) showed the logical consistency and psychological ex-perience14 of Spho\u00E1\u00B9\u00ADa theory, while Houben (1989) compared Bhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s12The definition ekati\u00E1\u00B9\u0085 v\u00C4\u0081kya\u00E1\u00B9\u0083 is explained by Pata\u00C3\u00B1jali by giving the illustration ofbr\u00C5\u00ABhi br\u00C5\u00ABhi, which indicates that a verb repeated is to be regarded as the same. Kaiyya\u00E1\u00B9\u00ADa,the commentator on the Mah\u00C4\u0081bh\u00C4\u0081\u00E1\u00B9\u00A3ya, also takes the term eka as identical.13Avikalpe\u00E2\u0080\u0099pi v\u00C4\u0081ky\u00C4\u0081rthe vikalp\u00C4\u0081 bh\u00C4\u0081van\u00C4\u0081\u00E1\u00B9\u00A3ray\u00C4\u0081\u00E1\u00B8\u00A5 | (VP II.116)14Coward argues that, according to traditional Indian Yoga, the Spho\u00E1\u00B9\u00ADa view of languageis practically possible. It is both logically consistent and psychologically realizable.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 317\u00C5\u009Aabda to Saussure\u00E2\u0080\u0099s theory of sign15 (Houben 1989). Later on, Akamatsu(1993) tried to look at this theory in the philosophical and historical contextof the linguistic theory in India.In contrast with the theory of Spho\u00E1\u00B9\u00ADa, M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083sakas hold the view that asyllable has a reality of its own and the word is a sum-total of the syllablesand the sentence is only words added together. The remaining definitionssuch as \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5, sa\u00E1\u00B9\u0085gh\u00C4\u0081ta\u00E1\u00B8\u00A5, krama\u00E1\u00B8\u00A5, padam\u00C4\u0081dyam and p\u00E1\u00B9\u009Bthak sarvampadam s\u00C4\u0081k\u00C4\u0081\u00E1\u00B9\u0087k\u00E1\u00B9\u00A3am are categorized under this view. Various modern Indianscholars (Bhide 1980; Choudhary 2011; Gangopadhyay 1993; Iyer 1969; Jha1980; Sriramamurti 1980) have discussed the compositionality of a sentencein modern times. This view is also studied by various Western psycho-linguists such as Sanford and Sturt (2002), and criticized by Pagin (2009)who asserts that it is not enough to understand the meanings of the words tounderstand the meaning of the whole sentence. Studies by Foss and Hakes(1978), Davison (1984), Glucksberg and Danks (2013) and Levy et al. (2012)proved that the sequence is the important parameter in understanding theEnglish sentence. Similar studies by McEuen (1946) and Davison (1984)have shown that people usually tend to skip the first word in the sentenceunless it is semantically loaded.We study the very first definition i.e. \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5 which states thata single word \u00C4\u0081khy\u00C4\u0081ta (\u00E2\u0080\u0098The Verb\u00E2\u0080\u0099) is the sentence. The explanation of thisdefinition as given by Bhart\u00E1\u00B9\u009Bhari himself in VP.II.326 suggests that if a mereverb denotes the definite means of the action (i.e. the agent and accessory)in the sentence then that verb should also be looked upon as a sentence.16 Inthe introduction to the Amb\u00C4\u0081kartr\u00C4\u00AB commentary on the VP by Pt. Raghu-natha Sarma, he discusses this view by giving examples such as pidhehi.He mentions that when someone utters the mere verb i.e. pidhehi (\u00E2\u0080\u0098Close\u00E2\u0080\u0099[imperative]), it also necessarily conveys the karma of the action which isdv\u00C4\u0081ram (\u00E2\u0080\u0098the door\u00E2\u0080\u0099), in which case, the mere verb idhehi can be consideredas a complete sentence17 (Sarma 1980). This view is emphasized by latermodern scholars by saying that if a linguistic string is to be considered as asentence, it should have the expectancy on the level of the semantics and not15Houben suggested that in both the works a purely mental signifier plays an importantrole.16\u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Bade niyata\u00E1\u00B9\u0083 s\u00C4\u0081dhana\u00E1\u00B9\u0083 yatra gamyate |tadapyeka\u00E1\u00B9\u0083 sam\u00C4\u0081pt\u00C4\u0081rtha\u00E1\u00B9\u0083 v\u00C4\u0081kyamityabhidh\u00C4\u00AByate ||\u00E2\u0080\u009D (VP.II.326)17pidheh\u00C4\u00ABti \u00E2\u0080\u00A6atra dv\u00C4\u0081ramiti karm\u00C4\u0081k\u00E1\u00B9\u00A3ep\u00C4\u0081t parip\u00C5\u00ABr\u00E1\u00B9\u0087\u00C4\u0081rthatve \u00E2\u0080\u0098dv\u00C4\u0081ra\u00E1\u00B9\u0083 pidhehi\u00E2\u0080\u0099 iti v\u00C4\u0081kyambhavatyeva |318 Gajjam et aljust on the word-level (Laddu 1980; Pillai 1971). As stated by the commen-tator Pu\u00E1\u00B9\u0087yar\u00C4\u0081ja, this definition believes that the meaning of a sentence is ofthe nature of an action,18 which means the meaning of the finite verbbecomes the chief qualificand in the cognition that is generatedand other words in the sentence confirm that understanding of a particularaction19 (Huet 2006; Pillai 1971). Moreover, as said in the commentary,this definition does not deny the status of the sentence of the linguisticstring which contains other words besides the verb. But it emphasizes thefact that, sometimes a single verb can also convey the complete meaning,hence can be looked upon as a sentence.20 Depending upon these viewsestablished by the commentary, we can explain the word \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5in both ways viz. the compound \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5 is analyzed either as \u00C4\u0081khy\u00C4\u0081-ta\u00E1\u00B8\u00A5 eva \u00C5\u009Babda\u00E1\u00B8\u00A5 (i.e. Karmadh\u00C4\u0081raya Sam\u00C4\u0081sa- \u00E2\u0080\u0098The verb\u00E2\u0080\u0099 [itself can also beconsidered as a sentence.]) or as \u00C4\u0081khy\u00C4\u0081ta\u00E1\u00B8\u00A5 \u00C5\u009Babda\u00E1\u00B8\u00A5 yasmin tat (i.e. Bahuvr\u00C4\u00ABhiSam\u00C4\u0081sa- \u00E2\u0080\u0098the linguistic string consisting the verb\u00E2\u0080\u0099 [is a sentence.]),21 bothof which are qualified as \u00E2\u0080\u0098a sentence\u00E2\u0080\u0099. However, one cannot decide whetherthis definition leaves out purely nominal sentences when it comes to assignthe status of the sentence.22Some earlier work on this view in the field of Psycholinguistics such asMcEuen (1946) proves that in the English language, the sentence cognitiontakes place even if the verb is unavailable. The same view is put forwardlater by Choudhary (2011). He showed that in verb-final languages suchas Hindi, comprehenders do not wait for the verb in case they have not18riy\u00C4\u0081 v\u00C4\u0081ky\u00C4\u0081rhta\u00E1\u00B8\u00A5 |19Kriy\u00C4\u0081 kriy\u00C4\u0081ntar\u00C4\u0081dbhinn\u00C4\u0081 niyat\u00C4\u0081dh\u00C4\u0081ras\u00C4\u0081dhan\u00C4\u0081 |Prakr\u00C4\u0081nt\u00C4\u0081 pratipatt\u00E1\u00B9\u009Bu\u00E1\u00B9\u0087\u00C4\u0081\u00E1\u00B9\u0083 bheda\u00E1\u00B8\u00A5 sambodhahetava\u00E1\u00B8\u00A5 ||\u00E2\u0080\u009D (VP.II.414)20tatr\u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babdo v\u00C4\u0081kyamti v\u00C4\u0081din\u00C4\u0081m \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babda eva v\u00C4\u0081kyamiti n\u00C4\u0081bhipr\u00C4\u0081ya\u00E1\u00B8\u00A5 \u00E2\u0080\u00A6kintukvacid \u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Babdo\u00E2\u0080\u0099pi v\u00C4\u0081kyam, yatra k\u00C4\u0081raka\u00C5\u009Babdaprayoga\u00E1\u00B9\u0083 vin\u00C4\u0081 kevl\u00C4\u0081khy\u00C4\u0081ta\u00C5\u009Bab-daprayoge\u00E2\u0080\u0099pi v\u00C4\u0081ky\u00C4\u0081rth\u00C4\u0081vagati\u00E1\u00B8\u00A5 \u00E2\u0080\u00A6 (Amb\u00C4\u0081kartr\u00C4\u00AB on VP.II.1-2)21We, in this paper, have studied the latter view, and presented the sentences havingverbs and other words as the stimuli to the participants. For studying the first view,which requires presenting the only-verb sentences, it would have led to the loss of contextwhen it comes to the written language cognition. Hence, in stead of presenting only-verbsentences, we have dropped the agent-denoting word from the sentence, which would helpus to find out, whether the verbs express their means of actions and are as comprehensibleas the sentences having the complements too.22We also tried to present these kind of sentences, to study if the nominal sentencesare as much comprehensible as the sentences having verbs, or whether it amounts to theexcessive cognitive load in the readers which makes them to consider the verb for thebetter understanding of it.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 319been reached to it yet but they process the sentence incrementally. Thestudy by Osterhout, Holcomb, and Swinney (1994) showed that the verbhas complement-taking properties. Hence, it is the major element in theprocedure of sentence-comprehension.Considering these studies as the motivation, we test the definition of theverb by using an experimental method i.e. by using readers\u00E2\u0080\u0099 Eye Move-ment Behavior on the data which contains verbs, which contains purelynominal sentences and which lack the agents. We are aware that there mightbe some shortcomings with this definition. There can be the cases or sit-uations in which this definition doesn\u00E2\u0080\u0099t hold true or holds true partially.23The aim of this paper is to find out the cases in which it does. Hence, wecarry out an experiment to find out the situation in which this definition isvalid and also provide statistical evidence for the same.2.2 Cognitive NLPIt is very clear from the vast number of studies that Eye Movement be-havior can be used to infer cognitive processes (Groner 1985; Rayner 1998;Starr and Rayner 2001). The eye is said to be the window into the brainas quoted by Majaranta and Bulling (2014). Rayner (1998) has mentionedin his work that the reading experiments have been carried out in differentlanguages such as English, French, Dutch, Hebrew, German (Clematide andKlenne 2013), Finnish, Japanese and Chinese etc. There are few studies onIndian languages such as Hindi (Ambati and Indurkhya 2009; Choudhary2011; Husain, Vasishth, and Srinivasan 2014; Salil Joshi, Kanojia, and Bhat-tacharyya 2013) and on Telugu (Ambati and Indurkhya 2009). The writingstyle is mainly from left to right except for Hebrew (right to left). Khan,Loberg, and Hautala (2017) studied the eye movement behavior on Urdunumerals which is written bidirectionally. The orthography has been bothhorizontal and vertical (Japanese and Chinese). These works have beentaken place at various levels of language such as typographical, orthograph-ical, phonological (Miellet and Sparrow 2004), lexical (Husain, Vasishth,and Srinivasan 2014), syntactic (Fodor, Bever, and Garrett 1974), seman-tic, discourse, stylistic factors, anaphora and coreference (Rayner 1998).Few studies were conducted on fast readers versus poor readers, children23Such as in poetry, some concern is also to be given to the sequence (krama\u00E1\u00B8\u00A5) of thewords. While learning a new language, every word including first word (padam\u00C4\u0081dya\u00E1\u00B9\u0083)seems to play the major role, etc.320 Gajjam et alversus adults versus elderly adults, multilingual versus monolinguals (DeGroot 2011), normal readers versus people with reading disabilities suchas dyslexia, aphasia (Levy et al. 2012), brain damages or clinical disabil-ity (Rayner 1998), schizophrenia, Parkinson\u00E2\u0080\u0099s disease (Caplan and Futter1986) or oculomotor diseases. Various methodologies were followed suchas eye contingent display change, moving window technique, moving masktechnique, boundary paradigm, Naming task, Rapid Serial Visual Presen-tation (RSVP) versus Self-paced reading, reading silently versus readingaloud, etc.The experiments that took place on reading have been used mainly tounderstand the levels underlying the comprehension procedure. Apart fromthat, a study for word sense disambiguation for the Hindi Language wasperformed by Salil Joshi, Kanojia, and Bhattacharyya (2013) where theydiscuss the cognitive load and difficulty in disambiguating verbs amongstother part-of-speech categories. They also present a brief analysis of disam-biguating words based on different ontological categories. Martinez-G\u00C3\u00B3mezand Aizawa (2013) use Bayesian learning to quantify reading difficulty us-ing readers\u00E2\u0080\u0099 eye-gaze patterns. Mishra, Bhattacharyya, and Carl (2013)proposes a framework to predict difficulty in translation using a translator\u00E2\u0080\u0099seye-gaze patterns. Similarly, A. Joshi et al. (2014) introduce a system formeasuring the difficulties perceived by humans in understanding the senti-ment expressed in texts. From a computational perspective Mishra, Kanojia,and Bhattacharyya (2016) predict the readers\u00E2\u0080\u0099 sarcasm understandability,detects the sarcasm in the text (Mishra, Kanojia, Nagar, et al. 2017a) andanalyze the sentiment in a given sentence (Mishra, Kanojia, Nagar, et al.2016) by using various features obtained from eye-tracking.Eye-tracking has been used extensively for Natural Language Processing(NLP) applications in the field of Computer Science, apart from the immenseamount of studies done in the field of psycholinguistics. Mishra, Kanojia,Nagar, et al. (2017b) model the complexity of a scan path, and propose thequantification of lexical and syntactic complexity. They also perform sen-timent and sarcasm classification (Mishra, Dey, and Bhattacharyya 2017)using neural networks using eye-tracking data via the use of a convolutionalneural network (CNN) (LeCun et al. 1998). They refer to the confluence ofattempting to solve NLP problems via cognitive psycholinguistics as Cogni-tive NLP.Our method of analyzing eye-movement patterns in the Sanskrit lan-guage is the first of its kind and is inspired by these recent advancements.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 321The Bird\u00E2\u0080\u0099s eye view of our research area is presented in Figure 1. Thehighlighted and bold text is our research interest in the current paper.Figure 1A brief analysis of our research area3 Our ApproachWe describe our approach to dataset creation in Subsection 3.1, experimentdetails which includes participant selection in Subsection 3.2, feature de-scription in Subsection 3.3, followed by the methodology of the experimentin Subsection 3.4.3.1 Dataset CreationWe prepare a dataset of 20 documents consisting of either a prose (Total 13)or a poetry (a subh\u00C4\u0081\u00E1\u00B9\u00A3ita) (Total 7) in the Sanskrit language. Prose docu-ments mainly contain the stories taken from the texts such as Pa\u00C3\u00B1catantra,Va\u00E1\u00B9\u0083\u00C5\u009Bav\u00E1\u00B9\u009Bk\u00E1\u00B9\u00A3a\u00E1\u00B8\u00A5 and B\u00C4\u0081lan\u00C4\u00ABtikath\u00C4\u0081m\u00C4\u0081l\u00C4\u0081. Subh\u00C4\u0081\u00E1\u00B9\u00A3itas are taken from the textSubh\u00C4\u0081\u00E1\u00B9\u00A3itama\u00C3\u00B1j\u00C5\u00AB\u00E1\u00B9\u00A3\u00C4\u0081. The stories are comprised of 10-15 lines each, and eachsubh\u00C4\u0081\u00E1\u00B9\u00A3ita is 2 - 4 verse long. We create three copies of 20 paragraphs as theexperiment demands and manipulate them as follows:322 Gajjam et al\u00E2\u0080\u00A2 Type A: These are 20 documents that do not contain any changesfrom the original documents. They are kept as they were.\u00E2\u0080\u00A2 Type B: In this set of documents, we remove the finite and infiniteverbs completely which results in a syntactic violation in the respectivesentences. These are purely nominal sentences. In poetry, insteadof removing the verbs, we replace the verbs with its synonym verbto maintain the format of the poetry. The motivation behind thiskind of modification is to test how much does a verb contributes tothe comprehension of a sentence, both syntactically and semantically.There are 20 documents of this kind.\u00E2\u0080\u00A2 Type C: Here, the verbs are kept constant but we drop the kart\u00C4\u0081in the sentences. kart\u00C4\u0081 being semantically loaded in the sentence, wechoose to drop it for the demand of the experiment i.e. to investigatewhether a mere verb without its agent can denote the meaning of thewhole sentence. Kart\u00C4\u0081s are not removed from the sentences whichdid not have finite or infinite verbs in the original document to avoidthe possibility of insufficient information. This kind of modificationwill throw some light on the view that the verb itself can be consid-ered as a sentence. In Type C of poetry, the stimulus is degraded byreplacing the original finite verbs by distant-meaning finite verbs byretaining the same grammatical category. Even though these verbsbear the syntactic integrity of the sentence, they tend to be semanti-cally incompatible with the other words in the linguistic string. Thisincompatibility leads to semantic inhibition while processing it, whichin turn allows the reader to reconstruct the meaning of the sentenceall over again. There are 20 documents of this kind.The paragraphs do not contain text which readers might find difficult tocomprehend. We normalize the text to avoid issues with vocabulary. Wecontrol the orthographical, typographical, and lexical variables that mightaffect the outcome of the experiment. We maintain a constant orthographythroughout the dataset. The passages are shown in Devan\u00C4\u0081gar\u00C4\u00AB script andthe writing style is from left to right. We keep the font size large, customizethe line spacing to optimum and adjust the brightness of the screen for thecomfort of the participant. We ensure that there is no lexical complexity inthe prose. We minimize it by splitting the sandhis (total 70), separating thecompound words with the hyphens (total 51), and also by adding commasBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 323in appropriate places for the easier reading. The verses are not subject tothis kind of modification. This forms our original document. Sentences inthe original dataset vary in their nature with respect to the verbs. Thereare 7 purely nominal sentences, 33 sentences with no finite verb but thek\u00E1\u00B9\u009Bdantas and 70 sentences having at least one finite verb in them. There areno single-sentence paragraphs which eliminate the possibility of insufficientcontextual information while reading. In poetry, there are 26 finite verbs intotal, each verse having 3 to 4 finite verbs in it. Two linguists validate ourdataset with 100% agreement that the documents are not incomprehensible.This forms the ground truth for our experiment.All these types of documents (i.e. Type A, B, and C) are shuffled in sucha way that no reader gets to read both types of the same paragraph.Hence, we tried to maintain the counter-balance to remove the bias of theparagraphs. 20 of such shuffled paragraphs make one final dataset. Thereare three final datasets: Datasets 1, 2, and 3. Out of the 20 participants, 7participants are presented with Dataset 1, 6 participants with Dataset 2 andremaining 7 participants with Dataset 3. We formulated two multiple-choicequestions in each paragraph. The first question of which is one and thesame for all paragraphs which helps us get the reader\u00E2\u0080\u0099s viewpoint about themeaningfulness of the paragraph concerned. The second question is basedon the gist of that paragraph which works as a comprehension test for thereaders, which also ensures that people have read attentively and eliminatesthe cases of mindless reading. The answers given by the participants onboth questions are used by us to decide the inter-annotator agreement andthe accuracy rate.324 Gajjam et al3.2 Experiment DetailsWe chose 20 participants 24 with a background in Sanskrit.25 They havebeen learning Sanskrit for a minimum of 2 years to a maximum of morethan 10 years. The participants are neurologically healthy adults who belongto the age group of 22 to 38. They are well-acquainted with the Sanskritlanguage, however, they were not aware of the modifications made to thedatasets beforehand. All of the participants can understand, read, and speakmultiple languages. While most of the participants are native speakers ofMarathi; few of them have Kannada, Telugu, and Hindi as their nativelanguage.They are provided with a set of instructions beforehand which mentionsthe nature of the task, annotation input method, and necessity of headmovement minimization during the experiment. We also reward them fi-nancially for their efforts. They are given two sample documents before theexperiment so that they get to know the working of the experimentationprocess.3.3 Feature DescriptionThe eye-tracking device records the activity of the participant\u00E2\u0080\u0099s eye on thescreen and records various features through gaze data. We do not use allthe feature values provided by the device for our analysis, but only the oneswhich can provide us with the prominence of a word (interest-area) and inturn, show us the importance of words that belong to the same category.These are features which are calculated based on the gaze behavior of theparticipant, and we use for our analysis:24The number of participants is less owing to the restriction that we needed our readersto know Sanskrit. We chose the readers with a normal or corrected vision since the read-ers who use bi-focal eyeglasses would pose a minor possibility of erroneous eye-movementdata. Moreover, some other human-related aspects such as very dark or very light irises,downward-pointing eyelashes, naturally droopy eyelids, the headrest not fitting the per-son\u00E2\u0080\u0099s head or even the incorrigible head motions amount to the calibration fails and errorswhile reading. We aim to increase the number of participants in future experiments.25We chose to present the Sanskrit data to the participants instead of their nativelanguages because it would be more faithful to study the definition, taking the samelanguage which was the lingua franca at the time when these definitions were enlisted.Nonetheless, we also aim to conduct the same definition on the native speakers and carryout the contrastive study for a better understanding of the definition.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 3251. Fixation-based features -Studies have shown that attentional movements and fixations are obli-gatorily coupled. More fixations on a word are because of incompletelexical processes. The more cognitive load will lead to more time spenton the respective word. There are some variables that affect the timespent on the word such as word frequency, word predictability, num-ber of meanings of a word or word familiarity, etc. (Rayner 1998). Weconsider the Fixation duration, Total fixation, Fixation Count for theanalysis. These are motivated by Mishra, Kanojia, and Bhattacharyya(2016)(a) Fixation Duration (or First Fixation Duration)-First fixations are fixations occurring during the first pass read-ing. Intuitively, increased first fixation duration is associatedwith more time spent on the words, which accounts for lexicalcomplexity.(b) Total Fixation Duration (or Gaze Duration)-This is a sum of all fixation durations on the interest areas. Some-times, when there is syntactic ambiguity, a reader re-reads thealready read part of the text in order to disambiguate the text.Total fixation duration accounts for the sum of all such fixationdurations occurring during the overall reading span.(c) Fixation Count-This is the number of fixations in the interest area. If the readerreads fast, the first fixation duration may not be high even ifthe lexical complexity is more. But the number of fixations mayincrease in the text. So, fixation count may help capture lexicalcomplexity in such cases.2. Regression-based feature -Regressions are very common in complicated sentences and many re-gressions are due to comprehension failures. A short saccade to theleft is done to read efficiently. Short within-word saccades show thata reader is processing the currently fixated word. Longer regression(back the line) occur because the reader did not understand the text.Syntactic ambiguity (such as Garden Path sentences etc.), syntac-tic violation (missing words, replaced words) and syntactic unpre-dictability lead to shorter saccades and longer regressions. We consider326 Gajjam et althe feature Regression Count i.e. a total number of gaze regressionsaround the AOI (Ares of Interest).3. Skip Count -Our brain doesn\u00E2\u0080\u0099t read every letter by itself. While reading peoplekeep on jumping to the next word. The predictable target word ismore likely to be skipped than an unpredictable one. We take Skipcount as a feature to calculate the results. Skip count means whetheran interest-area was skipped or not fixated on while reading. Thisis calculated as the number of words skipped divided by total wordcount. Intuitively, higher skip count should correspond to lesser se-mantic processing requirements (assuming that skipping is not doneintentionally). Two factors have a big impact on skipping: word lengthand contextual constraint. Short words are much more likely to beskipped than long words. Second, words that are highly constrainedby the prior context are much more likely to be skipped than thosethat are not predictable. Word frequency also has an effect on wordskipping, but the effect is smaller than that of predictability.4. Run Count -Run count is the number of times an interest-area was read.5. Dwell Time-based feature -Dwell time and Dwell Time percentage i.e. the amount of time spenton an interest-area, and the percentage of time spent on it given thetotal number of words.3.4 MethodologyAs described above in Section 3.1, we modified the documents in order to testthe syntactic and semantic prominence of a verb in both prose and poetry.Such instances of modification of the data may cause a syntactic violation,semantic inhibition, and leads to insufficient information to comprehend thedocument, at the surface level of the language. It enforces the reader tore-analyze the text. The time taken to analyze a document depends on thecontext (Ivanko and Pexman 2003). While analyzing the text, the humanbrain would start processing the text in a sequential manner, with the aim ofcomprehending the literal meaning. When such an incongruity is perceived,the brain may initiate a re-analysis to reason out such disparity (KutasBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 327and Hillyard 1980). As information during reading is passed to the brainthrough eyes, incongruity may affect the way eye-gaze moves through thetext. Hence, distinctive eye-movement patterns may be observed in the caseof the successful finding of a verb, in contrast to an unsuccessful attempt.This hypothesis forms the crux of our analysis and we aim to prove thisby creating and analyzing an eye-movement database for sentence semantics.4 Analysis & ResultsAs stated above, we collect gaze data from 20 participants and use it for ouranalysis. We try to verify the first sentence definition given by Bhart\u00E1\u00B9\u009Bhari.With our work, we find that the verb is the chief contributor to thesentence-semantics and enjoys more attention than other words in the pro-cess of sentence comprehension. To study how does a reader uses a verbin constructing the meaning of a linguistic string, we analyze the time onespends on the particular verb (dwell-time percentage), the number of timesone backtracks (regression out count) or skips (skip count) the verb, thenumber of times the verb is read through (run count) and fixated upon(fixation count). We analyze these features on the verbs vs. non-verbs inDatasets 1, 2 and 3 and present the results in the Figures 2 (dwell-timepercentage), 3 (regression count) and 4 (skip count) in the form of graphs.The analysis of dwell-time percentage, regression count and skip countproves our point that verbs are prominent elements while constructing thesentence meaning. It can be clearly seen that verbs are spent moretime on, regressed about more and skipped a lesser number oftimes than non-verbs. All the participants except a few correlate withour hypothesis. We observe that in Figure 2, Participant 5 (P5) has spentless time on the verbs but we also observe, as shown in Table 1, that P5 lacksin agreement compared to the other annotators. Participants 11 (P11), 12(P12), and 18 (P18) do not lack in agreement, still, they do not read verbsas much as the other consistent participants and hence are clearly outliers.Even though these four participants have not fixated on the verb for moretime, the number of times they regressed around verbs is significantly higheras shown in Figure 3. Figure 4 shows that verbs are unanimously skippedfor a lesser number of times than non-verbs, hence it is proved that a readercannot afford to skip verbs while constructing the sentence meaning.328 Gajjam et alFigure 2A Comparison of Dwell-Time Percentage on Verbs and Non-Verbs for allDatasets, and all participantsFigure 3A Comparison of Regression Count on Verbs and Non-Verbs for allDatasets, and all participantsFigure 4A Comparison of Skip Count on Verbs and Non-Verbs for all Datasets,and all participantsBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 329We also strengthen this view by analyzing the Type A vs. Type B vs.Type C documents and also consider the answers provided by the readersin Section 6.5 EvaluationWe perform the evaluation of our work and calculate the inter-annotatoragreement (IAA) for each participant with all the others, on the samedataset. We perform this for both the questions posed to the participants,separately. We also evaluate the answers provided by the participants toensure that none of them were performing an inattentive reading of thedocuments. We show our evaluation in Tables 1, 2, and 3 for Dataset 1,2 and 3 respectively. Overall, the agreement of our participants rangesbetween 0.45 (Moderate Agreement) to 0.95 (Almost perfect Agreement)for Question 1. For Question 2, the agreement ranges from 0.5 (ModerateAgreement) to 0.95 (Almost perfect Agreement). The Accuracy (Acc),as shown in the tables, ranges from 0.6 to 1, which means that ourparticipants were substantially accurate and were attentive during the ex-periment. The inter-annotator agreement points our the tentative outliersand helps us analyze the results of our experiment. We find that both theinter-annotator agreement and accuracy of our experiment are substantial.We also perform statistical significance tests based on the standard t-testformulation assuming unequal variances for both variables, for all partici-pants and display the p-values in Tables 4, 5, 6 for Datasets 1, 2, and 3respectively. For these datasets, we compare Verbs with all the other wordsfor the features Regression Count (RC) and Skip Count (SC). We find outthat a number of regressions performed by a user around verbs are muchmore than around other words. For these features, we also show the dif-ference between the means of verbs and non-verbs (bX), and the p-value(P). Our T-Test parameters were variable values, the hypothesized meandifference was set to zero, and the expected cut-off for the T-Test is 0:05.Our evaluations show that these values are statistically significant for mostof the participants.330 Gajjam et alInter-annotator agreement (IAA) and Accuracy (Acc) ScoresQ1 Q2IAA IAA AccP1 0.7 0.5 0.6P2 0.8 0.9 0.95P3 0.8 0.9 0.9P4 0.95 0.95 0.95P5 0.45 0.85 0.9P6 0.9 0.55 0.6P7 0.85 0.7 0.8Table 1Dataset 1Q1 Q2IAA IAA AccP8 0.85 0.9 0.95P9 0.75 0.6 0.75P10 0.75 0.8 1P11 0.65 0.75 0.85P12 0.7 0.8 0.85P13 0.85 0.95 1Table 2Dataset 2Q1 Q2IAA IAA AccP14 0.8 0.8 0.75P15 0.65 0.65 0.75P16 0.85 0.9 0.95P17 0.9 0.8 0.7P18 0.75 0.85 0.85P19 0.5 0.9 0.9P20 0.8 0.7 0.8Table 3Dataset 3Mean Difference and p-values from T-Test for Regression Count(RC) and Skip Count (SC)RC SCbX P bX PP1 0.159 0.000 0.061 0.038P2 0.234 0.000 0.078 0.012P3 0.250 0.000 0.180 0.000P4 0.126 0.001 0.112 0.001P5 0.062 0.050 0.029 0.194P6 0.183 0.001 0.064 0.029P7 0.091 0.029 0.089 0.005Table 4Dataset 1ROC SCbX P bX PP8 0.141 0.001 0.129 0.000P9 0.147 0.001 0.134 0.000P10 0.112 0.005 0.143 0.000P11 0.194 0.000 0.025 0.237P12 0.163 0.003 0.012 0.364P13 0.211 0.000 0.106 0.001Table 5Dataset 2ROC SCbX P bX PP14 0.188 0.000 0.058 0.053P15 0.072 0.033 0.058 0.053P16 0.244 0.001 0.077 0.015P17 0.129 0.003 0.055 0.059P18 0.120 0.030 -0.030 0.189P19 0.021 0.247 0.044 0.106P20 0.253 0.002 0.059 0.049Table 6Dataset 3Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 3316 DiscussionWe discussed the core features of our work i.e. Dwell-time Percentage, Re-gression Count, Skip Count, Run Count, and Fixation Count in Section4. In this section, we would like to further analyze the result of work byexploring the answers provided by our participants. We break down ourdocuments into the categories of prose and poetry. In Figures 5 and 6, weshow the answer counts of our participants, when they find the documentsabsolutely non-meaningful, or lacking information i.e., somewhat meaning-ful. For all participants, over document Types A, B, and C, we find thatType A (Original Data) is marked non-meaningful least number of times.In case of a prose (Figure 5), Type B documents lack verbs. It canclearly be seen that our participants do not understand the documents mostof the time, and mark them either as completely non-meaningful or lackingin information. We do not hint them to look for verbs as psycholinguisticprinciples do not allow an experiment to be biased in the participants\u00E2\u0080\u0099 mind.Non-presence of verbs in Type B documents affects both syntax and thesemantics of the documents and it can be seen that purely nominal sentencesfail to convey the complete semantics of the sentence. In Type C for prose(Figure 5), we see that our participants are confused by the removal of agent-denoting words, but are still able to grasp the context, and hence theiranswers do not depict absolute meaninglessness of the documents. Eventhough verbs are retained in document type C, the removal of agent wordsleads to insufficient information.For poetry (Figure 6), Type B documents have the presence of synony-mous verbs, and Type C have verbs with very distant meanings and nocorrelation with the semantics of the original verb present. Hence, Type Bdocuments are marked as lacking in information by our participants manytimes as compared to Type A documents. They do not mark even one ofthem as absolutely meaningless as a synonym of a verb is present and theyare still able to grasp the context which bears a strong impact on the con-clusion we draw. On a similar note, Type C documents that have verbs butwith very distant meanings are marked lacking in information most numberof times, as a correlation cannot be established between the expected senseof the original verb and the current verb present in the document.We explore further and manually analyze the saccadic paths of our par-ticipants to find out that in document types A, B, and C, the saccadic-regressions vary as per our hypothesis. We present a sample in Figures 8,332 Gajjam et al9 and 10. For a randomly chosen single participant, who has above averageIAA and good accuracy, we find that the amount of regression on documentType c increases in comparison to Type A since the document lacks a agentin some sentences. But, for Type B, we can observe that the regressionsincrease further when the verb is completely removed from the document.As stated before, the definition that we have studied might not be validin all cases. Our aim is to find out the cases in which it does. In theconclusion of this research, we can say that we have found one such casein which Bhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s definition \u00C4\u0080khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5 is valid and that is: whenthe lexical complexity is minimized in the Sanskrit texts, readers rely on theverbs in order to understand the complete meaning of the sentence, withoutwhich the sentence-meaning seems incomplete. Hence, we can conclude thatverbs play the most important role in the syntax and semantics of asentence, nonetheless, in most of the cases, they demand their complements(i.e. means of action) to represent the complete semantics of a sentence.We can also conclude that the purely nominal sentences in Sanskrit are lessmeaningful than the corresponding original sentences.Similarly, we would also like to present Figures 12 (Run Count) and 13(Fixation Count) which further strengthen our discussion. We can see inboth the figures that a number of times a verb has been read is always morethan the number of times other words have been read.Figure 5For ProseFigure 6For PoetryFigure 7Meaninglessness of documents as reported by Participants on differentdocument setsBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 333Figure 8Regressions on Type-AFigure 9Regressions on Type-BFigure 10Regressions on Type-CFigure 11Regression sample from a participantFigure 12A Comparison of Run Count on Verbs and Non-Verbs for all Datasets,and all participantsFigure 13A Comparison of Fixation Count on Verbs and Non-Verbs for all Datasets,and all participants334 Gajjam et alLimitationsThe data selected for our experiment does not vary in its nature. We onlyuse stories in prose, and the poetry is also borrowed from the same text.We would like to clearly state that we know this is a limitation of our work.It will be more insightful to conduct similar experiments on different kindsof texts. For the same experiment on \u00E2\u0080\u0098verbs\u00E2\u0080\u0099, data can also be modified inmany other ways. Moreover, a spoken word, when accompanied by gestureand facial expression and when given a special intonation, can convey muchmore than the written word. This experiment it limited to the writtensentences only and it tests the comprehension only from the reader\u00E2\u0080\u0099s pointof view.7 Conclusion & Future WorkWe present a fresh view to study Bhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s V\u00C4\u0081kyapad\u00C4\u00ABya, especially thedefinitions given by him on the syntactic and the semantic level. We picksentence definition one viz. \u00C4\u0080khy\u00C4\u0081ta\u00C5\u009Babda\u00E1\u00B8\u00A5, that the \u00E2\u0080\u009Cverb\u00E2\u0080\u009D can also beconsidered as a sentence. We discuss his work in brief and perform an ex-periment to study this definition from a cognitive point of view. We employthe eye-tracking technique and follow the methodology of silent-reading ofSanskrit paragraphs to perform the above-mentioned experiment in order tohave a better understanding of the definition. We aim to extend our workunder the purview of Cognitive NLP and use it to resolve computationalproblems. With our work, we open a new vista for studying sentence defini-tions in the cognitive point of view by following an investigational technique.Our results show that humans tend to read verbs more than they readother words and they are deemed most important. We assert that verbs playa prominent role in the syntax and semantics of a sentence, nonetheless, inmost of the cases, they demand their complements to represent the com-plete semantics of a sentence. It is proved that a human being, cognitively,searches for a verb in a sentence, without which the unity of a sentence tendsto be incomplete. Purely nominal sentences in the Sanskrit language are lessmeaningful than the original sentences. We show the statistical significanceof our results and evaluate them using the standard T-test formulation. Wealso discuss the manual analysis of saccadic paths and answers given by ourparticipants to verify our results. We are aware that, the method followedBhart\u00E1\u00B9\u009Bhari: Cognitive NLP 335by us is one way of justifying Bhart\u00E1\u00B9\u009Bhari and there could be other ways thatcan strengthen the same results.In the future, we aim to conduct more experiments on different kindsof texts in the Sanskrit language which have different sentence-constructionstyles. For the same experiment on \u00E2\u0080\u0099verbs\u00E2\u0080\u0099, data can also be modified in otherways such as- changing the place of the verb in the sentence, removing thesentence boundary markers, replacing the conjunctions, negatives, discoursemarkers, etc. We also aim to verify other sentence definitions using eye-tracking. We would like to employ other tools such as EEG and work inmulti-lingual settings to further delve deeper into the cognition of a humanmind so that we can understand the definition in a better perspective. Wewould also like to study the comprehension among the native speakers vs.bilingual so that we can study whether the definitions by Bhart\u00E1\u00B9\u009Bhari aregeneric in nature. We hope to gain more insights into the field of CognitiveNLP with the help of our work.AcknowledgementsWe thank our senior colleague Dr. Abhijit Mishra who provided insightsand expertise that greatly assisted this research. We are grateful to VasudevAital for his assistance in the data-checking process and all the participantsfor being part of this research. We would also like to extend our gratitudeto the reviewers for their comments on an earlier version of the manuscript,although any errors are our own.ReferencesAkamatsu, Akihiko. 1993. \u00E2\u0080\u009CPratibh\u00C4\u0081 and the Meaning of the Sentence inBhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s V\u00C4\u0081kyapad\u00C4\u00ABya\u00E2\u0080\u009D. Bhart\u00E1\u00B9\u009Bhari: Philosopher and Grammarian,.eds. Bhate S. and J. Bronkhorst.Pp. 37\u00E2\u0080\u009344.Ambati, Bharat Ram and Bipin Indurkhya. 2009. \u00E2\u0080\u009CEffect of jumbling in-termediate words in Indian languages: An eye-tracking study\u00E2\u0080\u009D. In: Pro-ceedings of the 31st Annual Conference of the Cognitive Science Society,Amsterdam, Netherlands.Bhide, V. V. 1980. \u00E2\u0080\u009CThe Concept of the Sentence and the Sentence-Meaningaccording to the P\u00C5\u00ABrva-M\u00C4\u00ABm\u00C4\u0081\u00E1\u00B9\u0083s\u00C4\u0081\u00E2\u0080\u009D. In: Proceedings of the Winter Insti-tute on Ancient Indian Theories on Sentence-Meaning. University ofPoona.Bronkhorst, Johannes. 1990. \u00E2\u0080\u009CP\u00C4\u0081\u00E1\u00B9\u0087ini and the nominal sentence\u00E2\u0080\u009D. Annals ofthe Bhandarkar Oriental Research Institute 71.1/4pp. 301\u00E2\u0080\u0093304.Caplan, David and Christine Futter. 1986. \u00E2\u0080\u009CThe roles of sequencing andverbal working memory in sentence comprehension deficits in Parkinson\u00E2\u0080\u0099sdisease\u00E2\u0080\u009D. Brain and Language 27.1pp. 117\u00E2\u0080\u0093134.Choudhary, Kamal Kumar. 2011. \u00E2\u0080\u009CIncremental argument interpretation ina split ergative language: neurophysiological evidence from Hindi\u00E2\u0080\u009D. PhDthesis. Max Planck Institute for Human Cognitive and Brain Sciences,Leipzig, Germany.Clematide, Simon and M Klenne. 2013. \u00E2\u0080\u009CDisambiguation of the semantics ofGerman prepositions: A case study\u00E2\u0080\u009D. In: Proceedings of the 10th Interna-tional Workshop on Natural Language Processing and Cognitive Science,France, pp. 137\u00E2\u0080\u0093150.Coward, George Harold. 1973. \u00E2\u0080\u009CA Philosophical and Psychological Analysisof the Sphota Theory of Language as Revelation\u00E2\u0080\u009D. PhD thesis. McMasterUniversity, Hamilton, Ontario.Coward, Harold. 1976. \u00E2\u0080\u009CLanguage as Revelation\u00E2\u0080\u009D. Indian PhilosophicalQuarterly 4pp. 447\u00E2\u0080\u0093472.Davison, Alice. 1984. \u00E2\u0080\u009CSyntactic markedness and the definition of sentencetopic\u00E2\u0080\u009D. Languagepp. 797\u00E2\u0080\u0093846.De Groot, Annette MB. 2011. Language and cognition in bilinguals andmultilinguals: An introduction. Psychology Press, London, U.K.336Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 337Deshpande, Madhav M. 1987. \u00E2\u0080\u009CP\u00C4\u0081\u00E1\u00B9\u0087inian syntax and the changing notionof sentence\u00E2\u0080\u009D. Annals of the Bhandarkar Oriental Research Institute68.1/4pp. 55\u00E2\u0080\u009398.Devasthali, GV. 1974. \u00E2\u0080\u009CVakya according to the Munitraya of Sanskrit Gram-mar\u00E2\u0080\u009D. Charudeva Shastri felicitation volumepp. 206\u00E2\u0080\u0093215.Fodor, Jerry Alan, Thomas G Bever, and Merrill F Garrett. 1974. The psy-chology of language. McGraw Hill, New York.Foss, Donald J and David T Hakes. 1978. Psycholinguistics: An introductionto the psychology of language. Prentice Hall, New Jersey, U.S.Gangopadhyay, Malaya. 1993. \u00E2\u0080\u009CTraditional views on sentential meaning andits implication on language pedagogy\u00E2\u0080\u009D. Indian linguistics 54.1-4pp. 87\u00E2\u0080\u009396.Glucksberg, Sam and Joseph H Danks. 2013. Experimental Psycholinguis-tics (PLE: Psycholinguistics): An Introduction. Vol. 3. Psychology Press,London, U.K.Groner, Rudolf. 1985. Eye movements and human information processing.Vol. 9. North-Holland Publishing Co.Houben, Jan EM. 1989. \u00E2\u0080\u009CThe sequencelessness of the signifier in Bhartrhari\u00E2\u0080\u0099stheory of language\u00E2\u0080\u009D. In: Proceedings of the Seventh World Sanskrit Con-ference, Leiden. Indologica Taurinensia, pp. 119\u00E2\u0080\u0093129.\u00E2\u0080\u0094 2008. \u00E2\u0080\u009CP\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s grammar and its computerization: a construction gram-mar approach\u00E2\u0080\u009D. In: Sanskrit Computational Linguistics. Springer, pp. 6\u00E2\u0080\u009325.Huet, G\u00C3\u00A9rard. 2006. \u00E2\u0080\u009CShallow syntax analysis in Sanskrit guided by semanticnets constraints\u00E2\u0080\u009D. In: Proceedings of the 2006 international workshop onResearch issues in digital libraries. ACM, p. 6.Husain, Samar, Shravan Vasishth, and Narayanan Srinivasan. 2014. \u00E2\u0080\u009CInte-gration and prediction difficulty in Hindi sentence comprehension: evi-dence from an eye-tracking corpus\u00E2\u0080\u009D. Journal of Eye Movement Research8.2.Ivanko, Stacey L and Penny M Pexman. 2003. \u00E2\u0080\u009CContext incongruity andirony processing\u00E2\u0080\u009D. Discourse Processes 35.3pp. 241\u00E2\u0080\u0093279.Iyer, KA Subramania. 1969. Bhart\u00E1\u00B9\u009Bhari: A study of the V\u00C4\u0081kyapad\u00C4\u00ABya in thelight of the ancient commentaries. Vol. 68. Deccan College Postgraduateand Research Institute, Poona.Jha, V. N. 1980. \u00E2\u0080\u009CNaiy\u00C4\u0081yikas Concept of Pada and V\u00C4\u0081kya\u00E2\u0080\u009D. In: Proceedings ofthe Winter Institute on Ancient Indian Theories on Sentence-Meaning.University of Poona.338 Gajjam et alJoshi, Aditya, Abhijit Mishra, Nivvedan Senthamilselvan, and PushpakBhattacharyya. 2014. \u00E2\u0080\u009CMeasuring Sentiment Annotation Complexity ofText\u00E2\u0080\u009D. In: Association of Computational Linguistics (Daniel Marcu 22June 2014 to 27 June 2014). Vol. 2. Association for Computational Lin-guistics.Joshi, Salil, Diptesh Kanojia, and Pushpak Bhattacharyya. 2013. \u00E2\u0080\u009CMorethan meets the eye: Study of Human Cognition in Sense Annotation.\u00E2\u0080\u009DIn: HLT-NAACL, pp. 733\u00E2\u0080\u0093738.Joshi, SD and JAF Roodbergen. 2008. \u00E2\u0080\u009CSome observations regardingP\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u0099s A\u00E1\u00B9\u00A3\u00E1\u00B9\u00AD\u00C4\u0081dhy\u00C4\u0081y\u00C4\u00AB\u00E2\u0080\u009D. Annals of the Bhandarkar Oriental Research In-stitute 89pp. 109\u00E2\u0080\u0093128.Juel, Connie and Betty Holmes. 1981. \u00E2\u0080\u009COral and silent reading of sentences\u00E2\u0080\u009D.Reading Research Quarterlypp. 545\u00E2\u0080\u0093568.Khan, Azizuddin, Otto Loberg, and Jarkko Hautala. 2017. \u00E2\u0080\u009COn the EyeMovement Control of Changing Reading Direction for a Single Word:The Case of Reading Numerals in Urdu\u00E2\u0080\u009D. Journal of PsycholinguisticResearchpp. 1\u00E2\u0080\u009311.Kiparsky, Paul and Johan F Staal. 1969. \u00E2\u0080\u009CSyntactic and semantic relationsin P\u00C4\u0081\u00E1\u00B9\u0087ini\u00E2\u0080\u009D. Foundations of Languagepp. 83\u00E2\u0080\u0093117.Kutas, Marta and Steven A Hillyard. 1980. \u00E2\u0080\u009CReading senseless sentences:Brain potentials reflect semantic incongruity\u00E2\u0080\u009D. Science 207.4427pp. 203\u00E2\u0080\u0093205.Laddu, S. D. 1980. \u00E2\u0080\u009CThe Concept of V\u00C4\u0081kya According to K\u00C4\u0081ty\u00C4\u0081yana andPata\u00C3\u00B1jali\u00E2\u0080\u009D. In: Proceedings of the Winter Institute on Ancient IndianTheories on Sentence-Meaning. University of Poona.Lai, Meng-Lung, Meng-Jung Tsai, Fang-Ying Yang, Chung-Yuan Hsu, Tzu-Chien Liu, Silvia Wen-Yu Lee, Min-Hsien Lee, Guo-Li Chiou, Jyh-ChongLiang, and Chin-Chung Tsai. 2013. \u00E2\u0080\u009CA review of using eye-tracking tech-nology in exploring learning from 2000 to 2012\u00E2\u0080\u009D. Educational ResearchReview 10pp. 90\u00E2\u0080\u0093115.LeCun, Yann et al. 1998. \u00E2\u0080\u009CLeNet-5, convolutional neural networks\u00E2\u0080\u009D. URL:http://yann. lecun. com/exdb/lenetp. 20.Levy, Joshua, Elizabeth Hoover, Gloria Waters, Swathi Kiran, David Ca-plan, Alex Berardino, and Chaleece Sandberg. 2012. \u00E2\u0080\u009CEffects of syntac-tic complexity, semantic reversibility, and explicitness on discourse com-prehension in persons with aphasia and in healthy controls\u00E2\u0080\u009D. AmericanJournal of Speech-Language Pathology 21.2S154\u00E2\u0080\u0093S165.Bhart\u00E1\u00B9\u009Bhari: Cognitive NLP 339Loundo, Dilip. 2015. \u00E2\u0080\u009CBhart\u00E1\u00B9\u009Bhari\u00E2\u0080\u0099s Linguistic Ontology and the Semanticsof \u00C4\u0080tmanepada\u00E2\u0080\u009D. Sophia 54.2pp. 165\u00E2\u0080\u0093180.Mahavir. 1984. Samartha Theory of P\u00C4\u0081\u00E1\u00B9\u0087ini and Sentence Derivation. Mun-shiram Manoharlal Publishers, New Delhi.Majaranta, P\u00C3\u00A4ivi and Andreas Bulling. 2014. \u00E2\u0080\u009CEye tracking and eye-basedhuman\u00E2\u0080\u0093computer interaction\u00E2\u0080\u009D. In: Advances in physiological computing.Springer, pp. 39\u00E2\u0080\u009365.Martinez-G\u00C3\u00B3mez, Pascual and Akiko Aizawa. 2013. \u00E2\u0080\u009CDiagnosing causes ofreading difficulty using bayesian networks\u00E2\u0080\u009D. In: Proceedings of the SixthInternational Joint Conference on Natural Language Processing, Nagoya,Japan, pp. 1383\u00E2\u0080\u00931391.Matilal, Bimal Krishna. 1966. \u00E2\u0080\u009CIndian Theorists on the Nature of the Sen-tence (v\u00C4\u0081kya)\u00E2\u0080\u009D. Foundations of Languagepp. 377\u00E2\u0080\u0093393.McEuen, Kathryn. 1946. \u00E2\u0080\u009CIs the Sentence Disintegrating?\u00E2\u0080\u009D The EnglishJournal 35.8pp. 433\u00E2\u0080\u0093438.Miellet, S\u00C3\u00A9bastien and Laurent Sparrow. 2004. \u00E2\u0080\u009CPhonological codes are as-sembled before word fixation: Evidence from boundary paradigm in sen-tence reading\u00E2\u0080\u009D. Brain and language 90.1pp. 299\u00E2\u0080\u0093310.Mishra, Abhijit, Pushpak Bhattacharyya, and Michael Carl. 2013. \u00E2\u0080\u009CAuto-matically Predicting Sentence Translation Difficulty\u00E2\u0080\u009D. In: Proceedings ofthe 51st Annual Conference of Association for Computational Linguistics(ACL), Sofia, Bulgaria.Mishra, Abhijit, Kuntal Dey, and Pushpak Bhattacharyya. 2017. \u00E2\u0080\u009CLearningCognitive Features from Gaze Data for Sentiment and Sarcasm Clas-sification using Convolutional Neural Network\u00E2\u0080\u009D. In: Proceedings of the55th Annual Meeting of the Association for Computational Linguistics(Volume 1: Long Papers), Vancouver, Canada. Vol. 1, pp. 377\u00E2\u0080\u0093387.Mishra, Abhijit, Diptesh Kanojia, and Pushpak Bhattacharyya. 2016. \u00E2\u0080\u009CPre-dicting Readers\u00E2\u0080\u0099 Sarcasm Understandability by Modeling Gaze Behav-ior.\u00E2\u0080\u009D In: The 30th AAAI Conference on Artificial Intelligence, pp. 3747\u00E2\u0080\u00933753.Mishra, Abhijit, Diptesh Kanojia, Seema Nagar, Kuntal Dey, and Push-pak Bhattacharyya. 2016. \u00E2\u0080\u009CLeveraging Cognitive Features for SentimentAnalysis\u00E2\u0080\u009D. In: Proceedings of The 20th SIGNLL Conference on Compu-tational Natural Language Learning, Berlin, Germany. Association forComputational Linguistics, pp. 156\u00E2\u0080\u0093166.\u00E2\u0080\u0094 2017a. \u00E2\u0080\u009CHarnessing Cognitive Features for Sarcasm Detection\u00E2\u0080\u009D. CoRRabs/1701.05574.340 Gajjam et alMishra, Abhijit, Diptesh Kanojia, Seema Nagar, Kuntal Dey, and PushpakBhattacharyya. 2017b. \u00E2\u0080\u009CScanpath Complexity: Modeling Reading EffortUsing Gaze Information.\u00E2\u0080\u009D In: The 31st AAAI conference on ArtificialIntelligence, pp. 4429\u00E2\u0080\u00934436.Osterhout, Lee, Phillip J Holcomb, and David A Swinney. 1994. \u00E2\u0080\u009CBrainpotentials elicited by garden-path sentences: evidence of the applicationof verb information during parsing.\u00E2\u0080\u009D Journal of Experimental Psychology:Learning, Memory, and Cognition 20.4p. 786.Pagin, Peter. 2009. \u00E2\u0080\u009CCompositionality, understanding, and proofs\u00E2\u0080\u009D. Mind118.471pp. 713\u00E2\u0080\u0093737.Pillai, K. Raghavan. 1971. Studies in the V\u00C4\u0081kyapad\u00C4\u00ABya, Critical Text of Can-tos I and II. Motilal Banarsidass, Delhi.Raja, K Kunjanni. 1968. Indian theories of meaning. The Adyar Libraryand Research Center, Chennai, Tamil Nadu.Rayner, Keith. 1998. \u00E2\u0080\u009CEye movements in reading and information process-ing: 20 years of research.\u00E2\u0080\u009D Psychological bulletin 124.3p. 372.Sanford, Anthony J and Patrick Sturt. 2002. \u00E2\u0080\u009CDepth of processing in lan-guage comprehension: Not noticing the evidence\u00E2\u0080\u009D. Trends in cognitivesciences 6.9pp. 382\u00E2\u0080\u0093386.Sarma, Raghunatha. 1980. Vakyapadiya Part II. Sampurnanand SanskritVishvavidyalaya, Varanasi.Sriramamurti, P. 1980. \u00E2\u0080\u009CThe Meaning of a Sentence is Pratibh\u00C4\u0081\u00E2\u0080\u009D. In: Pro-ceedings of the Winter Institute on Ancient Indian Theories on Sentence-Meaning. University of Poona.Starr, Matthew S and Keith Rayner. 2001. \u00E2\u0080\u009CEye movements during reading:Some current controversies\u00E2\u0080\u009D. Trends in cognitive sciences 5.4pp. 156\u00E2\u0080\u0093163.Tiwari, DN. 1997. \u00E2\u0080\u009CBhartrhari on the Indivisibility of Single-word Ex-pressions and Subordinate Sentences\u00E2\u0080\u009D. Indian Philosophical Quarterly24pp. 197\u00E2\u0080\u0093216."@en . "Conference Paper"@en . "10.14288/1.0391834"@en . "eng"@en . "Reviewed"@en . "Vancouver : University of British Columbia Library"@en . "Attribution-NonCommercial-NoDerivatives 4.0 International"@* . "http://creativecommons.org/licenses/by-nc-nd/4.0/"@* . "Faculty"@en . "Computational Sanskrit & Digital Humanities : Selected Papers Presented at the 17th World Sanskrit Conference, July 9-13, 2018"@en . "Text"@en . "http://hdl.handle.net/2429/74653"@en .