Digital Sanskrit philology tutorial
Wednesday, 18 December 2019
Introduction
Given the enormous body of knowledge composed in the Sanskrit language over the past three millennia --- the largest body of literature in the world prior to the invention of the printing press --- it is imperative that the texts that convey this knowledge be brought into the digital medium. Making Sanskrit texts accessible in the digital medium will expose them to digital finding aids and to modern computational methods of knowledge discovery thereby facilitating the utility of this knowledge in current research and its application in society.
The tutorial requires only minimal familiarity with Sanskrit: some school-level instruction in India or a first-year university course abroad. The tutorial will instruct participants in how to create digital editions of Sanskrit texts that precisely document the use of characters, prose sentences, paragraphs, verses, and larger document divisions, that precisely indicate inflectional identification and morphological analysis, that precisely indicate the source of the text in a printed edition or elsewhere, that precisely document the language, script, and reference structure of the text, all in accordance with the most widely accepted guidelines for machine-readable access, the Text-Encoding Initiative Guidelines for XML markup.
Specific skills
In the tutorial participants will learn how to:
- encode Sanskrit texts precisely in a simple ASCII encoding scheme that covers all of Sanskrit including Vedic, the Sanskrit Library Phonetic basic encoding (SLP1).
- mark up the structure of the text and the document format in parallel in XML in accordance with the Text-Encoding Initiative (TEI) guidelines.
- validate the structure of XML and TEI documents against a document type definition (DTD) using XML validation tools.
- use regular expressions and replacement expressions to search and replace within a text document to facilitate markup.
- use meter analysis software to identify verses.
- create bibliographic information in accordance with the TEI guidelines.
- create a TEI document header that describes the editor, contents, source, structure, conventions, revision history, and other features of a TEI document.
- use the Sanskrit Library's TEITAgger software which utilizes the preceding technologies to semi-automatically create TEI documents from text files.
Speakers
- Peter M. Scharf, Fellow, Indian Institute of Advanced Study, Shimla
- Tanuja P. Ajotikar, Assistant Professor, Vyākaraṇa Vibhāga, Shree Somnath University, Veraval
Venue
The sixteenth International Conference on Natural Language Processing (ICON-2019), pre-conference tutorial
Language Technologies Research Centre (LTRC), Seminar Room
International Institute of Information Technology
Professor C. R. Rao Road, Gachibowli
Hyderabad, Telangana 500032 INDIA
Pre-tutorial preparation
Download and install a good text editor:
- for MacOSX: https://www.barebones.com/products/bbedit/
- for Windows, Linux, or MacOSX: https://www.geany.org
Download and install an XML validator:
- https://www.oxygenxml.com [for purchase, or]
- https://sourceforge.net/projects/camprocessor/ [free]
Schedule
Time | Topic |
---|---|
10:00am | Introduction to SLP1, XML, TEI and their use in digital philology |
11:00am | Using TEI for critical editing, and morphological analysis |
11:30am | Practicum: Encode a text |
12:30pm | Lunch |
1:30pm | Regular expressions introduction and practicum |
2:00pm | Metrical identification |
2:30pm | Practicum: Using TEITAgger |
3:30pm | Group presentations |
4:00pm | Followup and Outlook: Text encoding and related computing projects |
4:30pm | End |
Readings
Character encoding
- Sanskrit Library Phonetic ASCII encoding help page
- Linguistic Issues in Encoding Sanskrit, Appendix B
- Linguistic Issues in Encoding Sanskrit
Higher-level encoding
- Peter M. Scharf, “TEITagger: Raising the standard for digital texts to facilitate interchange with linguistic software”
- Gérard Huet and Idir Lankri, “Preliminary Design of a Sanskrit Corpus Manager”
- Tanuja P. Ajotikar, Anuja P. Ajotikar, and Peter M. Scharf, “Enriching the digital edition of the Kāśikāvr̥tti by adding variants from the Nyāsa and Padamañjarī”
Metrical analysis
References
- Ajotikar, Tanuja P., Anuja P. Ajotikar, and Peter M. Scharf. 2018. “Enriching the digital edition of the Kāśikāvrtti by adding variants from the Nyāsa and Padamañjarī.” Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018, ed. by Gérard P. Huet and Amba P. Kulkarni, pp. 207–18.
- Consortium, TEI, ed. 2007. TEI P5: Guidelines for electronic text encoding and interchange. Version 3.2.0. TEI Consortium. URL: http: //www.tei-c.org/Guidelines/.
- Huet, Gérard P. and Amba P. Kulkarni, eds. Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018