Digital Sanskrit philology tutorial

Wednesday, 18 December 2019

Introduction

Given the enormous body of knowledge composed in the Sanskrit language over the past three millennia --- the largest body of literature in the world prior to the invention of the printing press --- it is imperative that the texts that convey this knowledge be brought into the digital medium. Making Sanskrit texts accessible in the digital medium will expose them to digital finding aids and to modern computational methods of knowledge discovery thereby facilitating the utility of this knowledge in current research and its application in society.

The tutorial requires only minimal familiarity with Sanskrit: some school-level instruction in India or a first-year university course abroad. The tutorial will instruct participants in how to create digital editions of Sanskrit texts that precisely document the use of characters, prose sentences, paragraphs, verses, and larger document divisions, that precisely indicate inflectional identification and morphological analysis, that precisely indicate the source of the text in a printed edition or elsewhere, that precisely document the language, script, and reference structure of the text, all in accordance with the most widely accepted guidelines for machine-readable access, the Text-Encoding Initiative Guidelines for XML markup.

Specific skills

In the tutorial participants will learn how to:

encode Sanskrit texts precisely in a simple ASCII encoding scheme that covers all of Sanskrit including Vedic, the Sanskrit Library Phonetic basic encoding (SLP1).
mark up the structure of the text and the document format in parallel in XML in accordance with the Text-Encoding Initiative (TEI) guidelines.
validate the structure of XML and TEI documents against a document type definition (DTD) using XML validation tools.
use regular expressions and replacement expressions to search and replace within a text document to facilitate markup.
use meter analysis software to identify verses.
create bibliographic information in accordance with the TEI guidelines.
create a TEI document header that describes the editor, contents, source, structure, conventions, revision history, and other features of a TEI document.
use the Sanskrit Library's TEITAgger software which utilizes the preceding technologies to semi-automatically create TEI documents from text files.

Speakers

Peter M. Scharf, Fellow, Indian Institute of Advanced Study, Shimla
Tanuja P. Ajotikar, Assistant Professor, Vyākaraṇa Vibhāga, Shree Somnath University, Veraval

Venue

The sixteenth International Conference on Natural Language Processing (ICON-2019), pre-conference tutorial

Language Technologies Research Centre (LTRC), Seminar Room

International Institute of Information Technology
Professor C. R. Rao Road, Gachibowli
Hyderabad, Telangana 500032 INDIA

Pre-tutorial preparation

Download and install a good text editor:

for MacOSX: https://www.barebones.com/products/bbedit/
for Windows, Linux, or MacOSX: https://www.geany.org

Download and install an XML validator:

https://www.oxygenxml.com [for purchase, or]
https://sourceforge.net/projects/camprocessor/ [free]

Schedule

Time	Topic
10:00am	Introduction to SLP1, XML, TEI and their use in digital philology
11:00am	Using TEI for critical editing, and morphological analysis
11:30am	Practicum: Encode a text
12:30pm	Lunch
1:30pm	Regular expressions introduction and practicum
2:00pm	Metrical identification
2:30pm	Practicum: Using TEITAgger
3:30pm	Group presentations
4:00pm	Followup and Outlook: Text encoding and related computing projects
4:30pm	End

Readings

Character encoding

Higher-level encoding

Metrical analysis

References

Ajotikar, Tanuja P., Anuja P. Ajotikar, and Peter M. Scharf. 2018. “Enriching the digital edition of the Kāśikāvrtti by adding variants from the Nyāsa and Padamañjarī.” Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018, ed. by Gérard P. Huet and Amba P. Kulkarni, pp. 207–18.
Consortium, TEI, ed. 2007. TEI P5: Guidelines for electronic text encoding and interchange. Version 3.2.0. TEI Consortium. URL: http: //www.tei-c.org/Guidelines/.
Huet, Gérard P. and Amba P. Kulkarni, eds. Computational Sanskrit and Digital Humanities: selected papers presented at the 17th World Sanskrit Confer- ence, University of British Columbia, Vancouver, 9–13 July 2018