The Sanskrit Library | Vedic Unicode Proposal

Extending the Unicode Standard to accomodate Vedic

Unicode is a standard for encoding world scripts that in the last few years has become the minimum implementation standard universally recognized by software and hardware developers. While the Unicode Standard previously included the encoding of several Indic scripts, it failed to include characters necessary for the adequate representation of Vedic, the most ancient texts of India of greatest interest to linguists and of enormous cultural importance to Hindus. The International Digital Sanskrit Library Integration project, funded by the U.S. National Science Foundation's Division of Information and Intelligent Systems under grant number 0535207, developed a successful proposal to include Vedic characters in the Unicode Standard by coordinating its activities with those of the Script Encoding Initiative at the University of California at Berkeley, with Evertype--a typesetting firm that serves as the Irish delegate to the International Standards Organization's Working Group (ISO WG2)--with the Ministry of Communications & Information Technology, Department of Information Technology, Government of India, and with the Government of India's Centre for Development of Advanced Computing (C-DAC) in Mumbai.

After undertaking a study of Indian phonetic treatises, the Sanskrit Library hosted a workshop at Brown, 14-17 January 2007, to draft a Vedic character proposal to be presented to the International Organization for Standardization and the Unicode Technical Committee at meetings during the ensuing year. Significant progress in collaborating with C-DAC was achieved in meetings Peter Scharf, the director of the Sanskrit Library, held with Swaran Lata, the Government of India representative to Unicode, at the Unicode Technical Committee (UTC) meeting in Redmond, Washington, 4-10 August 2007, and with Professor R. K. Joshi, C-DAC's Vedic encoding team leader, in Versailles, 1 November 2007, at the conclusion of the First International Sanskrit Computational Linguistics Symposium. Due to close collaboration in the intervening months, the UTC recommended accepting 59 characters in our joint Vedic Unicode Proposal at its meeting in Cupertino, California, 4-8 February 2008 (See N3383R = L2/08-050). At its WG2 meeting in Seattle, 21-25 April 2008, these were moved onto Amendment 6.2 of ISO/IEC 10646:2003 and slated for balloting by the International Organization for Standardization's JTC 1/SC2/ Working Group 2 (See N3456R = L2/08-176). Six additional characters that complete the set of characters required for Vedic (4 Gomukhas, Yajurvedic Kashmiri svarita, and anusvāra ubhayato mukha) were accepted by the UTC at its meeting in San Jose, 12-16 May 2008, in which Scharf participated by telephone (See L2/08-218). A single pṛṣṭhamātrā e character used as a combining character for four vowels in pṛṣṭhamātrā notation, and two additional characters (headstroke and gap filler) necessary for the proper representation of primary cultural heritage documents (manuscripts) were accepted by the UTC at its meeting in Redmond, 11-15 August 2008, in which Scharf participated by phone. At the request of the US representative, the ISO Working Group 2 added these additional 9 characters to amendment 6 at the WG2 meeting in Hong Kong, 13-17 October 2008. A total of 68 new characters for Vedic and historical Indic were slated to become part of the Unicode Standard 5.2, tentatively scheduled for publication in the Fall of 2009, and amendment 6 of ISO/IEC 10646:2003 (See N3488R3 = L2/08-273R3, and N3546).

After a year's comment period during which Vedic scholars and Indologists worldwide were invited to review the proposed characters, the 68 characters became part of The Unicode Standard 5.2 in October 2009. They appear in the Devanagari Extended and Vedic Extensions code charts under South Asian Scripts on the Unicode 5.2 Character Code Charts page.

Unicode is an evolving standard. Evidence demonstrating the occurrence, significance, and use of additional Devanāgarī and Vedic characters not included in the relevant code pages continues to be welcome. Please e-mail your comments to the Sanskrit Library Director.

Vedic Unicode proposal

Supporting documentation

Supplementary supporting documentation

This material is based upon work supported by the National Science Foundation under Grant No. 0535207. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.