Extending the Unicode Standard to accomodate Vedic
Unicode is a standard for encoding world scripts that in the last few years has become the minimum implementation standard universally recognized by software and hardware developers. While the Unicode Standard previously included the encoding of several Indic scripts, it failed to include characters necessary for the adequate representation of Vedic, the most ancient texts of India of greatest interest to linguists and of enormous cultural importance to Hindus. The International Digital Sanskrit Library Integration project, funded by the U.S. National Science Foundation's Division of Information and Intelligent Systems under grant number 0535207, developed a successful proposal to include Vedic characters in the Unicode Standard by coordinating its activities with those of the Script Encoding Initiative at the University of California at Berkeley, with Evertype--a typesetting firm that serves as the Irish delegate to the International Standards Organization's Working Group (ISO WG2)--with the Ministry of Communications & Information Technology, Department of Information Technology, Government of India, and with the Government of India's Centre for Development of Advanced Computing (C-DAC) in Mumbai.
After undertaking a study of Indian phonetic treatises, the Sanskrit Library hosted a workshop at Brown, 14-17 January 2007, to draft a Vedic character proposal to be presented to the International Organization for Standardization and the Unicode Technical Committee at meetings during the ensuing year. Significant progress in collaborating with C-DAC was achieved in meetings Peter Scharf, the director of the Sanskrit Library, held with Swaran Lata, the Government of India representative to Unicode, at the Unicode Technical Committee (UTC) meeting in Redmond, Washington, 4-10 August 2007, and with Professor R. K. Joshi, C-DAC's Vedic encoding team leader, in Versailles, 1 November 2007, at the conclusion of the First International Sanskrit Computational Linguistics Symposium. Due to close collaboration in the intervening months, the UTC recommended accepting 59 characters in our joint Vedic Unicode Proposal at its meeting in Cupertino, California, 4-8 February 2008 (See N3383R = L2/08-050). At its WG2 meeting in Seattle, 21-25 April 2008, these were moved onto Amendment 6.2 of ISO/IEC 10646:2003 and slated for balloting by the International Organization for Standardization's JTC 1/SC2/ Working Group 2 (See N3456R = L2/08-176). Six additional characters that complete the set of characters required for Vedic (4 Gomukhas, Yajurvedic Kashmiri svarita, and anusv?ra ubhayato mukha) were accepted by the UTC at its meeting in San Jose, 12-16 May 2008, in which Scharf participated by telephone (See L2/08-218). A single p???ham?tr? e character used as a combining character for four vowels in p???ham?tr? notation, and two additional characters (headstroke and gap filler) necessary for the proper representation of primary cultural heritage documents (manuscripts) were accepted by the UTC at its meeting in Redmond, 11-15 August 2008, in which Scharf participated by phone. At the request of the US representative, the ISO Working Group 2 added these additional 9 characters to amendment 6 at the WG2 meeting in Hong Kong, 13-17 October 2008. A total of 68 new characters for Vedic and historical Indic were slated to become part of the Unicode Standard 5.2, tentatively scheduled for publication in the Fall of 2009, and amendment 6 of ISO/IEC 10646:2003 (See N3488R3 = L2/08-273R3, and N3546).
After a year's comment period during which Vedic scholars and Indologists worldwide were invited to review the proposed characters, the 68 characters became part of The Unicode Standard 5.2 in October 2009. They appear in the Devanagari Extended and Vedic Extensions code charts under South Asian Scripts on the Unicode 5.2 Character Code Charts page.
Unicode is an evolving standard. Evidence demonstrating the occurrence, significance, and use of additional Devan?gar? and Vedic characters not included in the relevant code pages continues to be welcome. Please e-mail your comments to the Sanskrit Library Director.
Vedic Unicode proposal
- Proposal to encode additional characters for Vedic in the UCS ("the final 68") (N3488R3 = L2/08-273R3)
- Proposal to encode characters for Vedic Sanskrit in the BMP of the UCS (N3366 = L2/07-343)
- Eric Muller's Report of the South Asia Subcommittee's Encoding of Vedic: A detailed comparison of the Sanskrit Library's and C-DAC proposals and the evolotion of their concurrence
- Consensus adopted by the Unicode Technical Committee at its meeting 4-8 February 2008 in Cupertino, CA (N3383R = L2/08-050R)
- Consensus adopted by the Unicode Technical Committee at its meeting 12-16 May 2008 in San Jose, CA (L2/08-218)
- Characters accepted for balloting on amendment 6 by the WG2 committee of the International Standards Organization at its meeting 21-25 April 2008 in Seattle, WA (N3456R = L2/08-176)
- Summary of characters accepted for balloting on amendment 6 by the WG2 committee of the International Standards Organization at its meetings 21-25 April 2008 in Seattle, WA and 13-17 October 2008 in Hong Kong (N3546 = L2/08-366)
- Yajurvedic mid-character svarita proposal
- Materials for a Devanāgarī headstroke proposal
- Character placement criteria and gap-filler character evidence
- Technical document specifying Vedic character context and usage
Supporting documentation
- The Scharf/Everson proposal considered at the UTC meeting 6-10 August 2007 (n3290-vedic.pdf)
- Outline of the development of WG2/n3366 = L2/07-343
- Comments on R. K. Joshi’s documents L2/07-386 and L2/07-388
- Equivalences between L2/07-396 and L2/07-397 Draft Proposal for Encoding of Vaidika Character & Symbols in Unicode, dated 10 October 2007 by R. K. Joshi and Alka Irani, and L2/07-343 (N3366) dated 18 October 2007 edited by Michael Everson and Peter Scharf
- Encoding Sāmaveda with Ruby (Sanskrit Library Technical Note 1/rev. 2, Malcolm D. Hyman, November 15, 2007)
- VedicMarks2007Mar10V.pdf
Supplementary supporting documentation