Developing automated text-image alignment to enhance access to heritage manuscript images

Major activities

This project aims to enhance access to Sanskrit manuscripts by developing human-validated automated text-image alignment techniques in order to provide access to digital images via related machine-readable texts, lexical resources, linguistic software, and a sophisticated search interface. Digital images of manuscripts written in Sanskrit will be integrated into the Sanskrit Library. This integration will allow generalized information extraction and search techniques to reach enormous reservoirs of Sanskrit manuscripts. Integrating primary cultural materials with the Sanskrit Library will thus enable broad use of Indic collections for research and education where Indic materials are grossly underrepresented.

The project builds upon the 160-manuscript, 25,000-image prototype and test-bed of Sanskrit manuscripts digitally imaged and correlated with corresponding machine-readable texts in the project conducted at Brown University and the University of Pennsylvania 2009—2013. The result will be extendable to the collections of Sanskrit manuscripts housed in American libraries and throughout the world and to archives of scanned Sanskrit books.

Project personnel

  • Peter M. Scharf, Project Director
  • Ralph E. Bunker, Technical Director
  • Malhar Kulkarni, Associate Professor, Indian Institute of Technology
  • Anuja Ajotikar, Post-doctoral research associate, Indian Institute of Technology
  • Tanuja Ajotikar, Post-doctoral research associate, Indian Institute of Technology

Grant details

  • period: 1 July 2013 -- 30 June 2015
  • U.S. funding: National Endowment for the Humanities, Division of Preservation and Access, grant number PR-50178-13
  • funding: $280,000
  • location: The Sanskrit Library