Sunday, July 21, 2013

Mid-Term Summary : Why BookWorm

I have paused the coding for a while, and have been concentrating on documentation. A part of the same is this multiple-part entry to summarise my thoughts and efforts to help with the mid-term evaluation.
This interlude begins here. 
  • Part 1 - Background and Motivation

Tesseract has undoubtedly been the most prominent open-source OCR tool, and in past few years, the support for many languages has been incorporated to it. The most notable of these (to me personally) being :
  1. The Eutypon Project
  2. Debayan's work on IndicOCR
  3. Shirorekha chopping for Hindi
My motivation for taking up this project is rooted at the fact that Bengali is my mother tongue, and I have first-hand experience of situations where an OCR  for Bengali would have helped me.
With this project, my aim is to help in digitization of the huge literary heritage of the language and at the same time, making the language itself more easily accessible to people across the world.

So I read up the existing projects, talked to some of the  awesome people who originally took this initiative, and decided  to go forward and try to improve accuracy for OCR for Bengali. And thus began the Bookworm.

No comments:

Post a Comment