BookWorm - the BongOCR: Mid -Term Summary : Part 3

My working Methods.

This post probably needs a bit more structure, and hence, I will use some formatting styles Blogger provides.

The work I have done, and how I have done it, during the previous weeks can be summarised in the following steps :

Objective A. Get practical with Tesseract

I started the project by getting a first-hand feel of Tesseract. From all the reading I had done by this point, I was familiar with the working theory, but actually trying out the software provided with me a better insight into the current scenario.
Another valuable lesson learnt through this was that all the reading and previous work was based on earlier versions of Tesseract, but Tesseract 3.02 has a major upgrade in terms of BookWorm, it already has some base support for Indic languages, and the training method has also changed a bit from the previous versions (Tesseract 2.x).
I updated my plan several times along the course, and finally arrived at this conclusion to form the matrix mentioned in my previous post. I also finalised the my version of shirorekha chopping algorithm, the WhiteWashing Algorithm.
Also, discussions with +Abhishek Gupta made it clear that our projects are closely related, and his project is like a post-processor for OCR while mine is a pre-processor.

Objective B. Decide on Imaging Libraries

After familiarising myself practically with Tesseract, I started trying different libraries in Python, and finally decided that I will be using PIL and Scikit-Image. Some entries about the same can be found here.

Objective C. WhiteWashing Algorithm

Now, having made sure that the chosen libraries were suitable and sufficient for my purpose, I moved on, and started working on the WhiteWashing Algorithm.
Now that the coding for whitewashing an image is complete, after the midterm, I will turn my focus back to the filters, and start with forming the matrix.

BookWorm - the BongOCR

Monday, July 22, 2013

Mid -Term Summary : Part 3 - working on the BookWorm

My working Methods.

Objective A. Get practical with Tesseract

Objective B. Decide on Imaging Libraries

Objective C. WhiteWashing Algorithm

No comments:

Post a Comment

About Me