WhiteWashing and box files

Sunday, July 14, 2013

WhiteWashing and box files

Finally, I have fixed the padding-values for the WhiteWash algorithm and it seems to be working nicely.
Here is a preview of the current output :

However, the padding uses integer numbers right now. It will be better if I could somehow relate this to the document statistics. It should be interesting to try out later.

This success with the algorithm allows me to focus on the box file generation again. Currently, I have managed to generate this using my workaround in the virtual machine, and after tweaking Debayan and Sayamindu's original algorithm using pango.

The box file generated now looks like :

Which is improvement over the last time.

Now, I have to combine these two methods to try out my plan.

Next, I will make an entry sometime this week about my updated plans and goals for the midterm evaluation period.

So Long.

2 comments:

sankarshanJuly 14, 2013 at 9:05 PM
We now need to try this with some other text samples as well. Especially close typed. Do you have access to a newspaper to scan?
ReplyDelete
Replies
UnknownJuly 15, 2013 at 3:04 AM
Unfortunately, I don't have access to a bengali newspaper presently, but I am planning to test the code on some of the results of a simple Google Image search for "bengali newspapers".
I will arrange for some scans of bengali magazines/newspapers.
ReplyDelete
Replies

Add comment

BookWorm - the BongOCR

Sunday, July 14, 2013

WhiteWashing and box files

2 comments:

About Me