Tuesday, June 18, 2013

making box files

After trying for a few days, I succeeded in making a few box files today.
First, a view of the excerpt I used. I came across this piece when searching for bengali pangrams. As far as I can tell on a first view, this passage satisfies my purposes.
So I started out and made a simple box file, which, after lots of manual corrections, looks like this.
Now, this makes clear that I have to have the shirorekha-chopping even before making box files. Otherwise, it detects the whole words as a single blob. I'll try another box-file maker to see if the problem persists.
If it does, that could be a problem, as it means I have to stop the training procedure until my pre-processing module is complete.

On the other hand, I spoke to Abhishek Gupta today, and he helped me out regarding the datasets I have been thinking about. But it seems they do not have the information about the "font" associated with them. However, I'll have a look at them over the week.
Also, the trouble with NPP++ is fixed, though it does not take the specific encoding by default, and since I change it manually, it asks to save the file, which is a bit irritating.
So Long.

No comments:

Post a Comment