Many things

From: Antonio Carlini <>
Date: Mon Jan 31 17:49:29 2005

> Although it doesn't really know text is per-se, one of its
> algorithms is
> to find glyph-like things. Once it has all glyph-like things
> isolated
> on a page, it compares them all to each other and if two glyphs are
> similar enough, it will just represent them both (or N of
> them) with one
> compressed glyph image.

That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.

> So for OCR purposes, I don't think this type of compression
> really hurts
> -- it replaces one plausible "e" image with another one.

But one of them might have been something other than an "e".


Antonio Carlini
Received on Mon Jan 31 2005 - 17:49:29 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST