> Although it doesn't really know text is per-se, one of its
> algorithms is
> to find glyph-like things. Once it has all glyph-like things
> isolated
> on a page, it compares them all to each other and if two glyphs are
> similar enough, it will just represent them both (or N of
> them) with one
> compressed glyph image.
That looks like information loss to me. If one of those glyph-like
things was not the same symbol as the others, then the algorithm
has just introduced an error.
> So for OCR purposes, I don't think this type of compression
> really hurts
> -- it replaces one plausible "e" image with another one.
But one of them might have been something other than an "e".
Antonio
--
---------------
Antonio Carlini arcarlini_at_iee.org
Received on Mon Jan 31 2005 - 17:49:29 GMT