Lossy compression vs. archiving and OCR (was Re: Many things)

From: emanuel stiebler <emu_at_ecubics.com>
Date: Mon Jan 31 15:42:44 2005

Eric Smith wrote:

> Even at those resolutions, it can be difficult to tell some characters
> apart, expecially from poor quality originals. But usually I can do
> it if I study the scanned page very closely. No, OCR today cannot do
> as good a job at that as I can. Someday OCR may be better. But
> arbitrarily replacing the glyphs with other ones the software considers
> "good enough" is going to f*&# up any possibility of doing this by
> either a human OR OCR.
> And all to make the file a little smaller. DVD-R costs about $0.25
> to store 4.7GB of data, so I just can't get excited about using lossy
> encoding for text and line art pages that usually don't encode with
> lossless G4 to more than 50K bytes per page.

I'm here completely with Eric. However, probably we should distinguish
how we actually scan the stuff, and how we distribute the scans.

As the most work is anyway in setting up the scanner, name the files,
check if all pages are there, etc. I don't like to do it twice, so I
scan at least 300-400 dpi, most of the time with two bits per dot/pixel.
And put it on DLT as an original, then play with what I got. And, every
few years, I even check if the OCR is good enough, or still not, ...

Received on Mon Jan 31 2005 - 15:42:44 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST