Lossy compression vs. archiving and OCR (was Re: Many things)

From: Randy McLaughlin <randy_at_s100-manuals.com>
Date: Mon Jan 31 15:46:30 2005

From: "Eric Smith" <eric_at_brouhaha.com>
Sent: Monday, January 31, 2005 3:15 PM
<snip>
> But that's what you yourself said that the DjVu software does. It
> replaces glyphs with other glyphs that it thinks are similar. No matter
> how good a job it thinks it can do of that, I DO NOT WANT IT FOR
> ARCHIVAL DOCUMENTS.
>
> I normally scan at 300 or 400 DPI; when there is very tiny text I
> sometimes use 600 DPI.
>
> Even at those resolutions, it can be difficult to tell some characters
> apart, expecially from poor quality originals. But usually I can do
> it if I study the scanned page very closely. No, OCR today cannot do
> as good a job at that as I can. Someday OCR may be better. But
> arbitrarily replacing the glyphs with other ones the software considers
> "good enough" is going to f*&# up any possibility of doing this by
> either a human OR OCR.
>
> And all to make the file a little smaller. DVD-R costs about $0.25
> to store 4.7GB of data, so I just can't get excited about using lossy
> encoding for text and line art pages that usually don't encode with
> lossless G4 to more than 50K bytes per page.
>
> Eric

The point is not you nor your preferences you can store the documents any
way you want, you can decide to share or not. If you decide to share you
can ship the documents on DVD's or offer them on a website.

My documents are not perfect but I believe they are the best I can provide
given the variables of convenience and cost.

These questions face every archivist, if I decided to archive "perfect
documents" how many could I archive?


Randy
www.s100-manuals.com
Received on Mon Jan 31 2005 - 15:46:30 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:46 BST