manual file types

From: Paul Williams <>
Date: Tue Feb 1 11:52:38 2005

Jules Richardson wrote:
> [bi-level scanning is]
> horrible for anything source documentation with pictures
> in it. Quite often I find someone's scanned something without proofing
> it first and then saved it with bi-level, completely trashing any photos
> inside :-( It's annoying when it's some hard-to-find document and you
> not only want the information but the presentation of the original too.

I'm sure it breaks your heart that this hard-to-find document is
available for free download, in any state. When you find that the photos
have been "completely trashed", have you ever approached the person who
scanned the document and asked for better copies of the affected pages?
I have occasionally asked, and I have received better scans by email.

These documents are being preserved simply by being acquired by people
who have an interest in them. The cost of their acquisition, scanning
and hosting online is often considerable, but you are not (for the most
part) being asked to fund this preservation effort. The documents are
saved, and you get to see them.

> Plus I'm still dubious about bi-level for text that might need to be
> OCRed at a later date - in theory it's fine, but in practice when a
> document's well-used and may contain scuff marks, dirt etc. then
> encoding at bi-level might destroy information (by treating a scuff mark
> over the text as black) that could otherwise have been preserved.

I've only noticed Omnipage Pro struggling with bi-level scans of pages
that had grey backgrounds to some tables, if care hadn't been taken to
drop them out during scanning. Scuff marks and dirt are rare, and don't
generally stop you understanding the word, taken in context.

What is this "otherwise have been preserved"? I can't speak for anyone
else who scans documents, but I haven't thrown out any document that
I've scanned. If mistakes have been made, they could be rescanned at any
later time.

Received on Tue Feb 01 2005 - 11:52:38 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:34 BST