Scanning microfiche?

Tim wrote:
> Each 8.5" x 11" printed page on the microfiche is about 0.15" x 0.2",
> so the magnification factor is about 50.

Yup, standard magnifications are 24, 42, and 48.

> That means if I want the equivalent
> of a 75 DPI scan of the full-size version, that I need to scan the microfiche
> at about 4000 DPI. The el-cheapo (i.e. a couple hundred $) scanners I see
> on the shelves here seem to top out at 2400 DPI.

It's worse than that. The high resolutions they quote are *interpolated*.
Usually the real resolution (which they call "optical resolution") is
no more than 600 DPI, or sometimes even 300. A few scanners can do 1200 DPI
optical on one axis.

> And 4000 DPI is the "minumum acceptable" number in my above calculation. If
> I can do 4 times better than that, so much the better. In my experience
> most 75 DPI scans of 8.5" x 11" text don't OCR well at all, you need more
> resolution.

Talk to Al Kossow. Having looked at a lot of DEC field service fiche, I
didn't expect that anything under 300 DPI effective would even be reasonable
to look at, much less to OCR. He thought 200 DPI effective might be OK.

He sent a few samples off to a service bureau in southern CA, and they
scanned them at several effective resolutions including 200 and 300 DPI.
The 200 DPI scans were basically worthless. The conclusion was that
it really needed *400* DPI effective.

If you save greyscale data, 200 or 300 DPI effective might be OK. But
greyscale doesn't compress anywhere near as well as bilevel (using either
Group 4 fax compression [*], or JBIG [**]).

The service bureau charges around $0.07 per page (not per sheet of fiche)
to do this. They seem to miss a lot of pages in the process, so I assume
that they do NOT use a method that scans the entire fiche, but rather a
step-and-repeat sort of process.

Canon makes a fiche viewer/scanner for about $10K:

I've seen a similar product from another company, but it was a bolt-on
attachment to a user-supplied fiche viewer, and it still cost about $10K.
For that money I'd rather have the Canon since it was designed from the
ground up to be a scanner.

> So what are my choices for higher-resolution scanners? My *other* hobby
> happens to be large-format photography, so if the resulting scanner is also
> good for 4" x 5" negatives and/or transparencies I won't complain :-).
> It looks like there are 35mm film scanners with 2700 or 3000 DPI resolutions
> available for a few thousand, but I think I need to do better than that.

The only way to do a good job of it in bulk would be a drum scanner with
a resolution of at least 14400 DPI (300 DPI effective), and preferrably
near 19200 DPI (400 DPI effective). I've seen 15000 DPI drum scanners
for around $50,000; I've never found any that can do 15000 DPI.

> Finally, do *any* scanners have documented interfaces? i.e. say I
> find myself a nice SCSI-connected high-speed high-resolution scanner.

Not many. A lot of them use the SCSI 2 scanner commands, but with
undocumented extensions.

High end professional scanners usually have a dedicated PCI interface


[*] The nice thing about G4 is that not only does it do very good
compression, but it's also lossless, so it doesn't have unpleasant
fringing and smearing effects like JPEG. I still can't believe that
people try to use JPEG for text and line art, it's a bloody STUPID thing
to do. JPEG is designed for CONTINUOUS TONE IMAGES. Oh well, don't get
me started on that, I could rant for hours.

The bad news about G4 is that most people don't have software that can
process raw G4 files, or TIFF Class F G4 files. People will try to claim
that your TIFF files are broken. The good news is that G4 is a native
Acrobat compression format, so you can supply PDF files that will work for
everyone. That's how I supply all my scanned documents now.

I still get complaints from a few people that they can't read my PDF
files on their PDP-11/05 or Commodore 64. I ignore those people. I
like old computers as much as anyone, but I don't expect to view 300 DPI
scanned images on a computer with 64K of RAM and a low resolution
display. I have more modern computers for that.

[**] JBIG is also lossless and typically compresses 10-15% better than
G4. Unfortunately it's patented and not widely supported.
