20,046 page doc archive still available

From: John Foust <jfoust_at_threedee.com>
Date: Fri Sep 3 20:01:51 2004

Back in March 2001 I posted about a cache of 20,046 pages
of scanned docs I received from someone on the net.
See the TOC below, followed by his explanation of how
he did it.

It consumed several CD-Rs, compressed. I now have a DVD burner
as well, so I'd be glad to make copies on new or old media.
(It is actually all available on a hidden web page that
I disclose if someone sends me a pointed email, but I'd
hate to stress my little T-1.)

Anyone care to upgrade it to OCR'd PDF or whatever would be
considered a next-best method of preservation and search-ability?
I know it's possible with a handful of Linux, but my to-do list
is already too long.

- John

I've made contact with a guy who's scanned 20,046 pages of the
docs listed below, at 300 to 400 DPI. He first told me about the
UCSD p-System docs he'd scanned. Below the list is his description
of the process he followed.

I'm planning to get a copy of what he has and burn it to CD-R.
Does anyone else have an interest in these docs, or have any
ideas about distribution without massive copyright violation?

- John

        MOS 6502 datasheet
        6502 Assembly Language Subroutines (Leventhal)

        AMD 29000 Memory Design Handbook
        Am29027 Arithmetic Accelerator
        Am29C327 Floating Point Processor

    Data General
        C Language Reference Manual
        GATE User's Manual
        AOS/VS Internals Manual
        AOS/VS Programmer's Manual, volume 1
        AOS/VS System Calls Dictionary
        CEO User's Manual
        Eclipse 32-bit Principles of Operation
        Eclipse 32-bit System Functional Characteristics
        Fortran-77 Environment Manual
        Fortran-77 Reference Manual

        Clipper User's Manual

        RISC System Programmer's Guide
        R3000 Assembly Language Programmer's Guide
        R3000 Hardware User Manuals
        R3000 Language Programmer's Guide
        High-speed CMOS databook

        68000 Family Reference
        68020 User's Manual
        68851 User's Manual
        88100 User's Manual
        88200 User's Manual
        Linear Interface Integrated Circuits

        53C90A/B Advanced SCSI Controller (2 different manuals)
        53C94/5/6 databook
        53CF94/96-2 Fast SCSI Controller
        Disk Array Controller Firmware
        Disk Array Controller Hardware
        Disk Array Controller Software
        Floppy Disk Controller (SCSI-to-FD)

    National Semiconductor
        NS32532 Datasheet
        Series 32000 Programmer's Reference Manual
        DP8490 Enhanced Asynchronous SCSI Interface
        NS32CG16 Programmer's Reference Supplement
        Graphics Handbook
        Series 32000 Databook
        DRAM Management databook
        Embedded Controller Databook

    Ohio Scientific
        C4P User's Manual (2 different manuals)
        65V Programmer's manual
        Schematics for:
            502 CPU board
            505 CPU board
            527 24K memory board
            540 Video board
            542 Polled Keyboard

    Pinnacle Systems
        2 User's manuals for their 68k machine (My P-system machine)

    P-system manuals IV.12

        Operating System Reference
        Program Development Reference
        Application Development Guide
        Fortran 77 Reference
        Assembler Reference

        WTL4167 Floating-Point Coprocessor datasheet

Most of these are from about 1988 to 1992, with the exception of the OSI
documentation, of course, which is from 1979.

> What sort of process did you follow?  What sort of devices?
As far as the process, I scanned a manual in and checked to make sure
all the pages were there. If they weren't, I'd scan the pages that
didn't make it, and go through all the pages again. I'll admit this is a
little anal, but better safe than sorry. (When you're using a lot of
shell scripts, you never know if you accidently deleted a page with an
"mv" command.) When all the pages where there, I'd go through the manual
one more time to check for general quality (no folded corners, no torn
pages, etc.) If all was good, the manual would be moved to the directory
that would be the root directory of my CD-ROM. That's pretty much it.
The big manuals of more than 1000 pages really sucked, because I'd
generally have to make 3 or more passes to get those completely correct.
If I was going to do it again, I'd probably break the larger manuals
into smaller chunks to avoid this problem.
One thing that made the whole process a lot easier was the netpbm
utilities. I wrote a script to convert the manuals from ~2500x3300 TIFs
to ~500x600 GIFs. My machine takes about 2 seconds to process a 300-400
DPI TIF, but only a fraction of a second for a 75 DPI GIF. I'd run my
script, then do something else for a while. When it was done, I could
flip through the GIFs with GQview and inspect about 2-4 pages per
second. That saved a lot of time.
I assume that, by "devices", you mean what type of scanners I used. I
started with an HP 6350cse (with ADF) that I bought for this very
purpose. However, having never owned a scanner before, I was a little
disappointed with how slow the "fast" scanners are. Fortunately, imaging
is an integral part of the software my company sells and, as luck would
have it, we were demoing a new scanner from Fujitsu. This thing
literally does 60 pages/min at 300 dpi - *both* sides. It's about half
that fast at 400 dpi, which I had to use for the IC databooks to get the
fine print. Needless to say, I did most of my scanning on that.
By the way, to date, I've processed 20046 pages. I'm kinda burned out,
though, so it'll be a while before I do any more. 
Received on Fri Sep 03 2004 - 20:01:51 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:27 BST