Tape dumping programs for Unix/Linux... from Stan Sieler on 2002-05-02 (2002-May)

From: Stan Sieler <sieler_at_allegro.com>
Date: Thu May 2 14:06:37 2002

Re:

> The emulator community is vigorously using a tape image container
> format known as TAP for precisely this purpose.
>
> Each record from tape is written to file prefixed *and* suffixed
> by a four-byte record length in little-endian format. A zero-
> length record is represented by a 4-byte value of zero; although
> intuition might call for 8-bytes (a prefix & suffix with nothing
> in between), this is not the case. The convention appears to come
> directly from FORTRAN 77's handling of unformatted sequential files.
>
> And EOF is represented by two consecutive zero-length records.

Darn...sounds like a subset of what I use. I'd be interested
in knowing more about TAP (with an eye towards adopting use of it),
and would suggest some possibly missing features might be:

  - ability to flag if the current tape record had a
    recovered read error (obviously, not all hardware/OS's support
    getting that information)

  - ability to flag if a particular tape record had a hard error
    while reading it

  - ability to flag if an End-of-tape marker/indicator was seen
    while reading the the current record (again, not all
    hardware/OS's support getting that information)

  - overall header at start of file, recording information about the
    tape

I have a program called "tapedisk", which reads an arbitrary tape
and "copies" it to a single disk file (or set of files, if the size of the
output file exceeds the maximum disk file size). The purpose of
tapedisk is to allow a future exact copy of the tape to be recreated
on another tape (of equal or larger size). (That future copy would be
made with the companion program, disktape.) Additionally, most of
my other tape-reading utilities know how to read a tapedisk format
disk file as though it were a tape.

This is what I store on a per-tape-record basis:

   record # (since start of tape (not since last EOF), 1-based)
   record size (# bytes, not including this 16-byte header)
   (reserved word) (to allow for future information)
   flag word:
       bit for "recovered error while reading"
       bit for "hard error while reading"
       bit for "encountered an EOF" (in this case, record size is 0)
       bit for "encountered an EOT" (in this case, if data was returned,
               record size will be non-0)
       compression type:
         0 ... record not compressed
         1 ... record compressed (with zlib)
         ...
   ...tape record...

This is some what I store as a "header" at the start of the overall output:

   version ID of tapedisk

   72 character description of the tape, as entered by the user

   name of tape device
       (Useful if questions arise later about the condition of the
       hardware used)

   # full records
       If one requests X bytes in a read-from-tape, and the current
       tape record happens to be X + Y bytes in size, most
       tape drives / OS's seem to quietly throw away the extra Y
       bytes without telling the user.
       Obviously, that's *CRITICALLY* important information.
       So, if I expect X bytes, I always request X+2 (or more) bytes...
       if I get the full X+2, then I know there's some chance that
       the tape record was MORE than X+2 bytes long. Any time I
       get a tape record of exactly what I requested, I increment
       the "# full records" counter, so I will know in the future how
       many tape records *might* have been silently truncated.

   date/time tape was read (starting time, not ending time)

   size of largest tape record (in bytes)
   size of shortest tape record (in bytes, not counting 0)

   # of EOFs in a row that the user defined to be a logical
      End Of Tape. (0 means that logical End Of Tape wasn't used)
      (This applies to some OS's that put a double EOF at the logical
      end of data ... on a 9-track tape, it often realllllly helps
      to know that one should stop reading at that point.)

   # of EOFs seen on the tape

   # of EOFs the user wants to limit the copy to (0 means no limit)
      Thus, if the user said "copy the first 5 files to a tapedisk file",
      I can tell late that I stopped because the fifth EOF was hit,
      and not because the EOT was hit.

   # of short records in a row that constitute a logical EOT
      On some OS's, with some devices (usually DDS/DATs), when you reach
      the end of the tape, the hardware/software starts returning
      reads of length 0 or 1 bytes, but without returning an EOF condition.
      My tapedisk code defaults to assuming that 100 records of 1 byte or
      less in length mean I encountered this condition. A value of 0
      disables this.

   # bytes read from the tape (64 bit integer)
   # of records (other than EOFs) read from the tape

   CPU time required to read tape (millisecs)
   Elapsed time required to read tape (millsecs)

   compression type (0 means none)
      My tapedisk defaults to compressing the tape data, using zlib.
      My disktape automatically uncompresses it, of course, as well
      as my library of "read a tapedisk file" routines.

   # bytes saved by compression (64 bit integer)

   # of 256-byte sectors of data read from the tape
      (Thus giving the user an idea as to how large a tape would be
      needed to copy the data back to tape in the future)

   Code indicating how the tapedisk run was ended:
      - did we appear to get to the physical end of tape?
      - did the user interrupt us? (e.g., control-C)
      - did we hit the maximum number of logical EOTs?
      - is the output file full, and we weren't allowed to make
        continuation output files?
      - did we hit the user-specified maximum number of EOFs?
      - did a catastrophic tape error occur?
        (e.g., a read failed, and the user elected/specified not to
        record the error but to quit instead)
      - did we encounter too many "short" records in a row?
      - did we get a "disk full" (not "file full") while writing
        to disk?

As you can see...a lot of info, but all necessary to faithfully
reproduce the tape later and/or to understand *why* the file might
(or can't) be used to completely reproduce the tape.

This is part of a product we sell on MPE/iX systems, but I'd be happy
to document the structures and/or work with other people on standards
in this area.

There's room for potential improvement. For example, is it worthwhile
to have a table at the front of the output file that specifies where
the first N EOFs are?

StanStan Sieler sieler_at_allegro.com
www.allegro.com/sieler/wanted/index.html www.allegro.com/sieler
Received on Thu May 02 2002 - 14:06:37 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:35:19 BST