Defining Disk Image Dump Standard

From: Pete Turnbull <pete_at_dunnington.u-net.com>
Date: Thu Jun 1 03:13:10 2000

On May 31, 22:05, Sellam Ismail wrote:
>
> Responding to an older message...
> On Tue, 30 May 2000, Tony Duell wrote:
[...]
> > Which means the archive format would have to allow for :
[...]
> > It may be a _very_ unusual format, but a proposed archive format should
> > be able to handle _anything_.

Wel, I agree with that, so far as it's possible.

> I suppose a sub-format byte wouldn't hurt. What I don't like about it is
> that it will require that someone be maintaining a database of all the
> sub-formats. But I guess since we have a machine identifier and this
will
> have to be maintained as well, a sub-format byte is not too demanding.
> Of course, there will have to be a central person who is responsible for
> receiving input for new machine and sub-format types, updating the
> database with the new computer types and sub-format types, and
> disseminating this from a website.

That's part of the reason I think an encoded format is a bad idea. Hans'
suggestion of a tagged format using XML (or something else) is much better.
 It allows for decoding without referring to a central archive, and it's
much more flexible and extensible. Sure, it takes more space, but is that
a problem? The tags don't all need to be ASCII text, things like the
sector size could be integers, and field lengths could be limited. I'd
envisage something like nested objects (borrowing from Sellam's slightly
later mail):

{
  Disk Descriptor Header, containing:

    Host computer type string
    "Hard"/"soft" sector flag
    Number of tracks (1 byte)
    Disk drive RPM
    ...
    {
      Track Descriptor Header, containing:

        Track number (with fraction)
        Track format "logical"/"raw"
        Track size in bytes
        Sectors in this track (1 byte)
        Offset to next Track Descriptor Header
        ...
        {
          Sector header descriptor, containing:

            Sector header format FM/MFM/GCR/...
            Sector data format FM/MFM/GCR/... [1]
            Sector number as encoded on the original disk
            Track number as encoded on original disk
            Head number as encoded on original disk
            Physical sector number
            Sector size
            ...
            {
              sector data (binary, hex-coded, whatever)
            }
            {
              Sector header descriptor
            }
            {
              sector data
            }
      }
}

The nesting tagging allows you to specify things like RX02 floppies, where
the headers are FM but the data is MFM. It also allows you to specify
different sector sizes on different tracks, or data written in the headers
that doesn't match physical track/sector/side on the original. It also
means that if the database is lost, damaged, incomplete or otherwise
inaccesible, an archive can still be understood, and there's no chance of
inconsistency because two people tried to add new formats at about the same
time, or someone rolled their own.

I've seen too many data formats where the decoding information was
unavailable, or was hard to get, or was "location unknown at this time", or
the prospective user simply didn't now where to look. If the information
is in the archive itself, anyone can work out what do do with it, any time.

-- 
Pete						Peter Turnbull
						Dept. of Computer Science
						University of York
Received on Thu Jun 01 2000 - 03:13:10 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:32:59 BST