Let's develop an open-source media archive standard

From: Jules Richardson <julesrichardsonuk_at_yahoo.co.uk>
Date: Wed Aug 11 04:41:47 2004

On Tue, 2004-08-10 at 23:04, Vintage Computer Festival wrote:

> The format should be:
>
> 1) Well Documented (with such documentation actively preserved in all four
> corners of the globe and beyond)
> 2) Not constrained to any particular hardware
> 3) Be inclusive of all physical (and logical?) manner of recording media
> 4) Be implementable on even the simplest architectures (because the
> original media source will in many cases have to be read on the hardware
> it is connected to)
> 5) Open source, public domain, etc. (although a copyright may be held if
> it makes sense to do so)
> 6) Adaptable, expandable, revisable (for future extensions)
> 7) Text-based and storable in commonly accessible character formats (i.e.
> a suitable subset of Unicode, i.e. ASCII)
> 8) Allow for the representation of media in either logical or physical
> (raw bit stream) formats
>

Agreed with all those. Can I add some important things:

o) The format should be able to note media errors (I still want to
archive damaged disks whilst some data can be recovered from them)

o) The format should be able to cope with hard disk images too (i.e. not
make assumptions about only 1 or 2 media surfaces!)

o) Aside from the archive obviously recording geometry of the source
media, the following data is important per archive:
  who created it
  when it was created
  what hardware it was created on (at least drive model,
    and any protocol bridge that the drive was connected to)
  what the archive represents
  what platform / system the archive is for
  freeform description field (for any other notes)

o) Archive format should not be tied to any particular software (i.e. no
assumption is made about what utility is used to read/write the archive
files)

o) Each archive should record the version of the spec against which it
was created. Yes, you said that yourself in point 6, but it's darn
important :)

I'm actually currently playing around with this sort of thing anyway,
and for the human-readable side of configuration I'm favouring simple
parameter = value ASCII data contained within declared sections.
Currently sections are non-nestable, and I'm hoping they don't need to
be as it makes things a lot cleaner; we shall see. Configuration is
line-based (UNIX \n only), whitespace and case for parameter names is
*not* significant (so "A Parameter", "aparameter" and "a paraM eter"
are all equivalent as far as parsing's concerned - can't remember if I
borrowed that idea from the apache or samba config!). Comments are
supported, and yadda yadda yadda...

I've only got around to defining the config for the SCSI floppy
controller board to which the floppy drives are connected, so that I can
read the hardware configuration from a config file - but my plan is/was
to use the same config style for the floppy disk images.

Disk image archive format would be seperate from the hardware of course;
indeed I've got ST506 drives hooked up to various SCSI-ST506 bridge
boards in old machines (Omti, Adaptec etc.) and intend to use the same
archive format for those.

I still like the idea of embedding the file format definition itself as
commented data within each archive file - or at least a subset of it.
That way in 30 years when someone comes across an archive in this format
but the spec's long disappeared off the web, there's enough information
within the file itself telling them how to read it.

Ahh, fond memories of the days of upsetting customers because I designed
everything to have a lifetime of at least decades - and they couldn't
see beyond the next six months :-)

cheers

Jules
Received on Wed Aug 11 2004 - 04:41:47 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:33 BST