Let's develop an open-source media archive standard

From: Jules Richardson <julesrichardsonuk_at_yahoo.co.uk>
Date: Wed Aug 11 07:08:22 2004

On Wed, 2004-08-11 at 10:50, Steve Thatcher wrote:
> Hi all, after reading all this morning's posts, I thought I would throw out some thoughts.
>
> XML as a readable format is a great idea.

I haven't done any serious playing with XML in the last couple of years,
but back when I did, my experience was that XML is not a good format for
mixing human-readable and binary data within the XML structure itself.

To make matters worse, the XML spec (at least at the time) did not
define whether it was possible to pass several XML documents down the
same data stream (or, as we'd likely need for this, XML documents mixed
with raw binary). Typically, parsers of the day expected to take control
of the data stream and expected it to contain one XML document only -
often closing the stream themselves afterwards.

I did end up writing my own parser in a couple of KB of code which was a
little more flexible in data stream handling (so XML's certainly not a
heavyweight format, and could likely be handled on pretty much any
machine), but it would be nice to make use of off-the-shelf parsers for
platforms that have them where possible.

As you've also said, my initial thought for a data format was to keep
human-readable config seperate from binary data. The human-readable
config would contain a table of lengths/offsets for the binary data
giving the actual definition. This does have the advantage that if the
binary data happens to be a linear sequence of blocks (sectors in the
case of a disk image) then the raw image can easily be extracted if
needs be (say, to allow conversion to a different format)

Personally, I'm not a fan of mixing binary data in with the
human-readable parts because then there are issues of character escaping
as well as the structure detracting from the readability. And if encoded
binary data is used instead (say, hexadecimal representation) then
there's still an issue of readability, plus the archive ends up bloated
and extra CPU cycles are needed to decode data. Neither of those two
approaches lend themselves to simply being able to use common
command-line utilities to extract the data, either. I'm prefectly
willing to be convinced, though :)

> I looked at the CAPS format and in part that would be okay. I would like
> to throw in an idea of whatever we create as a standard actually have
> three sections to it.

So, first section is all the 'fuzzy' data (author, date, version info,
description etc.), second section describes the layout of the binary
data (offsets, surfaces, etc.), and the third section is the raw binary
data itself? If so, I'm certainly happy with that :-)

One aside - what's the natural way of defining data on a GCR floppy? Do
heads/sectors/tracks still make sense as an addressing mode, but it's
just that the number of sectors per track varies according to the track
number? Or isn't it that simple?

cheers

Jules
Received on Wed Aug 11 2004 - 07:08:22 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:33 BST