Let's develop an open-source media archive standard

From: Steve Thatcher <melamy_at_earthlink.net>
Date: Wed Aug 11 08:13:46 2004

as Sellam had suggested, the data size we represent the information in is not that important. I would encode binary data as hex to keep everything ascii. Data size would expand, but the data would also be compressable so things could be kept in ZIP files of whatever choice a person would want to for their archiving purposes.

XML has become a storage format choice for a lot of different commercial packages. My knowledge is more based on the windows world, but I would doubt that other computer software houses are avoiding XML. Sun/Java certainly embraces it.

I don't quite understand why representing binary as hex would affect the ability to have command line utilitities. Certainly more cpu cycles are needed for conversion and image file size is larger, but we need a readable format and I would think that cpu cycles is not as much of a concern or file size. A utility to create a disk would only have to run through the conversion once to buffer a representation of the floppy disk (unless we are talking about a hard drive image of course). The file size to re-create a floppy disk is only going to be 2 to 3meg at the most (if thinking about a 1.2meg floppy with fortmatting info).

The only difference I see in the sections that were described is that the first one encompasses the format info and the data. My description had the first one as being a big block that contained the two other sections as well as length and CRC info to verify data consistency. Adding author, etc to the big block would make perfect sense.

As for GCR, that would have been covered under etc... I am not familiar with GCR, but I would guess that it has to deal at least with physical tracks and heads. In this case, a track would consist of whatever the format needed plus the data blocks required for the track.


best regards, Steve Thatcher



-----Original Message-----
From: Jules Richardson <julesrichardsonuk_at_yahoo.co.uk>
Sent: Aug 11, 2004 8:08 AM
To: "General Discussion: On-Topic and Off-Topic Posts" <cctalk_at_classiccmp.org>
Subject: Re: Let's develop an open-source media archive standard

On Wed, 2004-08-11 at 10:50, Steve Thatcher wrote:
> Hi all, after reading all this morning's posts, I thought I would throw out some thoughts.
>
> XML as a readable format is a great idea.

I haven't done any serious playing with XML in the last couple of years,
but back when I did, my experience was that XML is not a good format for
mixing human-readable and binary data within the XML structure itself.

To make matters worse, the XML spec (at least at the time) did not
define whether it was possible to pass several XML documents down the
same data stream (or, as we'd likely need for this, XML documents mixed
with raw binary). Typically, parsers of the day expected to take control
of the data stream and expected it to contain one XML document only -
often closing the stream themselves afterwards.

I did end up writing my own parser in a couple of KB of code which was a
little more flexible in data stream handling (so XML's certainly not a
heavyweight format, and could likely be handled on pretty much any
machine), but it would be nice to make use of off-the-shelf parsers for
platforms that have them where possible.

As you've also said, my initial thought for a data format was to keep
human-readable config seperate from binary data. The human-readable
config would contain a table of lengths/offsets for the binary data
giving the actual definition. This does have the advantage that if the
binary data happens to be a linear sequence of blocks (sectors in the
case of a disk image) then the raw image can easily be extracted if
needs be (say, to allow conversion to a different format)

Personally, I'm not a fan of mixing binary data in with the
human-readable parts because then there are issues of character escaping
as well as the structure detracting from the readability. And if encoded
binary data is used instead (say, hexadecimal representation) then
there's still an issue of readability, plus the archive ends up bloated
and extra CPU cycles are needed to decode data. Neither of those two
approaches lend themselves to simply being able to use common
command-line utilities to extract the data, either. I'm prefectly
willing to be convinced, though :)

> I looked at the CAPS format and in part that would be okay. I would like
> to throw in an idea of whatever we create as a standard actually have
> three sections to it.

So, first section is all the 'fuzzy' data (author, date, version info,
description etc.), second section describes the layout of the binary
data (offsets, surfaces, etc.), and the third section is the raw binary
data itself? If so, I'm certainly happy with that :-)

One aside - what's the natural way of defining data on a GCR floppy? Do
heads/sectors/tracks still make sense as an addressing mode, but it's
just that the number of sectors per track varies according to the track
number? Or isn't it that simple?

cheers

Jules
Received on Wed Aug 11 2004 - 08:13:46 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:33 BST