Let's develop an open-source media archive standard

From: Vintage Computer Festival <vcf_at_siconic.com>
Date: Wed Aug 11 16:15:40 2004

On Wed, 11 Aug 2004, Jules Richardson wrote:

> So, what difference does it make to a human analyst whether the data is
> stored as hex pairs or binary data? Both need decoding by some process
> to make them usable. A human eye is no better off viewing a stream of
> hex digits than they are a stream of arbitary ASCII data.

I disagree. I can see "1A". I might not be able to see a CTRL-Z.

> Actually, binary data could possibly be more useful to the human eye in
> a browsing scenario, as at least the eye can quickly pick out meaningful
> strings - such as filenames on the original media - from a sea of binary
> data, without needing to do any decoding. At least viewing a file
> containing binary data could give a clue as to what it contained if the
> archive metadata (eg. description) wasn't up to much.

In that case, I would suggest someone develop a browser program to
interpret the archive data on the fly if they want to run a "strings" on

> As raised earlier though, I do wonder if it's an idea to define several
> possible encoding methods as part of the spec. Maximum flexibility
> always seems the key to long-lived data formats, so it perhaps makes a
> lot of sense to do so anyway. Who's to tell what use such archives might
> be put to in the future - but if the spec covers a reasonable base for
> now (with extensibility in mind such that others can be added if needs e
> in future versions) then everyone's happy, and future generations can
> always convert between formats as they see fit.

I would be perfectly fine with enabling the implantation of binary data
into the archive by having a tag to specify such. But I would strongly
discourage its use.

> Oh, next random thought (which I expect someone's already raised) - the
> addressing format for the data on the original media needs to be
> flexible enough to cope with different classes of data. Or rather, I'd
> expect different addressing classes. For hard disk and floppy archives,
> head/sector/track seem a logical addressing scheme. But for say a ROM
> image, there's no concept of head/sector/track; maybe just an index to
> the data and a length. Maybe someone will want to add scans of
> documentation pages to an archive, in which case chapter / page
> addressing is logical.

Right, and there will be appropriate tags for each type of medium.

> I'd say that important field values like the compression type should
> always be human-readable rather than a numeric id, just rigidly defined
> by the spec (e.g. 'none', 'base64' 'uuencode' etc.). That makes life a
> lot easier for someone potentially looking at this in the future if they
> don't happen to have a copy of the spec handy!

Totally agreed. There should not be anything cryptic in the tags, and to
the extent possible, they should make sense to a smart human.

> Hmm, I miss the old days of everyone chucking ideas around like this :-)

Well, they're back! ;)

Sellam Ismail                                        Vintage Computer Festival
International Man of Intrigue and Danger                http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers   ]
[ and academia at www.VintageTech.com  || at http://marketplace.vintage.org  ]
Received on Wed Aug 11 2004 - 16:15:40 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:34 BST