Let's develop an open-source media archive standard

From: Vintage Computer Festival <vcf_at_siconic.com>
Date: Wed Aug 11 13:41:56 2004

On Wed, 11 Aug 2004, Jules Richardson wrote:

> Maybe zip's not the ideal example. My point really is that if the
> archives are enormous, people are going to be tempted to compress them.
> If they compress them, what guarantee is there that a) the compression
> method is going to be around when someone totally unrelated wants to
> handle these files in x years, and b) is it going to be obvious to
> someone in x years what compression method was even used to compress the
> file?

Yep. Good points. Primarily, the human psychology aspect: large file
sizes will compel people to want to compress the images, quite possibly
ruining the effort of making the images to begin with. The spec should
be designed such that it allows for the smallest filesize possible.

> Again, it's back to longevity of the archives themselves. If something's
> needed for the short term (next ten years, say), it's not a problem. But
> it'd be nice if a future generation, upon discovering one of these
> archives, could know exactly what it was (and stand a good chance of
> decoding it) just by looking at it (hence the human-readable part)

Right. This is the whole reason for designing a spec like this.

> Again, I don't like the idea of anything happening to the archive files
> after creation though. I suppose the data from the raw device (floppy,
> hard disk, whatever) within the archive could be encoded somehow
> (leaving the config section as plain-text) - providing it's in a common
> enough format that we think someone will be able to find the spec for
> the encoding method in x years and so be able to get at the data. That's
> somewhat hard to say for sure though!

Or at least be able to figure it out. Encoding data in a wider base, such
as in hex or Base64, still allows a smart human to figure it out. If we
add meta compression, this will also need to be readily decodable by a
smart human.

> > Again, producing paper copies of stuff with non-printable characters
> > becomes "problematic".
> That's actually an extremely good point, and perhaps the best argument
> (IMHO) for not using binary data so far :-) Hmmm...


> Seriously, if there's a good argument for having CRC's in more than x
> (50?) percent of cases because corrupted data expected to be a real
> possibility, then make them mandatory. If not, then make them an
> optional extra. I certainly can't see a good reason why they'll *never*
> be needed, that's for sure.

I'd make them an optional extra, with the default assuming no CRCs. In
fact, the spec should be designed in such a way that as litte as possible
is assumed. Any encoding features should have to be explicitly invoked in
the image header.

> With you on the longevity side of things. Hmm, off the wall suggestion,
> but it's only the storage format for the raw data that's an issue,
> right? So does it make sense to define both binary and ASCII
> representation as valid storage formats, and the format in use within a
> particular archive is recorded as a parameter within the human-readable
> config section?

I still don't like it. As Roger M. pointed out, what will the binary data
look like after it's been paraded through several different platforms?

> In this way those wanting compact archives to save space, run against
> various existing utilities etc. can have them containing binary data;
> those who think they need ASCII representation of the data due to tool
> or transmission medium limitations can use that format - all whilst
> maintining compatibility with the spec. (potentially the 'encoding
> method' parameter could include other defined types - uuencode, base64
> etc. but let's not get ahead of ourselves...)

I'd rather give the option of being able to specify which text-based
encoding scheme was used (i.e. base16, base64, etc.)

> (funny how someone mentioned IFF files earlier; I keep on thinking of
> TIFF images where the data's structured and the format both versioned
> and maintained under strict control)

So it should be with this specification.

Sellam Ismail                                        Vintage Computer Festival
International Man of Intrigue and Danger                http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers   ]
[ and academia at www.VintageTech.com  || at http://marketplace.vintage.org  ]
Received on Wed Aug 11 2004 - 13:41:56 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:34 BST