Let's develop an open-source media archive standard

From: Vintage Computer Festival <vcf_at_siconic.com>
Date: Wed Aug 11 13:28:16 2004

On Wed, 11 Aug 2004, Patrick Finnegan wrote:

> > I'm not quite sure what having binary data represented as hex for the
> > original disk data gives you over having the raw binary data itself -
> > all it seems to do is make the resultant file bigger and add an extra
> > conversion step into the decode process.
>
> Again, producing paper copies of stuff with non-printable characters
> becomes "problematic".

Another point worth highlighting. The image files should be readily
printable, so that a complete image could be recovered by "manual OCR"
(i.e. typing it in) from a printed form of the image file. Keep in mind:
a printed form of the image file, on the right physical medium, will
probably outlast any magnetic or optical media.

> > As for file size, if encoding as hex that at least doubles the size
> > of your archive file compared to the original media (whatever it may
> > be). That's assuming no padding between hex characters. Seems like a
> > big waste to me :-(
>
> Then use uuencode or similar that does a bit less wasteful conversion.
> Anyways, the only computer media type where KB/in^2 (or KB/in^3) isn't
> increasing rapidly is paper.

I like the idea of Base64, because it's something that can still be
readily decoded manually by a human (converting between number bases is
easy once you understand the concept). I'm not knowledgeable as to how
the uuencode format is encoded.

The type of encoding used could be specified as a meta tag, perhaps even
per each data segment. Say it would be more efficient to encode the data
in hex for one part and then Base64 for another. Each part would have a
meta tag added describing which base was used for the encoding. Just an
idea...

> I don't really understand why you're quite so concerned about archive
> size bloat, at least over things like CRC's (which if applied liberally
> might add a 4% bloat in size) or plain-text encoding (which would add
> between 33% to about 100% to the size). I'd rather give up some
> efficiency in this case for ensuring that the data is stored correctly,
> and can be properly read (and easily decoded) in 50+ years.

Exactly. Given the current trends of hard drives, image filesize is not
an issue. Human readability is the prime concern.

Of course, if a file is too large to be, say, printed (practically) onto
paper, that could also be problematic. Again, this is getting back to the
issue of a perpetual media, which is a separate project.

-- 
Sellam Ismail                                        Vintage Computer Festival
------------------------------------------------------------------------------
International Man of Intrigue and Danger                http://www.vintage.org
[ Old computing resources for business || Buy/Sell/Trade Vintage Computers   ]
[ and academia at www.VintageTech.com  || at http://marketplace.vintage.org  ]
Received on Wed Aug 11 2004 - 13:28:16 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:33 BST