C source code to extract CP/M ARK archives?

From: Tom Jennings <tomj_at_wps.com>
Date: Fri Feb 4 14:51:46 2005

On Fri, 4 Feb 2005, Antonio Carlini wrote:

> My point was not information density. My question really was,
> just how self-describing is ASCII? There have been plenty of ways
> to represent letters and numbers (ASCII, EBCDIC, Morse) in the
> past using various media (different punched card formats,
> various paper tape formats, plenty of electronic formats).

Actually self-describing sounds mathematically self-cancelling.

A "rosetta stone" is one method, and another is plain old
sleuthing:

Assuming you have some bit serial string (and the wherewithal to
know that) you would likely need some a priori knowledge. You
suspect it's not noise because it doesn't statistically distribute
that way (that in itself is a project).

* Do you know it to be an encoding of a written human language?
* Can you estimate it's age?
* Do surroundings imply country/human language of origin? (labels
   on floppy, tape, etc)
* etc?

Assuming that you have some grokkable machine abstraction (eg. a
disk file like thing): is the number of symbols some integer
multiple, knowing that was once popular?

Try slicing into N-bit chunks (I just made about a dozen
assumptions there). If you suspect it was american, 1970 - 2020,
say, it is likely english. A simple, well-known letter
distribution check will tell you.

(If it's a .DOC file or other encoding you're screwed -- unless
someone's worked out characterizations for that sort of thing, ad
nauseum).

IFF you get to that point, then ASCII (real ASCII) would be
self-decoding, though of course it's really not, you imported all
this knowledge. Once there though, standard baby-crypto techniques
will work out the code::letter mappings.

> There must still be many, many paper-based descriptions of
> ASCII format lying around today. But how many will exist
> in a short period of time: say 500 years? Even now, I bet
> there are more descriptions of ASCII in ASCII than there
> are in EBCDIC!

Printing on paper or fiche is still really the only "universal"
way. All digital media is ephemeral except paper tape and punched
cards, which can be viewed as non-digital media in the common
sense.
Received on Fri Feb 04 2005 - 14:51:46 GMT

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:37:35 BST