Defining Disk Image Dump Standard (ACHTUNG very long!)

From: Hans Franke <Hans.Franke_at_mch20.sbs.de>
Date: Mon Jun 5 13:55:36 2000

I've been of for the hollydays, so now some comments.

I'd still favour a XML defined markup for this. Someone (John ?)
said cutting edge .. why ? XML itself is nothing, it's only aset
of rules to describe markup languages. In fact, XML is just a
simplified (and bastardised) SGML, posibly to promote the idea.

Nobody has to know anything about XML (although it helps) to use
one of these Languages. Like one can do well formed HTML without
knowing about XML, a Data Image Markup Language can be used without
XML - But defining it in XML will give a huge advantage in tool
usage. There are plenty of libraries and tool collections to handle
markup languages with a XML based definition. These tools are to
help development and not restrict.

There is no need to build a XML parser for an Apple ][ or a TRS
(Or a CoCo - Hi Bill:), unless someone likes to. The target systems
don't need any knowledge about XML either - they need to dechipher
the markup language. Read: they have to know that <TRACK LFD=0>
identifies the start of the description of track zero of said
disk. Of course the usage of a parser framework may reduce
programming, but there's no _must_ to do so. The same is true
for a system writing an image file - it doesn't need to know
XML - if the programm produces a well formed output (and that's
a basic must for every format) all will work fine.

XML changes nothing at first - like any other general standard.
The benefit is in the long term when people are able to build
advanced tools based on this standards without knowing about
every specific single usage - My most favoured example is still
the ordinary screw and bolt. Standardizeing this simple thing
has allowed toolmakers to produce a wide variety of tools handeling
them , from simple manual wrenches over prneumatic or electric
to 100% robotic devices ... and these tools work (almost) in
every situation ... we don't need a special wrench for Atari
screws or Commodore ... Of course I prefer the metric screw
system, but when looking closer it could be different, maybe
better in some aspects for different situations ... but that's
no real issue, the fact of having one system offers more
benefit than anything else. BTW: Did I already mention that
I _personaly_ think XML is a poor design ?

And If I sound like a old XML freak, I'm not, until some
8 Month ago I did ignore XML (and I did even fight it), then
I had to define some markup and tried XML ... no big deal
(exept the real crapy standard document) - and an instant
gain thru independant structure checking and problem reporting,
without writing my own tools to check everything.

Well, back to our theme:
Pete, I realy agree to your idea about a sensible editor, just
we are living in a real world, where real software is to be used.
And since this is supposed to be an open standard, a sensible
editior can't be assumed... Even if we would try, I doubt that
such a thing is available on every obscure home computer system.
Even chances for a simple text editor can be bad. So including
binary as default is a bad idea - I would even go further and
restrict all markup specific parts for only using the characters
A-Z, 0-9 and some well defined (read only the absolute necersary
minimum) characters. Remember, not every Computer system offers
lower case or ASCII at all (This is also the reason why my
definition did only include uper case tags, while XML encurages
lower case).

I can't stretch this fact far enough: Don't assume that binary
data transfer is possible between two systems - as soon as
you do, you will exclude possible usages. A format to be choosen
shouls at least allow to transport the neto data in non binary
form - a better way would be to allow different encodings, so
_neutral_ converters may change the data representation without
interpreting the content. Example: to save disk space data
chunks (sektors) are stored binary - but since the encoding
is just another parameter of the data tag, it may be convertetd
to a base64 encoding _without_ interpreting the content other than
converting the encoding. This may be done by a third party tool
(remember about the advantages of standards for tool variety?)

After all, we are talking about puting _yesterdays_ data
on _todays_ storage (and not the other way around).

Let'S just assume we would need three times - oh, well lets
say four times the space to encode so an Apple Disk will
need a whooping 600 kb ... Boy, we are talking about 2000s
storage, thats the year then 40 gig drives droped below
250 USD including taxes. The equivalent of 10 Gig markup
coded data, or about 70,000 Apple disks (143 k each).
Somehow I doubt that someone will ever collect that amount.

And even Windows is now able to compress data on the fly.
This may reduce it close to the original size, maybe even
less (in our example we store now something like a quater
million of disks on one 250 Dollar IDE drive...)

As far as I see this call, it's to define a format for more
than just a specific system or format - so reducing it to a hard
coded thing with just a few numbers would turn away most possible
usage. Even further, restricting it to floppy disk like structures
would render it non usable. Of course I may describe a CBM
disk written on a 4040 drive - but what about the same data
stored on tape drives ? Also, looking on Sallams definition
even some floppy structures are excluded - where is variable
speed, where are possible tracks of more than 64k, and how
to encode spiral tracks (no, I'm not talking about Apple copy
protection schemes, but rather flopy drives writing only one
big track, like on a CD - In fact, medias like these micro
drives (as used by some sharp machines) are in more danger than
some Apple disk ... you still get enough drives and disks to
replicate them. The same is true for other once common medias,
which fall in similar categories. just remember the Sinclair
micro drives. Maybe never common in the US, but for shure on
the Island and within Europe.

Defining only a standard for FD data means closing the eyes
about all the fast fading data storage history. One may define
one ore maybe two standars, but soon you'll loose - and the
forgotten medias will loose. If there'S some effort to invest,
it should be spend on a standard to cover as many as possible
medias, and it should be extensible to add missing medias on
the fly.

A format to be choosen should be able of the 10 following
things (with no special order):

- Be able to handle all kinds of stuff from Card Storages and
  paper tapes, over real tapes and audio cassettes, spiral
  floppies and micro drives to FDs, HDs and CDs... and what
  ever is coming (althoug I belive that the number of new
  concepts is shrinking).

- Define an abstract view (like tracks and sectors)

- Allow the addition of physical descriptions when needed

- (Possibly) Allow the definition of 'hardwaredependant' structures
  (ALthough this now crosses into the path of emulator definitions)

- Expandable in several ways (including content encoding)

- Transportable between machines, codes and OSes

- Allow robust encoding

- Able to be integrated into other definitions
  (like being integrated in a storage situation description)

- simple encodable / simple decodable

- Allow the encoding of multiple media within a single transfer
  unit (aka file).

(Especialy nuber 8 and 10 is important when going to do more
than just storing a disk image)

I still think a XML based markup language is a good choice.
I would suggest a multiple level design:

- Level one is a language to describe the logical
  content of a media.
- Level two would define 'physical' descriptions, like disk
  encoding etc.
- Level three defines a storage landscape

In the end these 3 levels should be interoperating.
For example: One would encode the logical content of a disk
using the Level 1 tags. A top level tag (like 'MEDIA' may
include a reference to a Level 2 description telling that
the disk is MFM encoded but Track 0 side 0 is FM. This will
be good for 99,99% of all encoded data disks - but for special
situations, like a copy protection scheme, where track 17 is
also FM the reevant tags will cary override informations
to tell the difference. Level 3 may be used to define a
situation where a computer has 4 disk drives and tells
which media (disks) are mounted in which drive. Or define
a set of medias belonging to one situation (like puting
all 23 OS/2 2.0 disks into one image).

Anyway, I'm geting tired ...

Gruss
H.

P.S.: still nobody to decode my disk example ?


--
VCF Europa 2.0 am 28./29. April 2001 in Muenchen
http://www.vintage.org/vcfe
http://www.homecomputer.de/vcfe
Received on Mon Jun 05 2000 - 13:55:36 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:33:00 BST