Charset/CCDD (was: Let's develop an open-source media archive

From: Sean 'Captain Napalm' Conner <>
Date: Thu Aug 12 15:10:53 2004

It was thus said that the Great Hans Franke once stated:
> Rather then restricting the encodeing of the XML file to a
> specific charset, we need to restrict the USAGE within the
> standard to certain characters, regardless of the encodeing.

  Unless otherwise noted, XML files are assumed to be encoded in UTF-8,
*but* an XML parser is required to abort at the first error in the XML file.
If a parser is reading an XML file without an explicit character set
encoding scheme (which means it's assuming UTF-8) and it reads a character
that is illegal (say the file was encoded in ISO-8859-3) it gives up
(usually with an "illegal character at such-n-such position" error).

  Right now, this is a real problem with XML deployment (it gets even
wierder when XML files are transported via HTTP but I'm getting ahead of
myself) so when I suggested that (if we are using XML) that each *must*
start with:

        <?xml version="1.0" encoding="US-ASCII"?>

It was a way of self-defense. Perhaps it can be relaxed some and require:

        <?xml version="1.0" encoding="some XML defined character encoding scheme"?>

and if the encoding scheme isn't defined, it's an error and further
processing of the archive should stop.

> I suggest to restrict the caracters used in tags, attribute
> names and attributes to 'A-Z' (uppercase), '0-9' and '-'.

  Unfortunately, XML is defined with lowercase (or it may be case
sensitive---I do know that all XML I've seen is with lowercase tags, and
it's pretty much a standard).

  -spc (hmmm ... )
Received on Thu Aug 12 2004 - 15:10:53 BST

This archive was generated by hypermail 2.3.0 : Fri Oct 10 2014 - 23:36:34 BST