XML is xhite
xhite is pronounced zh-ite.
We wrote up this page after we realised we had a common theme on several parts of our site all related to issues with XML. We don't really hate XML. We just think it is over-rated and not as simple to use as advertised. Sorry to all of you who worship at the altar of the great, infallible god XML.
In principle, XML is a brilliantly simple idea. A straight text file with elements defined by tags.
<?xml version="1.0"?> <document> <paragraph> Hello world. </paragraph> </document>
From our page on XML Validation.
An XML Schema provides a means for defining the structure, content and semantics of XML documents. [XML Schema, W3C].
In the bad old days, the structure of an XML document was defined by a Document Type Declaration (DTD). These definitions had a simple syntax and were easy to understand but, sadly, were not in XML and therefore an evil thing. So now we have to use the XML Schema (XSD) which has the advantage of adding the notion of type - 19 primitives and three type constructors at the last count - but at the expense of extreme verbosity and complexity.
We agree with Rick Jelliffe's comparison between DTD and XSD:
A DTD is a terse thing with a simple macro mechanism that makes people yearn for an XML syntax and more powerful abstractions. It can only handle simple structures.
An XSD is a verbose thing with multiple abstractions that makes people yearn for a terser syntax and less to learn. It still can only handle simple structures.
A simple DTD expression
<!ELEMENT ROOT (A?,B+,C*)>
becomes, in XSD (Ref: A Conversion Tool from DTD to XML Schema),
<schema xmlns='http://www.w3.org/2000/10/XMLSchema' targetNamespace='http://www.w3.org/namespace/' xmlns:t='http://www.w3.org/namespace/'> <element name='ROOT'> <complexType content="elementOnly"> <sequence> <element ref='t:A' minOccurs='0' maxOccurs='1'/> <element ref='t:B' maxOccurs='unbounded'/> <element ref='t:C' minOccurs='0' maxOccurs='unbounded'/> </sequence> </complexType> </element> </schema>
That's a bloat of some 1400% from 26 characters to about 400. As you can see, XML schemas very quickly become unreadable by mere humans because they are overly complex and overly verbose.
Interestingly, if you go the W3C schema-hack site referenced above, the examples they give on their page are actually invalid XML. Furthermore, the original document we got this example from (sorry, Joe) had not only copied the invalid XML verbatim but had copied the wrong DTD part and nobody noticed - and this in a doctoral thesis as well. You could not ask for a better way to demonstrate that such a verbose method of describing what should be a simple concept leads to mistakes.
Our hypothesis is that people just switch off when they see an XSD specification. We certainly do.
>>I have some questions related to XML-Dsig: > >Argghh!! Run away! A near-universal reaction.
- from Why XML Security is Broken by Peter Gutmann.
More quotes from Peter Gutmann:
- XML is an inherently unstable and therefore unsignable data format. XML-Dsig attempts to fix this via canonicalisation rules, but they don't really work.
- The use of an "If it isn't XML, it's crap" design approach that leads to the rejection of conventional, proven designs in an attempt to prove that XML was more flexible than existing stuff.
- XML security gives you the flexibilty to shoot yourself in the foot in a dozen different ways without even knowing it.
- "Secure XML", the definitive reference on the topic, spends fully half of its 500-odd pages trying to come to grips with XML and its canonicalistion problems, without really ever resolving things. In fact it reads more like a 250-page essay on how not to do things than a solution.
- Since there's only one logical way to structure secured data, it'd be obvious to anyone that all they'd done was reinvent the wheel in XML. To avoid this problem as well, they reinvented the wheel in XML, but made it square to avoid accusations that they'd just reinvented the wheel.
- It's impossible to create something that's simply a security component that you can plug in wherever you need it, because XML security is inseparable from the underlying XML processing system. This breaks the basic principle of modularity, and ensures that XML security toolkits will be created either by XML vendors with little knowledge of security or security vendors with little knowledge of XML, a recipe for disaster.
We agree with all of the above. As a member of the latter set in the final paragraph, we have managed at great effort to make XML-Dsig work in particular circumstances, and we have written up some how-to pages on the subject to help others. We still don't like it.
Meanwhile, here is some assistance from [XML-C14N] to help you do canonicalization:
The XPath 1.0 Recommendation defines the term node-set and specifies a data model for representing an input XML document as a set of nodes of various types (element, attribute, namespace, text, comment, processing instruction, and root). The nodes are included in or excluded from a node-set based on the evaluation of an expression. Within this specification, a node-set is used to directly indicate whether or not each node should be rendered in the canonical form (in this sense, it is used as a formal mathematical set). A node that is excluded from the set is not rendered in the canonical form being generated, even if its parent node is included in the node-set. However, an omitted node may still impact the rendering of its descendants (e.g. by augmenting the namespace context of the descendants).
From About this site.
Back in 2004 it looked like XHTML was the way to go. After all, it had an "X" in it, so it must be cool. However, at this time (2010), IOHO it is actually more worthwhile to create pages in Strict HTML 4.01 than in XHTML Transitional. W3C is now looking at HTML 5 and recommends using plain HTML 5 in most usages. XHTML2 is being abandoned.
XHTML is over-rated
For the "X" part of XHTML to work properly you need two things to happen:
- Your web server must serve up XHTML documents as proper XML, not
- Your browser must be capable of handing true XHTML documents.
Well, number 1 doesn't happen unless you manage your own server and number 2 never happens at all for any popular browser, so
your XHTML document is almost certainly going to be interpreted as HTML.
Worse, if it actually were interpreted as pure XHTML and there was any error whatsoever, no matter how slight,
you'd just get a big **ERROR** message across your screen in the so-called Yellow Screen Of Death.
Oh, and all those convenient HTML entities like
are invalid in XHTML.
And older browsers will get upset if you include a
<?xml version="1.0"?> declaration.
So our advice is to concentrate on getting your HTML right and make sure you conform to the latest HTML Strict specification.
And once you've done that, it's a trivial exercise* to convert to XHTML, should that ever become really useful to do.
* Write all your tags in lower case; change
<img ...> and the like to
<img ... />;
and make sure you've closed all your
<p> elements, and, er, that's just about it.
- Armstrong, Joe. Making reliable distributed systems in the presence of software errors (PDF), The Royal Institute of Technology Stockholm, Sweden, December 2003.
To comment or flame please send us a message. Yes, we know we are behind the times and not getting with it with the best thing to come out in computer science since sliced bread, but we don't care, because it's not.
This page last updated 2 January 2013