DI Management Home > XML is xhite

XML is xhite


xhite is pronounced zh-ite.

We wrote up this page after we realised we had a common theme on several parts of our site all related to issues with XML. We don't really hate XML. We just think it is over-rated and not as simple to use as advertised. Sorry to all of you who worship at the altar of the great, infallible god XML.

In principle, XML is a brilliantly simple idea. A straight text file with elements defined by tags.

<?xml version="1.0"?>
<document>
  <paragraph>
    Hello world.
  </paragraph>
</document>

Ah, but... XSD Schemas   XML-Dsig   XHTML vs HTML

XSD Schemas

From our page on XML Validation.

An XML Schema provides a means for defining the structure, content and semantics of XML documents. [XML Schema, W3C].

In the bad old days, the structure of an XML document was defined by a Document Type Declaration (DTD). These definitions had a simple syntax and were easy to understand but, sadly, were not in XML and therefore an evil thing. So now we have to use the XML Schema (XSD) which has the advantage of adding the notion of type - 19 primitives and three type constructors at the last count - but at the expense of extreme verbosity and complexity.

We agree with Rick Jelliffe's comparison between DTD and XSD:

A DTD is a terse thing with a simple macro mechanism that makes people yearn for an XML syntax and more powerful abstractions. It can only handle simple structures.

An XSD is a verbose thing with multiple abstractions that makes people yearn for a terser syntax and less to learn. It still can only handle simple structures.

A simple DTD expression

<!ELEMENT ROOT (A?,B+,C*)>

becomes, in XSD (Ref: A Conversion Tool from DTD to XML Schema),

<schema
  xmlns='http://www.w3.org/2000/10/XMLSchema'
  targetNamespace='http://www.w3.org/namespace/'
  xmlns:t='http://www.w3.org/namespace/'>

 <element name='ROOT'>
  <complexType content="elementOnly">
   <sequence>
    <element ref='t:A' minOccurs='0' maxOccurs='1'/>
    <element ref='t:B' maxOccurs='unbounded'/>
    <element ref='t:C' minOccurs='0' maxOccurs='unbounded'/>
   </sequence>
  </complexType>
 </element>
</schema>

That's a bloat of some 1400% from 26 characters to about 400. As you can see, XML schemas very quickly become unreadable by mere humans because they are overly complex and overly verbose.

Interestingly, if you go the W3C schema-hack site referenced above, the examples they give on their page are actually invalid XML. Furthermore, the original document we got this example from (sorry, Joe) had not only copied the invalid XML verbatim but had copied the wrong DTD part and nobody noticed - and this in a doctoral thesis as well. You could not ask for a better way to demonstrate that such a verbose method of describing what should be a simple concept leads to mistakes.

Our hypothesis is that people just switch off when they see an XSD specification. We certainly do.

XML-Dsig

From our pages Signing an XML document using XMLDSIG and XML-Dsig and the Chile SII.

  >>I have some questions related to XML-Dsig:
  >
  >Argghh!! Run away!

  A near-universal reaction.

- from Why XML Security is Broken by Peter Gutmann.

More quotes from Peter Gutmann:

We agree with all of the above. As a member of the latter set in the final paragraph, we have managed at great effort to make XML-Dsig work in particular circumstances, and we have written up some how-to pages on the subject to help others. We still don't like it.

Meanwhile, here is some assistance from [XML-C14N] to help you do canonicalization:

The XPath 1.0 Recommendation defines the term node-set and specifies a data model for representing an input XML document as a set of nodes of various types (element, attribute, namespace, text, comment, processing instruction, and root). The nodes are included in or excluded from a node-set based on the evaluation of an expression. Within this specification, a node-set is used to directly indicate whether or not each node should be rendered in the canonical form (in this sense, it is used as a formal mathematical set). A node that is excluded from the set is not rendered in the canonical form being generated, even if its parent node is included in the node-set. However, an omitted node may still impact the rendering of its descendants (e.g. by augmenting the namespace context of the descendants).

XHTML vs HTML

From About this site.

Back in 2004 it looked like XHTML was the way to go. After all, it had an "X" in it, so it must be cool. However, at this time (2010), IOHO it is actually more worthwhile to create pages in Strict HTML 4.01 than in XHTML Transitional. W3C is now looking at HTML 5 and recommends using plain HTML 5 in most usages. XHTML2 is being abandoned.

XHTML is over-rated

For the "X" part of XHTML to work properly you need two things to happen:

  1. Your web server must serve up XHTML documents as proper XML, not text/html.
  2. Your browser must be capable of handing true XHTML documents.

Well, number 1 doesn't happen unless you manage your own server and number 2 never happens at all for any popular browser, so your XHTML document is almost certainly going to be interpreted as HTML. Worse, if it actually were interpreted as pure XHTML and there was any error whatsoever, no matter how slight, you'd just get a big **ERROR** message across your screen in the so-called Yellow Screen Of Death. Oh, and all those convenient HTML entities like &nbsp; are invalid in XHTML. And older browsers will get upset if you include a <?xml version="1.0"?> declaration. So our advice is to concentrate on getting your HTML right and make sure you conform to the latest HTML Strict specification. And once you've done that, it's a trivial exercise* to convert to XHTML, should that ever become really useful to do.

* Write all your tags in lower case; change <br>, <img ...> and the like to <br/>, <img ... />; and make sure you've closed all your <p> elements, and, er, that's just about it.

We agree entirely with Peter-Paul Koch and his opinion to "X It Off Your List" and with Zytrak's more recent comments XHTML vs HTML.

Reference

Contact

To comment or flame please send us a message. Yes, we know we are behind the times and not getting with it with the best thing to come out in computer science since sliced bread, but we don't care, because it's not.

This page last updated 2 January 2013