Design Techniques

XSL: Extensible Stylesheet Language

Validating to XHTML using XML

Creating a valid XHTML web page is quite simple to do now. Get the right DOCTYPE tag, make sure all tags are closed and use a CSS for layout (to make it really simple). Macromedia's Dreamweaver MX even does a lot of this for us now but what's the deal if you have gone the next step and are using XML and XSL files to create a XHTML page? The XML tutorial talked about how to actually build a site using XML and XSL but did not go into validation in any depth. This tutorial looks briefly at why validation is important and then how I have achieved it using a XSL.

Why Validation is Important

There are plenty of technical explanations for this (as I have found) but put simply validation is important to web designers because a valid page will appear the same in all standards compliant browsers. Designing to standards is a good enough reason by itself but factor in that it makes our job much easier if we no longer have to guess how a page will appear in a specific browser or test in every browser/device we want the site to appear on and it's a no brainer.

I thought I had this sorted out until I was testing a XML based web site that used a CSS for layout and was supposedly validating to XHTML-transitional on the Mac version of Internet Explorer 5. The site worked well in the PC versions of Mozilla, IE 5, Opera 6 and the Linux versions of Mozilla and Konqueror and I figured it was valid until I saw the Mac view of it...and then ran it through the W3C validator (which I should have done right at the start). My XSL was creating HTML that was forcing the browsers into "quirks" mode which some browsers rendered well while others didn't. There were two main issues:

  1. I was telling the XSL to output HTML and not XML. XHTML is designed to bring HTML inline with XML so the browsers view it as XML. My HTML output command was confusing them.
  2. I was giving the browsers two different encoding instructions - UTF-8 (in the XML) and iso-8859-1 (in the HTML). This was really causing some problems
  3. I was not specifying the namespace in the XHTML which gave it an identity crisis.

The Solution

The solution lies completely in the XSL which makes fixing the problem incredibly easy...one file, one change. Here is the start of the XSL that now creates a valid XHTML page.

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" omit-xml-declaration="yes" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" />

The first line is standard to XSL stylesheets as it simply says "I am a XSL stylesheet" by declaring the XML name space (xmlns) as XSL. The second line, the xsl:output, is the key to all of this. It defines a set of rules for how the final page will be set up. Let's have a look at it:

method="xml": tells the XHTML page that it is part of the XML family and the browser renders the XHTML correctly with all tags closed etc. This is the one I had set to HTML originally.

indent="yes": tells the code to be laid out with indents and basically makes the source code easier to read and understand. If you set it to 'no' the code is set up without wrapping or indents.

omit-xml-declaration="yes": hides the <?xml version="1.0" encoding="UTF-8"?> part of the XHTML.

doctype-public and doctype-system: the two parts of a valid DOCTYPE. This should look fairly familiar.

The next part to this is to tell the XHTML output that it is XHTML. This is done in the <html> tag and looks like this: <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">. Looks nasty but all it's doing is defining the HTML as XHTML using english as the language.

All that was required now was to put a meta tag in the <head> that specified the encoding I wanted to use (in this case UTF-8 which matched the XML file encoding) which looks like this:

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />

...and that's it. After checking that all the XHTML in the XSL was actually valid I was good to go. Another check in the W3C validator showed no errors and only valid XHTML. As I am using XSLs to define all my web sites now I have a template that will create valid XHTML. As the browsers improve and the specs change the XSL files make updating the sites a realistic job and the valid code should be sound for a few years yet.

Update:

The site now uses multiple namespaces to define the newsitems and content which has changed the requirements for validation slightly. The new namespaces are referenced in the XML with the prefixes dw: (Docbook-web subset) and :rss (format for syndicated News which works equally well for my latest updates section).

For validation (and to render) these new namespaces need to be referenced in the XSL:

<xsl:stylesheet version="1.0" xsl:exclude-result-prefixes="dw rss" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:dw="http://iopen.net/schema/docbook-web/0.9" xmlns:rss="http://purl.org/rss/1.0/"/>

The xsl:exclude-result-prefixes="dw rss" is important for validation as it essentially stops the namespaces from being referenced in tags they shouldn't be referenced in (eg. <html>). Multiple prefixes can be excluded by separating with a space.

Valid XHTML 1.0

 
Interface 1
Interface 2
Interface 3
Interface 4
Interface 5