Hack 74. Get a Taste of E4X Scripting

Learn the future of XML scripting techniques.

The JavaScript language is officially named ECMAScript. E4X (ECMAScript for XML) is the new ECMA-357 standard (at http://www.ecma-international.org) that extends Edition 3 of ECMAScript. It adds a drop of extra syntax that makes it easy to manipulate XML. This hack presents a brief tour of these features.

E4X is new, and it isn't available in Firefox 1.0. It is being implemented rapidly, though, and is likely to be present in the 1.1 release or thereabouts. It's fun to play with, and it's the future of XML scripting.

6.18.1. Where E4X Fits In

JavaScript code is just one way of manipulating the content of an XML document in the browser. If you want to do so, you have to embed or attach a JavaScript script to that document. That remains the case for E4X.

Once JavaScript is running inside a web document, there are four established ways to interact with that document's content. To briefly review, you can:

Use widely supported but nonstandard DOM 0 JavaScript host objects, such as document.images[3]. Strengths: simple. Weaknesses: limited to HTML.
Use W3C DOM 1, 2, or 3 interfaces, usually starting with document.getElementById() or Microsoft's nonstandard document.all( ). Strengths: generic to all XML. Weaknesses: verbose.
Use nonstandard .innerHTML features to turn a string containing XML into DOM content, or more rarely use the DOM 3 Load & Save interfaces. Strengths: simple. Weaknesses: nonstandard.
Use XPath query patterns [Hack #63], perhaps inside XSLT [Hack #64] . Strengths: powerful. Weaknesses: limited coding features.

E4X uses syntax as simple as the first method to achieve all of the most common uses of the other three. It consists of:

A starting point integrated with the familiar (to many) JavaScript environment
New, simple syntax that makes HTML and XML element access easy
Four new native JavaScript objects, if you happen to like objects
A near-invisible connector to the non-E4X DOM objects in the XML document

6.18.2. Setting Up a Playpen for E4X

You can play with E4X in any old web page, with one restriction. If your script wants to take advantage of E4X, then this won't work for you:

<script type="text/javascript" src="test.js"></script>

This is the new way forward:

<script type="text/javascript;e4x=1" src="test.js"></script>

For this simple exploration, it doesn't matter too much what your test page looks like. Since E4X is intended for XML first and foremost, we're doing the right thing, using XHTML 1.0 instead of HTML 4.01. Here's a simple test document:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
     "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <script type="text/javascript;e4x=1" src="test.js" />
  </head>
  <body>
    Test me! <p id="tag-with-no-content" />
  </body>
</html>

Notice the special value for the type attribute. Suppose we also have both the document and a specific tag available in two convenient variables. These might be set up somehow by the test.js script:

var doc = window.document;
var tag = document.getElementById("tag-with-no-content");

So far, this is all plain JavaScript 1.5 or ECMAScript Edition 3. Remember, you have to have an E4X-enabled version of Firefox. Check the release notes for 1.1 and later for details. At worst, you can compile the SpiderMonkey test program called simply js. But that's another story.

6.18.3. Experiment with E4X Features

So let's play. Let's first add two XML tags the old way. We'll add non-XHTML tags, just for fun. First, let's use dirty nonstandard tricks:

tag.innerHTML = '<list><item type="round">ball</item></list>';

Note how two types of quotes are required, but at least it's short. Next, let's use existing DOM standards:

var text = doc.appendChild(doc.createTextNode("ball"));
var list = doc.createElement("list");
var item = doc.createElement("item");
    item.setAttribute("type", "round");
    item.appendChild(text);
    list.appendChild(item);
tag.appendChild(list);

That's quite verbose, but at least it's portable. Now, use E4X standard object syntax:

// 'tag' is an E4X object as well as a DOM one.
tag.list.item = "ball";        // add the two tags, and innermost content.
tag.list.item.@type = "round"; // add an attribute, and give it a value.

The syntax on the left side is the E4X quick way of stepping down through a tag or element hierarchy. That's convenient by itself. Even better though, if the tags don't exist, they're automatically created as they're referenced (because the tag object is an XML object). Best of all, the right-hand side is automatically added as the content of the specified left-hand side. Very simple.

E4X provides an alternate, XML-based syntax. Here's the same addition as the previous bit of code:

var list = <list>
             <item type="round">ball</item>
           </list>;
tag += list;

In this example, the XML is literally and lexically part of the JavaScript syntax. No quotes or translation functions are required; it's all automatic. Note how the XML isn't trapped inside a string. The second assignment adds special and convenient semantics to the += operator. It's equivalent to this DOM 1 code:

tag.appendChild(list);

Suppose the data added isn't all static. E4X XML content can be also constructed using variables or expressions, as shown in this final alternative:

var shape = "round";
var type = 1;
var thing = "item";
var list = <list>
             <{thing} type={shape}>
               { (type == 1) ? "ball" : "stick"; }
             </{thing}>
           </list>;
tag += list;

Everywhere you see braces (curly brackets), a JavaScript expression can be put in. To do the same thing using traditional notations, even .innerHTML, used to require complicated string concatenations. Not anymore.

You can also query the XML content very simply using E4Xno XPath required. This line returns any and all tags named <item> that are in the tag hierarchy held by list, no matter how deeply nested they are:

var items = list..items;

A set of objects is returned if there's more than one match. For our sample data, there's only one match. This syntax is equivalent to the use of // in XPath.

This further line returns all immediate child tags of the <list> tag, which happens to be the same result as the previous case:

var items = list.*;

This final line returns all the attributes of the <item> tag as a list:

var atts = list.item.@*;

E4X supports XML namespaces and a few other goodies as well. Unlike the ECMAScript standard, the E4X standard is easy to read. Download a copy today from http://www.ecma-international.org/.