As a ColdFusion developer, hopefully by now you have heard at least a little about XML (eXtensible Markup Language). Wikipedia defines XML as a "general-purpose markup language" designed to "facilitate the sharing of data" and also designed to be "relatively human-legible."
If you need primer on the basics of what XML is and the rules of how it's structured, a good place to start is http://en.wikipedia.org/wiki/XML.
Nowadays we see XML in regular daily use. RSS Feeds, podcasts, and Web Services are all XML, or use XML to exchange information. Even HTML is related and basically a type of XML (loosely, but that's a long discussion for another time). It's no coincidence that we see XML in use for mass-distributed or syndicated content - XML is designed for a few key purposes: 1) as a format that can easily be interpreted by software, and 2) containing only the data and its structure, with no formatting.
A complete collection of data as shown in Listing 1 is called an XML Document. XML Documents should be "well-formed" - meaning that they conform to all the rules of XML syntax, with no unclosed tags. (Note: The "<street2/>" element, which is empty, uses a trailing slash denoting that it is an empty element. This is essentially the same thing as typing out "<street2> </street2>.")
XML Documents that are meant to comply to a standard format will have an XML Schema definition, or a DTD (Document Type Definition), which defines a tight set of rules as to which content and tags can and can't be contained in an XML Document of that type. XML Documents are considered "valid" if they conform to their DTD, although for individual sites, many XML documents will not have a DTD and having one is not required.
So how do ColdFusion programmers use and work with XML? ColdFusion is supposed to empower programmers to build sites and Web applications faster, right? So what tools are available to help us get the job done faster when it comes to XML? That's the primary focus of this article. We'll focus on the building blocks of how ColdFusion works with XML and then show how to use some of the other useful functions.
The built-in functions in CFML that we'll be covering are Xml Parse(), XmlNew(), XmlElemNew(), XmlSearch, and XmlValidate().
Reading XML Documents
Reading and using data from an XML document is the first, and easiest, lesson. The key to coding a project that uses XML is understanding the mechanics of XML. In the past (pre-XML days) if we wanted to pull data from inside text (such as a first name from the data in Listing 1), we'd have needed to search the string and use mid() functions to extract what we want. This is a tedious process. Now with XML we can translate a well-formed XML document into something ColdFusion can understand. We use the XmlParse() function to convert this from structured text into an XML Object, which is basically a collection of structures and arrays.
Take a look at Listing 2 and Listing 3. In this example, we read details of an order in XML format and display the results, formatted with HTML.
On line 1, we read the contents of the XML document into a string named strXmlOrder. In this example we are reading a file from the local file system, however, you could have just as easily read this data from a database or a remote server using CFHTTP or a Web Service call.
On line 5 we parse the XML using XmlParse(). This function transforms the XML from its textual form into a ColdFusion XML Object called XmlOrder. Try doing a <cfdump var="#XmlOrder#"> here and take a look at the results. What you'll see is that the XmlOrder object is an organized collection of structures and arrays.
The XML Object is organized as a tree of XML Elements (an element is an opening and closing tag and any of its contents - including other elements) and attributes (values that are part of the opening tag).
Look again at Listing 2 and let's put things into perspective:
<order> is the xmlroot, and also happens to be an XML Element because it has an opening and closing tag. The <order ... > element has one attribute called "id," which has a value of "E10645."
The <customer ...>, <items>, and each of the <item> tags are also XML Elements. "Fname," "lname," and "memberid" are attributes of the <customer ...> element and so on; you get the idea.
Under the xmlroot element, all other "children" elements are organized in an array called XmlChildren. So <customer...> and <items> are both children of <order>, and as such are stored in the XmlChildren array, as element 1 and 2 accordingly.
The Items element has three children elements (the three individual items), and each of those items has three children elements (name, quantity, and price).
Now that you have a feel for how this data is organized, let's go back to Listing 3.
On line 11 we display the order id. Since "id" is an attribute of the <order ...> element, it's neatly put in a structure called XmlAttributes. So we can reference it like so:
XmlOrder.order.XmlAttributes.id
Since the <order> element also happens to be the xmlroot, we can reference the same attribute like so:
XmlOrder.xmlroot.XmlAttributes.id
Since XmlAttributes is a standard ColdFusion structure, any functions for struct manipulation will work on it as well, so for example if you wanted to retrieve the list of attributes from the <order...> element, you could use StructKeyList(XmlOrder.order.XmlAttributes).
On lines 15 and 19, attributes of the <customer ...> element are displayed. The customer element is a child of xmlroot, so we refer to it as "XmlOrder.xmlroot.XmlChildren[1]" (the array index is 1 since it appears as the first child of the xmlroot), or we can refer to it by a name like "XmlOrder.xmlroot.customer." To access attributes of the customer element we simply refer to the XmlAttributes struct.
Jump down to line 30 where we are doing a loop over each item in the <items> element. Since each <item> is a child of <items>, it is organized in an array called XmlChildren. Unlike the previous example, we can't refer to each of the items by its name because there are three elements with the same name. Instead, since XmlChildren is an array, we just loop over the array and refer to each <item> with its array index.
On line 31 we are setting a local variable xnItem to each item of the XmlChildren Array. As you may have noticed, XML notation can get long. You can use variables like this to shorten references to your XML Elements. This makes the code a little more human-readable and a lot easier to maintain.
Lines 33-36 use the same techniques as above to reference the information in the XML Document. Note that the Item ID is an attribute, while the other pieces of information are Child elements. These child elements refer to a variable called XmlText. All XML Elements have two property variables: XmlName and XmlText. XmlName always contains the name of the element (or tag), while XmlText is the text between the opening and closing tags that isn't inside any other elements. So for an Xml Element like "<material>wood</material>," XmlName would be "material" and XmlText would be "wood." Applying this to our example, XmlOrder.xmlroot.XmlName would be "order."
Creating New XML Documents
So far the only XML-specific ColdFusion tag we've used is XmlParse(). I highlight this to emphasize the point that once you parse your XML into an XML Object, it is basically a ColdFusion structure of arrays and structures, and you don't really need any more XML-specific tags or functions to read the data.
Now, let's move on to creating new XML documents. There are a few ways to do this. The first way - which some consider to be "cheating" - is fast and easy.
You can simply hand-code your XML in a CFML document and wrap <cfsavecontent> tags around it (see Listing 4). This is extremely easy because you can drop in your own variables and any other CFML you want right inline with your XML. Note that with this method, you're basically hand-creating a string, which has to be parsed in an XML Object.
You can also use the <cfxml> tag, which works pretty much the same, except that instead of ending up with a string, you get the XML Object without having to call XmlParse() (see Listing 5).
Functionally there's not too much difference between these methods, both are quick and dirty and get the job done.
Now, here's the programmatic way to create a new XML Document:
<cfset myXml = XmlNew()>
<cfset myXml.xmlroot = XmlElemNew(myXml,"collection")>
The first line creates a new empty XML Object. The second line creates a new XML Element and assigns it to the xmlroot of your XML Object. Note that XmlElemNew() takes two parameters: the first is the XML Object, and the second is the name of the element you're creating. Try doing a <cfdump var="#myXml#"> and look at the results. Note that when the XML Object is created, it's already a ColdFusion object, not a plain text format. If you want to see what the XML looks like in its text format, try doing a <cfdump var="#toString(myXml)#">.
Now lets add an attribute to our new XML Element.
<cfset myXml.xmlroot.XmlAttributes.name = "My CDs">
Remember that XmlAttributes is a struct, so you can also use any of ColdFusion's struct functions here, such as StructInsert or StructUpdate.
Now let's create a new element for our CD Collection and give it some attributes:
<cfset xnCD = XmlElemNew(myXml,"cd")>
Here we're creating an XML Element that isn't attached to the XML Document right now. If you did a <cfdump var="#myXml#">, you wouldn't see this new element appear in myXml anywhere.
The technique of creating "unattached" XML elements is an important technique to understand. When we create an XML Element, we can refer to it with its short variable name and manipulate it as much as we want, including adding attributes and child elements to its XmlChildren array.
Let's add a few attributes to our xnCD element:
<cfset xnCD.XmlAttributes.title = "The best of Billy Joel">
<cfset xnCD.XmlAttributes.cover = "billjoel_best.jpg">
<cfset xnCD.XmlAttributes.genre = "Easy Listening">
Now that we have added some attributes, take a look at your XML Element by doing a <cfdump var="#xnCD#"> and make sure that the values you expected show up properly.
Once you're happy with this element, let's plug it into the XML Document. If you remember, I mentioned earlier that each XML Element has a property called XmlChildren, which is an array of XML Elements. Since it's an array, let's use ColdFusion's ArrayAppend() function to add our new element to the XML Document.
<cfset ArrayAppend(myXml.xmlroot.XmlChildren,xnCD)>
Now let's take a look at the XML Object by doing a <cfdump var="#myXml#"> and see that our xnCD Element has been added to our XML Document, and now appears in XMLChildren under collection. This technique of assembling XML elements and then "plugging them into" the XML Document is a structured and reliable way for you to programmatically build XML Documents. You can use queries and loop constructs to build XML from database results or from FORM data and ensure that the resulting XML is well formed. As an exercise, try adding some "audio track" XML elements to the above CD by creating new XML elements and appending them to xnCD.XmlChildren before adding it to your XML Document.
Now that we're done creating our XML Document, getting the actual XML Text is as simple as calling ColdFusion's toString() function. So to write the XML to a local file we could do:
<cffile action="write" file="#ExpandPath(".")
#/collection.xml" output="#tostring(myXml)#">
Validating XML Documents
The examples we worked with so far are very basic, and most of us will work with much more complex data. So what happens when we have XML data that may not be valid (see the definition earlier), or even worse, what if it's not well formed? (again reference above).
When you try to parse an XML document, occasionally there will be problems, specifically if the document itself is not well formed. Unfortunately the error you receive is not always very user-friendly such as "An error occurred while parsing an XML document."
If you have XML that is 10 lines long, chances are you can just take a quick look at it and figure out the problem, but if your XML Document has 2,000 rows of data, it's no longer a trivial process. That's where XmlValidate() comes into play.
This valuable and under-used function checks whether an XML Document is well formed and valid, and it also returns a list of what's wrong with the XML Document. That list is exactly what we need to "fix up" or verify that the data we have is usable.
XmlValidate() takes two arguments, the first is the XML you want to check. This can be text you read from an XML file, an XML ColdFusion Object you created, or the path and filename of the XML file on the Web server. The second argument is optional and is for the DTD, and can also be a string, URL, or path and filename.
When you call XmlValidate, it'll check your XML Document and return a struct with several valuable parts: Errors is an array that contains any errors in the document that prevent it from being valid (i.e., complying with the DTD); FatalErrors is an array that contains any errors in the document that prevent it from being well formed (and parsable using XmlParse); Status tells us whether all the tests were passed or not and returns a YES or NO; Warning is an array of any warnings that XmlValidate() found while checking the XML Document.
The FatalErrors array we get from XmlValidate is incredibly valuable. It will tell you which line and at which character it found a problem, and in my experience provides relatively helpful error messages.
Searching XML Documents Using XmlSearch()
We now understand that we can parse an XML Document and get a ColdFusion XML Object that is a bunch of arrays and structures, so we can read pieces as we want.
What if you only want to pull out a few parts of an XML Document, or if you are only interested in one type of XML Element from your document?
XmlSearch is designed to let you search your XML document for specifically named XML Elements. Let's look again at Listing 2. If we wanted to get an array of all the <item> elements, we could refer to the XmlChildren array as we did before. But what if this XML Document contained multiple orders? Then for each order there would be a different set of <item> elements, one set for each order. Getting a list of all the <item> elements is a whole lot more complex in this scenario. Instead of trying to pull apart the information we want, let's use XmlSearch() instead (see Listing 6).
XmlSearch() takes two parameters, the first is your ColdFusion XML Object and the second is an XPath expression. XPath, according to Wikipedia, is an expression language for addressing portions of an XML Document. We are only going to use it for basic purposes, but if you want to read up on XPath, a good place to start is with www.w3.org/TR/xpath.
On line 8 we are calling XmlSearch. The first argument is the XML Object we are working with, and the second is the XPath expression. This simple expression simply states "All XML Elements where the XMLName of the Element is Item."
The return type from XmlSearch is an array, so we save the results in an array variable called myResultsArray.
In the results, each item of the array is the actual XML Element that XmlSearch() found. This means that each search result will have the same Structure values as any other XML element, including XMLName, XMLText, XMLAttributes, and XMLChildren.
Once you have your results, you can use any of the techniques above to read and use the data that was found.
What to Study Next
We certainly can't cover every aspect of programming with XML here, but luckily you have the Internet at your fingertips.
Some of the recommended topics for study are listed below:
• XPath Language - http://en.wikipedia.org/wiki/XPath
• CDATA (character data embedded in XML) - http://en.wikipedia.org/wiki/CDATA
• XSL - http://en.wikipedia.org/wiki/Extensible_Stylesheet_Language
• XmlTransform() - ColdFusion function - search Google
Summary
I hope this has been a useful overview of coding with XML and the various built-in tools in ColdFusion. These tools should help get started with reading and creating well-formed XML.