Converting XMod XML Export to CSV
Jun
14
Written by:
6/14/2007 3:29 PM
There have been a number of questions recently about how to export XMod data to a comma-delimited format - also called a CSV format (for Comma Separated Values). Figuring it out is fairly tricky, but once you get it working, the process is pretty simple. In this article, I'll describe how to take the XML export file that XMod creates and convert it, using XSLT, to a CSV file.
There have been a number of questions recently about how to export XMod data to a comma-delimited format - also called a CSV format (for Comma Separated Values). Figuring it out is fairly tricky, but once you get it working, the process is pretty simple. In this article, I'll describe how to take the XML export file that XMod creates and convert it, using XSLT, to a CSV file.
The XMod XML Export File
For this example we'll be using a sample export file that contains basic user information such as first and last name, address, city, state, and postal code (zip code). The file's content is listed below:
<?xml version="1.0" encoding="UTF-8"?>
<xmodexport version="4.0" generationdate="2006-02-27 22:52:04.984" portalid="0">
<records>
<record id="25" formid="83" portalid="0" adduser="2" updateuser="2"
approved="true" dateadded="2006-04-18 00:00:00.000" datemodified="2006-04-18 00:00:00.000"
displaydate="2006-04-17 00:00:00.000" expirydate="9999-12-31 23:59:59.000">
<InstanceData><![CDATA[<instance>
<FirstName>Kelly</FirstName>
<LastName>Ford</LastName>
<Address>123 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>]]>
</InstanceData>
</record>
<record id="23" formid="83" portalid="0" adduser="2" updateuser="2"
approved="true" dateadded="2006-04-18 00:00:00.000" datemodified="2006-04-18 00:00:00.000"
displaydate="2006-04-17 00:00:00.000" expirydate="9999-12-31 23:59:59.000">
<InstanceData><![CDATA[<instance>
<FirstName>John</FirstName>
<LastName>Adams</LastName>
<Address>234 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>]]>
</InstanceData>
</record>
<record id="165" formid="83" portalid="0" adduser="2" updateuser="2"
approved="true" dateadded="2006-04-18 00:00:00.000" datemodified="2006-04-18 00:00:00.000"
displaydate="2006-04-17 00:00:00.000" expirydate="9999-12-31 23:59:59.000">
<InstanceData><![CDATA[<instance>
<FirstName>George</FirstName>
<LastName>Washington</LastName>
<Address>345 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>]]>
</InstanceData>
</record>
</records>
</xmodexport>
Since the export file's format is designed primarily for backup and re-importation into XMod, most of the data for each record is enclosed in a CDATA tag, as highlighted in the first record. This makes it more difficult to process. So, the first thing we'll need to do is extract that data and save it as a separate XML file.
Pre-Processing the Export File
To do this, we need to create an XSLT stylsheed that will output the XML content but do so as plain text. It's a small, but important distinction. Our XSLT stylesheet is below:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/xmodexport/records">
<![CDATA[<records>]]>
<xsl:for-each select="record">
<![CDATA[<record id="]]><xsl:value-of select="@id"/><![CDATA[">]]>
<xsl:value-of select="InstanceData"/>
<![CDATA[</record>]]>
</xsl:for-each>
<![CDATA[</records>]]>
</xsl:template>
</xsl:stylesheet>
This stylesheet is similar to most you'll see. After the obligatory declarations, the <xsl:output> tag has its method set to "text"
<xsl:output method="text" version="1.0" encoding="UTF-8" indent="yes"/>
. It then includes a template which matches the <records> tag in the export file.
<xsl:template match="/xmodexport/records">
Next the stylesheet outputs an opening <records> tag. This will be the root element in our output XML. To keep the XSLT processor from balking at the temporarily mal-formed XML, we cloak the text using a CDATA element.
<![CDATA[<records>]]>
Next, we loop through each <record> tag in the export file and output its contents. Notice that we are manually creating our output <record> tag, cloaking it with CDATA, but grabbing the record's ID using the <xsl:value-of> tag. The end result (assuming the record ID is 25) would be <record id="25">. Next, we grab the contents of the exported record's <InstanceData> tag. This is where our form's data is held. Finally, we close the <record> tag - again, using CDATA to hide the XML from the XSLT processor, close the for-each loop, and close the <records> tag.
<xsl:for-each select="record">
<![CDATA[<record id="]]><xsl:value-of select="@id"/><![CDATA[">]]>
<xsl:value-of select="InstanceData"/>
<![CDATA[</record>]]>
</xsl:for-each>
<![CDATA[</records>]]>
Running our stylesheet against the sample export file results in output like this:
<records>
<record id="25">
<instance>
<FirstName>Kelly</FirstName>
<LastName>Ford</LastName>
<Address>123 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>
</record>
<record id="23">
<instance>
<FirstName>John</FirstName>
<LastName>Adams</LastName>
<Address>234 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>
</record>
<record id="165">
<instance>
<FirstName>George</FirstName>
<LastName>Washington</LastName>
<Address>345 Main St</Address>
<City>MyCity</City>
<State>WA</State>
<Zip>12345</Zip>
</instance>
</record>
</records>
Save this file. In the next step we'll be running it through a different XSLT stylesheet to generate our CSV file.
Generating the CSV File
Now that we have our data in a usable XML format, we can run it against a new stylesheet for converting it to comma-separated-values. Here's the stylsheet
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="records">
<xsl:text disable-output-escaping="yes">ID,FirstName,LastName,Address,City,State,Zip
</xsl:text>
<xsl:for-each select="record">
<xsl:text/><xsl:value-of select="@id"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/FirstName)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/LastName)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/Address)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/City)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/State)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/Zip)"/><xsl:text/>
<xsl:text disable-output-escaping="yes">
</xsl:text>
</xsl:for-each>
</xsl:template>
<xsl:template match="records/record">
</xsl:template>
</xsl:stylesheet>
The first thing you may notice is our stylesheet isn't indented, like most XML examples you see. That's because the spacing in our stylesheet will affect the spacing in our output file, since it is text. Most of the items in the stylesheet are designed to keep all the values from a single record on the same line. We make extensive use of the <xsl:text> tag, which outputs plain text.
<xsl:output method="text"/>
This line, tells the XSLT processor that the stylesheet should output plain text.
<xsl:template match="records">
<xsl:text disable-output-escaping="yes">ID,FirstName,LastName,Address,City,State,Zip
</xsl:text>
This section matches the <records> tag - the root tag we created in our previous stylesheet. The <xsl:text> tag in this section creates a field-name row and is optional.
<xsl:for-each select="record">
<xsl:text/><xsl:value-of select="@id"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/FirstName)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/LastName)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/Address)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/City)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/State)"/>,<xsl:text/>
<xsl:value-of select="normalize-space(instance/Zip)"/><xsl:text/>
<xsl:text disable-output-escaping="yes">
</xsl:text>
</xsl:for-each>
The bulk of the work is done in this section. It loops through each child <record> tag in the <records> root tag. For each <record>, it outputs the record's ID - from the "id" attribute. Since we're building a comma-separated values file, each field is separated by a comma. If your data will contain commas you'll need to consider some other character(s) to use. After the comma, you'll see another <xsl:text/> tag. This is used for readability. It allows us to place each field value on a separate line but not have that line-break output. Otherwise, we would need to put all the fields on one line.
<xsl:value-of select="normalize-space(instance/FirstName)"/>,<xsl:text/>
All field values follow the same basic pattern as the line above. We use the "normalize-space" function for those fields whose values in the XML data file may span more than one line or contain extra space. This function eliminates that extra whitespace and helps keep the values on one line, as required by the CSV format. Since all our fields are contained by the <instance> tag, the expression used to retrieve them follows the pattern: instance/FieldName. So, the FirstName field is "instance/FirstName".
<xsl:text disable-output-escaping="yes">
</xsl:text>
Finally, we use another <xsl:text> tag that has output escaping disabled. By doing this and having the tag span two lines, we effectively insert a line break, which leaves the output ready for the next line of field values.