Developer Guide to the SAXAdapter Project

Mark Priest

Introduction

XML, the Extensible Markup Language, is an extremely useful tool that can be used in a variety of situations for the language- and platform-independent exchange and organization of data. When processing XML documents using the Java programming language there are a number of well-supported APIs that can be used depending on your particular application. The most common APIs for XML in Java are SAX, the Simple API for XML Processing, DOM, the Document Object Model, and JDOM, a more java-friendly version of DOM.

The SAX interface provides a simple, efficient, and powerful way to process XML documents. SAX is underutilized because, ironically, many developers find it difficult and cumbersome to use. The tree oriented APIs such as DOM and JDOM are often used instead because they are viewed as simpler to work with and as having less of a learning curve. In this document I will present a utility that sits on top of the SAX interface and that greatly simplifies the use of SAX while preserving most of its efficiency and all of its power. By using a utility like the SAXAdapter that I will present, SAX truly becomes a "simple" XML API with less of a learning curve than the more complex tree-based APIs such as DOM.

The SAX API

David Brownell and David Megginson currently maintain the SAX API as a Sourceforge open source project at http://www.saxproject.org/. SAX is an event-based API rather than a tree-based API. This means that parser events are reported in the order they occur to your application code during the parsing process. Tree APIs such as DOM first parse an entire XML document and then present an API that application programs can use to navigate a tree structure that represents the XML document content. One advantage of using SAX is that only a small portion of data resides in memory during parsing. A SAX parser reports new elements and attributes as they are encountered, rather than preloading them into memory as in DOM. The garbage collector can reclaim elements and attributes that have already been reported while parsing continues. Another advantage of SAX is that parsing can continue as expected up to the point that a syntax error is encountered. If you choose to handle the error yourself you may even be able to continue parsing after handling the error. With tree-based APIs such as DOM, parsing fails when syntax errors are encountered and it is not possible to recover any information from the source document.

The current version of SAX is SAX2, which is a fairly significant change to the original SAX interface. The SAX web site provides a good, brief summary of the history of the SAX interface. The main motivation for the revision of the SAX interface was adding support for the W3C XML namespaces recommendation. The XML Namespaces Tutorial provides additional information about XML Namespaces and SAX2.

SAX2 defines four core interfaces, ContentHandler, DTDHandler, EntityResolver, and ErrorHandler in the org.xml.sax package; and two optional extension interfaces, DeclHandler and LexicalHandler in the org.xml.sax.ext package, that are used to implement callbacks for the various parsing events. Most of the time you will only be concerned with a subset of the methods in one of these interfaces, org.xml.sax.ContentHandler. I have included a subset of the ContentHandler source with the most useful callback methods in Listing 1 (txt).

Listing 1. A Useful Subset of Methods in ContentHandler

  /**
   * Receive notification of the beginning of a document.
   *
   * <p>The SAX parser will invoke this method only once, before any
   * other methods in this interface or in
   * {@link org.xml.sax.DTDHandler DTDHandler}
   * (except for {@link #setDocumentLocator
   * setDocumentLocator}).</p>
   */
  public void startDocument ()
	  throws SAXException;


  /**
   * Receive notification of the end of a document.
   *
   * <p>The SAX parser will invoke this method only once, and it
   * will be the last method invoked during the parse.
   * The parser shall not invoke this method until it has either
   * abandoned parsing (because of an unrecoverable error) or
   * reached the end of input.</p>
   */
  public void endDocument()
	  throws SAXException;


  /**
   * Receive notification of the beginning of an element.
   *
   * <p>The Parser will invoke this method at the beginning of every
   * element in the XML document; there will be a corresponding
   * {@link #endElement endElement} event for every startElement event
   * (even when the element is empty). All of the element's content
   * will be reported, in order, before the corresponding endElement
   * event.</p>
   */
  public void startElement (String namespaceURI, String localName,
          String qName, Attributes atts)throws SAXException;

  /**
   * Receive notification of the end of an element.
   *
   * <p>The SAX parser will invoke this method at the end of every
   * element in the XML document; there will be a corresponding
   * {@link #startElement startElement} event for every endElement
   * event (even when the element is empty).</p>
   */
  public void endElement (String namespaceURI, String localName,
        String qName) throws SAXException;


  /**
   * Receive notification of character data.
   *
   * <p>The Parser will call this method to report each chunk of
   * character data.  SAX parsers may return all contiguous character
   * data in a single chunk, or they may split it into several
   * chunks; however, all of the characters in any single event
   * must come from the same external entity so that the Locator
   * provides useful information.</p>
   */
  public void characters (char ch[], int start, int length)
  	throws SAXException;
  

A SAX2 parser is represented by an implementation of the org.xml.sax.XMLReader interface. You implement the SAX2 parser event interfaces for which you are interested in getting callbacks and register them with the XMLReader using the appropriate set methods. The core interfaces are registered using setContentHandler(), setDTDHandler(), setEntityResolver(), and setErrorHandler(), respectively, and the extension interfaces are registered using the setProperty() method with the extension interface name ("http://xml.org/sax/properties/declaration-handler" and "http://xml.org/sax/properties/lexical-handler", respectively) and the implementing object as arguments. When you call the XMLReader's parse() method and specify an XML data source, the XMLReader becomes the producer of a SAX event stream and the event handler interface objects you registered with the reader are consumers of that event stream. You can obtain a reference to an XMLReader using the static methods of the XMLReaderFactory.

Example 1 (txt) shows a simple XML document with the associated callbacks that would be generated by a SAX2 parser. This example is oversimplified because it only shows the ContentHandler callbacks that are of primary interest. Basically, the parser calls startDocument() at the beginning of the parsing process and endDocument() at the end. For each element there is a call to startElement() and endElement() with calls to characters() in between. Calls to startElement() and endElement() are nested according to the way tags are nested in the XML document.

Example 1. Simple XML Document With Important SAX Callbacks

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <property>
    <name>name</name>
    <value>George</value>
  </property>
</properties>

SAX2 callbacks
  1. startDocument()
  2. startElement() for properties tag
  3. characters() for properties tag with whitespace (spaces and CR/LFs)
  4. startElement() for property tag
  5. characters() for property tag with whitespace
  6. startElement() for name tag
  7. characters() for name tag with "name"
  8. endElement() for name tag
  9. startElement() for value tag
  10. characters() for value tag with "George"
  11. endElement() for value tag
  12. endElement() for property tag
  13. endElement() for properties tag
  14. endDocument()

If you are going to do any serious work with SAX2 then I highly recommend that you get David Brownell's book. Brownell does an excellent job of presenting SAX and gives some good examples.

Difficulty of Using SAX

When you start working with SAX you will quickly realize why this "simple" interface isn't so simple to use in practice. By design, SAX does not keep track of any state for you. For example, the characters() method is called some number of times and reveals the text between tags for a given element. However, you must keep track of the last call to startElement() to know which tag the text belongs to. Start dealing with typical XML documents, which are more complicated than Example 1, and keeping track of all of the SAX parsing state quickly becomes tedious and error-prone. This is the reason why many developers abandon SAX in favor of tree-based APIs after experimenting with SAX.

If you persist in using SAX you may find that you are writing code like that in Listing 2 (txt) in order to handle the events for each element and to match text with the right tags. If so you will spend a lot of time dealing with details of element state and not enough time implementing your application logic. Fortunately, this problem can be overcome by using the principle of abstraction, as is often the case. Extract the element state management code into a common utility class and you are free to write only the code that you need to implement your business logic. One way to implement this abstraction is to use the SAXAdapter utility.

Listing 2. Example of Typical ContentHandler Code

  StringBuffer textBuffer;
  String name, value;

  public void startElement (String namespaceURI, String localName,
			      String qName, Attributes atts)
	throws SAXException
  {
    textBuffer.setLength(0);
    // store off attributes
    if (qName.equals("name"))
    {
      nameType = atts.getValue("nameType");
    }
    else if (qName.equals("value"))
    {
      valueType = atts.getValue("valueType");
    }
    else if .......
  }

  public void characters (char ch[], int start, int length)
	throws SAXException
  {
    textBuffer.append(ch, start, length);
  }

  public void endElement (String namespaceURI, String localName,
			    String qName)
	throws SAXException
  {
    currentText = textBuffer.toString();
    if (qName.equals("name"))
    {
      name = currentText;
      handleNameTag();
    }
    else if (qName.equals("value"))
    {
      value = currentText;
      handleValueTag();
    }
    else if .......
  }
  

SAXAdapter Utility

The SAXAdapter utility both simplifies the ContentHandler interface and provides a parsing model that is a natural fit to most XML processing tasks. First, you instantiate a SAXAdapter, which functions as a SAX2 parser (XMLReader implementation), or producer, of SAX events. Next, you register SAXTagHandler implementations, which are simplified versions of ContentHandler implementations, with the adapter for each tag you are interested in handling. This simplifies the use of SAX in two ways. First, it pares down the ContentHandler interface from eleven methods to two and it provides a simple mechanism for keeping track of parsing state that is generally of interest. Second, it allows you to easily partition handling code by tag, which is usually what you want to do. Instead of writing nested if-then-else structures, you can provide different implementations of SAXTagHandler and the adapter will call them for the appropriate tags. Listing 3 (txt) shows the SAXTagHandler interface and Figure 1 shows a class diagram including important methods for selected utility classes.

Listing 3. SAXTagHandler Interface

/**
 * This interface represents a callback for XML element begin and end
 * events during SAX parsing of an XML document.
 * It is a simplified version of the <code>ContentHandler</code> interface.
 *
 * @author Mark Priest
 */
public interface SAXTagHandler
{

  /**
   * Called by the <code>SAXAdapter</code> when a start tag, for which this
   * callback interface has been registered, has been encountered.
   * @param argTagNamespace namespace URI of tag
   * @param argLocalTagName unqualified tag name
   * @param argQName qualified name of tag
   * @param argAttributes tag attributes
   * @param argText a StringBuffer that contains the text between the start
   * and end tags.
   * @param argContext the Map that represents parsing context.
   * This context must be understood by each handler
   * @param argNamespaceContext an interface that allows the handler to access
   * namespace data associated with the current parsing state
   * @throws org.xml.sax.SAXException
   */
  public void onStartTag(String argTagNamespace, String argLocalTagName,
    String argQName, Attributes argAttributes, StringBuffer argText,
    Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException;

  /**
   * Called by the <code>SAXAdapter</code> when an end tag, for which this
   * callback interface has been registered, has been encountered.
   * @param argTagNamespace namespace URI of tag
   * @param argLocalTagName unqualified tag name
   * @param argQName qualified tag name
   * @param argContext the Map that represents parsing context.  This context
   * must be understood by each handler
   * @param argNamespaceContext an interface that allows the handler to access
   * namespace data associated with the current parsing state
   * @throws org.xml.sax.SAXException
   */
  public void onEndTag(String argTagNamespace, String argLocalTagName,
    String argQName, Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException;
}
  

Figure 1. Key XML Utility Classes and Methods

The SAXAdapter stores parsing state in three different ways to simplify application development.

  1. A StringBuffer is passed to each tag handler that represents the accumulated calls to onCharacters(), saving you from matching text with tags
  2. A Map is provided with each callback that is shared by all handlers and is an easy way to share application state.
  3. A NamespaceContext interface object is passed into both callbacks, which provides context-specific namespace information about the parsed document.

With this simple infrastructure it is surprising how much work can be done with just a little application code.

The SAXAdapter creates an internal stack of SAXTagHandlers as it parses an XML document. Every time a handler is identified for a particular tag, that handler is added to the stack. The attributes collected in startElement() and the element content collected in characters() are presented to the registered handler's onStartTag() method. The onStartTag() method is called when either the end tag of the original element is encountered or when a nested element is discovered, whichever comes first. The onEndTag() method is called when the end tag is encountered. Handlers can be registered mid-parse and the adapter will scan the stack to replace any handlers previously associated with the specified tag with the new handler. If no handler is found for a given tag, nothing is added to the stack and that tag is ignored.

When a tag is encountered, the adapter searches for a handler using the following logic:

  1. Return a registered handler that matches namespace and name; else
  2. Return a registered default namespace handler that matches the namespace; else
  3. Return a handler registered by qualified-name that matches the qName; else
  4. Return the default handler, if one has been registered.

The adapter keeps track of namespace mappings using the SAX NamepaceSupport helper class and exposes this information in the NamespaceContext interface in the onStartTag() and onEndTag() callbacks.

If no parent XMLReader is supplied in the constructor for a SAXAdapter, then the adapter uses the SAX2 bootstrap mechanism in XMLReaderFactory to create a parent parser. If the system variable "net.sourceforge.saxadapter.xmlreaderimpl" is defined (e.g. by passing "-Dnet.sourceforge.saxadapter.xmlreaderimpl=<classname>" on the command line of the JRE) then that system property is interpreted as the fully-qualified classname of the desired XMLReader implementation. This feature is useful in situations where third party software sets a system default implementation for XMLReader that you want to bypass.

One limitation of the SAXAdapter is that it is not capable of correctly handling mixed element content. In Example 2 (txt), the string "some text..." in the property tag that precedes the name tag would get reported in the onStartTag() callback for the property tag, but the string "some more text..." after the value tag would not get reported. The SAXAdapter is not designed to process XML documents that have this type of mixed content. It is expecting XML documents that have elements with empty content, element content, or parsed character data (PCDATA) content.

Example 2. Mixed Element Content Not Supported by SAXAdapter

<?xml version="1.0" encoding="UTF-8"?>
<properties>
  <property>
    some text...
    <name>name</name>
    <value>George</value>
    some more text...
  </property>
</properties>
  

Simple Properties File Example

The best way to explain how to use the SAXAdapter is to give a simple example. Example 3 (txt) shows the DTD for an XML properties file that can store properties that are of type String or of one of the base types using the wrapper classes (i.e. Integer, Float, etc) and a sample file is shown in Example 4 (txt). The name tag provides the name of the property and the value tag is a String representation of the property. The javaType tag is the fully-qualified name of the class of the property (e.g. java.lang.String, java.lang.Integer). If the javaType tag is not present it is implied that the property is of type String.

Example 3. DTD for Simple XML Properties File

<!ELEMENT environment (properties)>

<!ELEMENT properties (property+)>

<!ELEMENT property (name,value,javaType?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT value (#PCDATA)>
<!ELEMENT javaType (#PCDATA)>
  

Example 4. Sample XML Properties File

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE environment SYSTEM "env.dtd">
<environment>
   <properties>
      <property>
         <name>message.SubscriptionImpl.mapLimit</name>
         <value>2</value>
         <javaType>java.lang.Long</javaType>
      </property>
      <property>
         <name>message.subscriptionManager.DeliveryHandler</name>
         <value>RemoveMessage</value>
      </property>
      <property>
         <name>windows.client.timeout</name>
         <value>20000</value>
         <javaType>java.lang.Integer</javaType>
      </property>
      <property>
         <name>windows.resourceport</name>
         <value>24001</value>
         <javaType>java.lang.Integer</javaType>
      </property>
   </properties>
</environment>
  

The EnvironmentLoader class parses a single input file, or a set of files, and creates an Environment implementation. It is clear that when the name, value, and javaType tags are encountered for each property that we need to store the element content for use when building the property. The ValuesTagHandler in Listing 4 (txt) handles this task by storing the String representing element content in the Map, keyed by element name, for later use. The ValuesTagHandler is the most common handler that I use for SAX applications. For most tags you just want to store the content for later processing. This handler is registered for each of the three tags: name, value, and javaType.

Listing 4. ValuesTagHandler Implementation

/**
 * This handler simply stores the text between tags in the Map under
 * the tag name
 */

class ValuesTagHandler implements SAXTagHandler
{
  public void onStartTag(String argTagNamespace, String argLocalTagName,
    String argQName, Attributes argAttributes, StringBuffer argText,
    Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    argContext.put(argLocalTagName, argText.toString());
  }

  public void onEndTag(String argTagNamespace, String argLocalTagName,
    String argQName, Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    // ignore
  }
}
  

The EnvPropertyTagHandler handler in Listing 5 (txt) takes the values that have been stored for each property tag and actually constructs a property object from them. Most of the action occurs in the onEndTag() method in this type of handler because when we reach the end tag we know that the state of each of the child tags has been stored in the map. This is the second most common type of handler that I use in my development. The EnvPropertyTagHandler calls a utility method on the EnvironmentImpl to construct the property. The begin tag is used to clear state for the next property instance.

Listing 5. EnvPropertyTagHandler Implementation

/**
 * This handler collects the stored values for name, value, and type and
 * calls the setProperty() method on EnvironmentImpl
 */
class EnvPropertyTagHandler implements SAXTagHandler
{
  EnvironmentImpl m_env;

  EnvPropertyTagHandler(EnvironmentImpl argEnv)
  {
    m_env = argEnv;
  }

  public void onStartTag(String argTagNamespace, String argLocalTagName,
    String argQName, Attributes argAttributes, StringBuffer argText,
    Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    argContext.put(EnvironmentLoader.NAME_TAG, "");
    argContext.put(EnvironmentLoader.PROPERTY_VALUE_TAG, null);
    argContext.put(EnvironmentLoader.PROPERTY_JAVATYPE_TAG, "java.lang.String");
  }

  public void onEndTag(String argTagNamespace, String argLocalTagName,
    String argQName, Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    try
    {
      m_env.setProperty((String) argContext.get(EnvironmentLoader.NAME_TAG),
                  (String) argContext.get(EnvironmentLoader.PROPERTY_VALUE_TAG),
                  (String) argContext.get(EnvironmentLoader.PROPERTY_JAVATYPE_TAG));
    }
    catch(Exception e)
    {
      throw new SAXException(e);
    }
  }
}
  

Most of the work in using the SAXAdapter utility is in writing the SAXTagHandler implementations. Invoking the adapter is meant to be fairly simple. In the load() method of Listing 6 (txt), a SAXAdapter is instantiated, possibly with a parent XMLReader. Both of the SAXTagHandler implementations are instantiated and registered with the adapter and then the parse() method is called on the adapter with the InputStream wrapped in a SAX InputSource object. Repeated calls can be made to parse() with the same set of registered handlers.

Listing 6. EnvironmentLoader's load() Method

  /**
   * This method loads the environment based on the configuration
   * data located in the resource at the provided URLs.
   *
   * @param argResourceURL - delimited list of URLs of the
   * configuration resources(: or ; delimted)
   * @param argReader parent XMLReader in pipeline
   * @param argEnv Environment impl to append properties
   * @return com.teti.telematics.core.resource.Environment
   */
  public static Environment load(String argResourceURL,
    XMLReader argReader, EnvironmentImpl argEnv)
    throws Exception
  {
    SAXTagHandler valHandler = new ValuesTagHandler();
    EnvPropertyTagHandler envHandler =
      new EnvPropertyTagHandler(argEnv);

    adapter.registerHandler(PROPERTY_TAG,
      envHandler);
    adapter.registerHandler(NAME_TAG,
      valHandler);
    adapter.registerHandler(PROPERTY_VALUE_TAG,
      valHandler);
    adapter.registerHandler(PROPERTY_JAVATYPE_TAG,
      valHandler);

    while(tok.hasMoreTokens())
    {
      String url = tok.nextToken();
      System.out.println("Loading environment from: " + url);

      InputStream input =
        EnvironmentLoader.class.getResourceAsStream(url);
      adapter.parse(new InputSource(input));
    }
    return argEnv;
  }
  

Note that there are only about ten lines of actual parsing code needed to process the properties file. Most of the tedious parsing code is hidden in the SAXAdapter. The real usefulness of the utility is not apparent, however, until a more complex real-world example is presented.

Real-World Example

A more realistic example of using the SAXAdapter is processing electronic commerce purchase orders in CommerceOne's XML Common Business Library (xCBL) dialect. Briefly, xCBL is an electronic commerce XML dialect that is used for documents such as purchase orders, availability requests, price checks, etc. CommerceOne has made xCBL available free of charge to the public and it is maintained online at http://www.xcbl.org/. I will use xCBL version 3.5 in this example to show how the SAXAdapter utility is useful in a complex, real-world situation.

The reason I chose xCBL is that it involves the processing of fairly complicated and extremely verbose XML documents. The xCBL dialect defines hundreds, if not thousands, of tags in an effort to create a universal language for e-commerce. This ambitious requirement is what makes xCBL challenging to use and process. I have experience working with xCBL using the DOM interface and it is very tedious. It is difficult to use DOM or other tree-based APIs with xCBL because of the large number of tags and the fact that the response documents use different tag names than the request documents. The SAXAdapter utility and SAX greatly simplify this kind of task.

Figure 2 shows a set of java beans style classes that represent an order, its associated line items, contacts, and references in a hypothetical business, Dunn Manufacturing. The OrderProcessor class takes an xCBL 3.5 Order document as input and constructs the Dunn Manufacturing business objects. The approach is similar to that of the previous example in that a set of handlers is created that process key tags of interest. Additionally, a default handler is registered that gets called when no other handler is registered for a given tag. The DefaultPropHandler, shown in Listing 7 (txt), functions like the ValuesTagHandler in the previous example. Instead of accessing each of the hundreds of elements by name, this handler processes them all with only one handler registration.

Figure 2. Dunn Manufacturing Order Classes

Listing 7. DefaultPropHandler Implementation

/**
 * This handler saves the text between tags under the tag name in the map
 * to be used by the other handlers
 */
class DefaultPropHandler implements SAXTagHandler
{
  public void onStartTag(String argTagNamespace, String argLocalTagName,
    String argQName, Attributes argAttributes, StringBuffer argText,
    Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    argContext.put(0 == argTagNamespace.length() ? argQName : argLocalTagName,
      argText.toString());
  }


  public void onEndTag(String argTagNamespace, String argLocalTagName,
    String argQName, Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
  }
}
  

The other handlers function much like EnvPropertyTagHandler in the first example. The begin tag is used to clear state associated with previous elements, and the end tag is used to construct an object and store it for later use. Ultimately, the OrderHandler, which is registered with the Order tag, finishes construction of the graph of order-related objects. A typical example of one of these handlers is the OrderReferenceHandler shown in Listing 8 (txt). The OrderReferenceHandler constructs an OrderReference from the order type, number, and description tags processed by DefaultPropHandler and adds the resulting reference to a running list of references for this order. The OrderHandler then uses this list to construct an array of OrderReference objects that it stores internally.

Listing 8. OrderReferenceHandler Implementation

/**
 * This handler constructs a List of references from the reference elements
 */
class OrderReferenceHandler implements SAXTagHandler
{
  public void onStartTag(String argTagNamespace, String argLocalTagName,
    String argQName, Attributes argAttributes, StringBuffer argText,
    Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    // clear any previous state
    argContext.put(OrderConstants.ORDER_REFERENCE_TYPE_TAG, null);
    argContext.put(OrderConstants.ORDER_REFERENCE_NUM_TAG, null);
    argContext.put(OrderConstants.ORDER_REFERENCE_DESC_TAG, null);
  }


  public void onEndTag(String argTagNamespace, String argLocalTagName,
    String argQName, Map argContext, NamespaceContext argNamespaceContext)
    throws SAXException
  {
    String type = (String)
      argContext.get(OrderConstants.ORDER_REFERENCE_TYPE_TAG);
    String ref = (String)
      argContext.get(OrderConstants.ORDER_REFERENCE_NUM_TAG);
    String desc = (String)
      argContext.get(OrderConstants.ORDER_REFERENCE_DESC_TAG);
    OrderReference newRef = new OrderReference(type, ref, desc);
    List list = (List) argContext.get(OrderConstants.ORDER_REFERENCE_TAG);
    if (null == list)
    {
      list = new LinkedList();
      argContext.put(OrderConstants.ORDER_REFERENCE_TAG, list);
    }
    list.add(newRef);
  }
}
  

Key excerpts from a sample xCBL order are shown in Example 5 (txt). The entire order is quite verbose and is contained in the examples directory of the saxadapter project. This sample order comes from the XML schemas xCBL distribution at http://www.xcbl.org/. Example 6 (txt) shows the result of calling toString() on the Order object after processing the sample order with the OrderProcessor.

Example 5. Sample xCBL 3.5 Order Document

<?xml version="1.0"?>
<!-- © Copyright Commerce One, Inc. 2000
All Rights Reserved -->
<!--The Buyer is ABC Enterprises, they are purchasing parts
from Dunn Manufacturing-->
<!--This is the first Order from ABC to Dunn-->
<!--There are two line items that are being ordered-->
<Order xmlns="publicid:org.xCBL:schemas/XCBL35/Order.xsd"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xsi:schemaLocation=
       "publicid:org.xCBL:schemas/XCBL35/Order.xsd Order.xsd">
  <OrderHeader>
    <OrderNumber>
      <BuyerOrderNumber>4500005693</BuyerOrderNumber>
    </OrderNumber>
    <OrderIssueDate>20010203T12:00:00</OrderIssueDate>
    <OrderParty>
      <BuyerParty>
        <Name1>ABC Enterprises</Name1>
        <POBox POBoxPostalCode="249"/>
        <PostalCode>20012</PostalCode>
        <City>Alpine</City>
        <RegionCoded>USNY</RegionCoded>
        <ContactName>Dietl,B.</ContactName>
        <ContactFunctionCoded>PO contact</ContactFunctionCoded>
        <ContactNumber>
          <ContactNumberValue>655-456-8911</ContactNumberValue>
          <ContactNumberTypeCoded>
            TelephoneNumber
          </ContactNumberTypeCoded>
        </ContactNumber>
        <ContactNumber>
          <ContactNumberValue>bdietl@ABC.com</ContactNumberValue>
          <ContactNumberTypeCoded>
            EmailAddress
          </ContactNumberTypeCoded>
        </ContactNumber>
      </BuyerParty>
      <SellerParty>
        <Name1>Dunn Manufacturing</Name1>
        <Department>Order Department</Department>
        <PostalCode>95006</PostalCode>
        <City>Orange</City>
        <RegionCoded>USCA</RegionCoded>
        <ContactName>Ms Black</ContactName>
        <ContactFunctionCoded>
          AccountsReceivable
        </ContactFunctionCoded>
        <ContactNumber>
          <ContactNumberValue>212-345-4784</ContactNumberValue>
          <ContactNumberTypeCoded>
            TelephoneNumber
          </ContactNumberTypeCoded>
        </ContactNumber>
        <ContactNumber>
          <ContactNumberValue>Hblack@DunnMfg.com</ContactNumberValue>
          <ContactNumberTypeCoded>
            EmailAddress
          </ContactNumberTypeCoded>
        </ContactNumber>
      </SellerParty>
    </OrderParty>
   </OrderHeader>
   <OrderDetail>
     <ItemDetail>
       <BuyerLineItemNum>00010</BuyerLineItemNum>
       <SellerPartNumber>
         <PartID>R-5000</PartID>
       </SellerPartNumber>
       <QuantityValue>111</QuantityValue>
       <Price>
         <UnitPriceValue>10.00</UnitPriceValue>
         <CurrencyCoded>USD</CurrencyCoded>
       </Price>
     </ItemDetail>
     <ItemDetail>
       <BuyerLineItemNum>00011</BuyerLineItemNum>
       <SellerPartNumber>
         <PartID>R-3456</PartID>
       </SellerPartNumber>
       <QuantityValue>1</QuantityValue>
       <Price>
         <UnitPriceValue>1000.00</UnitPriceValue>
         <CurrencyCoded>USD</CurrencyCoded>
       </Price>
     </ItemDetail>
   </OrderDetail>
   <OrderSummary>
     <TotalAmount>
       <MonetaryValue>
         <MonetaryAmount>2110.00</MonetaryAmount>
           <Currency>
             <CurrencyCoded>USD</CurrencyCoded>
           </Currency>
       </MonetaryValue>
     </TotalAmount>
   </OrderSummary>
</Order>

  

Example 6. Result of Calling toString() on Order in Example 5

Order:
 date: Sat Feb 03 12:00:00 EST 2001
Item:  buyerLineNum: 10 sellerLineNum: -1 buyerPartNum: R-5000
  sellerPartNum: R-5000 quantity: 111 UOM: EA price: 1000
Item:  buyerLineNum: 11 sellerLineNum: -1 buyerPartNum: R-3456
  sellerPartNum: R-3456 quantity: 1 UOM: EA price: 100000
Party:  type: BuyerParty Address:
  name: ABC Enterprises dept: null city: Alpine zip: 20012
  contacts:
  Contact:  name: Dietl,B. type: PO Contact
    email: bdietl@ABC.com phone: 655-456-8911

Party:  type: SellerParty Address:
  name: Dunn Manufacturing dept: Order Department city: Orange
    zip: 95006
  contacts:
  Contact:  name: Ms Black type: AccountsReceivableContact
    email: Hblack@DunnMfg.com phone: 212-345-4784
  Contact:  name: George Walsh type: DeliveryContact
    email: gwalsh@DunnMfg.com phone: 212-345-4701

Party:  type: ShipToParty Address:
  name: ABC Enterprises dept: null city: New York zip: 20001
  contacts:
  Contact:  name: Ms. Audra Murphy type: null
    email: Amurphy@ABC.com phone: 655-456-8901

reference:  ref: REF-002-99-0-3000 type: RequestNumber
  desc: Buyers OrderRequest Number
reference:  ref: REF002-44556677 type: RequestNumber
  desc: Sellers OrderRequest Number

  

SAX Pipelines

The SAXAdapter is designed as a SAX pipeline component and can be used anywhere in a chain of XMLReader implementations. A SAX pipeline is a series of objects that implement XMLReader, and typically XMLFilter, a subclass of XMLReader, that are arranged in a chain so that a given pipeline component receives SAX events from the previous pipeline component and forwards those events, after processing them, to the subsequent component in the pipeline. SAX pipelines are a powerful idiom because they allow code reuse and support efficient, single-pass parsing algorithms. Figure 3 illustrates the concept of a SAX pipeline. See the GNU JAXP project for another approach to creating SAX pipelines.

Figure 3. Illustration of a SAX Pipeline

Both of the previous examples can be expanded to illustrate the power of using SAXAdapter as a SAX pipeline component. Figure 4 is a class diagram that shows selected SAXAdapter utility classes and their relationship to the SAX2 XMLFilterImpl and extension handlers. The PipelineBase class contains all of the logic necessary to form a SAX pipeline component that forwards all events, including those of the two extension handlers, to the next pipeline component. You would subclass PipelineBase to provide the desired business logic as in the XMLWriter class. Since the ContentHandler methods are generally customized more than the others, PipelineBase provides a simple way to add functionality to these. For each ContentHandler method there is a corresponding protected method with the prefix "on" that is invoked prior to executing the pipeline logic (e.g. onStartElement() is called at the beginning of startElement()). Overriding these methods is safe since they will not affect the passing of events to the next pipeline component.

Figure 4. Relationship of SAXAdapter utility to XMLFilterImpl and Extension Handlers

Enhanced Environment Properties Example

I expanded the functionality of the properties example by providing the capability to dynamically define certain properties in terms of others following the property model used by the Ant build tool. A property with a value defined as "http//${this.hostname}/" would be rendered as "http://localhost/" assuming that the property this.hostname was set to "localhost". I implemented the class AntPropertyResolver as a pipeline component that overrides the SAX characters() callback and modifies the character array passed to subsequent pipeline components such that they receive element content that appears to be the transformed property. The over-ridden characters() method parses the character array provided by the characters() callback and looks for the "${}" construction. If found, a StringBuffer is used to construct the transformed content and this transformed content is used in the subsequent characters() callback. If not found, the event is passed unchanged. The AntPropertyResolver is used with EnvironmentLoader in this example, but could clearly be used whenever Ant-like property transformation is needed in XML parsing. Note that this does not follow the PipelineBase convention of over-riding the protected methods. In this case I want to completely over-ride characters() since I am changing the contents of the original character array. Example 7 (txt) shows a sample XML properties file and how the enhanced EnvironmentLoader would transform it.

Example 7. A Sample Properties File and the Transformed Properties

<?xml version="1.0" encoding="UTF-8"?>
<environment>
   <properties>
      <property>
         <name>dbuser</name>
         <value>username</value>
         <javaType>java.lang.String</javaType>
      </property>
      <property>
         <name>dbpassword</name>
         <value>${dbuser}_password</value>
         <javaType>java.lang.String</javaType>
      </property>
      <property>
         <name>dburl</name>
         <value>jdbc:oracle:thin:@${this.hostname}:1521:sid</value>
         <javaType>java.lang.String</javaType>
      </property>
   </properties>
</environment>

Key: this.hostnameValue: mpriest2k288l
Key: dburl		Value: jdbc:oracle:thin:@mpriest2k288l:1521:sid
Key: dbuser		Value: username
Key: dbpassword		Value: username_password

  

Enhanced xCBL Parsing Example

The xCBL parsing example can also be enhanced using the SAX pipeline approach. The xCBL specification requires that an xCBL Order document be followed by an OrderResponse document from the seller. The OrderResponse document is similar to the Order document, but differs in that many of the tags are prefixed with "OrderResponse" instead of "Order" and there is some additional information from the seller such as whether the order was successfully processed and whether there were errors. This type of processing is tedious and inefficient when using a tree-based model because the tree has to be written back out to XML in a second pass and the change in tag names makes it difficult, at least using DOM, to do this easily.

The XCBLResponseWriter is a SAXTagHandler implementation that creates an OrderResponse for a given xCBL Order document. When I chain the SAXAdapter from the original xCBL example to another SAXAdapter that invokes the XCBLResponseWriter, the order objects are populated at the same time that the response is being generated. Everything occurs in a single pass. Listing 9 (txt) shows how the writer is chained to the order processor.

Listing 9. Chaining the XCBL Writer to the Order Processor

  /**
   * This method takes an InputSource that represents an order in
   * XCBL 3.5 format and converts it to an Order object and also
   * creates an XCBL 3.5 OrderResponse document to be returned
   * to the buyer
   *
   * @param argIn the order in XCBL 3.5 format
   * @param argWriter the writer to which to write the OrderResponse
   * document
   * @exception SAXException, IOException
   */
  public Order processXCBL35Order(InputSource argIn, Writer argWriter)
    throws SAXException, IOException
  {
    m_parserManager = new SAXAdapter();
    m_writerManager = new SAXAdapter(m_parserManager);
    m_writerHandler = new XCBLResponseWriter(m_writerManager);
    m_writerManager.registerDefaultHandler(m_writerHandler);
    m_writerHandler.setWriter(argWriter);

    // always call parse() on the last pipeline element
    Map map = new HashMap();
    m_parserManager.setMap(map);
    m_writerManager.parse(argIn);
    Order order = (Order) map.get(OrderConstants.ORDER_TAG);
    map.clear();

    return order;
  }
  

The XCBLResponseWriter registers some helpers with the SAXAdapter when it is created in Listing 10 (txt). These helpers handle the details of writing out the XML. All of the actual XML serialization is done using the methods of the XMLCallbackWriter, which is invoked directly rather than through the adapter. The XCBLResponseWriter handles simple name changes by keeping a Map with the names of transformed elements. Many elements are ignored because they are not used in the Dunn Manufacturing business model. These are handled by keeping a set of ignored elements that are simply skipped. Example 8 (txt) shows an excerpt of the response document that was generated by XCBLResponseWriter.

Listing 10. Registering Helper Classes With SAXAdapter

  public XCBLResponseWriter(SAXAdapter argAdapter)
  {
    m_adapter = argAdapter;

    // define the tags that can be handled by a simple name change
    m_map.setProperty(OrderConstants.ORDER_TAG,
      OrderConstants.ORDER_RESPONSE_TAG);
    m_map.setProperty(OrderConstants.ORDER_BUYER_NUMBER_TAG,
      OrderConstants.ORDER_RESPONSE_BUYER_NUMBER_TAG);
    m_map.setProperty(OrderConstants.ORDER_NUMBER_TAG,
      OrderConstants.ORDER_RESPONSE_NUMBER_TAG);
    m_map.setProperty(OrderConstants.ORDER_ISSUE_DATE_TAG,
      OrderConstants.ORDER_RESPONSE_ISSUE_DATE_TAG);
    m_map.setProperty(OrderConstants.ORDER_DETAIL_TAG,
      OrderConstants.ORDER_RESPONSE_DETAIL_TAG);
    m_map.setProperty(OrderConstants.ORDER_LIST_DETAIL_TAG,
      OrderConstants.ORDER_RESPONSE_LIST_DETAIL_TAG);
    m_map.setProperty(OrderConstants.ORDER_ITEM_TAG,
      OrderConstants.ORDER_RESPONSE_ITEM_TAG);

    // ignore these tags in the response
    m_ignoreSet.add(OrderConstants.ORDER_REFERENCES_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_PURPOSE_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_CURRENCY_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_LANGUAGE_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_DATES_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_SHIPTO_PARTY_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_TERMS_OF_DELIVERY_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_HEADER_NOTE_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_LIST_OF_STRUCTURED_NOTE_TAG);
    m_ignoreSet.add(OrderConstants.ORDER_REQUEST_RESPONSE_CODED_TAG);

    // set these on the adapter as requiring special handling
    m_adapter.registerHandler(OrderConstants.ORDER_HEADER_TAG,
      new OrderHeaderTagHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_ISSUE_DATE_TAG,
      new OrderDateHandler());
    m_adapter.registerHandler(
      OrderConstants.ORDER_RESPONSE_REQUESTED_TAG,
      new OrderResponseHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_BUYER_PARTY_TAG,
      new BuyerPartyTagHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_SELLER_PARTY_TAG,
      new SellerPartyTagHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_ITEM_TAG,
      new ItemDetailTagHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_SUMMARY_TAG,
      new OrderSummaryTagHandler());
    m_adapter.registerHandler(OrderConstants.ORDER_PARTY_TAG,
      new IgnoreTagHandler());
  }
  

Example 8. Excerpt of Response Document Generated by Response Writer

© Copyright Commerce One, Inc. 2000
All Rights Reserved

<OrderResponse xmlns="publicid:org.xCBL:schemas/XCBL35/Order.xsd"
               xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
               xsi:schemaLocation=
               "publicid:org.xCBL:schemas/XCBL35/Order.xsd Order.xsd">
  <OrderResponseHeader>
    <BuyerOrderResponseNumber>4500005693</BuyerOrderResponseNumber>
    <OrderResponseIssueDate>20010203T12:00:00</OrderResponseIssueDate>
    <OrderReference>
      <Reference>
        <RefNum>REF0</RefNum>
        <RefDate>20020729T10:32:16</RefDate>
      </Reference>
    </OrderReference>
    <Purpose>
      <PurposeCoded>Confirmation</PurposeCoded>
    </Purpose>
    <ResponseType>
      <ResponseTypeCoded>Accepted</ResponseTypeCoded>
    </ResponseType>
   </OrderResponseHeader>
   <OrderResponseDetail>
     <OrderResponseItemDetail>
       <ItemDetailResponseCoded>
         ApprovedAsSubmitted
       </ItemDetailResponseCoded>
       <OriginalItemDetail>
         Entire contents of original order line item...
       </OriginalItemDetail>
     </OrderResponseItemDetail>
     <OrderResponseItemDetail>
       <ItemDetailResponseCoded>
         ApprovedAsSubmitted
       </ItemDetailResponseCoded>
       <OriginalItemDetail>
         Entire contents of original order line item...
       </OriginalItemDetail>
     </OrderResponseItemDetail>
   </OrderResponseDetail>
   <OrderResponseSummary>
     <OriginalOrderSummary>
       <TotalAmount>
         <MonetaryAmount>2110.00</MonetaryAmount>
         <CurrencyCoded>USD</CurrencyCoded>
       </TotalAmount>
     </OriginalOrderSummary>
   </OrderResponseSummary>
</OrderResponse>
  

Conclusion

The SAX2 API is a very powerful, yet underutilized API for processing XML documents. SAX should seriously be considered in cases where memory is at a premium, where documents are fairly complex, or where using a SAX pipeline is useful. By using a utility such as the SAXAdapter outlined in this document the SAX API can be made more accessible to developers while retaining its power and flexibility.