Thursday, 17 September 2009 17:54

XML for Small Devices

Rate this item
(0 votes)

15.1 What Is XML?

XML is the acronym of the eXtensible Markup Language. Like any other markup language, it uses nested text tags to enclose content and represent data structure. XML itself does not define a specific set of tags and structures to use. You can define any tags to use in your XML document as long as your applications and other communication parties understand them. Hence, XML is extensible. For example, you can use the following XML document to describe a small computer parts inventory.

Listing 15.1. A sample XML document
    Athlon 1.5GHz    AMD    100.0    10000  

    Inkjet color printer    HP    120.0    1000  

Markup data languages have been around since the 1970s. In fact, the HTML itself is a markup language. What makes XML so special? Well, there are several reasons.

  • XML has rich expression power. Nested tag elements allow us to easily express hierarchical data structures. More importantly, together with technologies such as XML Schema, XML supports strong data typing. Strong typed data are fundamental to object-oriented systems (i.e., Java applications). Please see Chapter 16 for more on XML Schema and XML data types.

  • XML is both machine and human friendly. Unlike HTML, XML has strict syntax requirements for easy and fast parsing. For example, every XML start tag must have a matching end tag, and they must be properly nested. Since an XML document contains descriptive tag information, it is easy for humans to read and understand.

  • XML promotes open standard. XML's extensibility and flexibility allows it to be adopted across many industries. To become a ubiquitous data exchange format, XML must be standardized. XML namespaces and schema are already standardized by W3C. Many industry-specific and application-specific XML formats are also being standardized. Today, XML is the most interoperable data format across many platforms.

15.2 Challenges for Small Devices

XML and Java champion similar ideas such as open interface, platform independence, and object-oriented data. XML support on the Java platform is traditionally very strong. All popular XML standards are supported on the J2SE and J2EE platforms. However, XML support on the J2ME platform, especially CLDC/MIDP platform, is still in development. XML-compatible mobile clients are slow to emerge because of the additional processing time and bandwidth required by XML tags. But eventually, XML will become the ubiquitous data format for mobile applications. Throughout this and the following several chapters, we will see several real-world examples that showcase capabilities of mobile XML clients.

Since CDC and PersonalJava have full support of core Java libraries such as IO and String, we can port J2SE and J2EE XML libraries to those platforms or even run compiled J2SE XML libraries directly. Having said that, specially optimized, fast, and lightweight XML parsers are still preferred on the mobile platform.

On the CLDC platform, we need specially written lightweight XML parsers. There are several different CLDC-compatible XML parsers available from various commercial and open source projects. In this chapter, we focus on the open source kXML package. For more parsers, please refer to the "Resources" section. CLDC parsers provide only the most basic functionalities, and none of them validates XML messages against document type definitions (DTDs) or schemas.

15.3 XML Parsing Models

The XML parser converts text-based XML documents to memory objects accessible to computer programs. There are several ways to parse an XML document.

15.3.1 SAX

SAX (Simple API for XML) is an event-based parsing model. The parser goes through the entire document in a linear pass. When the parser encounters an XML entity—such as a tag, a piece of enclosed text, or an attribute—it emits an event. The events are captured and processed by an event-handler method. The application developer implements the event handler to do application-specific tasks with those events. The kXML v1.2 supports SAX. Figure 15.1 illustrates the SAX parsing process.
 

Figure 15.1. SAX parsing process.
 

 

A simple SAX application looks like the following (Listing 15.2).

Listing 15.2. A simple SAX program
// Create a SAX parser.

// This is implementation specific.

Parser p = new SAXParser ();



// "handler" is the callback object that processes SAX events.

p.setDocumentHandler( handler );

p.parse();

Although SAX is simple to implement and very popular, it is outdated by the newer pull-based parsing APIs. For more examples using SAX, please refer to Section 17.2.

15.3.2 XMLPull

One big problem the SAX model has is that it is push based: Once the parsing is started, parsing events are pushed in continuously. The parser runs through the entire XML document in one pass. Developers have no control over the flow of the parsing process. For example, let's suppose that you are looking for a specific piece of information located in the middle of an XML document. Under the SAX model, you cannot stop parsing after you retrieve the data. The parse keeps going until it finishes the entire document. This is ineffective, especially for mobile clients.

XmlPull API gives developers more control over the parsing flow. Since the parser is pull based, the application can pause the parsing to take care of other things and come back later, or it can even stop the parsing before the end of the document is researched. The kXML v2.0 supports the XmlPull v1.0 API. Figure 15.2 illustrates the XMLPull parsing process.

XML Pull Parsing Process
 

The heart of the XmlPull API is the XmlPullParser interface. XmlPull providers supply their own XmlPullParser implementation through the XmlPullParserFactory factory class. XmlPullParser defines a number of event types (e.g., the START_TAG event) and data access methods (e.g., the getAttributeValue() method). Core methods to control the parsing flow are next() and nextToken().

  • The next() method advances the parser to the next event. Event types seen by the next() method are START_TAG, TEXT, END_TAG, and END_DOCUMENT.

  • The nextToken() method gives developers a finer control. It sees all the events the next() method sees. In addition, it sees and reports the following events: COMMENT, CDSECT, DOCDECL, ENTITY_REF, PROCESSING_INSTRUCTION, and IGNORABLE_WHITESPACE.

The use of an XmlPull parser is illustrated in Section 15.5.

15.3.3 Document Model

Both SAX and XmlPull treats the hierarchical XML data structure as a linear flow. To reconstruct data into a logical tree structure requires the developer to control the event handler carefully with flags and case statements. This kind of code is hard to write and hard to debug. Also, SAX and XmlPull support only serial access. We do not have random access to any node in the document. The document model parsers come to the rescue.

A document model parser is essentially a SAX or XmlPull parser with a predefined event handler that stores XML information into an in-memory tree. The application can then access and manipulate any data in the tree model using a set of API methods. Figure 15.3 illustrates the DOM (Document Object Model) parsing process.

DOM Parsing
 

The building block of any document model is the Node object. The Node class defines methods that allows multiple Node objects to be linked into a tree structure. The XML document is represented by such a tree. Objects representing other XML entities, such as elements, attributes, and text, inherit from Node.

A standard XML document model API is DOM, which is defined by the W3C. However, the standard DOM is quite complex. There are several easy-to-use DOM-like APIs, such as the JDOM API, in the Java world. For mobile devices, full support for DOM has proven too expensive. A particularly interesting lightweight XML object model is kDOM, supported by both kXML v1.2 and v2.0. Code examples for kDOM are given in Section 15.6.

15.4 Introducing Amazon XML Services

The Amazon Web Services interface (v2.0) is designed to provide Amazon associates programmatic access to Amazon catalog data and search functionalities. Amazon Web Services is available in two flavors.


 

  • A standard SOAP Web Service. The application architecture here is SOAP RPC. Query and response data are all enclosed in SOAP messages.

  • A literal XML service. In this model, the query is encoded as parameters in the request URL. The Amazon server returns an XML document containing the response data. The tags and structure of the returned document are defined in DTD files provided by Amazon.

The Amazon service can operate in lite or heavy mode. A lite mode response is smaller but contains less information. Due to the bandwidth and processing power limits of mobile clients, we use the lite mode in this chapter. To use the Amazon service, you have to first obtain an authentication token from the Amazon Web site. Then, you encode query parameters into a URL string. For example, the following URL (without line breaks)

http://xml.amazon.com/onca/xml?v=1.0

&t=webservices-20&dev-t=ABCD123456

&KeywordSearch=mobile%20java

&mode=books&type=lite&page=1&f=xml

specifies that a user identified by token ABCD123456 asks to search keywords mobile java in books store. Note that the space between keywords is encoded using %20 per URL encoding rules. The first page result is returned as a lite version, literal XML document. Listing 15.3 demonstrates the returned XML document. I have added some white spaces and replaced long strings with ellipses (...) to make it more readable.

Listing 15.3. A sample Amazon search response message
      
0471034657 Mobile Information Device Profile for Java 2 Micro Edition (J2ME): Professional Developer's Guide Book C. Enrique Ortiz Eric Giguere 15 January, 2001 John Wiley & Sons http://images.amazon.com/...THUMBZZZ.jpg http://images.amazon.com/...MZZZZZZZ.jpg http://images.amazon.com/...LZZZZZZZ.jpg \$49.99 \$49.99 \$28.99
... ...
... ...

Our sample MIDlet AmazonLite demonstrates how to parse the Amazon lite document. AmazonLite first prompts the user for the keywords to search. Then, the user clicks which XML parsing mode she would like to test. Button Pull is for XmlPull, and button kDOM is for a document model mode. MIDlet then sends out the query, receives data, parses the response document, and then displays extracted data in a new form. Figure 15.4 shows the program in action.

Mobile Access Amazon XML

.
 

XML parsing in AmazonLite is done by the kXML parser. kXML is an open source XML parser that is compatible with all J2ME platforms. It is developed under the Endrya ME project and released under the CPL license. kXML v2.0 supports the XmlPull interfaces as well as the kDOM document model.

15.5 Amazon Services via XmlPull

Using XmlPull, we parse the document linearly. The application has to remember the state information to retrieve XML contents based on the context. The code is listed in Listings 15.4 and 15.5. The logic flow is the following:

  1. When method getBooksViaPull() encounters a Details start tag, it passes the parser control to method getBookDetailsViaPull().

  2. Method getBookDetailsViaPull() instantiates a new BookDetails object and stores the value of the url attribute in it.

  3. When method getBookDetailsViaPull() encounters ProducteName, Authors, OurPrice, and UsedPrice tags, it stores their text values into appropriate fields in the BookDetails object.

  4. At the Details close tag, method getBookDetailsViaPull() returns the populated BookDetails object. Method getBooksViaPull() stores the BookDetails object into a Vector Books and moves forward to the next Details start tag.

After the parsing is done, all useful data is extracted and stored in the Books Vector.

Listing 15.4. The AmazonLite.getBooksViaPull() method
Vector getBooksViaPull (InputStream is) throws Exception {

  Vector books = new Vector ();

  InputStreamReader reader = new InputStreamReader(is);

  KXmlParser parser = new KXmlParser();

  parser.setInput(reader);

  int eventType = parser.getEventType();

  while (eventType != parser.END_DOCUMENT) {

    // Only respond to the 
start tag

if (eventType == parser.START_TAG) {

if ( parser.getName().equals("Details") ) {

BookDetails bd = getBookDetailsViaPull(parser);

books.addElement( bd );

}

}

eventType = parser.next();

}

return books;

}
Listing 15.5. The AmazonLite.getBookDetailsViaPull() method
BookDetails getBookDetailsViaPull (XmlPullParser parser)

                                       throws Exception {

  BookDetails bd = new BookDetails ();

  // get attribute value from the 
// start tag bd.url = parser.getAttributeValue(null, "url"); int eventType = parser.next(); while ( true ) { // Break out the loop at
end tag if ( eventType == parser.END_TAG ) { if ( parser.getName().equals("Details") ) { break; } } if ( eventType == parser.START_TAG ) { String tagname = parser.getName(); if ( tagname.equals("ProductName") ) { // Proceed to the enclosed Text node parser.next(); bd.title = parser.getText().trim(); } if ( tagname.equals("Authors") ) { // First start tag

parser.next();

// White space between tags

parser.next();

// Proceed to the enclosed Text node

parser.next();

bd.firstAuthor = parser.getText().trim();

}

if ( tagname.equals("OurPrice") ) {

// Proceed to the enclosed Text node

parser.next();

bd.newPrice = parser.getText().trim();

}

if ( tagname.equals("UsedPrice") ) {

// Proceed to the enclosed Text node

parser.next();

bd.usedPrice = parser.getText().trim();

}

}

eventType = parser.next();

}

return bd;

}

15.6 Amazon Services via kDOM

Using the kDOM document model, parsing is very simple. Just a few lines of code builds the kDOM tree in a Document type object doc from the input XML stream.

InputStreamReader reader = new InputStreamReader(is);

KXmlParser parser = new KXmlParser();

parser.setInput(reader);

Document doc = new Document ();

doc.parse (parser);

The rest of the methods, getBooksViaDOM() (Listing 15.6) and getBookDetailsViaDOM() (Listing 15.7), in the AmazonLite class demonstrate how to traverse the tree to retrieve useful information. All ignorable white spaces are built into Text nodes by default. We have to be careful not to mistake them with real Element nodes. Since the tree object is already in memory, you can access any random node at anytime. You can even change the content of any node and have kDOM write out the new tree to an I/O stream.

Listing 15.6. The AmazonLite.getBooksViaDOM() method
Vector getBooksViaDOM (InputStream is) throws Exception {

  Vector books = new Vector ();



  InputStreamReader reader = new InputStreamReader(is);

  KXmlParser parser = new KXmlParser();

  parser.setInput(reader);

  Document doc = new Document ();

  doc.parse (parser);



  // Use the following code to write

  // in memory doc object to a stream

  // KXmlSerializer serializer = new KXmlSerializer ();

  // serializer.setOutput (System.out, null);

  // doc.write (serializer);

  // serializer.flush ();



  // The  element

Element prods = doc.getRootElement();



int numOfEntries = prods.getChildCount ();

for (int i = 0; i < numOfEntries; i++) {

if ( prods.isText(i) ) {

// Text here are all insignificant white spaces.

// We are only interested in children elements

} else {

// Not text, must be a
element

Element e = prods.getElement (i);

BookDetails bd = getBookDetailsViaDOM( e );

books.addElement( bd );

}

}

return books;

}

Listing 15.7. The AmazonLite.getBooDetailsViaDOM() method
BookDetails getBookDetailsViaDOM (Element e) throws Exception {

  BookDetails bd = new BookDetails ();

  // get attribute value from the 
start tag

bd.url = e.getAttributeValue(null, "url");

int numOfChildren = e.getChildCount ();

for (int i = 0; i < numOfChildren; i++) {

if ( e.isText(i) ) {

// Ignore

} else {

Element c = e.getElement(i);

String tagname = c.getName();

if ( tagname.equals("ProductName") ) {

// First child is a text node

bd.title = c.getText(0).trim();

}

if ( tagname.equals("Authors") ) {

// Goes down the tree: The second child

// is the first element. Get the

// first child of that element.

bd.firstAuthor =

c.getElement(1).getText(0).trim();

}

if ( tagname.equals("OurPrice") ) {

// First child is a text node

bd.newPrice = c.getText(0).trim();

}

if ( tagname.equals("UsedPrice") ) {

// First child is a text node

bd.usedPrice = c.getText(0).trim();

}

}

}

return bd;

}

15.7 A Mobile RSS Client

The last example of XML parser use in this chapter is a mobile Really Simple Syndication (RSS) client. RSS is a widely used XML format for news and blog sites to feed their headline contents to aggregators. For news sites, RSS makes it possible to advertise their headlines and provides links back to their sites; for aggregators, the use of RSS avoids HTML screen scraping and makes it possible to automatically aggregate a large number of sites.

15.7.1 A Simple RSS Example

RSS is designed to be simple. It is readable by humans and can be easily parsed by machines. Listing 15.8 shows a simple RSS feed. The channel element represents a content source, and it can contain multiple item elements. Each item element represents a headline or a story. Under the channel or the item element, the title, link, and description nodes contain the title, URL link, and synopsis of the content. There are many more optional elements defined in the RSS specification (see "Resources" for more information).

Listing 15.8. A real simple RSS feed
      XML-for-Small-Devices | Java-Mobile-Development | Atcomm-Underground - Atcomm Enterprises | Web Development | Application Development | Software Development | Java Development | .NET Development | BPO | IT Consulting    http://www.mysite.com

Latest Content en-us Copyright 2003 myself 24/02/03 12:34:56 XML-for-Small-Devices | Java-Mobile-Development | Atcomm-Underground - Atcomm Enterprises | Web Development | Application Development | Software Development | Java Development | .NET Development | BPO | IT Consulting http://www.mysite.com/pic.gif http://www.DevX.com





XML-for-Small-Devices | Java-Mobile-Development | Atcomm-Underground - Atcomm Enterprises | Web Development | Application Development | Software Development | Java Development | .NET Development | BPO | IT Consulting Product shipped http://www.mysite.com/shipped

This e-mail address is being protected from spambots. You need JavaScript enabled to view it 22/02/03 11:22:33
XML-for-Small-Devices | Java-Mobile-Development | Atcomm-Underground - Atcomm Enterprises | Web Development | Application Development | Software Development | Java Development | .NET Development | BPO | IT Consulting etc etc

etc

15.7.2 PeekAndPick

Jonathan Knudsen's PeekAndPick (v2.0) is an MIDP RSS client. It aggregates headlines from a number of news/blog sites specified by the user. The user can read headlines and have the link emailed to him via the mobile phone (see Figure 15.5).

Peek and pick in action
 
 

The complete code and design documentation of PeekAndPick is available (see "Resources"). In the source code package, classes under the rss package provide an RSS parser based on kXML's XmlPull parser and its interface to other programming components. Listing 15.9 shows the code snippet from the kXML12Parser.parse() method which parses the RSS stream. The flow is very simple. The program goes into a channel element, iterates through the item elements, and reads out the contents in the title, link, and description nodes. All other advanced RSS elements are ignored.

Listing 15.9. The kXML12Parser.parse() method
public void parse(InputStream in)

                 throws IOException {

  mCancel = false;



  Reader reader = new InputStreamReader(in);

  XmlParser parser = new XmlParser(reader);

  ParseEvent pe = null;



  parser.skip();

  pe = parser.read();

  String root = pe.getName();

  if (root.equals("rss")) {

    parser.skip();

    parser.read(Xml.START_TAG, null, "channel");

  }



  boolean trucking = true;

  boolean first = true;

  while (trucking && mCancel == false) {

    pe = parser.read();

    if (pe.getType() == Xml.START_TAG) {

      String name = pe.getName();

      if (name.equals("item")) {

        String title = null, link = null;

        String description = null;

        while ((pe.getType() != Xml.END_TAG) ||

         (pe.getName().equals(name) == false)) {

          pe = parser.read();

          if (pe.getType() == Xml.START_TAG &&

                pe.getName().equals("title")) {

            pe = parser.read();

            title = pe.getText();

          } else if (pe.getType() == Xml.START_TAG

              && pe.getName().equals("link")) {

            pe = parser.read();

            link = pe.getText();

          } else if (pe.getType() == Xml.START_TAG

          && pe.getName().equals("description")) {

            pe = parser.read();

            description = pe.getText();

          }

        }

        if (first) {

          if (mCancel == false) fireFirstItem();

          first = false;

        }

        if (mCancel == false)

          fireItemParsed(title, link, description);

      } else {

        while ((pe.getType() != Xml.END_TAG) ||

              (pe.getName().equals(name) == false))

          pe = parser.read();

      }

    }

    if (pe.getType() == Xml.END_TAG &&

    pe.getName().equals(root))

      trucking = false;

  }

  if (mCancel == false) fireFinished();

  mCancel = false;

}

The spirit of RSS is very similar to the concept of XML Web Services: offering services through a standard, interoperable interface. In the next chapter, we dive into SOAP XML Web Services.

Last modified on Wednesday, 23 September 2009 20:43
Vicky

Vicky

E-mail: This e-mail address is being protected from spambots. You need JavaScript enabled to view it