CS118 Programming Languages

Lecture 12


Contents

  1. XML in the large
  2. DOM
  3. SAX
  4. JavaScript
  5. Assignments

XML in the large

eXtensible Markup Language

                  CSS XLS XSLT XHTML
                          ^
                          |
                          | presentation
Xpointer                  |
Xpath   <--navigation--  XML ----api------> DOM (Java, JavaScript...)
Xlink                     |                 SAX
                          | definitions
                          |
                          v
                      DTD Schema
Not all of the above are freeware. Some are "real expensive".

Scott McNeley says "the net is the computer". Well these are its programming languages. And, as in the 1960's for conventional programming languages, chaos reigns and nice guys do not necessarily finish first. Alas, poor Algol-60, I knew thee well...

DOM

Document Object Model

Here is a bit of Java code that prints out the name of the root node of file example.xml:

import java.io.file;
import org.w3c.dom.Document;  // note URL style library names
import org.w3c.dom.Node;
import javax.xml.parsers.DocumentBuilderFactory;

public class Example {
  public static void
  main(String[] args) throws Exception {
    File xmlfile = new File("example.xml");
    Document root 
      = DocumentBuilderFactory
      . newInstance()
      . newDocumentBuilder()
      . parse(xmlfile);             // what if it is a BIG file?

    doc.normalize();                // collapse whitespace
    Node root = doc.getDocumentElement();
    System.out.println("node name = " + root.getNodeName());
  }
}

One can write similar code in JavaScript, Perl, VBasic, or any other language with access to an XML parser. The actual API is different for each programming language but all should implement the W3C DOM spec (provide the primitives). Anyone in this class could implement DOM -- it is just a parser and some readouts. But others have already done it, so why bother (unless money matters)?

The organizing feature is that the document (defined in XML and perhaps DTD) is represented as a walkable object. There are two ways to look at an XML element: by position and by name. The two ways are reflected into two object types. By position:

  NodeList nl = root.getChildNodes();
  Node     a  = nl.item(0); 
  Node     b  = nl.item(1);            // NULL if none
  Node     c  = root.getFirstChild();  // same as a
  Node     d  = b.getNextSibling();    // same as b
By name:
  NamedNodeList nnl;                   // used for attributes

There is a rich library of W3C standard functions to walk around the document object. E.g.

Attr CDATASection Comment 
Document DocumentFragment DocumentType
DOMImplementation 
Element Entity EntityReference 
Notation ProcessingInsruction Text
There are also many Microsoft extensions which may, or may not, find there way into the standard

The call to parse is the fundamental weakness of DOM. It is an all or nothing proposition, which is OK for small XML files, but infeasible for large XML files, especially those pulled over the net.

It is inefficient to represent large data bases as XML files, but it is possible to give query results as (smaller) XML files. This leaves data presentation to the XSLT or DOM recipient, which is as it should be.

SAX

Simple Api for Xml

Sequential, event driven, no standard yet. Analogous to the canonical parse instead of the parse tree. You could implement SAX based on DOM, but that would be doubly inefficient.

JavaScript

It once was livescript, but SUN got it changed to look like Java and be named like Java. So, Microsoft hates it and refused to accept the name. It has a standard, called ecmascript since nobody objected to it being named after the European Computer Manufacturers Assocation standards body.

Example: ~mckeeman/src/javascript/minesweeper/minesweeper.html

go to:

CS118
Home Page
CS118
Summary
Previous
Lecture
Next
Lecture