CS118 Programming LanguagesLecture 12Contents
XML in the largeeXtensible Markup Language
CSS XLS XSLT XHTML
^
|
| presentation
Xpointer |
Xpath <--navigation-- XML ----api------> DOM (Java, JavaScript...)
Xlink | SAX
| definitions
|
v
DTD Schema
Not all of the above are freeware. Some are "real expensive".
Scott McNeley says "the net is the computer". Well these are its programming languages. And, as in the 1960's for conventional programming languages, chaos reigns and nice guys do not necessarily finish first. Alas, poor Algol-60, I knew thee well... DOMDocument Object Model Here is a bit of Java code that prints out the name of the root node of file example.xml:
import java.io.file;
import org.w3c.dom.Document; // note URL style library names
import org.w3c.dom.Node;
import javax.xml.parsers.DocumentBuilderFactory;
public class Example {
public static void
main(String[] args) throws Exception {
File xmlfile = new File("example.xml");
Document root
= DocumentBuilderFactory
. newInstance()
. newDocumentBuilder()
. parse(xmlfile); // what if it is a BIG file?
doc.normalize(); // collapse whitespace
Node root = doc.getDocumentElement();
System.out.println("node name = " + root.getNodeName());
}
}
One can write similar code in JavaScript, Perl, VBasic, or any other language with access to an XML parser. The actual API is different for each programming language but all should implement the W3C DOM spec (provide the primitives). Anyone in this class could implement DOM -- it is just a parser and some readouts. But others have already done it, so why bother (unless money matters)? The organizing feature is that the document (defined in XML and perhaps DTD) is represented as a walkable object. There are two ways to look at an XML element: by position and by name. The two ways are reflected into two object types. By position: NodeList nl = root.getChildNodes(); Node a = nl.item(0); Node b = nl.item(1); // NULL if none Node c = root.getFirstChild(); // same as a Node d = b.getNextSibling(); // same as bBy name: NamedNodeList nnl; // used for attributes There is a rich library of W3C standard functions to walk around the document object. E.g. Attr CDATASection Comment Document DocumentFragment DocumentType DOMImplementation Element Entity EntityReference Notation ProcessingInsruction TextThere are also many Microsoft extensions which may, or may not, find there way into the standard The call to parse is the fundamental weakness of DOM. It is an all or nothing proposition, which is OK for small XML files, but infeasible for large XML files, especially those pulled over the net. It is inefficient to represent large data bases as XML files, but it is possible to give query results as (smaller) XML files. This leaves data presentation to the XSLT or DOM recipient, which is as it should be. SAXSimple Api for Xml Sequential, event driven, no standard yet. Analogous to the canonical parse instead of the parse tree. You could implement SAX based on DOM, but that would be doubly inefficient. JavaScriptIt once was livescript, but SUN got it changed to look like Java and be named like Java. So, Microsoft hates it and refused to accept the name. It has a standard, called ecmascript since nobody objected to it being named after the European Computer Manufacturers Assocation standards body. Example: ~mckeeman/src/javascript/minesweeper/minesweeper.html
|