Table of Contents
“XQuery is to XML as SQL is to relational databases.”
XQuery 1.0: An XML Query Language became a W3C Candidate Recommendation on 3 November 2005. The specification's abstract says,
XML is a versatile markup language, capable of labeling the information content of diverse data sources including structured and semi-structured documents, relational databases, and object repositories. A query language that uses the structure of XML intelligently can express queries across all these kinds of data, whether physically stored in XML or viewed as XML via middleware. This specification describes a query language called XQuery, which is designed to be broadly applicable across many types of XML data sources.
Charles Goldfarb invented SGML (1974). This markup language had its beginnings in IBM's GML (1960s). Tim Berners-Lee used SGML to make HTML (1993). SGML is also the basis of the XML specification (1998) edited by Tim Bray, Jean Paoli and Michael Sperberg-McQueen. Michael Sperberg-McQueen is also one of the editors of the original TEI specification (1994).
XML has now become a popular medium for data storage and exchange. A number of other specifications have been produced to help use XML. One of these is XQuery, which grew out of a conference held at Boston in 1998 to discuss a query language for XML.
This borrows heavily from the Wikipedia entries for XQuery and Functional Programming:
XML is a tree-like data structure with seven kinds of nodes (document, element, attribute, text, comment, processing instruction, namespace). The XQuery data model has these seven node and 50 “atomic” types as well. The 50 may be arranged under the headings of untyped, Boolean, numeric, string, calendar, qualified name, and other types.
Documents are interrogated using XPath expressions. (See the Wikipedia entry for XPath.)
It is very easy to construct XML using XQuery.
In XQuery, all input and output must be an instance of the XQuery data model. Every instance is a sequence of zero or more items, and every item is one of the seven node or 50 atomic types.
Some of you are old enough to remember the “GOTO” statement, which became extinct because of an article by Edsger Dijkstra called “A Case against the GO TO Statement” (1968). The 1977 Turing Award lecture by John Backus (who invented FORTRAN and BNF) entitled Can Programming Be Liberated From the von Neumann Style? may send the assignment statement the same way. [Kay 2004, 625-9]
What!? How then do I program? In order to become functional, you are best to forget everything you know about procedural programming. Functional programs are like that frightful mathematics you learned at university with statements such as “for every,” “there exists,” “let N be the set of all positive integers.” Functions do everything interesting. Side-effects are forbidden (i.e. existing values are not clobbered). If the procedural left-side of your brain tells you to use iteration, the functional right-side must come up with a recursive solution instead.
Advantages of this approach are:
invariance: the result of a function will be the same for a given set of parameters no matter how it is evaluated. This makes it easier to prove program correctness and to use parallel computing.
closure: no side effects (i.e. no changes of state) occur. Consequently, pure functional programs are guaranteed to be thread safe.
modularity: order of execution no longer matters so a result built from independent parts can be constructed as the parts come to hand. One need not wait for all of the parts before beginning to produce the output. One need not worry about other parts when one part is updated.
These things are not in the current specification, but will be in a future edition.
The original XML document can be transformed but not updated.
One can only apply pattern matching to individual text nodes. A program is necessary to look through the whole document.
It would also be nice to make XQuery object-oriented (without getters and setters, which require assignment), but I don't even know if it makes sense to try.
Joins and other forms of algebra are achieved using the FLWOR construct (for, let, where, order by, return). This is an operation, not a loop! It transforms one sequence into another. Don't ask how, just believe. The “return” of FLWOR is nothing to do with that of a procedural language subroutine.
These are lifted from XML Query Use Cases:
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> <author><last>Stevens</last><first>W.</first></author> <publisher>Addison-Wesley</publisher> <price>65.95</price> </book> <book year="2000"> <title>Data on the Web</title> <author><last>Abiteboul</last><first>Serge</first></author> <author><last>Buneman</last><first>Peter</first></author> <author><last>Suciu</last><first>Dan</first></author> <publisher>Morgan Kaufmann Publishers</publisher> <price>39.95</price> </book> <book year="1999"> <title>The Economics of Technology and Content for Digital TV</title> <editor> <last>Gerbarg</last><first>Darcy</first> <affiliation>CITI</affiliation> </editor> <publisher>Kluwer Academic Publishers</publisher> <price>129.95</price> </book> </bib>
<bib> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where $b/publisher = "Addison-Wesley" and $b/@year > 1991 return <book year="{ $b/@year }"> { $b/title } </book> } </bib>
<bib> <book year="1994"> <title>TCP/IP Illustrated</title> </book> <book year="1992"> <title>Advanced Programming in the Unix environment</title> </book> </bib>
<results> { let $a := doc("http://bstore1.example.com/bib/bib.xml")//author for $last in distinct-values($a/last), $first in distinct-values($a[last=$last]/first) order by $last, $first return <result> <author> <last>{ $last }</last> <first>{ $first }</first> </author> { for $b in doc("http://bstore1.example.com/bib.xml")/bib/book where some $ba in $b/author satisfies ($ba/last = $last and $ba/first=$first) return $b/title } </result> } </results>
<results> <result> <author> <last>Abiteboul</last> <first>Serge</first> </author> <title>Data on the Web</title> </result> <result> <author> <last>Buneman</last> <first>Peter</first> </author> <title>Data on the Web</title> </result> <result> <author> <last>Stevens</last> <first>W.</first> </author> <title>TCP/IP Illustrated</title> <title>Advanced Programming in the Unix environment</title> </result> <result> <author> <last>Suciu</last> <first>Dan</first> </author> <title>Data on the Web</title> </result> </results>
<bib> { for $b in doc("http://bstore1.example.com/bib.xml")//book where count($b/author) > 0 return <book> { $b/title } { for $a in $b/author[position()<=2] return $a } { if (count($b/author) > 2) then <et-al/> else () } </book> } </bib>
<bib> <book> <title>TCP/IP Illustrated</title> <author> <last>Stevens</last> <first>W.</first> </author> </book> <book> <title>Advanced Programming in the Unix environment</title> <author> <last>Stevens</last> <first>W.</first> </author> </book> <book> <title>Data on the Web</title> <author> <last>Abiteboul</last> <first>Serge</first> </author> <author> <last>Buneman</last> <first>Peter</first> </author> <et-al/> </book> </bib>
XML editors
The book by Brundage and the one edited by Katz given in the references will get you started. Look out for a forthcoming book by Priscilla Walmsley, possibly from O'Reilly.
Note | |
---|---|
Priscilla Walmsley's book on XQuery was published the year after this short introduction was written. The book is now listed in the reference list. |
Boag, Scott, Don Chamberlin, Mary F. Fernández, Daniela Florescu, Jonathan Robie, Jérôme Siméon. XQuery 1.0: An XML Query Language. http://www.w3.org/TR/xquery/.
Chamberlin, Don, Peter Fankhauser, Daniela Florescu, Massimo Marchiori, Jonathan Robie. XML Query Use Cases. http://www.w3.org/TR/xquery-use-cases/.
Functional Programming. Wikipedia. http://en.wikipedia.org/wiki/Functional_language.
Katz, Howard, ed. 2004. XQuery from the Experts: A Guide to the W3C XML Query Language. Boston: Addison-Wesley.
———. An Introduction to XQuery. http://www-128.ibm.com/developerworks/xml/library/x-xquery.html.
XPath. Wikipedia. http://en.wikipedia.org/wiki/Xpath.
XQuery. Wikipedia. http://en.wikipedia.org/wiki/Xquery.