Oxygene by Example - Xml
(TODO: Expand this to cover Cocoa and Java too?)
Quick introduction to XML
Since it's introduction at the end of the 20th century, XML has become pretty much the standard for storing and transferring small amounts of machine and human readable structured data. An XML file generally starts with a header (or has an implied one), and contains nested tags with data in it. The format is pretty simple and strict:
<?xml version="1.0" encoding="UTF-8" ?> <data> <item attribute="15">data</item> <item attribute="16" /> </data>
In this file, the <?xml version="1.0" encoding="UTF-8" ?> is optional and the encoding depends on the encoding used in the rest of the file. For every XML file, there can only be one root element. Each item has to be closed, and in exact reverse order as how it was opened. In this sample, <data&t; and <item attribute="15"> open a tag. the </item> and </data> tags close it. When an opening tag ends with a /, it has no children and is closed right away. the attribute="16" part is called an attribute, which is a simple name='string value' pair. Text can go in attributes and between the tags, and has to be escaped if a characted is one of < > & ' ". These can be quoted with the & character and look like (same order) < > & ' ".
Namespaces
While not part of the original specification of XML, XML Namespaces are used in most XML documents. A namespace is used to uniquely identify an element or attribute by including an URI with each one of them.
<?xml version="1.0" encoding="UTF-8" ?> <book xmlns="http://schemas.remobjects.com/book"> <chapter id="1" xmlns="http://schemas.remobjects.com/book"> <paragraph xmlns="http://schemas.remobjects.com/book"> ... </paragraph> <paragraph xmlns="http://schemas.remobjects.com/book"> ... </paragraph> </chapter> <chapter id="2" xmlns="http://schemas.remobjects.com/book"> <paragraph xmlns="http://schemas.remobjects.com/book"> ... </paragraph> <paragraph xmlns="http://schemas.remobjects.com/book"> ... </paragraph> </chapter> </book>
The URI used does not have to exist; it doesn't even have to start with http; however it should be unique. Writing documents this way is rather verbose, so namespace prefixes were introduced:
<?xml version="1.0" encoding="UTF-8" ?> <bk:book xmlns:bk="http://schemas.remobjects.com/book"> <bk:chapter id="1"> <bk:paragraph> ... </bk:paragraph> <bk:paragraph> ... </bk:paragraph> </bk:chapter> <bk:chapter id="2"> <bk:paragraph> ... </bk:paragraph> <bk:paragraph> ... </bk:paragraph> </bk:chapter> </bk:book>
This document is considered to be the exact same as the one above. The name of the prefix does not matter, it's only used as a reference to the xmlns reference above it.
Working with XML in Oxygene for .NET
While there are lots of way to deal with XML, for example by reading it manually, or by using the XmlDocument class, the easiest way is using the XLinq classes in the System.Xml.Linq namespace. These classes make reading and writing XML very easy.
Reading an XML File
Opening an XML file with XLinq is simple. XDocument.Load(input) where input can be a filename, stream or one of several other things. The returned class is an XDocument, which holds the whole file. The "Root" property of this class contains the root element (book in the above class) and has several methods of querying the data.
var lDoc := XDocument.Load('r:\data.xml'); var lChapterWithID := lDoc.Root.Elements(XName.Get('chapter', 'http://schemas.remobjects.com/book')) .FirstOrDefault(a->a.Attribute('id'):Value = '2');
The above code first returns all elements on the root (it will return a sequence of chapter tags) that have as name 'chapter' in the given xml namespace. Then it calls FirstOrDefault on that, which returns the first element that matches the condition in the parameters, or nil if there is none. The parameter is a Lambda expression. This lambda is an inline (anonymous) function that gets triggered for each element with the name 'chapter', and asks for the XAttribute named 'id'. Given that the result of this can be nil, we use the [Colon Operator] to access the Value property (the colon operator doesn't raise an exception when unable to access a member of the class, but returns nil) and compare that with '2', which is the chapter id we're looking for. What this returns is the first chapter with an attribute 'id' with value '2'. After this code lChapterWithID will contain an XElement with
<bk:chapter id="2"> <bk:paragraph> ... </bk:paragraph> <bk:paragraph> ... </bk:paragraph> </bk:chapter>
in it. Another thing we might want to do, is get the text of all paragraphs in a string.
var var lDoc := XDocument.Load('r:\data.xml'); var lChapters := lDoc.Root.Descendants(XName.Get('paragraph', 'http://schemas.remobjects.com/book')); // Descendants returns all elements at any sub level from this node, while Elements only the direct children of the node. var lChapterText := String.Join(#13#10, lChapters.Nodes() // Returns all nodes, including Text nodes .Where(a->a.NodeType = XmlNodeType.Text) // Filter by text, returns a sequence of XNode which are all XText instances. .Select(a-> XText(a).Value) // selects the values .ToArray); // Convert to array, as String.Join requires an array, not a sequence, and appends them.
Writing XML
Creating the above document using XLinq can be done in multiple ways, but the simplest is by using the constructor overloads in a nested call:
var ns: XNamespace := 'http://schemas.remobjects.com/book'; var lNewDoc := new XDocument( new XElement(ns + 'book', new XElement(ns + 'chapter', new XAttribute('id', 1), new XElement(ns+'paragraph', ' ... '), new XElement(ns+'paragraph', ' ... ') ), new XElement(ns + 'chapter', new XAttribute('id', 2), new XElement(ns+'paragraph', ' ... '), new XElement(ns+'paragraph', ' ... ')) ) ); lNewDoc.Save('R:\file.xml');
The first parameter to XElement is always the name of the value. Anything after it is the child nodes. Any XAttribute instances end up as attributes, the rest as child nodes, strings are automatically converted to XText nodes.
Instead of using a nested call like above, it's also possible to store each element and use the Add method to add children. The XLinq family of classes are all multiple, so the same APIs can be used to modify existing XML documents.
