Working with XML

If you open an xml file with for instance your web browser you will see that it is not human friendly. However it is computer friendly. Different tools an libraries exist to deal with it. Those tools use different ways and methods, each has it strengths and weaknesses:

  1. DOM (Document Object Model) reads the XML data and created as tree in memory. This DOM tree in memory can be processed and then converted back to XML.

  2. XPath and XPointer can address data in a DOM tree

  3. XSLT makes use of a stylesheet (that is actually a program written in the XML Stylesheet Language) that is transformed into an other document format. Common output formats are XML, XHTML

XmlStarlet

XML starlet is a tool that can be used to work with xml on command line level. It is nice to learn xpath expressions and use it in scripts.

xmldiff

Xmldiff from http://www.logilab.org/ can be used to compare two xml files or two directories containing xml files. It also supports to compare html.

Html-xml-utils

Html-xml-utils from http://www.w3.org/Tools/HTML-XML-utils/ is a collection of various tools to manipulate xml and html data.

XML2

Xml2 has introduced a format called flat. It can convert xml or html into flat and vice a versa. It uses 4 programs for that xml2, html2, 2xml and 2html. Flat work well with c command line tools. It has no closing tags and the start tags are expanded to hold the complete path. Xml2 is fed from a pipe and is therefore used in combination with wget or cat and the output is filtered with grep, awk, sed and maybe formatted using cut

cat<filename>.xml | xml2 | grep <string> | cut <options>

The data behind = is the data of an element and the data before the = are the tags pointing to the data. Except if there is a @ character, then it is the data of an attribute and the name of the attribute is between the @ and = character.

Validate xml

Xml is not always validated, to save time. To validate and xml document type:

xmllint --valid --noout <my file>.xml

xmllint comes with xsltproc


Linurs startpage