Publishing XML Documents

XSL processing

A XSL processor is used for the XSLT (XSL Transformation) that can produce various commonly known formats formats. See:

The goal to get a HTML page is to type a command as:

<XSLTprocessor><mydoc>.xml <mystyle>.xsl <myhtmlpage>.html

  1. The xml file contains absolutely nothing about how the content is visually presented.

  2. The xsl file (style sheet file) contains how the xml is visually presented

  3. And finally in this example html is the output

This is also the way Docbook works. Docbook is xml and using a stylesheet in can be converted in HTML or other formats. The XML definition of Docbook might be to numerous for simple projects. Note: various formats as XML can be imported into Docbook.

You can put a link (=processing instruction <? ) to the xsl style sheet inside your xml file:

      <?xml version="1.0" encoding="ISO-8859-1"?> 
<?xml-stylesheet type="text/xsl"  href="project.xsl" ?> <yourxml> 

This might be a good idea, but might be also a bad idea when having multiple stylesheet for the same xml file. It can also considered as violation of the concept of separating the visual appearance from semantics.

The XSLT processor does not do a lot, except that it transforms XML in XHTML. It basically replaces the XML tags with XHTML tags and places additional stuff to format the page as the background image.

There are many different xsl processors:


The XSLT processor xsltproc is probably already installed with the libxslt package.

To call xsltproc xsltproc<my>.xml when the stylesheet is referenced inside the xml file. The output appears on the screen, to put it into a file do:

xsltproc<my>.xml ><my>.html

And to call a stylesheet on the command line

xsltproc<my>.xsl <my>.xml

To write into a file and ignore downloading the dtd:

xsltproc --novalid -o<output-file>.html <stylesheet>.xsl <input-xml-file>.xml

And here an example that uses a xml docbook file and stylesheet and produces multiple html files with he file name coming from the <sect1 id=<filename>> tag, this is done by passing some variables modifying the defaults to the stylesheet using --stringparam:

xsltproc --stringparam 1 /usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl<name>.xml

Or to put the resulting html files somewhere else:

xsltproc \

--stringparam 1 \

--stringparam base.dir /<destination directory>/ \

/usr/share/sgml/docbook/xsl-stylesheets/html/chunk.xsl <filename>.xml

To see what happens some options can be added:

--profile shows what matched in a sequence

--load-trace shows what files got loaded

--noout can be added when no file should be created and no output is desired


It uses java and is therefore much slower than xsltproc. It is well documented and used in other tools. It and has also a gui kernow:

To call saxon with a hello.xml that contains the link to hello.xsl:

saxon8 -a -o hello.html hello.xml

Or when having all 3 files separated

saxon8 -o hello.html hello.xml hello.xsl


Many different Xslt processors are around as:

  1. xalan from the apache project.

  2. gorg used to make the homepage, but has poor documentation.

  3. sablotron

Debugging XSLT

Many commercial xslt debuggers exist. But there are not too many options for none commercial tools.

XSLT debugging with eclipse

The biggest problem is to install a working version of eclipse and its (eclipse web developer tools) WTP. Very many combinations with already installed plug-ins and versions exist. It is not very important what version gets installed but it is important that XSLT support is included or gets otherwise installed manually using eclipses GUI. See

As usual in eclipse a project need to be created that puts at the minimum an empty subdirectory in the workplace (eclipse directory ~/eclipse-workspace). A good start is opening in the project explorer XMLExample. The files of this projects are under ~/eclipse-workspace/XMLExample.

Then add a copy of your xsl and xml files and do not forget if required dtd files underneath ~/eclipse-workspace/<some dir> using import and creating a new directory.

xsl file can be run where a xml file (from the eclipse-workspace) is required to be passed.

However it is best to create a XSL configuration with all the settings. This XSL configuration can then be run and debugged with some mouse clicks. In this configuration the output file can be configured to be html and then the standard web browser opens it.

Eclipse has some perspectives, those are basically view configurations, so when debugging the Debug perspective opens. Perspectives can be changed under Window > Perspective

When debugging with the not debug capable default JRE XSLT processor a request pops up to switch to Xalan. See

Figure 11.4. xslt debugging with eclipse

Eclipse XSLT


During single step debugging some error might occur that does not occur when not doing single step.

One of such errors is when an attribute is added to an already processed element. The error reported is: Cannot add attribute content after child nodes or before an element is produced. Attribute will be ignored.

It looks that the single step debugging can not revert what it has already put on its result window. The solution is do not single step such lines.

Alternative ways of debugging

Instead of using a debugger you can generate messages (similar as printf when developing a c program).

Text that ends up in the output file


Or add it into a template to create a function:

        <xsl:template name="hello">
  <xsl:text> Hello </xsl:text>

And then call it from anywhere you want as:

        <xsl:call-template name="hello"/>

More advanced is the use of message see

Text that ends up in the console

  some text

Some item

  <xsl:copy-of select="$<what to look at>"/>

terminate here

        <xsl:message terminate="yes" /> 

XSL Stylesheets

The previous section looked simple, but the topic gets very fast very complex, since many tools and article are around not focusing on the concept how to deal with xml data.

In a modern environment, it is preferred to write the stuff once and publish it in various forms, places and media (Manual pages, HTML on the web, PDF to be printed, speech output, ...). If you are a programmer, it is wise to have your data available as XML so you can make use of those tools to have your program supporting outputs in HTML, PDF or whats so ever. This explains why XML gained importance, but also why so many different and complex tools exist.

XML does usually not contain anything about how the data is displayed. XSL (Extensible Style Language) is a programming language used to transform XML into readable formats.

Xsl Stylesheets are regular xml files. Since xsl needs to deal with tags from the processed file there might be potential confusion if a tag belongs to xsl or to the processed xml file. This is simply resolved by using the namespace xsl for all tags belonging to xsl. In simple words tags for xsl start with <xsl:xxx and tags for the xml file being processed don't have this prefix.


The xsl style sheets are xml and start therefore also with the xml version used and the character encoding

        <?xml version="1.0" encoding="ISO-8859-1"?> 

The next line the xsl prefix for the tag stylesheet and adds attributes for version and xmlns (namespace).


Finally the stylesheet closes with the </xsl:stylesheet> tag. Xmlns creates the namespace xsl and therefore all tags inside the stylesheet must be prefixed with the xsl prefix.

How the xslt processor works is difficult to understand since it is not sequentially, it is rule based. With such an empty stylesheet the xsl processor would just remove all tags from the elements and would print out all text of all elements one after an other.

To do something different and have the possibility to replace xml, rules have to be added. A rule is an element with the xsl prefix and the template tag.

<xsl:template ....> 
<some stuff>

The xsl processor does not sequentially read and process a xsl file. It checks for rules defined in template elements as seen above and processes its contents. Templates as above are usually nested, this means out of the template other rules set in other templates will be evaluated, therefore the xslt processor recursively processes a tree of templates.

To read any further the details about xpath expressions are not necessary to be understood, since you can easily be lost when reading about xpath. xpath is nothing else than selecting elements in a tree of elements following an approach as selecting a file inside a tree of directories using a path to the file (relative or absolute). However you need to learn the syntax when writing your own not trivial stylesheets.


The most common rule is the match rule.

        <xsl:template match="<xpath-expression>">

When the xml file is processed the xml tags are observed and basically drawn away and the rest is printed onto the screen or in a file. If a match template matches with the xml tag processed, then not the tag but the complete element and its child elements are drawn away. This can also be done intentionally to not process an element and its child elements.

        <xsl:template match="/"> 

selects the root of the document, therefore everything. Alternatively the match could select the root tag, since all xml files should have just one root element.

Instead of drawing away everything something should be done. The most easiest thing is just adding text (that could also contain <tag>) to the match element. Other thins are:

Read the attributes and write then out

        <xsl:attribute name="href">
      <xsl:value-of select="@href"/>

Write the text of the element

        <xsl:value-of select="."/>

Process all child elements


Process just some child elements this can be used as a fork to go down a branch in a tree

        <xsl:apply-templates select="<some tag>"/>

Sequence of processing

Having match rules everything between the template tag of the stylesheet is written to the output but the matched element and its child elements are not processed.

To have the child elements processed the following instruction is necessary to add inside the match rule:

  1. <xsl:apply-templates/> is necessary to process child elements but also the text of the current element that is printed. Having this and nothing else will make that the text of the current element plus all child elements are printed.

  2. On the other hand if you want to wipe off all child elements , don't place the < xsl:apply-templates/> statement.

  3. Or just omit the current text since it might be used as attribute

                  xsl:apply-templates select="*"
  4. Alternatively it can be restricted to certain child elements <xsl:apply-templates select="<xpath-expression>"/>. So just the elements and their subelements selected with the xpath expression is further processed.

  5. The <xsl:apply-templates select="<xpath-expression>"/> statement can also be used to alter the sequence how the subelements are processed.

        <xsl:template match="echo">   
  <h1 align="center">     

element or text

Elements could be added as text

        <xsl:template match="www">
   <html xmlns="" lang="en">
   <xsl:apply-templates />

or with xsl:element to be checked and have higher quality

        <xsl:template match="www">
  <xsl:element name="html"
    <xsl:attribute name="lang">en</xsl:attribute>
    <xsl:apply-templates />

Adding line breaks and blanks

HTML and XML do not care if line breaks exist. But if you open a automatic produced HTML file you do not like to read everything in a single line. To be able to insert line breaks add first

        <xsl:template name="Newline">

and then you can produce in your code line breaks by inserting

        <xsl:call-template name="Newline" />

An other problem is that you like to insert a or more blanks, but the xslt processor ignores the blanks. Therefore where you like to have the blank insert:

        <xsl:text> </xsl:text>

As you see it is as for line break, there a cr character is inserted.

Printing out text and attributes

Anything can be printed out using a xpath expression

        <xsl:value-of select="<xpath-expression>"/>

To print out the value/text of the selected element use:

<xsl:value-of select="."/> it usually has the same effect as <xsl:apply-templates/> the difference is that it does not go down the tree any further. It prints out the text ignoring any other included elements, instead of processing the text and additional elements.

and the value of an attribute:

<xsl:value-of select="@<attribute name>"/>

Or and the value of an child element:

<xsl:value-of select="./<child element>"/>

Or a neighbor element

<xsl:value-of select="../<neighbor element>"/>

Write an attribute

The following example shows how to insert an attribute. The matched element text has to be put as attribute:

        <xsl:template match="homepage"> 
   <xsl:attribute name="url">
    <xsl:value-of select="."/>
   <xsl:value-of select="."/>
  <xsl:apply-templates select="*"/>  

The example inserts the url text in the desired places with the <xsl:value-of select="."/> command and avoids that the text is printed else were by just processing the child elements using <xsl:apply-templates select="*"/>

Other xslt elements

Other useful commands are: for-each, sort, if, choose

Sample stylesheet

And now here a simple xsl style sheet

Example 11.7. xsl stylesheet

            <?xml version="1.0" encoding="ISO-8859-1"?> 
<xsl:stylesheet version="2.0"  
  <xsl:output method="html"/> 
  <xsl:template match="/"> 

<xsl:output method="html"/> used the output tag of the xsl namespace. The output tag has all necessary attributes to have or modify a html header.

Here all attributes of the output tag:

method = "xml" | "html" | "text" 
version = "string" 
encoding = "string" 
omit-xml-declaration = "yes" | "no"  
standalone = "yes" | "no"  
doctype-public = "string"   
doctype-system = "string" 
indent = "yes" | "no" 
media-type = "string" /> 

Embedding XSL stylesheets in XML

Usually you have one stylesheet (or a set of it) an you apply it to many xml files. This is the way as it supposed to be: Write it once (without worry about how it is later visually presented) and then transform it to whatever using stylesheets.

The disadvantage is that you can not simply send a single file to somebody that is not aware of how to deal with XML and this are most humans on our planet. Therefore it is possible to add an xsl stylesheet to your xml data and have a regular web browser to convert it. See

Since XSLis xml it can be easily included using an other namespace the xsl namespace. However it must also be processed when opened, this is done with a process instruction as

        <?xml-stylesheet type="text/xml" href="#stylesheet"?>

The href points to the stylesheet and used id attribute of it


A little trick remains, stylesheets to not have an an attribute id and parsing would fail. This can be fixed with:

<!DOCTYPE www [
<!ATTLIST xsl:stylesheet

So a simple xml file with an embedded stylesheet looks as:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xml" href="#stylesheet"?>
<!DOCTYPE www [
<!ATTLIST xsl:stylesheet
  <xsl:template match="xsl:stylesheet" />
  <xsl:template match="www">
    <html xmlns=""> 
      <meta name="author" content="Urs Lindegger" />
       <xsl:value-of select="./head/title" />
      <style type="text/css">
       p.text {color: red; background-color: white;} 
       <xsl:value-of select="./head/title"/>
      <xsl:apply-templates select="body"/>    
  <xsl:template match="text">
   <p xmlns="" class= "text">
     <xsl:apply-templates /> 

    <title>Example how an embeddedd xsl stylesheet</title>
    <description>Embed xsl in xml</description>
    <keywords>xsl, xml</keywords>
    <text>If it works this text is changed to red</text>

Formatting Object

When publishing XML data a intermediate step can be made by converting it into FO (formatting objects) that can be converted to formats as PDF.

Formatting Object Processor

To use the apache formatting object processor emerge fop. See:

Other formatting object processors are: XMLmind XSL FO that can convert FO to Office Open, OpenOffice and RTF therefore opens the door to the Microsoft World.

An other FO is

Converting to HTML is much less picky than converting to pdf. One reason is pdf is meant to be paper and html computer screen. If it done not fit to the screen, there is no problem, you just get scroll bars. However if it does not fit to the paper, you get a serious error.

To convert docbook to formatting object, the fo file:

xsltproc -o<my file>.fo /usr/share/sgml/docbook/xsl-stylesheets/fo/docbook.xsl <my file>.docbook

And to add a paper format parameter

xsltproc --output --stringparam paper.type A4 /usr/share/sgml/docbook/xsl-stylesheets/fo/docbook.xsl BootFromUsb.xml

and then to pdf:

fop -fo<my file>.fo -pdf <my file>.pdf

Fop could also convert xml to fop and the to pdf with one call: fop -xml <my file>.xml -xsl <my stylesheet>.xsl -pdf <my file>.pdf

Fop and fonts

A common warning is:

WARNING: Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400".

700 is the font weight telling how fat bold should be. The font installed does not allow the setting for 700 so 400 is taken this probably give a nicer output. Possible weights are: normal | bold | 100 | 200 | 300 | 400 | 500 | 600 | 700 | 800 | 900 where normal = 400, bold = 700. This shows that the FOP warning does not find bold and replaces it with normal. This is rated as WARNING but could be considered also as INFO and could therefore ignored.

However it is nice to process data without getting warnings.

A radical way of getting rid of these warnings is to tell xsltproc to not use that fonts by font substitution xsltproc --output --stringparam paper.type A4 --stringparam serif --stringparam serif /usr/share/sgml/docbook/xsl-stylesheets/fo/docbook.xsl doc.xml

It is possible to pass a xml config file with the -c command line option. The file fop.xconf can be found inside apaches fop package, on gentoo it can be taken from /usr/portage/distfiles/fop.

The file holds the default values and therefore should have no effect until it gets modified. However, it changes the path for relative links to the location of this file and therefore troubles as not finding pictures to be included might occur. It also gives out some messages that it changes the page size.

Using font substitution (added right on top not in the <render> sections) the warning can be resolved

<fop version="1.0">
       <from font-family="Symbol" font-weight="bold"/> 
       <to font-family="Symbol" font-weight="400"/> 
      <from font-family="ZapfDingbats" font-weight="bold"/>        <to font-family="ZapfDingbats" font-weight="400"/> 

Using this file fop is called as: fop -c fop.xconf ...............

PDF must support the following fonts: Helvetica (normal, bold, italic, bold italic), Times (normal, bold, italic, bold italic), Courier (normal, bold, italic, bold italic), Symbol and ZapfDingbats.

It can happen that it will not find the desired fonts and then default to Times-Roman and showing a # character in the pdf.

If other fonts (as asian) are to be used then they need to be installed first on the PC and then imported to the config file.

        <renderer mime="application/pdf">
  <font kerning="yes"
  <font-triplet name="DejaVuSans" 

The fo file contains the fonts to be used, however the new font will not be in the fo file. So a font substitution is required.

<fop version="1.0"> 
        <from font-family="serif" /> 
        <to font-family="kochi-gothic-subst" /> 

Now the font is ok for asia, but the The Euro sign as unicode € is no more understood and replaced by a dot.

To fix that both fonts the default serif and kochi-gothic-subst need to be used. This can be achieved xsltproc with the command line option:

--stringparam serif,kochi-gothic-subst

As result the fo file will contain both fonts and give serif priority:


Fop and hyphenation

To not get hyphenation errors emerge offo-hyphenation for apache fop see The xml files from offo-hyphenation are not directly used they put during the fop compilation into fop-hyph.jar (if everything goes well) aunpack -l /usr/share/fop/lib/fop-hyph.jar from atool shows what hyphenation patterns are available

Fop and pictures

Pdf means being able to print on paper and this means the pictures must fit within the paper dimension. So the pictures printsize must be smaller than the printable paper size. This sounds easy but:


The printsize is on most pixel based formats a computed value. Printsize is pixel per inch multiplied by number of pixels. This explains why when resizing a picture with gimp to 14 cm it might result in 14.0001 cm. Imperial dimensions is still common and unfortunately some programmers still struggle with inch to metric conversion.

A good strategy is to agree for portrait format on x pixel resolutions of 640, 320, 160 for the picture dimensions 14cm, 7cm, 3.5cm and have then the y resolution respecting the x/y ratio.

High resolution picture are nice but take some time to download on a web page and when printing on paper the printer must support the resolution.

This gives then a dot per inch resolution of 640/14cm*2.54cm/1inch= 116 dpi or for people that like it simple 100dpi.

identify -units "PixelsPerInch" -format "%w x %h %x x %y" <picture>.<ext> from imagemagick prints out both resolution and dpi

mogrify -resize 320x236! <picture>.<ext> changes the resolution and for exact pixel counts the ! character ignores x/y ration of the original picture.

mogrify -units "PixelsPerInch" -density 100 <picture>.<ext> fixes the dpi (mogrify works on the same file, convert would require two files an input file and an output file)

Fop and accessibility

Newer fop versions have the command line option -a that enables accessibility features as (Tagged PDF and produce additional warnings helping to improve the document). Alternatively fop.xconf can get


However then pictures need alternate text and this is a bit complicated. If used the <graphic> tag needs be replaced by <mediaobject>

  <imagedata align="center" fileref="pics/<some>.png"/>
    <phrase><some text></phrase>

This works for html but not for fop. For fop the fop extension fox must be added as with xsltproc --stringparam fop1.extensions 1 .... resulting in:

        fop <fo:root xmlns:fo="" 

then in the fo document it must made be sure that the graphics get the alt-text attribute

        <fo:external-graphic src="url(pics/pibs.png)" fox:alt-text="logo"

Unfortunately this is not done automatically by the style-sheets and there is also a reason for it. Not all Formatting Object Processors use fox:alt-text.

A fundamental question. When having XML then why not focus on accessibility friendly and HTML and keep pdf just targeted to print it on paper?

Formatting Object Structure

A FO file is pure xml and after its xmlns declaration, it can be separated different parts. The first part defines page layout templates and gives those templates some names:

        <?xml version="1.0" encoding="UTF-8"?>
<fo:root xmlns:fo="">   
      <fo:simple-page-master master-name="DefaultPage"
         margin-right="1in" >    
   <fo:page-sequence master-reference="DefaultPage">      
      <fo:flow flow-name="xsl-region-body">               
         <fo:block>Hello World</fo:block>             

After the printable region is defined on the sheet, the region can be divided further in region-before, region-after, region-start, region-end and what is left over is the the region-body that holds the main contents of the page. In the simples form the empty element


needs to be added. The individual pages can then be added using the page-sequence tag and passing the layout template name using master-reference.

        <fo:flow flow-name="xsl-region-body"> 

tells that the text has to go into the region body.

The element block holds what appears on the page. In the simples case it is text. A picture can be added as:

      <fo:external-graphic src="pic.jpg" 

There are many more options as choosing different page templates depending on odd or even page numbers.

Finally fop -fo -pdf sample.pdf converts it to pdf

Links: and there is also a WYSIWYG editor


Jade (or OpenJade) is a tool that can convert SGML to RTF, TeX, MIF, and SGML using DSSSL stylesheets

Cleanup XML

Since a xml file contains a huge amount of tags, there are endless formatting styles. Two identically files (semantics) might look completely different and cause confusion for humans and diff programs. Therefore tools to format xml as tidy be used. See man tidy

Advanced features: Accessibility checks can be enabled tidy --accessibility-check <n>

Other tools

Many tools are available to convert xml, but many of them use other depending tools.

To convert xml to pdf format using the latex infrastructure.

xmlto pdf mydoc.xml

Or to convert xml (docbook) to html and store the result in the html sub-directory:

xmlto html -o html hello.docbook

Checks for having user friendly documents

Accessibility checks

There is and tools as or with called by the following evaluates the accessibility of a web page. It allows easily browsing from one page to an other of a web site. There are also plug-ins for web browsers as firefox so with a click a web page can get analyzed.

Errors and Alerts are shown including their reasons. Having created the web site with style sheets allows easily improving the stylesheets and then having all web pages cleaned up.

Linurs Servernest startpage