Docbook

Docbook allows to write a book that can not be read. Ok, it is a joke, docbook holds the semantic content of the book in XML (or SMGL), but it contains nothing how this can be visualized.

To view a docbook without a XML or ASCII editor, it has to be converted into a well known format as html, pdf,... . This process makes use of a stylesheet that adds how it should appear. See http://docbook.org/, http://www.oreilly.com/openbook/docbook/book/docbook.html or http://www.sagehill.net/docbookxsl/index.html

You might ask yourself what is this all about, why not just using something as OpenOffice Writer to the documents. If you ask yourself this question, then just use OpenOffice Writer that's fine.

However when the book becomes a project, or when it comes to publishings to different formats, having amendments, corrigenda, versions or you want to make it available in different formats as pdf and HTML or more people write the same book, then you are better of having something as docbook. Since it is ASCII, you can have a version management system as CVS and compare the differences using diff.

Unfortunately Docbook had a major evolution behind, that is still ongoing and therefore might confuse people who want to start with it. In the past docbook was SGML, used DSSSL style sheets and TeX as engine (jade and jadetex). This is still working but nowadays docbook has moved to XML, used XSL (libxslt libxlm2) to convert in HTML and the XSL-FO engine (fop) to convert to pdf. Docbook uses common xml tools and can therefore make use of many tools, this is the advantage of docbook being pure xml.

Docbooks has different formats (sets of tags DTD):

  1. Article is something smaller than a book.

  2. Book, I guess you know what it is. Use this as default.

  3. Chapter a file that will be used in a book or article

In general docbooks are validated using its DTD, parsed and then translated in the desired format. What is used for that is not a simple tool, it is a toolchain, a chain of different tools and of course command line tools.

Docbook files have often the extension *.docbook, this causes sometimes problems with generic tools, a better option is calling them just *.xml. Good tools and operating systems look anyway to the files internal data and will find out that it is docbook using xml and even see the version numbers of docbook.

Docbook syntax

Pictures

To have good picture support, to not create a mess when they got converted from docbook to an other format especially pdf or having a too big or bad resolution adjust the pictures to be used. Use jpg since this is widely supported and has tags containing picture information that can be added and read.

  1. Scale the size to 320*240 pixel

  2. Set the resolution to 72 pixel/inch

  3. And the most important thing, set print size to 4.44 * 3.33 inch

In gimp you can do both with the scale image dialog.

Docbook Links

Internal Links

Links point to a file or a location in a file. To mark points in a file, id attributes can be added to tags as: <sect1 id="Docbook"> marks and gives section 1 (the Docbook section in this document) the id Docbook.

To create a link to this id the link tag can be used:

          <link linkend="Docbook">This is a link to Docbook</link>

and here how it looks like This is a link to Docbook

Don't use empty tags as:

          <link linkend="Docbook"/>

since they make an error when processing the xml.

The id is an attribute and can not stand on its own without a tag. The anchor tag is basically a dummy tag that allows to place an id. However anchor tags cause errors when converting xml to pdf using fop. So don't use them.

Links to other sites

To link to a uri, a site on the web:

          <ulink url="http://www.linurs.org">my homepage</ulink>

and here it is: my homepage and here as empty tag

          <ulink url="http://www.linurs.org"/>

and here how it looks like: http://www.linurs.org. In serna it looks a bit scary, since it prompts to insert the link text, but this can be ignored, since we want here the uri to be shown as it is. So what we see is consistent with the link.When the document is converted to pdf (means print to paper) both, the text and the link are printed. If they are equal then it looks silly, so keep links empty except if the text is different from the link.

Or an e-mail address:

          <email>urs@linurs.org</email>

and here how it looks

Links between files

A link to an other xml file can be done by using the <ulink> tag as it is used to link to some web site or files. This approach is not very portable and needs a web server to be installed.

Easier would be using a relative path between the files. This is done with the <olink> tag that points to the target docs id (not file name) and the id within the file:

          <olink targetdoc="<target file id>" targetptr="<target id>

The files can be written straight forward, here the first file:

          <?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "docbookV4.5/docbookx.dtd" []>
<book id="MySourceBook">
  <title>My Source Book</title>
  <chapter id="SourceChapter">
    <title>My Source Chapter</title>
    <para>Olink to other document 
      <olink targetdoc="MyDestinationBook"
             targetptr="DestinationChapter">
             link text
      </olink>
    </para>
  </chapter>
</book>

and here the second:

          <?xml version='1.0' encoding='UTF-8'?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" "docbookV4.5/docbookx.dtd" []>
<book id="MyDestinationBook">
  <title>My Destination Book</title>
  <chapter id="DestinationChapter">
    <title>My Destination Chapter</title>
    <para>olink to other document
        <olink targetdoc="MySourceBook"  
               targetptr="SourceChapter">
               link text
        </olink>    
    </para>
  </chapter>
</book>

Since the two files have links they are linked, but it is desirable to process and edit them independently. The two files might be in different directories. However they must know about where they are since they share links.

In the following example both files are inside the same directory, this makes the path more simple, but the default output filenames need to be modified otherwise they overwrite each other. To know the links of each other, they need to be processed producing a xml file that contains the link information. The following commands produce just those files containing the links:

          xsltproc --stringparam targets.filename "srctarget.db" \
 --stringparam  collect.xref.targets  "only" \
 /usr/share/sgml/docbook/xsl-stylesheets/html/docbook.xsl \
 source.xml
         
xsltproc --stringparam targets.filename "desttarget.db" \
 --stringparam  collect.xref.targets  "only" \
 /usr/share/sgml/docbook/xsl-stylesheets/html/docbook.xsl \
 destination.xml

The two files are included in a common file that needs to be edited manually:

          <?xml version="1.0" encoding="utf-8"?> 
<!DOCTYPE targetset SYSTEM "file:///usr/share/sgml/docbook/xsl-stylesheets/common/targetdatabase.dtd" [
 <!ENTITY sourcetargets SYSTEM "srctarget.db"> 
 <!ENTITY destinationtargets SYSTEM "desttarget.db">
]>
<targetset> 
  <targetsetinfo> 
    olink example
  </targetsetinfo>
  <sitemap> 
    <dir name=".">
      <document targetdoc="MySourceBook" 
                baseuri="source.html"> 
                &sourcetargets; 
      </document>
       <document targetdoc="MyDestinationBook" 
                baseuri="destination.html"> 
                &destinationtargets; 
      </document>
    </dir>
  </sitemap>
</targetset>

The <dir> element is used like a tree for every directory that is between both files. There is relative linking between them so there is no need to add the absolute path. In this example both files are in the same directory, so one <dir> tag is enough. Having this file that includes the two link files, the html can be produced:

          xsltproc --output source.html \
 --stringparam target.database.document "olink.db" \
 --stringparam current.docid "MySourceBook" \
 /usr/share/sgml/docbook/xsl-stylesheets/html/docbook.xsl \
 source.xml

xsltproc --output destination.html \
 --stringparam target.database.document "olink.db" \
 --stringparam current.docid "MyDestinationBook" \
 /usr/share/sgml/docbook/xsl-stylesheets/html/docbook.xsl \
 destination.xml     

Index

To get an index the element <index> needs to be set in the place where the index should appear. To have items popping up, the element <indexterm> is used. This is actually a bit tricky since putting it right here shows nothing in the text. However what is put here invisible pops up in the index. A child element of the <indexterm> is <primary> that holds the text popping up in the index. A <primary> element can have multiple <secondary> child elements.

Language

The language of a document can be set as attribute on the root element of a docbook file:

<book lang="de">

This seems to do not a lot, but it gives the change to communicate the language to stylesheets and allows the stylesheets to select the proper language for automatic text added typically for formatting as chapter for English or Kapitel for German.

Splitting a docbookfile

If the docbook file comes to big it can be split in different docbook files. Every file contains a chapter. The parent files holds a list of all chapter files to be included:

Example 11.4. Split docbook

<!DOCTYPE book PUBLIC 
"-//OASIS//DTD DocBook XML V4.4//EN" "docbookV4.5/docbookx.dtd" [
<!ENTITY <fist chapter> SYSTEM "<filename first chapter>.xml">
<!ENTITY <second chapter> SYSTEM "filename second chapter>.xml">

<!ENTITY <variable name> "<productname><Variable data></productname>">

]>

&<first chapter>

&<variable name>

The entity allows also something as variables. Stuff that is repeated should be define there.

Since <!ENTITY just includes XML data into XML data the files included have to follow certain constraints:

  1. <!DOCTYPE is not allowed since it would appear twice in the master document

  2. Tags as <article> are also not allowed for the same reasons.

Serna-free allows to edit such docbook data as everything would be inside the same file. Additionally single include files can be edited.

Meta data

Metadata is data not visible in documents. If docbook is exported to html meta data should be created, since Internet search machines look for meta tag to put your web page in higher rankings.

There are two types of meta tags commonly of interest for this purpose:

The docbook <abstract> tag should hold a description, in a form of a single sentence. This tag will be exported as description meta data in html (set the stylesheet parameter 'generate.meta.abstract' to 1). It should be between 150 and not longer than 160 characters.

Note

The description meta tag is quite important since search machines show the description tag in the search result to the users.

Keywords for the meta data can be inserted between the <bookinfo>, <chapterinfo> or <sect*info> tags, where <keywordset> has to be selected, the individual keywords are inserted into the <keyword> tag. The keywords and will be converted to the keyword meta data in html. It is also possible to comma separate the keywords and having just one keywords element.

Note

The keywords meta tag is no more important for search machines.

<chapterinfo>
  <abstract>
    <para>Introduction to XML and Docbook</para>
  </abstract>
  <keywordset>
    <keyword>XML</keyword>
    <keyword>Docbook</keyword>
    <keyword>Serna</keyword>
  </keywordset>
</chapterinfo>

The keywords will be inherited from the parent pages. As example, a docbook book creates keywords for the book and a chapter section will create keywords for the chapter. So the chapter page(s) will have a first keyword element with the chapter keywords and than a second keyword element with the keywords from the book. This can be omitted by using

        <xsl:param name="inherit.keywords" select="0"></xsl:param>

Additionally for the description tag, xsltproc has been call with the parameter: --stringparam generate.meta.abstract 1

Docbook catalog file

When splitting files, the path the the files included has to be put in the parent file.

Each time you move a child file from one directory to an other, the parent file needs to be edited to know about it.

Change your docbook file as follows

<!ENTITY <my chapter> PUBLIC "my FPI">

A FPI looks as follows:

  1. "-//OASIS//DTD DocBook XML V4.5//EN"

  2. -// means it is not registered

  3. OASIS is the owner company, organization or person

  4. DTD is the keyword indicating the type of information (DTD, ELEMENT, TEXT)

  5. DocBook XML V4.5 is the description

  6. //EN is the language for the markup text and not its contents

Create or modify a catalog. The xml catalog is /etc/xml/catalog

Docbook editing

It is rather difficult to find a good GUI GPL Linux docbook editor. Possibilities are:

vex

Vex http://wiki.eclipse.org/Vexis gpl and based on the eclipse platform. With eclipse installed vex can be installed through the eclipse marketplace.

As usually in eclipse a project must be created, select just a generic project this puts a subdirectory under the workspace. Then you can put your docbook files there. To create select XML authoring> File. VEX is the visual XML editor, but you can open at the same time the regular XML editor (that allows you to view the elements, but also the raw characters).

Make sure you select the XML Authoring Perspective to get all the right windows open.

Figure 11.1. Vex

Vex


Vex can also be used to edit custom xml files as WYGIWYS.

  1. Create a new project via XML Authoring > Visual XML plug-in project type. This creates two file the hidden .project file and a file vex-plugin.xml.

  2. Now a DTD defining the structure and rules and a CSS defining the appearance must be put in this directory. The CSS must contain quite a bit to make a satisfiable result. Basically every element tag should have its definition.

  3. Those two file need to be registered in the vex-plugin.xml file that is best created using the GUI of VEX.

  4. First register the DTD file. This is done by selecting the DTD file and go to Properties > Visual XML Document Type. Fill in the fields, name and system ID get <filename>.dtd and the public ID something as -//DTD <name>//EN then press apply to see the list of elements. Select there the root element.

  5. Now register the css, by selecting it and go to Properties > Visual XML Document Type. The name is <filename>.css and select the corresponding css.

DEP4E

DEP4E is a eclipse docbook editing plug in.

xmlmind

Has a personal version and is a pure java application. The disadvantage to not be GPL brings the advantage that it is actively maintained. Download it from http://www.xmlmind.com/xmleditor/ and extract it. Since it is a java application it runs directly without having to be installed. Go to where the bin directory is and run:

./xxe &

It is available as free http://www.xmlmind.com/xmleditor/what_is_xxe.html but also as full featured commercial version.

Conglomerate

Is available in portage, but unfortunately no news after 2005 are there, so its probably dead. Create a docbook file using conglomerate is easy:

Figure 11.2. Conglomerate

Conglomerate


Serna-free

Serna is a WYSIWYG XML editor. Serna existed upto 2011 as free version. But now the free version is no more available. A gentoo ebuild can be found on my overlay. Since it is no more available the fetch from the internet will not work anymore, so the files have to be copied to /usr/portage/distfiles manually.

Figure 11.3. Serna-free

Serna Free


To start it type /opt/bin/serna

or create an icon on the desktop.

There is different levels of checks available, as default you can not make violations when you edit. But this can be a pain when you import and edit some file. In the menu item Edit => Validation move the level from strict to on to get less errors reported.

To use custom XML, you can work in text mode, as with any regular xml editor

Alternative Docbook editors

  1. Docbook can be read in XML capable editors as Bluefish, Screem.

  2. Emacs is also possible to be used but it has a horrible user interface and is mega complicated.

  3. Lyx aims to be used as well but it has no XML support, not nicely integrated, old lyx versions recommended. The latex fronted lyx supports export in docbook format and gentoo has even a docbook useflag to enable this support, however they consider docbook to be slightly bloated.

  4. Quanta could be an option to edit docbook files but it is connected too tight to kde, and therefore makes sense just for kde users.

  5. Xerlin and pollo are other applications that are also outdated and run badly.

  6. Create Wikitext and then convert it to Docbook with a tool as the eclipse plugin mylyn WikiText

  7. For OpenOffice there is a docbook template http://www.openoffice.org/xml/xmerge/downloads/DocBookTemplate.stw open it and edit the file with open office, when done it can be saved as docbook.

Converting Docbook via Sgml utils

This is the sample files where id attributes have been added to create nice file names when this file becomes split:

Example 11.5. Docbook

<?xml version="1.0"?> 
<!DOCTYPE book PUBLIC 
"-//OASIS//DTD DocBook XML V4.1.2//EN" "http://www.oasis-open.org/docbook/xml/4.1.2/docbookx.dtd">
<book id="HelloWorld" lang="en"> 
  <title>Hello World</title>
  <chapter id="Introduction">
    <title>Hello World</title>
    <para>This is my docbook Hello World document</para>
    <sect1 id="AboutThisBbook">
      <title>About this book</title>
      <para>This is my first DocBook file.</para>
    </sect1>
    <sect1 id="WorkInProgress">
      <title>Warning</title>
      <para>This is still under construction.</para>
    </sect1> 
  </chapter>
</book>

After emerge docbook-sgml-utils a lot of docboo2* commands are available.

Then run docbook2html hello.docbook to get full featured set of html pages where you can open index.html. Or choose an other format:

docbook2pdf hello.docbook

docbook2txt hello.docbook

docbook2rtf hello.docbook

Many editors as Openoffice can directly open the docbook xml files. However it can happen that not all features are supported and no pictures appear.

To put the results in a directory

docbook2html -o html hello.docbook

Create a link from where the pics are to a pics directory in the html directory otherwise the links to the pics will fail.

To have meaningful names for the sub pages, use the following to take the id's as filenames

docbook2html -o html -V "%use-id-as-filename%" hello.docbook

Since filenames become lower case, don't use uppercase or CamelCase in the id's.

To put everything in a single file

docbook2html -o html -u hello.docbook

To use a stylesheet

cp /usr/share/sgml/docbook/utils*/docbook-utils.dsl .

$ docbook2html -d docbook-utils.dsl#html myfile.docbook

Looking at the docbook2* man page, all are just jade wrapper scripts for docbook. They could also be called as jw to get HTML:

jw<my file>.docbook

A set of HTML pages are created that are linked with each other. The index.html page is the start.

Converting Docbook via XSLT

Since docbook is XML, XML tools can be used to convert docbook files. See also Publishing XML Documents.

The big advantage is that docbook stylesheets are available /usr/share/sgml/docbook/xsl-stylesheets/ http://www.sagehill.net/docbookxsl/index.html http://nwalsh.com/docs/articles/dbdesign/ that allow to convert efficiently to different formats. Those stylesheets can also be customized using parameters passed on command line. This depends on the xslt processor since it has to pass them to the stylesheets.

With xsltproc the parameter --stringparam indicated that such a parameter is used, the parameter itself contains of a name and a value. All parameters are described in http://docbook.sourceforge.net/release/xsl/current/doc/index.html

xsltproc ... --stringparam <parameter name> <parameter value> ...

The following makes html more human readable by adding cr in the right places: --stringparam chunker.output.indent yes

An other form is using css. Such a stylsheet can be applied on the command line: xsltproc --stringparam html.stylesheet style.css chunk.xsl myfile.xml

Every html page gets then a link to the stylesheet:

      <link rel="stylesheet" href="style.css" type="text/css"> 

The stylesheet must be found in the location of the html pages, or an other parameter must be used to tell where it is.

Customization Layer

Instead of listing all the parameters used in the command line they can also be put into a stylesheet

<?xml version='1.0'?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
  <xsl:import href=
"/usr/share/sgml/docbook/xsl-stylesheets/xhtml/chunk.xsl"/>
  <xsl:param name="base.dir" select="'../html/'"/>    
  <xsl:param name="chunk.first.sections" select="1"/>
  <xsl:param name="html.stylesheet" select="'style.css'"/> 
  <xsl:param name="chunker.output.encoding" select="'UTF-8'"/> 
  <xsl:param name="use.id.as.filename" select="1"/> 
  <xsl:param name="navig.graphics" select="1"/> 
  <xsl:param name="generate.meta.abstract" select="1"/> 
  <xsl:param name="chunker.output.indent" select="'yes'"/> 
</xsl:stylesheet> 

This is a simple form of adding a customizing layer to the standard XSL sylesheets, since the parameters overwrite the original stylesheets parameter settings. Having this stylesheet the command gets simple and short: xsltproc style.xsl index.xml

A web site should have a favicon, this is the favorite icon that pop up in the browsers corner and bookmarks. This can be inserted in html the same way

        <xsl:template name="user.head.content">
  <link xmlns="http://www.w3.org/1999/xhtml" 
  rel="shortcut icon" href="/favicon.ico"    
  type="image/x-ico" />
</xsl:template>

Custom HTML can be put around the navigation headers and footers

        <xsl:template name="user.footer.navigation">
  <a href="http://www.linurs.org/"><img src="/favicon.ico" alt="Linurs" border="0" /></a>
</xsl:template> 

user.<header or footer>.navigation goes on the outside and user.<header or footer>.content goes on the inside of the page.

Epub with Docbook

There is also a stylesheet to convert docbook to epub, the format for ebook readers. The command xsltproc /usr/share/sgml/docbook/xsl-stylesheets/epub3/chunk.xsl <my docbook>.xml does it. It creates two directories META-INF and OEBPS that contain the data plus the file mimetype. The pictures have to be included at the correct location. When everything is prepared go into the directory and do zip -r <my epub>.epub *

Note

firefox has two addons one is a epub reader the other a epub writer.

Note

Epub3 can also embed movies.

Man Pages with Docbook

Man pages can also be written using docbook, this helps to get an unique look and no need to bother about learning the man page syntax. Or the other way around man pages can be converted to docbook using doclifter http://www.catb.org/esr/doclifter/ this helps also to get familiar how man pages look in docbook. Doclifter is a python script that can also be run without bothering about package installation. Docbook man pages can be converted to man pages using xsltproc and the man page stylesheet found in /usr/share/sgml/docbook/xsl-stylesheets/manpages/docbook.xslHowever there is the <reference> root element to be taken to start writing manpages. The <reference> element includes one or many <refentry> element. The <refentry> serves as a man page for a particular section, so the docbook file can hold a collection of all sections in a single file. However doclifter does not use <reference> as root element when converting a single man page but uses instead <refentry> as root element. The docbook file can then be converted to a man page using: xsltproc /usr/share/sgml/docbook/xsl-stylesheets/manpages/docbook.xsl <myman>.xml

Note, there is no need to specify the output file name, since this is chosen automatically.

Microsoft Word with Docbook

Wordml is an xml format that Microsoft Word understands. See http://www.explain.com.au/oss/docbook/. It makes use of the stylesheets found in /usr/share/sgml/docbook/xsl-stylesheets/roundtrip. Those stylesheets allow both converting WordML in Docbook and converting Docbook in WordML It supports also to add a WordML template. For the list of supported docbook element see http://www.explain.com.au/oss/docbook/supported.html Currently <graphic> is not supported, so the conversation looses the pictures, but inserts a comment where a picture is missing.

To start it:

xsltproc -o <my-word>.xml --stringparam wordml.template /usr/share/sgml/docbook/xsl-stylesheets/roundtrip/template.xml /usr/share/sgml/docbook/xsl-stylesheets/roundtrip/dbk2wordml.xsl <my-doc>.xml

Alternatively docbook can be converted in FO and the using the fo converter from http://www.xmlmind.com/foconverter/what_is_xfc.html to convert it to rtf, docx (or odt for libreoffice)

Webpages with Docbook stylesheets

The stylesheets coming with docbook allow to create webpages that are linked together. It uses the <webpage> element that is not a valid docbook element additionally not all docbook elements are supported. http://docbook.sourceforge.net/release/website/example/index.html contains additional documentation and shows how such web sites will look like.

Slides with Docbook stylesheets

The stylesheets coming with docbook allow to make a slide show with navigation icons similar as power point, but there is no need to have a program installed, since it runs entirely in the web browser (except if the pdf version is selected). In the web browser go in in Full screen mode. It uses special elements that are not a valid docbook elements.

The documentation: http://docbook.sourceforge.net/release/slides/current/doc/

Introduction with example: http://www.miwie.org/presentations/html/dbslides.html

Stylesheet parameters: http://docbook.sourceforge.net/release/xsl/current/doc/slides/index.html


Linurs Servernest startpage