Documentation Generation

Organisation:Copyright (C) 2022-2022 Olivier Boudeville
Contact:about (dash) howtos (at) esperide (dot) com
Creation date:Wednesday, January 12, 2022
Lastly updated:Friday, April 8, 2022

Objective

We want to be able to generate, from a single source, at least two documentation formats:

The document source shall be expressed in a simple, non-limiting, high-level syntax; in practice a rather standard, lightweight markup language.

All standard documentation elements shall be available (ex: title, tables, images, links, references, tables of content, etc.) and be customisable.

The resulting documents shall be quickly and easily generated, with proper error report, and be beautiful and user-friendly (ex: with well-configured LaTeX, with appropriate CSS, icons and features like banners, with proper rendering of equations).

Per-format overriding shall be possible (ex: to define different image sizes depending on web output or PDF one).

The whole documentation process shall be powered only by free software solutions, easily automated (ex: with Make) and suitable for version control (ex: with Git).

Technical Details

Rendering Mathematical Elements

With the RST toolchain, the PDF output, thanks to LaTeX, offers built-in high-quality rendering of mathematical elements such as equations, matrices, etc.

By default, the HTML output does not benefit from LaTeX, and remains significantly less pleasing to the eye, and less readable.

So we complement it by MathJax, a neat open-source "JavaScript display engine for mathematics that works in all browsers".

It shall thus be installed once for all first. For example, on Arch Linux, as root, it is sufficient to execute:

$ pacman -Sy mathjax

Then, to enable the use of MathJax for a given website based on Ceylan-Myriad, run from its root (often a doc directory):

make create-mathjax-symlink

(this target is defined in GNUmakerules-docutils.inc; it boils down to symlinking /usr/share/mathjax; see also the HOWTOs corresponding makefile to properly manage this dependency afterwards, notably when deploying web content)

The list of TeX/LaTeX commands supported by MathJax may be of use.

A few examples of resulting math-related outputs can be seen for example in this section.

Title Hierarchy

It must be consistent: a given type of subtitle must always be placed at the same level in the title hierarchy.

We rely on the markup conventions exposed in this demonstration file (created by David Goodger), whose source is here.

From the top-level title to the most nested ones:

  • =, on top and below the title (document title)
  • -, on top and below the title (document subtitle)
  • =, below the title (H1)
  • -, below the title (H2)
  • . , below the title (H3)
  • _, below the title (H4)
  • *, below the title (H5)
  • :, below the title (H6)
  • +, below the title (H7)

Image Sizes

Responsive images, i.e. images that automatically adjust to fit the size of the screen, can be used. They are then defined for example thanks to:

<img src="foobar.png" id="responsive-image-large"></img>

Various standard sizes have been defined, all prefixed with responsive-image-; from the biggest (95%) to the smallest (10%), as defined for example in myriad.css, they are: full, large, intermediate, medium, reduced, small, tiny, xsmall.

Multi-File Documents

Targeting a Standalone Document

Although they tend to be less convenient to edit, longer documents may be split in a set of RST source files (the Myriad documentation is an example of it; the WOOPER documentation is an example of the opposite approach, based on a single source file).

Targeting Interlinked Modular Documents

In some cases, at least for the HTML output, the need is not to produce a single, large, monolithic document, but a set of interlinked ones (the present HOWTO is an example thereof) that can be browsed as separate pages.

Then a convenient approach is to define different entry points for different output formats, like, for these HOWTOs, this one for the HTML output and this one for the PDF output.

Validating / Checking

In addition to the verification of the messages reported when the document is built, some tools allow to perform some checks on a generated document.

Notably an online HTML page, or set of pages, can be verified by third-party tools like this one, to detect dead links.

Pointing to a Specific Moment in a Linked Video

It is as simple as designating, in an HTML link, the targeted second by suffixing the URL video filename with #t=DURATION_IN_SECONDS, like in some-video.mp4#t=1473 [1].

[1]With mplayer, use the o hotkey to display elapsed durations.

Miscellaneous

Conversion between Markup Formats

Pandoc is the tool of choice for such operations, as it often yields good results.

For example, in order to convert a page written in Mediawiki syntax, whose source content has been pasted in a old-content-in-mediawiki.txt file, into one that be specified in a GitLab wiki (hence in GFM markup, for GitLab Flavored Markdown) from a converted content, to be written in a converted-content.gfm file, one may use:

$ pandoc old-content-in-mediawiki.txt --from=mediawiki --to=gfm --standalone -o converted-content.gfm

# Or, for older versions of pandoc not supporting a gfm writer:
$ pandoc old-content-in-mediawiki.txt --from=mediawiki --to=markdown_github --standalone -o converted-content.gfm

Then the content in converted-content.gfm file can be pasted in the target GitLab wiki page.

Another example is the conversion of a GitLab wiki page into a RST document (ex: then for a PDF generation):

$ pandoc my-gitlab-wiki-extract.gfm --from=gfm --to=rst --standalone -o my-converted-content.rst

# Or, for older versions of pandoc not supporting a gfm reader:
$ pandoc my-gitlab-wiki-extract.gfm --from=markdown_github --to=rst --standalone -o my-converted-content.rst

Finally, if really needing to generate a Word document, an example may be:

$ pandoc my-document.rst --from=rst --to=docx -o my-converted-document.docx

The lists of the input and output formats supported by Pandoc and of their corresponding command-line options is specified here.

These options are also returned by: pandoc --list-input-formats and pandoc --list-output-formats (or, for older versions of pandoc, thanks to pandoc --help).

An input file may not be encoded in UTF-8, which can result in:

pandoc: Cannot decode byte '\xe9': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream

In this case, the actual encoding shall be determined, for example with:

$ file input.html
input.html: HTML document, ISO-8859 text

Then the encoding may be changed before calling pandoc, for example like:

$ iconv -f ISO-8859-1 -t utf-8 input.html | pandoc --from=html --to=markdown_github --standalone -o output.gfm

Transformation of PDF files

For that, one may use the pdftk tool:

  • to concatenate PDFs: pdftk 1.pdf 2.pdf 3.pdf cat output 123.pdf
  • to split all pages of a PDF in as many individual files (named pg_0001.pdf, pg_0002.pdf, etc.): pdftk document.pdf burst

Image Transformations

One may rely on:

  • GIMP (GNU Image Manipulation Program; corresponding, on Arch, to the gimp package)
  • or on command-line ImageMagick (on Arch, the imagemagick package, which provides notably the convert and display executables)

To invert/negate an image (swap colors with their complementary ones):

$ convert source.png -negate target.png

UML Diagrams

If SysML can also be of interest, we focus here on UML2 class diagrams (one of the 14 types of diagrams provided by UML2).

Quick UML Cheat Sheet

Multiplicities

A multiplicity is a definition of cardinality (i.e. number of elements) of some collection of elements.

It can be set for attributes, operations, and associations in a class diagram, and for associations in a use case diagram. The multiplicity is an indication of how many objects may participate in the given relationship.

It is defined as an inclusive interval based on non-negative integers, with * denoting an unlimited upper bound (not, for example, n).

Most common multiplies are:

  • no instance or one instance: 0..1
  • any number of instances, including zero: * (shorthand for 0..*)
  • exactly k instances: k (so, if k=5, 5)
  • at least M instances: M..* (2..*)
  • at least M instances, but no more than N (hence bounds included): M..N (e.g. 3..5)

For associations, the default multiplicity is automatically is 0..1, while new attributes and operations have a default multiplicity of 1.

Association

An association is a relation between two classes (binary association) or more (N-ary association) that describes structural relationships between their instances.

For example a polygon may be defined from at least 3 vertices that it would reference, whereas a point may take part to any number of polygones (including none):

(see the sources of this diagram)

The multiplicity of an endpoint denotes the number of instances of the corresponding class that may take part to this association. For example, at least 3 points are needed to form a polygon, whereas any number of polygons can include a given point.

In UML the direction of the association is easily ambiguous (here we have to rely on external knwoledge to determine whether a polygon is composed of points, of if a point is composed of polygons). Adding a chevron (like > or <, e.g. "references >" ; ideally this should be a small solid triangle) to the text is not a good solution either, as the layout may place the respective endpoints in any relative position. Adding an arrow to the end of the line segment cannot be done either, as it would denote the navigability of the association instead.

Aggregation

An aggregation is a specific association that denotes that an instance of a class (ex: Library) is to loosely contain instances of another class (ex: Book), in the sense that the lifecycle of the contained classes is not strongly dependent on the one of the container (ex: books will still exist even if the library is dismantled).

Here a library may contain any number of books (possibly none), and a given book belongs to at most one library.

(see the sources of this diagram)

Composition

A composition is a specific association that denotes that an instance of a class (ex: HumanBeing) is to own instances of another class (ex: Leg), in the sense that the lifecycle of the contained classes fully depends on the one of the container (here, if a human being dies, his/her legs will not exist anymore either).

Here a human being has exactly 2 legs, and any given leg belongs to exactly one human being (therefore this model does not account for one-legged persons).

(see the sources of this diagram)

Inheritance

An inheritance relationship is a specific association that denotes that a class (ex: HumanBeing) is a specific case of a more general one (ex: Animal), and thus that an instance of the first one is also an instance of the second one ("is-a" relationship).

Here a human being is a specific animal.

(see the sources of this diagram)

Tooling

In a design phase, one may prefer lightweight tools like Graphviz, PlantUML or even Dia.

As long as the architecture of a framework is not stabilised, having one's tool determine by itself the layout of the rendering (rather than having to place manually one's graphical components) is surely preferable.

For that we use Graphviz, with our own build conventions.

For example, supposing this diagram example, i.e. a source file named uml_class_diagram_example.graph:

$ make uml_class_diagram_example.png
# or, to force a regeneration and a displaying of the result:
$ make clean uml_class_diagram_example.png VIEW_GRAPH=true

This example results in the following diagram:

(see the sources of this diagram)