Organisation: | Copyright (C) 2022-2024 Olivier Boudeville |
---|---|
Contact: | about (dash) howtos (at) esperide (dot) com |
Creation date: | Wednesday, January 12, 2022 |
Lastly updated: | Sunday, June 23, 2024 |
The goal is to generate nice documentations of any kind (not necessarily technical), as static content - as opposed to wikis or content management systems (CMS).
We want to be able to generate, from a single source, at least two documentation formats:
The document source shall be expressed in a simple, non-limiting, high-level syntax; in practice a rather standard, lightweight markup language.
All standard documentation elements shall be available (e.g. title, tables, images, links, references, tables of content, etc.) and be customisable.
The resulting documents shall be quickly and easily generated, with proper error report, and be beautiful and user-friendly (e.g. with well-configured LaTeX, with appropriate CSS, icons and features like banners, with proper rendering of equations).
Per-format overriding shall be possible (e.g. to define different image sizes depending on web output or PDF one).
The whole documentation process shall be powered only by free software solutions, easily automated (e.g. with Make) and suitable for version control (e.g. with Git).
For that we rely on two possible approaches, a lighter one and a more involved one, depending on the project at hand.
We found this approach convenient for lighter projects, i.e. ones comprising a limited number of pages. This is the case of this page and more generally of the whole Ceylan-HOWTO website.
We chose to rely on the reStructuredText syntax and tools, also known as RST, a part of the Docutils project. Here we do not specifically rely on elements related to Python or the Sphinx toolchain, as our more heavy-duty approach does.
We augmented reStructuredText with:
Of course this website, and many others that we created, rely on this approach; as an example, one may look at the sources of the current document.
With the RST toolchain, the PDF output, thanks to LaTeX, offers built-in high-quality rendering of mathematical elements such as equations, matrices, etc.
By default, the HTML output does not benefit from LaTeX, and remains significantly less pleasing to the eye, and less readable.
So we complement it by MathJax, a neat open-source "JavaScript display engine for mathematics that works in all browsers".
It shall thus be installed once for all first.
For example, on Arch Linux, as root, it is sufficient to execute:
$ pacman -Sy mathjax
If not having root permissions, it can be installed directly in one's user account, for example:
$ cd /tmp && git clone https://github.com/mathjax/MathJax.git $ mv MathJax/es5 ~/Software/MathJax
Then, to enable the use of MathJax for a given website based on Ceylan-Myriad, run from its root (often a doc directory):
$ make create-mathjax-symlink
(this target is defined in GNUmakerules-docutils.inc; it boils down to symlinking /usr/share/mathjax; see also the HOWTOs corresponding makefile to properly manage this dependency afterwards, notably when deploying web content)
Yet, depending on settings and conventions, updating MathJax in a web root may lead to permission errors; in that case the next approach shall be favored.
Just install MathJax directly in your user account (e.g. in ~/Software/mathjax), follow the guidelines in the Fixing Permissions in Third-Party Content to Integrate in a Web Root section, and add symlinks to the result in all documentation trees requiring MathJax.
For that, rather than installing MathJax by oneself (as we found its website rather unclear about how to install it when not in a Node.JS context) or possibly taking inspiration from this PKGBUILD, the simplest way is to install it from one's package manager (e.g. pacman -Sy mathjax) and to copy the result in one's account: cp -r /usr/share/mathjax ~/Software/, before fixing permissions there. This local copy shall just be regularly updated.
This is typically generating an image file file from a LaTeX formula to include in a presentation, an e-mail, any kind of post, etc.
tex2svg is the tool of choice here.
On Arch Linux, the texlive-bin (for pdflatex) and pdf2svg packages may have to be installed.
We do not think this is the best approach as the resulting bitmap file is likely to have issues in terms of rendering/aliasing.
This can be done thanks to tex2png, a simple yet effective Bash script.
At least the texlive-fontsrecommended Arch package shall be installed beforehand, so that the lmodern.sty file is available.
Example of use: ./tex2png -c "$\phi_n(\kappa) = \frac{1}{4\pi^2\kappa^2}$" -T -D 500 -o my-example.png.
The list of TeX/LaTeX commands supported by MathJax may be of use.
Each LaTeX command may either be specified directly inline, in the text (with :math:`LATEX_CMD`) or in a block indented after a .. math:: directive.
This allows to define inline mathematical elements, like \(P = \begin{pmatrix} 10 \\ 45\end{pmatrix}\) (obtained with P = \\begin{pmatrix} 10 \\\\ 45\\end{pmatrix}) or standalone ones, like:
obtained thanks to:
M = \begin{bmatrix} a11 & a12 & ... & a1n \\ a21 & a22 & ... & a2n \\ ... & ... & ... & ... \\ am1 & am2 & ... & amn \\ \end{bmatrix}
For \(\phi: \mathbb{R} \rightarrow ]0,1[\) (i.e. \phi: \mathbb{R} \rightarrow ]0,1[), we may have \(P_e = \phi(m+\phi^{-1}(P_n))\) is P_e = \phi(m+\phi^{-1}(P_n)).
If \(\phi(x) = e^{x}/(1+e^{x})\) (translating to \phi(x) = e^{x}/(1+e^{x})), then:
(translating to P_e = \frac{P_n.e^{m}}{1 + P_n.(e^{m}-1)})
A few other examples of resulting math-related outputs can be seen in this section.
See the next section for a proper use of MathJax in webservers.
It must be consistent: a given type of subtitle must always be placed at the same level in the title hierarchy.
We rely on the markup conventions exposed in this demonstration file (created by David Goodger), whose source is here.
From the top-level title to the most nested ones:
Responsive images, i.e. images that automatically adjust to fit the size of the screen, can be used. They are then defined for example thanks to:
<img src="foobar.png" id="responsive-image-large"></img>
Various standard sizes have been defined, all prefixed with responsive-image-; from the biggest (95%) to the smallest (10%), as defined for example in myriad.css, they are: full, large, intermediate, medium, reduced, small, tiny, xsmall.
Although they tend to be less convenient to edit, longer documents may be split in a set of RST source files (the Myriad documentation is an example of it; the WOOPER documentation is an example of the opposite approach, based on a single source file).
In some cases, at least for the HTML output, the need is not to produce a single, large, monolithic document, but a set of interlinked ones (the present HOWTO is an example thereof) that can be browsed as separate pages.
Then a convenient approach is to define different entry points for different output formats, like, for these HOWTOs, this one for the HTML output and this one for the PDF output.
Defining any title (e.g. the "Rendering Mathematical Elements" one above) automatically introduces in turn a corresponding anchor, which, for the HTML output, can then be referenced from any page, for example as raw HTML (like MyPage.html#rendering-mathematical-elements, or directly from the current page as #rendering-mathematical-elements) or directly through RST in the document (e.g. specified as `Rendering Mathematical Elements`_, resulting in: Rendering Mathematical Elements).
Note the light transformation (spaces becoming dashes) of the specified name once a it is translated into a legit HTML anchor.
Extra local anchors (e.g. that could be named "how to render equations") can also be specified anywhere in the document (e.g. just before the previously mentioned title, so that it can be designated with other words), thanks to:
.. _`how to render equations`:
It can then be referenced from the same page as #how-to-render-equations or from another one as MyPage.html#how-to-render-equations.
Note that titles and hypertext links introduce local links as well, so one's inner links may clash with them (resulting in (ERROR/3) Duplicate target name [...]); the best option is generally to phrase these inner links differently.
Engineering generally relies often on the IEEE or the APA citation style.
We dislike a bit IEEE, as its references are just numbers (e.g. [1] in Lindberg and Lee [1]), instead of more informative elements (like [CIT2002]), so we chose APA, which seems more in-line with RST conventions.
Within APA, we prefer parenthetical citations (e.g. (Salas & D’Agostino, 2020)) to narrative ones (e.g. Salas and D’Agostino (2020)).
Here are extra examples thereof:
Authors' Last name, First Initial. (Year). Book title: Subtitle. (Edition) [if other than the 1st]. Publisher
Unfortunately neither types of APA citations is supported (possibly because of the parentheses and/or the space, the text is not interpreted as a citation. So for example [(Taylor, 2005)]_ / .. [(Taylor, 2005)] will not work.
So we finally retained conventions that are a bit different: APA with no parentheses or space.
Finally an actual example just follows: like explained in [Taylor2005], the conservation of momentum can help solving elegantly some problems.
[Taylor2005] | Taylor, J, (2005), Classical Mechanics, (2005), University Science Book, 98-100 |
To comment-out a block of text, just add .. at the beginning of a line, then, from the next line, put that block, indented of at least one space; this must be a legit block (see Defining Blocks).
All lines of a block shall start with the same whitespace. So, whenever a given block is not left-justified (meaning that at least one of its lines starts with a different offset), prefer having all lines of such a block be indented of (at least) 4 spaces (i.e. a tabulation).
Otherwise, if using a single space to indent, as soon as a line of the block is to start with 3 spaces, whitespace-cleanup operations will combine them with the first one to form a tabulation (4 spaces in a row), and all lines of the commented block will not start with the same whitespaces, which could result, from the point of view of RST tools, in an invalid block.
To indent on Emacs, one may select the region of interest and then hit C-x TAB TAB TAB [...] (or even just TAB once the block is selected).
Either a standard or a code indented block may be used.
A standard block is introduced by a non-indented text finishing with two colons (::), like in: Here is what she said::.
Note
Unfortunately this does not allow proper French syntax, for which a space is needed before the colons (e.g. typed as Elle a dit ensuite ::): adding such a space will result in no colon to be displayed.
So we stick to writing an improper Elle a dit ensuite::, which is rendered as Elle a dit ensuite:, better than Elle a dit ensuite but worse than Elle a dit ensuite :.
Before a code block (e.g. introduced with .. code:: erlang), a single colon should be used, not two of them. For example:
This algorithm can be: .. code:: erlang [...]
This typically happens whenever defining a link with a given label more than once.
A solution is to use anonymous reference instead, i.e. double underscores (so: __) to define references, like in:
Here is `some link <http://example.org/xxx>`__.
Using double underscores for links could be considered the norm.
It applies to more ambitious projects, involving larger content with potentially many interlinked pages.
It is based on the Sphinx toolchain, and therefore shares many elements with our lightweight approach above, starting from the RST syntax.
It offers out of the box many useful mechanisms beyond the generation of a single-page website, from a smooth navigation between a set of pages based on foldable menus to a generated index and a local search engine. It moreover can be easily customised.
More types of outputs are readily supported: HTML, PDF, EPUB, man pages.
One may follow these guidelines. On Arch Linux, we install the python-sphinx package, thanks to: pacman -Sy python-sphinx gnu-free-fonts texlive-binextra (last package being needed now for latexmk, to generate PDFs).
Running sphinx-quickstart is one's best route; in terms of choices:
Then make html should generate a base website (including makefiles) that can be browsed in the build/html directory (use make clean to force its erasure; run just make to list all targets of interest, including the linkcheck one, to check all external links for integrity).
The project settings can be edited in source/conf.py.
The default (HTML) theme is the Alabaster one. Other themes may be preferred, whether they are Sphinx built-ins or external ones. Many can be customised.
As for us, we prefer mobile-compliant themes with a left column to navigate. This includes the popular Read the Docs theme, which will use here, but also the classic theme.
The documentation of the Read the Docs theme details everything needed. For an installation thereof on Arch, the simplest is to run pacman -Sy python-sphinx_rtd_theme.
Then it is just a matter of editing source/conf.py so that 'sphinx_rtd_theme' is listed in the extensions list, and that the html_theme is now set to 'sphinx_rtd_theme'.
Running make html again should be sufficient to take this new theme into account. The generated result is quite satisfying.
Here is one customisation thereof (theme options) that we like:
#html_theme = 'alabaster' html_theme = 'sphinx_rtd_theme' html_theme_options = { #'analytics_id': 'G-XXXXXXXXXX', # Provided by Google in your dashboard #'analytics_anonymize_ip': True, 'logo_only': False, 'display_version': True, 'prev_next_buttons_location': 'both', 'style_external_links': True, 'vcs_pageview_mode': '', #'style_nav_header_background': 'white', #'style_nav_header_background': '#2980B9', 'style_nav_header_background': 'black', # Toc options 'collapse_navigation': False, 'sticky_navigation': True, 'navigation_depth': 4, 'includehidden': True, 'titles_only': False } html_logo = '../foobar-title.png' html_favicon = '../foobar-icon.png' html_last_updated_fmt = '' html_copy_source = False html_show_sourcelink = False html_show_sphinx = False html_static_path = ['_static'] # So that inner cross-RST file references can be found: default_role = 'any'
This is just a matter of adding *.rst files (each defining at least one title) in the source directory, and to reference them in a least one table of contents (e.g. in source/index.rst), like in:
.. toctree:: :maxdepth: 2 :caption: Contents: ./foobar.rst ./buz.rst
The result in a static website that can safely be transferred and served by any webserver of choice.
In our preferred approach, described here and there, such inner links - which are possibly defined and referenced in different RST files - are to be managed in a slightly different way than for Docutils's inner links:
We would have liked to organise a Sphinx document according to a filesystem tree, so that there is in each directory a RST file that comprises the content for that level and that just lists its direct local children as the RST files in its subdirectories, as relative files, like, in a a/b/c/c.rst file: .. include:: d/d.rst.
Strangely enough, it worked for 3-level nesting (a, a/b and a/b/c), but not for the next level: even though the RST files in c where included as d/d.rst, they were never found:
a/b/c/c.rst:4: CRITICAL: Problems with "include" directive path: InputError: [Errno 2] No such file or directory: 'a/b/d/d.rst'.
(we can see that c is lacking; no way of adding it once; and trying there to specify .. include:: c/d/d.rst results in a/b/c/c/d/d.rst not being found...)
The only (unsatisfactory) solution we found it is specify paths that are "absolute" (i.e. relative to the root of the tree), for example as .. include:: /a/b/c/d/d.rst.
These hints apply more generically than only with a RST toolchain.
In addition to the verification of the messages reported when the document is built, some tools allow to perform some checks on a generated document.
Notably an online HTML page, or set of pages, can be verified by third-party tools like this one, to detect dead links.
The objective is to ensure that a filesystem tree can be transferred as a whole without permission errors to a given server more than once, whereas the destination user and group (typically specialised and restricted on a server) differ from the source ones.
It is indeed often necessary to fix permissions in a third-party tree before it is transferred to a server (e.g. MathJax being copied from a client host through scp in a web root on a given server); otherwise the next transfers will stumble on the initial, inadequate group rights, typically preventing them to be overwritten by a process belonging to a different user yet being in the same group, like 700 instead of 770 - resulting in Permission denied errors.
For that we recommend executing our fix-www-metadata.sh script in the source root prior to transfer it to a webroot of choice. This typically applies to MathJax (see the Rendering Mathematical Elements section), which therefore should not be installed in the system tree thanks to a package manager (as our script will alter its permissions) but in the user account (e.g. as ~/Software/mathjax).
So typically this script shall be symlinked in each third-party root of interest, and be executed there at each update thereof.
However doing so does not solve all issues: the file entries created/updated by scp on the server will be owned by the user on the server implied by the scp command, for example stallone:users - not the desired web-srv-user:web-srv-group. Of course the fix-www-metadata.sh script can be run (by root) on the server to correct that. Yet the next update of this webroot will fail again with Permission denied errors, as the groups are not expected to match anymore (we cannot overwrite with our client-side users group a remote directory whose permission is 770 that is owned by group web-srv-group).
A solution is to ensure that the source content bears already the target group. As scp relies on user/group IDs, not on names (e.g. on a numerical GID like 1001, not a name like users or web-srv-group), the simplest solution is to determine the actual GID of the target group on the server (e.g. running, as root, grep web-srv-group /etc/group may tell us that the GID of web-srv-group is 1002 there) and to create on the client a group with the same GID (if ever possible - that is if there is not already another group happening to have be set to that GID) and to apply it to the source content to transfer, like in:
# We are on the "client", source host, as root: # (web-srv-group-of-target-server clearer than web-srv-group) $ groupadd --gid 1002 web-srv-group-of-target-server $ usermod -a -G web-srv-group-of-target-server stallone $ chgrp -R web-srv-group-of-target-server /home/stallone/mathjax
Then, afterwards, when the stallone user performs his scp repeatedly to transfer updated versions of his mathjax directory from the client to the server, he should be able to perform a flawless update of its files and directories.
It is as simple as designating, in an HTML link, the targeted second by suffixing the URL video filename with #t=DURATION_IN_SECONDS, like in some-video.mp4#t=1473 [1].
[1] | With mplayer, use the o hotkey to display elapsed durations. |
Pandoc is the tool of choice for such operations, as it often yields good results.
For example, in order to convert a page written in Mediawiki syntax, whose source content has been pasted in a old-content-in-mediawiki.txt file, into one that be specified in a GitLab wiki (hence in GFM markup, for GitLab Flavored Markdown) from a converted content, to be written in a converted-content.gfm file, one may use:
$ pandoc old-content-in-mediawiki.txt --from=mediawiki --to=gfm --standalone -o converted-content.gfm # Or, for older versions of pandoc not supporting a gfm writer: $ pandoc old-content-in-mediawiki.txt --from=mediawiki --to=markdown_github --standalone -o converted-content.gfm
Then the content in converted-content.gfm file can be pasted in the target GitLab wiki page.
Another example is the conversion of a GitLab wiki page into a RST document (e.g. then for a PDF generation):
$ pandoc my-gitlab-wiki-extract.gfm --from=gfm --to=rst --standalone -o my-converted-content.rst # Or, for older versions of pandoc not supporting a gfm reader: $ pandoc my-gitlab-wiki-extract.gfm --from=markdown_github --to=rst --standalone -o my-converted-content.rst
Finally, if really needing to generate a Word document, an example may be:
$ pandoc my-document.rst --from=rst --to=docx -o my-converted-document.docx
The lists of the input and output formats supported by Pandoc and of their corresponding command-line options is specified here.
These options are also returned by: pandoc --list-input-formats and pandoc --list-output-formats (or, for older versions of pandoc, thanks to pandoc --help).
An input file may not be encoded in UTF-8, which can result in:
pandoc: Cannot decode byte '\xe9': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream
In this case, the actual encoding shall be determined, for example with:
$ file input.html input.html: HTML document, ISO-8859 text
Then the encoding may be changed before calling pandoc, for example like:
$ iconv -f ISO-8859-1 -t utf-8 input.html | pandoc --from=html --to=markdown_github --standalone -o output.gfm
For that, one may use the pdftk tool, possibly with the convert one, which comes from ImageMagick (typically available thanks to a imagemagick package):
PDF documents may contain images/scans (possibly of texts) and/or actual, raw texts. If a PDF is a scan, OCR (Optical character recognition) can be used in order to convert the embedded scans into their actual text. Such a transformation can be done online, and we found PDF24 very useful for that. From such services, usually a PDF (thus including text instead of images) is generated. To obtain a text version thereof respecting its layout (typically to preserve the indentation of a scanned program), one may use: pdftotext -layout my-OCRed-document.pdf in order to enjoy a proper my-OCRed-document.txt.
One may rely on:
To invert/negate an image (swap colors with their complementary ones, while preserving alpha coordinates):
$ convert source.png -channel RGB -negate target.png
See also the Myriad's automatic rules, which generate X-negated.png from X.png thanks to: make X-negated.png.
Let's suppose that a SVG file is available (for example obtained thanks to our latex-to-image.sh script).
Going for a PNG of a width of 1000 pixels while selecting a high-enough DPI, preserving the aspect ratio and keeping the background transparent:
$ convert -density 1200 -size 1000 -background none latex-formula.svg target.png
Setting a background to a solid color (e.g. white) may allow, when adding a border, to have its color applied only on that border (rather than on the full background).
For example to add a 10-pixel wide / 5-pixel tall red border to an image:
$ magick source.png -bordercolor red -border 10x5 target.png
A more classical 2-pixel thick black border:
$ magick source.png -bordercolor black -border 2 target.png
Let's suppose we have an overall, larger image (e.g. my-overall-plot.png) onto which we want to composite / blit a smaller one (e.g. my-formula.png) at position (100,150) - in pixels, relatively to the top-left corner - with no specific scaling:
$ magick composite my-formula.png my-overall-plot.png -geometry +100+150 target.png
Positioning the inner image based on "gravity" (preset areas, based on various possible origins; see magick -list gravity) is often convenient; for example relatively to the bottom-right corner of the final image (knowing that positive axes are then, for "SouthEast", the opposite of the default ones - positive coordinate offsets have therefore to be specified in order to go towards the center):
$ magick composite my-formula.png my-overall-plot.png -gravity SouthEast -geometry +50+200 target.png
Or simply to have the inner image centered into the overall one:
$ magick composite my-formula.png my-overall-plot.png -gravity Center target.png
See also our affix-images.sh script.
Refer to our data display section.
If SysML can also be of interest, we focus here on UML2 class diagrams (one of the 14 types of diagrams provided by UML2).
A multiplicity is a definition of cardinality (i.e. number of elements) of some collection of elements.
It can be set for attributes, operations, and associations in a class diagram, and for associations in a use case diagram. The multiplicity is an indication of how many objects may participate in the given relationship.
It is defined as an inclusive interval based on non-negative integers, with * denoting an unlimited upper bound (not, for example, n).
Most common multiplicities are:
For associations, the default multiplicity is automatically is 0..1, while new attributes and operations have a default multiplicity of 1.
An association is a relation between two classes (binary association) or more (N-ary association) that describes structural relationships between their instances.
For example a polygon may be defined from at least 3 vertices that it would reference, whereas a point may take part to any number of polygones (including none):
(see the sources of this diagram)
The multiplicity of an endpoint denotes the number of instances of the corresponding class that may take part to this association. For example, at least 3 points are needed to form a polygon, whereas any number of polygons can include a given point.
In UML the direction of the association is easily ambiguous (here we have to rely on external knwoledge to determine whether a polygon is composed of points, of if a point is composed of polygons). Adding a chevron (like > or <, e.g. "references >" ; ideally this should be a small solid triangle) to the text is not a good solution either, as the layout may place the respective endpoints in any relative position. Adding an arrow to the end of the line segment cannot be done either, as it would denote the navigability of the association instead.
An aggregation is a specific association that denotes that an instance of a class (e.g. Library) is to loosely contain instances of another class (e.g. Book), in the sense that the lifecycle of the contained classes is not strongly dependent on the one of the container (e.g. books will still exist even if the library is dismantled).
Here a library may contain any number of books (possibly none), and a given book belongs to at most one library.
(see the sources of this diagram)
A composition is a specific association that denotes that an instance of a class (e.g. HumanBeing) is to own instances of another class (e.g. Leg), in the sense that the lifecycle of the contained classes fully depends on the one of the container (here, if a human being dies, his/her legs will not exist anymore either).
Here a human being has exactly 2 legs, and any given leg belongs to exactly one human being (therefore this model does not account for one-legged persons).
(see the sources of this diagram)
An inheritance relationship is a specific association that denotes that a class (e.g. HumanBeing) is a specific case of a more general one (e.g. Animal), and thus that an instance of the first one is also an instance of the second one ("is-a" relationship).
Here a human being is a specific animal.
(see the sources of this diagram)
In a design phase, one may prefer lightweight tools like Graphviz, PlantUML or even Dia.
As long as the architecture of a framework is not stabilised, having one's tool determine by itself the layout of the rendering (rather than having to place manually one's graphical components) is surely preferable.
For that we use Graphviz, with our own build conventions.
For example, supposing this diagram example, i.e. a source file named uml_class_diagram_example.graph:
$ make uml_class_diagram_example.png # or, to force a regeneration and a displaying of the result: $ make clean uml_class_diagram_example.png VIEW_GRAPH=true
This example results in the following diagram:
(see the sources of this diagram)
Assets can be found thanks to Creative Commons, which references, among others, OpenClipart, otherwise possibly publicdomainvectors.org.
General purposes elements:
The simplest approach is to rely on SVG files, to edit them on Inkscape and possibly to export a selection of them as PNG files of the desired size.
One may use websites like Dafont in order to select, based on appearance and licence, a given TTF font.
At least on Arch, it is sufficient to copy the corresponding downloaded TTF file (as root) in /usr/share/fonts/ so that tools like The Gimp support it right afterwards (e.g. no need to run fc-cache beforehand).