Markup Languages in Software Documentation

The German version of this article has been published in IT Spektrum 4/2023 and can be downloaded and read as a PDF file.

Abstract

Today, the great importance of software documentation can be met with more state-of-the-art environments than Word or Confluence. In this context, markup languages play a central role. This article shows what exactly lightweight markup languages are, what advantages they offer, and how companies can find the most universally applicable one from the multitude of languages available today. This will open up possibilities that will outshine any company wiki.

Introduction

Documents in lightweight markup languages can be created and edited with any text editor. Their simple syntax is easy to read for both humans and machines. The raw documents can be put under version control like program code, and a build pipeline can then generate, link, export, and deploy the formatted documents. Such an approach is commonly referred to as docs as code: documentation “code” is treated exactly the same as program code.

This article will highlight the fundamentals, especially for the first step: the selection of a suitable lightweight markup language. The language we are looking for should be suitable for most forms of software documentation and also be usable by anyone involved in IT—not just software developers themselves, but also software architects, project managers, product managers, and all other decision-makers.

According to [LudLic13], software documentation initially covers all types of system documentation. This includes, e.g., specifications, architecture descriptions, manuals, but also APIs and—as the last fallback level—the actual program code. We want to ignore the last two points for now (especially in the Java area), because program code remains program code, and APIs in the Java world are usually documented using Javadoc. In Python, on the other hand, an API can theoretically be documented in any markup language, but how far it makes sense to deviate from well-established quasi-standards is something everyone has to decide for themselves.

The area of software documentation also includes project documentation, i.e., everything related to project planning, as well as quality documentation, e.g., test reports. The latter can be created manually or automatically.

To summarize, in this article we want to consider software documentation to be everything that in companies is typically documented in separate documents (often Word files) or on company wiki pages (often Confluence, but also BlueSpice or other wikis)—or at least should be documented …

Reasons for Software Documentation

Theoretically, the reasons for documentation are known to everyone working in software development. However, poor or even missing software documentation is not only a perceived problem, but also confirmed as such in the literature (representative of many: [LudLic13]). It is often seen as unnecessary or—if at all—only created at the end of a development phase. This is then often accompanied by a feeling of tedious obligation, “just to get this formality done”.

A little comfort in advance: one, two, or three decades ago, the knowledge and possibilities regarding effective software documentation were very different from today. Overly long Word documents, stored somewhere on SharePoint with file suffixes such as -new.doc, -current.doc, -final.docx, -but-now-really-final-final.docx, have killed the last flicker of fun in this work, even for the most motivated documenter.

Today, ubiquitous networking, web browsers with excellent functionality and display, wikis, integrated development environments (IDEs) and version management systems (usually Git) have become commonplace. Part of the IT industry has recognized the importance of the topic of documentation and published corresponding specialist books (for example [Zoe22] as a good introduction) and templates for recurring issues (e.g., the C4 model or arc42). Furthermore, the use of lightweight and functional markup languages—which of course have not yet been around for ages—is of great benefit. This is exactly what this article is about.

In my article [Hei23] about Javadoc, I discussed the necessity of (source-code-oriented) documentation in great detail. I will not repeat this argumentation here, but instead just briefly state my personal top 3:

Reverse Engineering

Source code is much more often read than written. Mentally “decompiling” uncommented classes is already bad, but for entire software architectures it is downright irresponsible. Not only is such Sisyphean work extremely tedious, but it is also very time-consuming, error-prone, and therefore expensive. Anyone who has their developers searching for peas like Cinderella in times of a shortage of skilled workers should not be surprised at the lack of employees and efficiency.

Agility

“Agile” does not mean “no documentation”. There is and never has been any mention of the fact that agile methods such as Scrum no longer require the creation of software documentation. The Agile Manifesto merely states that (in case of doubt) “working software over comprehensive documentation” should be given preference (note the word “comprehensive”). However, it also states: “[…] while there is value in the items on the right, we value the items on the left more.” [Manifesto]

It is also useful to look at this from a historical perspective. In the past—and this was the real motivation for the agile manifesto—excessive specifications were written before even a single line was coded or the customer was brought on board for an initial preview. It was not uncommon for such classic waterfall projects to come to a rude awakening later on. The Agile Manifesto therefore aims to bring software managers back to their senses: Don’t get bogged down in excessive specification prematurely, but rather make sure that a prototype is running first.

There is also nothing wrong with this. But neither does it replace the need for documentation (ideally created during the development phase), nor does it justify turning these agile principles into a religion.

Company Value

Also with regard to the monetary value of software (and therefore its owning company), “sufficient documentation that is comprehensible to an expert third party” (“ausreichende und für einen sachverständigen Dritten nachvollziehbare Dokumentation”) is indispensable. At the latest when a software company (or its product) is to be sold one day, an expert will look at the whole thing as part of a due diligence process. “Only if appropriate documentation allows the knowledge to be passed on without the involvement of employees can it be assumed that the company has the technology at its disposal” (“Nur wenn entsprechende Unterlagen die Weitergabe der Kenntnisse auch ohne Mitwirkung der Mitarbeiter ermöglichen, ist davon auszugehen, dass das Unternehmen über die Technologie verfügen kann”).

These two (German) quotes do not come from anywhere, but from the Principles for the Valuation of Intangible Assets (Grundsätzen zur Bewertung immaterieller Vermögenswerte) of the Institute of Public Auditors in Germany (Institut der Wirtschaftsprüfer (IDW)) [IDWS5]. Inadequate and incomprehensible documentation is therefore not only a typical defect in business, but also in legal terms (more on this:[Dem24]).

Does all this sound very dramatic? Right, because it is.

What Characterizes (Lightweight) Markup Languages?

A markup language is a machine-readable and in most cases also human-readable language for structuring and formatting texts and other data. Its probably best-known representative is the Hypertext Markup Language (HTML) [WikiAuszSpr]. Documents in markup languages can be written manually by humans or generated automatically.

Typical (normal) markup languages for creating formatted documents are, e.g., HTML (in combination with CSS) or LaTeX. The fact that HTML and CSS can be used for excellent design needs no further explanation. LaTeX is a language for typesetting texts and also fulfills the highest typographical requirements. Some readers may remember it from their university studies, where it is widely used, especially in mathematical and scientific papers.

As brilliant as the functionality of such languages is, manually writing their commands takes a lot of time and is therefore not “suitable for the masses” for our purposes mentioned at the beginning. I wrote the raw version of my master’s thesis in HTML back then, and over 12 years ago I started to work systematically with LaTeX. This allowed me to design the most beautiful papers during my didactic studies, the most beautiful worksheets in class, and—since founding my company—the most beautiful letterheads, contracts, and presentation slides. But when I look back today, I can’t even count how much time it has all cost me, because the language can be quite stubborn and inconsistent and there was always something to adjust—even with the simplest things like tables. Therefore, I can understand everyone who does not want to bother with it (anymore). Anyway, at least I have some nice stationery now …

In the figure below I once listed (for once completely unscientifically and without references) typical representatives of markup languages. As a basic principle, I would claim that the more (design) flexibility a language offers, the more complicated it is.

Flexibility vs. complexity

A lightweight markup language addresses this issue and provides a syntax that is easy to learn, write, and read without any significant disruption to the reading flow. For example, a == at the beginning of a line can stand for a heading, and bulleted lists are simply created with a preceding * per bullet point. A YouTube video can be embedded with a single line:

video::dQw4w9WgXcQ[youtube]

Anyone who works with Jira or a company wiki will probably already have become familiar with such lightweight markup languages. Confluence is a major exception here, as it accepts wiki markup and Markdown syntax as input, but then saves it in its internal rich text format, whose source code the user then no longer has access to [ConfWikiMarkup].

Confluence thus joins the ranks of pure WYSIWYG word processing programs such as Word and its open source competitors. However, such proprietary (binary) formats are out of the question (although in this case I also define “proprietary” something that is not human-readable, even though the Office formats would theoretically be open source formats). On the one hand, of course, this is due to the vendor lock-in, which is currently affecting a large number of companies after Atlassian (the manufacturer of Confluence) announced that in the future it would only sell on-site licenses for horrendous five-figure sums per year and move all others to its cloud (which is also subject to US law) [Los21]. On the other hand, I consider any file format to be inadequate if its content or settings are not clearly visible to humans. Anyone who has ever had to deal with broken tables, bullet lists or inconsistent formatting in Word knows what I’m talking about. With this in mind, it will be even
more of a challenge for all Confluence customers who want to move away from Confluence to properly extract their documents from the platform.

Generated texts in a (lightweight) markup language can still be easily exported to a company wiki via script. Editing inside the wiki, however, is not intended. The master must always be the markup language document itself.

Popular Lightweight Markup Languages and Selection Criteria

Now that we know what lightweight markup languages and their advantages are, we will take a look at some of their typical representatives. On Wikipedia you can find a list and comparison of currently 22 lightweight markup languages [WikiLightMarkLang]. Many of these only exist for a specific product or a specific area of application, such as BBCode (for (BB) internet forums), Jira Formatting Notation, Org-mode (very Emacs-oriented), Slack, WhatsApp, and the wiki dialects PmWiki and TiddlyWiki. Others are outdated or (still) lead a shadowy existence, such as Creole, Gemtext, POD, setext, Texy and txt2tags.

The prevalence and popularity of a markup language should not be the main argument, but at least one argument in the decision-making process. It does not help if a language—even if it were technically compelling—has not found its way out of a nerdy one-page website or an individual’s Github repository in the last 15 years. Or as it would be called in open source circles: if it does not have a community. As we will see in a moment, sorting out the first two thirds of the languages is not a problem—we will find what we are looking for nevertheless.

The list also contains five Markdown dialects: Markdown, Markdown Extra, GitHub Flavored Markdown, MultiMarkdown, and Djot.

Markdown

Markdown is a double-edged sword. It is by far the most popular lightweight markup language, not only perceived, but also objectively in my research of various IT magazines over the last 5 to 15 years (representative of many: [Tre22]). It is appreciated for its very simple, efficient syntax and support by many programs and platforms—above all GitHub. One problem, however, is that Markdown has not yet been standardized, resulting in a large number of flavors. In addition, many language features are missing and can only be implemented separately via plug-ins or embedded HTML code. This ranges from elementary things such as missing tables, cross-references and footnotes to the lack of embedding YouTube videos. A lightweight markup language that is not portable and still has to fall back on HTML is therefore ruled out for our purposes as a universal software documentation language. This opinion is also shared by other voices [Hol16].

The following table lists the four remaining lightweight markup languages in alphabetical order:

  • AsciiDoc
  • MediaWiki
  • reStructuredText
  • Textile
Textile

Textile has a manageable range of functions, which is concisely documented on the official website [Textile4]. However, I have never come across Textile either in practice or in theory (literature). There is a risk that for larger fields of application, functions will ultimately be missing that may—or may not—be covered by plug-ins. And that risk is too great. It is already clear that the standard version lacks cross-file includes and mathematical formulas.

MediaWiki

MediaWiki is used in the world-famous Wikipedia as well as in the corporate wiki software BlueSpice. However, even after an extensive search, I have not found a binding specification; apparently a list of examples has to suffice [WikitextExamples]. Like so many things in the wiki world, what you are looking for is scattered around behind sometimes cryptic-looking links. The attempt to formally specify MediaWiki markup was aborted in 2010 [MediaWikiMarkSpec]. I also see the fixation on the wiki platform itself as problematic. A truly offline-compatible and platform-independent file format would be desirable.

reStructuredText (RST)

RST is widely used in the Python world. What Javadoc is for Java programmers, RST is for Pythonistas—but with one crucial difference: While Javadoc is specifically tailored to the programming language (which can above all be seen in the many specific tags such as @param, @return, or @throws), Python source code is documented in so-called docstrings. Actually, a docstring is nothing more than a multi-line string that can even be read out at runtime and in whose “design” (formally correct: in whose semantic markup) the author is basically free [PEP257].

This means that there are no fixed documentation tags as in Javadoc. This has the advantage that the documents generated from it (usually using the Sphinx generator) do not look as monotonous and strict as Javadoc APIs, but often appear more “creative”, “lively”, and “colorful” (for taste samples see [pandReadPickle] or [TensFlowModel]). As a disadvantage, however, this requires significantly more responsibility and discipline from the author in order to maintain a consistent style and to not forget anything. You can find out more about the correct use of Javadoc in great detail in my article “Javadoc with Style” [Hei23].

Although RST is more located in the Python area, its syntax is completely Python-independent. In principle, RST is fully suitable for software documentation of all kinds. Both the language itself and the quasi-standard generator Sphinx leave nothing to be desired in terms of functionality. However, RST is not particularly inviting for an audience that is not so much into programming. This may be partly due to the somewhat bare, yet very formal-looking websites [RST, Sphinx], and partly due to the somewhat clumsy syntax, which is particularly noticeable with tables. RST also does not win any friends with the online editor, which has been offline for months, if not years [ninjs].

AsciiDoc

AsciiDoc is therefore the jack-of-all-trades of documentation languages. With includes, mathematical formulas in TeX notation and native-feeling embedding of (also text-described) diagrams of various kinds, the range of functions also fulfills the last criteria we wanted. The syntax is pleasant, the documentation is appealing and ranges from a Quick Reference [AscDocSyntaxQuickRef] to a complete, almost lavish Language Documentation [AscDocLangDoc]. The standard generator for AsciiDoc is called Asciidoctor and basically contains all the documentation that is relevant for us as users (authors). The official website of the “language” AsciiDoc [AscDoc], on the other hand, is of no interest to us.

If you want to try out the markup language right away, you can find a live editor at [ascdocLIVE]. With the Chrome plugin Asciidoctor.js Live Preview, AsciiDoc raw files can be displayed fully rendered in the web browser. The IntelliJ IDEA plugin AsciiDoc allows easy viewing and editing of AsciiDoc documents. Other browsers and IDEs offer comparable plug-ins. I am also making this article available to the readership in AsciiDoc format under [art], as the raw version of this article was—how else could it be—also written in AsciiDoc.

Organizational and Political Matters

It must be admitted that an (AsciiDoc) markup language document neatly embedded in the company structure cannot initially be created as quickly and easily as a Word file or a Confluence page. The language elements must first be learned and an infrastructure with a documentation generator, version management, and CI/CD pipeline must be configured.

For the language elements, employee training by a specialist is recommended. This can be recruited in-house as an expert role or brought in from outside. In half to full a day of training—depending on how in-depth you want to go, especially with diagrams—it should be possible to convey the key basic concepts. In my opinion, it is not advisable to let every employee “play around on their own”. It is well known that the individual expectations of each person vary greatly. Training, on the other hand, can work out the essentials, emphasize important and leave out unimportant things. Not only will the required amount of work and time be significantly lower, but it is also ensured that at the end all employees have the same level of knowledge.

Setting up and configuring the technical infrastructure should be entrusted to software developers. They are most familiar with the version control system and the build pipeline. For all non-developers who (understandably) do not want to bother with Git on the command line, there are easy-to-use GUI clients with which they can store and retrieve documents—no more complicated than an FTP client. However, this also means that the software developers responsible for the documentation infrastructure must be able to put themselves in the shoes of non-developers and thus keep all unnecessary complexity away from them.

In his book [Zoe22], Stefan Zörner even suggests a role called “Doctator”. This person provides and maintains an overview of the documentation in the company, takes care of both the organizational and technical aspects of the documentation process, supports the teams with documentation (but does not necessarily write the documentation himself), and monitors the up-to-dateness and consistency of the content.

I can only support this idea. I personally believe that the work of documenting should be concentrated on a few people who a) enjoy doing it and b) do it well. Not just anyone should be allowed to create and edit documents—and definitely not have to. I have seen too many cases in my career in which this (essentially well-intentioned) “wiki philosophy” ultimately fell victim to a mixture of diffusion of responsibility and conflicting quality requirements. But I also know that opinions on this topic differ widely and are ultimately political. A small but very good tabular comparison of the two knowledge management philosophies “knowledge sharing” vs. “classic knowledge management” can be found in [Heigl21].

I am always astonished at how little the topic of “documentation” is systematized in companies. If documentation is documented at all, it is improvised at best. The rules for versioning schemes, expense reports and e-mail signatures are worked out down to the smallest detail, but companies often shrug their shoulders when it comes to documentation—the crucial basis for new employees, further development of the software architecture, and a resilient company value.

The GitLab Documentation Style Guide (whose specific content I neither want to propagate nor evaluate as good or bad here) shows examples of what should in principle be regulated in terms of documentation [GitLabDocStyleGuide]. This ranges from broad topics such as the “single source of truth” over the language (German or English? And if the latter, which English?) to details such as the capitalization of headings.

Conclusion: The Journey Goes On

This technical article could easily go further. It has presented the advantages of lightweight markup languages in a (hopefully) objective and comprehensible way and has also transparently narrowed down the field of possible languages. From my own experience, I can report that the longer one deals with the topic, the more or less it boils down to a docs as code approach with—depending on technology and preference—AsciiDoc or reStructuredText as markup language.

Where does the journey go from here? With the technical and organizational basis for treating, versioning, and deploying documentation texts like source code while maintaining their easy readability—as described in this article—the first major stage has already been reached. The ecosystem with Asciidoctor as the standard generator for AsciiDoc and Sphinx as the quasi-standard generator for reStructuredText can be easily managed.

At a next stage, not only can diagrams be classically integrated into (AsciiDoc) documents, but they can even be described and generated directly in text form. Analogous to docs as code, this is referred to as diagrams as code. The “box shifting” from graphics programs has finally come to an end. As we all know, a picture is worth a thousand words, which is why I refer the reader directly to [Kro] to get an overview of the available diagram types and their often simple description syntax.

The figure below shows an example of a sequence diagram that was generated from the short code it is followed by (both from [Kro]). Moreover, the plot in the first third of this article was also generated directly from text. As you already know, its source code can be viewed at [art].

Sequence diagram
seqdiag {
  browser -> webserver [label = "GET /index.html"];
  browser <-- webserver;
  browser -> webserver [label = "POST /blog/comment"];
  webserver -> database [label = "INSERT comment"];
  webserver <-- database;
  browser <-- webserver;
}

In order to structure entire architecture documentations—and thus as a further evolutionary step—I want to particularly point out the C4 model and the arc42 template mentioned briefly at the beginning. An excellent concise overview of the latter can be found in the JavaSPEKTRUM article [Sta22] by Gernot Starke. Dr. Starke also describes in JavaSPEKTRUM how AsciiDoc can be used for internationalization [Sta19]. Together with [Sch17], it becomes clear why includes are so important for a markup language and all the things that can be realized with them.

If you are aiming to go really high, you can also devote yourself to executable documentation. Depending on the tool used, architecture specifications are described either directly in program code, in AsciiDoc documents, or separately as XML files. These can then be tested automatically and also serve as a data basis for the generated architecture documentation. This ensures that the requirements for the architecture are always synchronized with the associated documents. The best-known representatives of such tools are jQAssistant and ArchUnit.

Now I would like to let you, dear readers, go on a discovery tour into the world of lightweight markup languages and docs as code concepts.

Literature and Links

[art]
this article in AsciiDoc format, link.simplexacode.ch/pz7g
[AscDoc]
AsciiDoc, asciidoc.org
[AscDocLangDoc]
AsciiDoc Language Documentation, docs.asciidoctor.org/asciidoc/latest/
[AscDocSyntaxQuickRef]
AsciiDoc Syntax Quick Reference, docs.asciidoctor.org/asciidoc/latest/syntax-quick-reference/
[ascdocLIVE]
asciidocLIVE, asciidoclive.com
[ConfWikiMarkup]
Confluence Wiki Markup, confluence.atlassian.com/doc/confluence-wiki-markup-251003035.html
[Dem24]
C. Demant, Software Due Diligence, 2. Auflage, BoD – Books on Demand, 2024
[GitLabDocStyleGuide]
GitLab, Documentation Style Guide, docs.gitlab.com/ee/development/documentation/styleguide/
[Hei23]
C. Heitzmann, Javadoc with Style, link.simplexacode.ch/gh32
[Heigl21]
R. Heigl, Unternehmenswissen, in: iX 2/2021, Heise Medien, 2021
[Hol16]
E. Holscher, Why You Shoudn’t Use “Markdown” for Documentation, ericholscher.com/blog/2016/mar/15/dont-use-markdown-for-technical-docs/
[IDWS5]
IDW Standard: Grundsätze zur Bewertung immaterieller Vermögenswerte (IDW S 5), 2015
[Kro]
Kroki, Examples, kroki.io/examples.html
[Los21]
M. G. Loschwitz, Wissen ist Macht, in: iX 2/2021, Heise Medien, 2021
[LudLic13]
J. Ludewig, H. Lichter, Software Engineering, dpunkt.verlag, 2013
[Manifesto]
Manifesto for Agile Software Development, agilemanifesto.org
[MediaWikiMarkSpec]
Media Wiki, Markup spec, www.mediawiki.org/wiki/Markup_spec
[ninjs]
rst.ninjs.org (offline)
[pandReadPickle]
pandas API reference, pandas.read_pickle, pandas.pydata.org/docs/reference/api/pandas.read_pickle.html
[PEP257]
PEP 257 – Docstring Conventions, peps.python.org/pep-0257/
[RST]
reStructuredText, docutils.sourceforge.io/rst.html
[Sch17]
M. Schlichting, Lebendige Dokumentation mit AsciiDoctor, in: Java aktuell 3/2017, DOAG, 2017
[Sphinx]
Sphinx, reStructuredText, www.sphinx-doc.org/en/master/usage/restructuredtext/index.html
[Sta19]
G. Starke, Internationalisierung von Dokumenten – i18n-light mit AsciiDoc & Co., in: JavaSPEKTRUM, 5/2019, SIGS DATACOM, 2022
[Sta22]
G. Starke, arc42, die Achte, in: JavaSPEKTRUM, 1/2022, SIGS DATACOM, 2022
[TensFlowModel]
TensorFlow Core v2.11.0 API Documentation, tf.keras.Model.compile, www.tensorflow.org/api_docs/python/tf/keras/Model#compile
[Textile4]
Textile 4.0.0, textile-lang.com
[Tre22]
S. Tremmel, # Überschrift – Mit Markdown schnell und einfach Texte auszeichnen, in: c’t, 18/2022, Heise Medien, 2022
[WikiAuszSpr]
Wikipedia, Auszeichnungssprache, de.wikipedia.org/wiki/Auszeichnungssprache
[WikiLightMarkLang]
Wikipedia, Lightweight markup language, en.wikipedia.org/wiki/Lightweight_markup_language
[WikitextExamples]
Wikimedia Meta-Wiki, Help:Wikitext examples, meta.wikimedia.org/wiki/Help:Wikitext_examples
[Zoe22]
S. Zörner, Softwarearchitekturen dokumentieren und kommunizieren, 3. Auflage, Hanser, 2022

Shortlink to this blog post: link.simplexacode.ch/4gwr2024.03

This Post Has 2 Comments

  1. Good luck 🙂

  2. Hello.

Leave a Reply