Markup languages for the newsletter

Featureset

  • Table of contents
  • Automatic section numbering
  • Output in HTML, plain text, PDF
  • Including images to HTML and PDF version
  • Easy generating of plain text version w/o markup and unresolved image links etc.
  • Lists and simple tables
  • Customizable styles/templates for PDF and HTML
  • Free Software (Free as in freedom – GPL, BSD, PD etc.)
  • Available on Debian, Ubuntu, Fedora, SuSE, *BSD (and at least a little chance for Windowsuser to contribute something)

Languages / Toolkits

LaTeX

Pros (for this application) Latex is a combined content structure, layout and formatting language, all the lightweight markup languages are content structure only and a different backend configuration language must be used for layout/formatting. Already setup.

Con fewer contributions may come in Latex so more conversion effort may be needed by the production team.

ReST

Is the current used markup language. Its seems to support all points mentioned above either as build in or with some little manuel work after.

The only unsolved issue is the generating of plain text with table of content and section numbering. This could be done using the HTML/PDF output but will need some further effort to investigate.

Sphinx

AsciiDoc

Asciidoc provides table of contents, automatic section numbering and output in all three formats, images in PDF and HTML. As shown in the example at: https://github.com/elextr/geany_stuff/commit/d094fb77d37bd00c4172041ea95be08ba294460c the html can be css formatted to be the same as Newsletter 2, so that is done.

The lightweight markup language is similar to most such languages, eg ReST, Markdown etc and like them you could almost use the source as the text version. Supports images, lists, tables, literal blocks, code blocks with syntax highlighting via external filters, HTML5 audio and video, table of contents by included javascript, section numbering.

To generate PDF needs a docbook toolchain such as free open source dblatex (uses your installed latex) or Apache project's FOP. For text it uses w3m (includes the javascript generated toc).

Asciidoc configuration is by 1) setting attribute values on the command line or in the source file 2) for complex changes cascading (like css) configuration files control input format and generated output (HTML and docbook).

dblatex configuration is by 1) attributes set on the command line or in a config file or 2) latex stylesheets for complex changes FOP configuration is by 1) attributes set on the command line or 2) XSL stylesheets

Pros are this is what Asciidoc is designed for, writing human documents, it isn't a code docstring extractor that is being forced into another role. It also provides a Python script to assist in running the backend toolchains (PDF command is “a2x sourcefile”) and it supports more than one toolchain in case one proves to be a problem and xhtml11 or xhtml5 are produced directly without any toolchain. Asciidoc is also mature, currently version 8.6.5 with version 1 release in 2002. License GPLV2.

Cons isn't used in Geany so its an extra tool.

But it is just Python and can be just installed in a user directory (or even run from the clone of the Mercurial repo), it doesn't need to be a system install (although it can be from package (most distros, all listed above including windows) or makefile).

Doxygen

Not sure about automatic section numbering, but Doxygen does all the other stuff and is already being used in Geany (see the Plugin Howto). It can be used like this to generate documents, even though it's normally used with source code API docs (I guess it's the same case as Sphinx and GTK-DOC?). Pros are it's easy, matches existing Geany documents, and supports lots of output formats. Cons are that it's not the primary use for the tool and it might not do plain text output, and the markup seems not to be great for plain-text use.

Pandoc

You can format your file in Pandoc-enhanced Markdown. Markdown is a very easy-to-read and easy-to-write plain text format. The pandoc tool can convert your plain text source file into html, LaTeX, and other output formats (it can, in fact, convert between various formats, as described on its webpage).

Once your newsletter source file is written, create an html version like so:

  pandoc -s --toc -N --css=style.css -o newsletter_n.html newsletter_n.txt

To create a pdf version:

  pandoc -s --toc -N -o newsletter_n.tex newsletter_n.txt
  pdflatex newsletter_n.tex
  pdflatex newsletter_n.tex

where you run that pdflatex command twice in order to correctly generate the table of contents.

Installation

On Debian-based GNU/Linux systems, you can install Pandoc in the usual way: aptitude install pandoc. Otherwise, see the install instructions.

For generating the pdf, you'll also need a LaTeX distribution installed, such as texlive. On Debian-based systems: aptitude install texlive

Print/export