This repo contains a collection of Dockerfiles to build various pandoc container images.
Contents
Docker images hosted here have a "core" version and a "latex" version:
- core:
pandocandpandoc-citeproc, as well as the appropriate backend for the full lua filtering backend (lua filters can call external modules). - latex: builds on top of the core image, and provides an as-minimal-as-possible
latex installation in addition. This includes all packages that
pandocmight use, and any libraries needed by these packages (such as image libraries needed by the latex graphics packages).
From there, the tagging scheme is either X.Y, X.Y.Z, latest, or edge.
X.YorX.Y.Z: an officialpandocrelease (e.g.,2.6). Once anX.Ytag is pushed, it will not be re-built (unless there is a problem). Pandoc releases versions such as2.7or2.7.1(there is no2.7.0), which is where the optional.Zcomes from.latest: thelatesttag points to the most recentX.Yrelease. For example, if tags2.5and2.6were available online,latestwould be the same image as2.6.edge: the "bleeding edge" tag clones themasterbranch ofpandocandpandoc-citeproc. This tag is a moving target, and will be re-built at least once a month. The CI scripts have a cron job to build each image stack on the first of the month. However, changes to themasterbranch of this repository may also result in theedgetag being updated sooner.
The current latest tag for all images points to pandoc version 2.9.2.1.
- Core image:
pandoc/core- To build locally:
make alpine
- To build locally:
- Latex image:
pandoc/latex- To build locally:
make alpine-latex
- To build locally:
Note: this section describes how to use the docker images. Please refer to the
pandocmanual for usage information aboutpandoc.
Docker images are pre-provisioned computing environments, similar to virtual machines, but smaller and cleverer. You can use these images to convert document wherever you can run docker images, without having to worry about pandoc or its dependencies. The images bring along everything they need to get the job done.
-
Install Docker if you don't have it already.
-
Start up Docker. Usually you will have an application called "Docker" on your computer with a rudimentary graphical user interface (GUI). You can also run this command in the command-line interface (CLI):
open -a Docker
-
Open a shell and navigate to wherever the files are that you want to convert.
cd path/to/source/dirYou can always run
pwdto check whether you're in the right place. -
Run docker by entering the below commands in your favorite shell.
Let's say you have a
README.mdin your working directory that you'd like to convert to HTML.docker run --rm --volume "`pwd`:/data" --user `id -u`:`id -g` pandoc/latex:2.6 README.md
The
--volumeflag maps some directory on your machine (lefthand side of the colons) to some directory in the container (righthand side), so that you have your source files available for pandoc to convert.pwdis quoted to protect against spaces in filenames.Ownership of the output file is determined by the user executing pandoc in the container. This will generally be a user different from the local user. It is hence a good idea to specify for docker the user and group IDs to use via the
--userflag.pandoc/latex:2.6declares the image that you're going to run. It's always a good idea to hardcode the version, lest future releases break your code.It may look weird to you that you can just add
README.mdat the end of this line, but that's just because thepandoc/latex:2.6will simply prependpandocin front of anything you write afterpandoc/latex:2.6(this is known as theENTRYPOINTfield of the Dockerfile). So what you're really running here ispandoc README.md, which is a valid pandoc command.If you don't have the current docker image on your computer yet, the downloading and unpacking is going to take a while. It'll be (much) faster the next time. You don't have to worry about where/how Docker keeps these images.
Pandoc commands have a way of getting pretty long, and so typing them into the
command line can get a little unwieldy. To get a better handle of long pandoc
commands, you can store them in a script file, a simple text file with an *.sh
extension such as
#!/bin/sh
pandoc README.mdThe first line, known as the shebang
tells the container that the following commands are to be executed as shell
commands. In our case, we really don't use a lot of shell magic, we just call
pandoc in the second line (though you can get fancier, if you like). Notice that
the #!/bin/sh will not get you a full bash shell, but only the more basic
ash shell that comes with Alpine linux on which the pandoc containers are based.
This won't matter for most uses, but if you want to write writing more
complicated scripts you may want to refer to the ash
manual.
Once you have stored this script, you must make it executable by running the following command on it (this may apply only to UNIX-type systems):
chmod +x script.shYou only have to do this once for each script file.
You can then run the completed script file in a pandoc docker container like so:
docker run --rm --volume "`pwd`:/data" --entrypoint "`pwd`/script.sh" pandoc/latex:2.6Notice that the above script.sh did specify pandoc, and you can't just
omit it as in the simpler command above. This is because the --entrypoint flag
overrides the ENTRYPOINT field in the docker file (pandoc, in our case),
so you must include the command.
GitHub Actions is an Infrastructure as a Service (IaaS) from GitHub that allows you to automatically run code on GitHub's servers on every push (or a bunch of other GitHub events).
Such continuous integration and delivery (CI/CD) may be useful for many pandoc users. Perhaps, you're using pandoc convert some markdown source document into HTML and deploy the results to a webserver. If the source document is under version control (such as git), you might want pandoc to convert and deploy on every commit. That is what CI/CD does.
To use pandoc on GitHub Actions, you can leverage the docker images of this project.
To learn more how to use the docker pandoc images in your GitHub Actions workflow, see these examples.
Suppose users desire a new image stack using a different base image. To make
the requirements clearer, assume the desire is to have a new image stack based
off ubuntu.
-
Create a top-level directory named
ubuntu. The name of this directory should be exactly the same as whatever theFROMclause will be, for consistency and clarity. -
Create
ubuntu/Dockerfile. ThisDockerfilewill be the "core"ubuntuimage, it should only containpandocandpandoc-citeproc. Refer to thealpine/Dockerfilefor assistance in how to create multiple layers. The idea is to create a base image, install all build dependencies andpandoc/pandoc-citeproc. Then create a new layer from the original base image and copy from the intermediate build layer. This way thepandoc/pandoc-citeprocare effectively the only additional items on top of the original base image. -
Add an
ubuntutarget to theMakefile. -
Create
ubuntu/latex/Dockerfileand install the latex dependencies. Use thealpine/latex/Dockerfileas a reference for what dependencies should be installed in addition to latex. -
Add an
ubuntu-latextarget to theMakefile. -
Add testing targets
test-ubuntuandtest-ubuntu-latex. You should be able to copy-paste the existingtest-alpineandtest-alpine-latextargets and rename the target-specific variable value forIMAGE:# update default ---> |-----------------------------| test-ubuntu: IMAGE ?= pandoc/ubuntu:$(PANDOC_VERSION) test-ubuntu: # vvv invokation line is the same as alpine tests IMAGE=$(IMAGE) make -C test test-core
This means that
make test-ubuntuwill invoke thetest-coretarget in thetest/Makefile, using the imagepandoc/ubuntu:edge. The target specific value is helpful for developers to be able to run the tests against an alternative image, e.g.,IMAGE=test/ubuntu:edge make test-ubuntu. Note that the testing targets must be thecoreandlatextargets withtest-preprended. The CI tests runmake test-<< parameters.core_target >>andmake test-<< parameters.latex_target >>(see next item). -
Now that your image stack has been defined (and tested!), update the CircleCI
.circleci/config.ymlfile to add a new build stack. Specifically, search foralpine_stack: &alpine_stack. An exampledifffor thisubuntustack could look like this:@@ -58,6 +58,9 @@ jobs: alpine_stack: &alpine_stack core_target: alpine latex_target: alpine-latex +ubuntu_stack: &ubuntu_stack + core_target: ubuntu + latex_target: ubuntu-latex # Setup builds for each commit, as well as monthly cron job. workflows: @@ -66,12 +69,17 @@ workflows: - lint - build_stack: <<: *alpine_stack + - build_stack: + <<: *ubuntu_stack monthly: # NOTE: make sure all `build_stack` calls here *also* set `cron_job: true`! jobs: - build_stack: <<: *alpine_stack cron_job: true + - build_stack: + <<: *ubuntu_stack + cron_job: true
You should not need to edit anything else in this file!
-
Update this file (README.md) to include a listing of this new image stack. Create a new h2 heading (
Ubuntu Linuxin this example) underneathAll Image Stacksheading. Please keep this alphabetical. Please also make sure to create a hyperlink under the**Contents**listing at the top of this file for browsing convenience. -
Open a Pull Request for review!
When pandoc has a new official release, the following steps must be performed
in this exact order:
-
Create a pull request from a branch. Edit the
Current `latest` Tagsection to include the newpandocrelease number. Suppose we are releasing image stacks forpandocversion 9.8:$ git checkout -b release/9.8 # ... edit current :latest ... $ git add README.md $ git commit -m 'release=9.8' $ git push -u origin release/9.8
The important part is the commit message. The
.circleci/version_for_commit_message.shscript will check the commit message forrelease=X.Y/release=X.Y.Z, and if found performs the additional tagging to:latest. So the diff does not really matter, just the message.Create a pull request first to make sure all image stacks build as expected.
-
Assuming the pull request build succeeds, merge to
masterbranch. The only time thatdocker pushis performed is when a commit hits themasterbranch of this repository.
Code in this repository is licensed under the GNU General Public License Version 2.