It is 2016. A good continuous integration system is an essential tool for software development. Most programmers wouldn't develop software without using a version control system or writing unit tests, and CI is now another of the basic tools in a toolbox for programming.

I've been using CI since about 2011 for my own projects. Initially, I used Jenkins, but after a few years, it irritated me enough that wrote my own, which is called Ick. What irritated me about Jenkins was particularly that it broke in one way or another several times a year, and I'd need to fix it, sometimes by deploying it from scratch. I was also annoyed by the fact that it tooks much wrangling to get Jenkins set up so that it could cleanly build Debian packages for me, for each of my many personal projects. I worked around that by writing a tool to set up Jenkins jobs, and triggering them, via the Jenkins API. Towards the end I had about 500 jobs and was reasonably happy. Until Jenkins broke again.

In the interest of fairness, I'll admit that Jenkins works well enough for a lot of people and projects. I'm not saying Jenkins is uselessly bad for everyone, but it and I have differences of opionion about a few things, which means I'm more likely to be happy with another tool.

Ick is what I currently use. It started as a quick hack, and it's aimed at me. It isn't a service that runs continuously. It's a command line tool I run when I think it's time. It's also designed around my personal resource contraints at the time I was building it. Primarily this means it only runs one build at a time, rather than trying to run as many things in parallel as possible, since I only had my laptop back then, and the laptop didn't have enough RAM and CPU to run many builds in parallel.

While Ick works for me, and I'm mostly OK with it, it's clumsy and slow, and doesn't work very well for anyone else. It's also a bit fragile, and understanding things like the build steps to run requires changing its source code. I would like to have something better, and if I go into the effort of writing that, I'd like it to be as close to my dream system as possible.

This document is the start of that work.

Other CI systems

I've not tried all the CI systems in the world. In fact, I've only really tried Jenkins in anger. I've looked a little into a few others, and since I'm looking for a system to be happy with, I've managed to find fatal flaws in all the ones I've looked at.

In order to generate a lot of angry feedback, here's a few notes:

  • Buildbot seems to be configured using Python scripts. I want configuration to be declarative. If I have to run arbitrary code to know what will happen, I'm unhappy.

  • Go seems interesting in many ways. However, it's written in Java (which I don't like and don't want to debug), and while it's open source, the project requires patch submitters to sign a CLA, which I won't. If I chose Go, I'd not be to improve the software.

What I think I want from a CI system

In this chapter I'm trying to draw a picture of what kind of CI system I think I would be happy with.

What a CI system is

The purpose of a CI system is to take some source code and make sure it works. As a side effect it should produce some artifacts, which would include installation packages, formatted manuals, and possibly other things, depending in the project in question. It might also include things such as deploying the software to production servers, or to maintain computing infrastructure.

One can think of a CI system as a thing that waits for something to trigger it, and once triggered, runs a sequence of steps to produce the desired results. In its most simple form, a CI system just runs a a Unix script if if anything changes in a version control repository.

What about CD?

An extension of a continuous integration system is a continuous delivery (or deployment) system, or CD for short. This would take the installation packages produced by the CI part and either make them available for anyone to use, or even install them onto production servers.

From a CI implementataion point of view, I think there's not a whole lot of difference between CI and CD. CD means adding more steps to the "pipeline" that is run on every change, in order to do the delivery or deployment.

What a CI/CD system doesn't need to be

A CI/CD system doesn't need to manage its various component systems, or other systems, or do other system administration tasks. It might be configured to do those, but the knowledge should be in the configuration and doesn't need to be baked into the code.


In this chapter I'm defining various core concepts for a CI system. This is highly biased by my vision for a CI system and might not be shared by CI experts. I'm open to feedback.

  • CI configuration specifies how a CI instance should work: what projects it should build, how, when, etc. The configuration is stored in static files in version control.

  • A project represents a thing that is being developed. For example, it might be a program, a website, or a document. A project consists of some source data, one or more pipelines to build artifacts from the source, with some triggers that cause the pipelines to run. The CI configuration specifies the projects and their various aspects.

  • A version control repository is where project source code is stored. It is one of the ways in which source data can be provided to the CI system. As an example, git. Personally, git is the only interesting one at this time, but I'd prefer to not code in an assumption that it is git. It should be fairly easy to support any reasonable version control system.

    A project source data may reside in multiple version control repositories, which get stacked. For example, Debian packaging might be in a separate repository from upstream source code.

  • A pipeline instance is a pipeline template with parameters to nail down any variations for a specific project.

  • A pipeline template specifies the steps to take to turn source data into output artifacts. A template typically has parameters, such as where the source data is retrieved from, or what to do with the built artifacts.

  • A pipeline step is an atomic unit in a pipeline template. For example, a step might specify how to unpack the source code, build a release tarball, or build a Debian binary package from a Debian source package.

    Pipeline steps are specified in the CI configuration.

    Pipeline steps may specify that they need a specific worker, or a worker with specific attributes, on which to run.

    Pipeline steps may be templated with parameters, just like pipelines.

  • A trigger alerts the CI system that a pipeline may need to be run. For example, a pipeline might be triggered by a change in the version control repository, a timer elapsing, or another pipeline finishing successfully. Another trigger might be that a Debian APT repository has changed (prompting a pipeline that updates production servers for security updates).

  • The CI controller decides when to run each step in each pipeline and on which worker and what to do if something fails. The controller is probably a daemon that can be queried and controlled via an HTTP API, using suitable authentication and transport encryption.

  • A CI worker runs each pipeline step. Each worker gets commands from the controller, and returns results to the controller. Workers have names, and attributes (key/value pairs), and pipelines or their steps may be restricted to specific workers or types of workers based on the attributes.

    Communication with workers is probably purely over ssh. Workers should not be required to have anything much installed, apart from an ssh server.

  • An artifact repository is where outputs from the build processes get stored. A CI instance might have any number of artifact repositories: for example, if it builds both Debian and RPM packages, they would be stored in different repositories.

  • Pipeline output is everything produced by running a pipeline that isn't explicit artifacts. For example, the build log (output from all the commands run by the pipeline), measurements of resource usage, etc.

  • A project dependency is another project, which is needed to build a specific project, or whose build completing triggers another project.

  • A pipeline parameter is used to create a pipeline instance from a pipeline template, and fixes a generic value for the template in a manner suitable for the specific project.

  • A workspace is the directory where pipeline steps are run. It is initially populated with the source code, and nothing else. Each pipeline step may, however, modify the workspace, e.g., by building source into binary code. The workspace is initially set up by the controller, and copied to the worker when it begins running a step, and back when it's finished running the step.

    The workspace is effectively created from scratch for each run, and deleted afterwards, though this may be implemented via things like "git clean -fdxq", possibly, for efficiency.

    A workspace be a checkout (git clone) or an exported copy (git archive) of a version control system.

  • The user interface of the CI system is mostly detached from the actual working logic, and may be implemented in various ways, including command line and a web application, using the controller API.

  • The CI system may produce reports, whenever a pipeline has finished running. A report is produced by code that is specified by the user, and is triggered by a pipeline finishing, whether successfully or not. The reporting code gets access to all runs, artifacts, and related metadata gathered by the CI system.

    Notifications (via email, IRC, or whatever) might be implemented as reports, or notifications may become their own concept.

A non-useful visual representation

Configurtion sketch

In this section I show some examples of how configuration files might work. Configuration files are in YAML, for human readability and editability.

A pipeline step:

- name: create_upstream_release_tarball_from_git
    # These are automatic.
    - project_name
    - project_version
    - arfifacts
    # This is a custom variable.
    - compression_level
  shell: |
    pv="{{ project_name }}-{{ project_version }}"
    git archive --prefix="$pv" HEAD |
      xz "-{{ compression_level }} > "{{ artifacts }}/$pv.tar.xz"
- name: run_make_in_directory
    - build_subdir
  shell: |
    make -C "{{ build_subdir }}"
- name: copy_out_artifacts
    - arfifacts
    - build_subdir
  shell: |
    cd "{{ build_subdir }}"
    cp *.html *.pdf "{{ artifacts }}/."

A couple of pipeline templates:

- name: release_source_code
  - create_upstream_release_tarball_from_git
    compression_level: 9
- name: build_docs
  - run_make_in_directory
  - copy_out_artifacts

A trigger:

  name: version_control_repository_has_changed

FIXME: No idea yet how to implement triggers, especially if they're supposed to be triggered by HTTP requests. One idea is that each trigger becomes an endpoint in the controller HTTP API: https://controller/trigger/version_control_has_changed/obnam. Triggering would then happend by calling the API endpoint.

A project:

- name: obnam
  - type: git
    url: git://
  - type: git
    url: git://
    dirname: debian
  - operating_system: linux
  - name: release_source_code
    - version_control_repository_has_changed
      compression_level: 1
  - name: build_docs
    - version_control_repository_has_changed
      compression_level: 9
      build_subdir: manual

A worker:

- address: ick@ick-debian8-amd64
    # Note: most of these will be determined automatically
    operating_system: linux
    linux_distribution: debian
    arch: amd64
    cores: 1
    memory: 4GiB
    disk: 1TB

Self-contained configuration example

See tiny.txt for a sketch of a self-contained configuration example.