Introduction

It is 2016. A good continuous integration system is an essential tool for software development. Most programmers wouldn't develop software without using a version control system or writing unit tests, and CI is now another of the basic tools in a toolbox for programming.

I've been using CI since about 2011 for my own projects. Initially, I used Jenkins, but after a few years, it irritated me enough that got ennough, and wrote my own, which is called Ick. What irritated me about Jenkins was particularly that it broke in one way or another several times a year, and I'd need to fix it, sometimes by starting off from scratch. I was also annoyed by the fact that it tooks quite some wrangling to get Jenkins set up so that it could cleanly build Debian packages for me, for each of my many personal projects. I worked around that by writing a tool to set up Jenkins jobs, and triggering them, via the Jenkins API. Towards the end I had about 500 jobs and was reasonably happy. Until Jenkins broke one time too many.

In the interest of fairness, I'll admit that Jenkins works well enough for a lot of people and projects. I'm not saying Jenkins is uselessly bad for everyone, but it and I have differences of opionion about a few things, which means I'm more likely to be happy with another tool.

Ick is what I currently use. It started as a quick hack, and it's aimed at me. It isn't a service that runs continuously. It's a command line tool I write when I think it's time. It's also designed around my personal resource contraints at the time I was building it. Primarily this means it only runs one build at a time, rather than trying to run as many things in parallel as possible, since I only had my laptop back then.

While Ick works for me, and I'm mostly OK with it, it's clumsy and slow, and doesn't work very well for anyone else. It's also a bit fragile, and fixing things like the build steps to run requires changing its source code. I would like to have something better, and if I go into the effort of writing that, I'd like it to be as close to my dream system as possible.

This document is the start of that work.

Other CI systems

I've not tried all the CI systems in the world. In fact, I've only really tried Jenkins in anger. I've looked a little into a few others, and since I'm looking for a system to be happy with, I've managed to find fatal flaws in all the ones I've looked at.

In order to generate a lot of angry commentary, here's a few notes:

  • Buildbot seems to be configured using Python scripts. I want configuration to be statically analysable. If I have to run arbitrary code to know what will happen, I'm unhappy.

  • Go seems interesting in many ways. However, it's written in Java (which I don't like and don't want to debug), and while it's open source, the project requires patch submitters to sign a CLA, which I won't. If I chose Go, I'd not be to improve the software.

  • ... I'm sure I've looked at other systems. FIXME.

What I think I want from a CI system

In this chapter I'm trying to draw a picture of what kind of CI system I think I would be happy with.

What a CI system is

The purpose of a CI system is to take some source code and make sure it works. As a side effect it should produce some artifacts, which would include installation packages, formatted manuals, and possibly other things, depending in the project in question.

One can think of a CI system as a thing that waits for something to trigger it, and once triggered, runs a sequence of steps (known as the pipeline) to produce the desired results. In its most simple form, a CI system just runs a sequence of Unix shell commands if if anything changes in a version control repository.

What about CD?

An extension of a continuous integration system is a continuous delivery (or deployment) system, or CD for short. This would take the installation packages produced by the CI part and either make them available for anyone to use, or even install them onto production servers. Using a CD system requires a development workflow where the time to production is mimimization goal.

From a CI implementataion point of view, I think there's not a whole lot of difference between CI and CD. CD means adding more steps to the "pipeline" that is run on every change, in order to do the delivery or deployment.

What a CI/CD system doesn't need to be

A CI/CD system doesn't need to manage its various component systems, or other systems, or do other system administration tasks. It might be configured to do those, but the knowledge should be in the configuration and doesn't need to be baked into the code.

FIXME. There's more that can be placed out of scope.

Concepts

In this chapter I'm defining various core concepts for a CI system. This is highly biased by my vision for a CI system and might not be shared by CI enthusiasts. I'm open to feedback.

  • CI configuration specifies how a CI instance should work: what projects it should build, how, when, etc. The configuration is stored in static files in version control.

  • A project represents a thing that is being developed. For example, it might be a program, a website, or a document. A project consists of some source data, one or more pipelines to build artifacts from the source, with some triggers that cause the pipelines to run. The CI configuration specifies the projects and their various aspects.

  • A version control repository is where project source code is stored. It is one of the ways in which source data can be provided to the CI system. As an example, git. Personally, git is the only interesting one, but I'd prefer to not code in an assumption that it is git. It should be fairly easy to support any reasonable version control system.

    A project source data may reside in multiple version control repositories, which get stacked. For example, Debian packaging might be in a separate repository from upstream source code.

  • A pipeline instance is a pipeline template with parameters to nail down any variations for a specific project.

  • A pipeline template specifies the steps to take to turn source data into output artifacts. A template typically parameters, such as where the source data is retrieved from, or what to do with the built artifacts.

  • A pipeline step is an atomic unit in a pipeline template. For example, a step might specify how to unpack the source code, build a release tarball, or build a Debian binary package from a Debian source package.

    Pipeline steps are specified in the CI configuration.

    Pipeline steps may specify that they need a specific worker, or a worker with specific attributes, on which to run.

    Pipeline steps may be templated with parameters, just like pipelines.

  • A trigger alerts the CI system that a pipeline may need to be run. For example, a pipeline might be triggered by a change in the version control repository, a timer elapsing, or another pipeline finishing successfully.

  • The CI controller decides when to run each step in each pipeline and on which worker and what to do if something fails. The controller is probably a daemon that can be queries and controlled via an HTTP API.

  • A CI worker runs each pipeline step. Each worker gets commands from the controller, and returns results to the controller. Workers have names, and attributes (key/value pairs), and pipelines or their steps may be restricted to specific workers or types of workers.

    Communication with workers is probably purely over ssh. Workers should not be required to have anything much installed, apart from an ssh server.

  • An artifact repository is where outputs from the build processes get stored. A CI system might any number of artifact repositories: for example, if it builds both Debian and RPM packages, they would be stored in different repositories.

  • Pipeline output is everything produced by running a pipeline that isn't explicit artifacts. For example, the build log (output from all the commands run by the pipeline), measurements of resource usage, etc.

  • A project dependency is another project, which is needed to build a specific project.

  • A pipeline parameter is used to create a pipeline instance from a pipeline template, and fixes a generic value for the template in a manner suitable for the specific project.

  • A workspace is the directory where pipeline steps are run. It is initially populated with the source data, and nothing else. Each pipeline step may, however, modify the workspace, e.g., by building source into binary code.

    The workspace is effectively created from scratch for each run, and deleted afterwards, though this may be implemented via things like "git clean -fdxq", possibly.

    A workspace be a checkout (git clone) or an exported copy (git archive) of a version control system.

  • The user interface of the CI system is mostly detached from the actual working logic, and may be implemented in various ways, including command line and a web application.

  • The CI system may produce reports, whenever a pipeline has finished running. A report is produced by code that is specified by the user, and is triggered by a pipeline finishing, whether successfully or not. The reporting code gets access to all runs, artifacts, and related metadata gathered by the CI system.

A non-useful visual representation

Example configuration files

In this section I show some examples of how configuration files might work. Configuration files are in YAML, for human readability and editability.

A pipeline step:

pipeline_steps:
- name: create_upstream_release_tarball_from_git
  needs_vars:
    # These are automatic.
    - project_name
    - project_version
    - arfifacts
    # This is a custom variable.
    - compression_level
  shell: |
    pv="{{ project_name }}-{{ project_version }}"
    git archive --prefix="$pv" HEAD |
      xz "-{{ compression_level }} > "{{ artifacts }}/$pv.tar.xz"
- name: run_make_in_directory
  needs_vars:
    - build_subdir
  shell: |
    make -C "{{ build_subdir }}"
- name: copy_out_artifacts
  needs_vars:
    - arfifacts
    - build_subdir
  shell: |
    cd "{{ build_subdir }}"
    cp *.html *.pdf "{{ artifacts }}/."

A couple of pipeline templates:

pipeline_templates:
- name: release_source_code
  steps:
  - create_upstream_release_tarball_from_git
  defaults:
    compression_level: 9
- name: build_docs
  steps:
  - run_make_in_directory
  - copy_out_artifacts

A trigger:

trigger:
  name: version_control_repository_has_changed

FIXME: No idea yet how to implement triggers, especially if they're supposed to be triggered by HTTP requests. One idea is that each trigger becomes an endpoint in the controller HTTP API: https://controller/trigger/version_control_has_changed/obnam. Triggering would then happend by calling the API endpoint.

A project:

projects:
- name: obnam
  version_control_repositories:
  - type: git
    url: git://git.liw.fi/obnam
  - type: git
    url: git://git.ilw.fi/obnam.debian
    dirname: debian
  workers:
  - operating_system: linux
  pipelines:
  - name: release_source_code
    triggers:
    - version_control_repository_has_changed
    vars:
      compression_level: 1
  - name: build_docs
    triggers:
    - version_control_repository_has_changed
    vars:
      compression_level: 9
      build_subdir: manual

A worker:

workers:
- address: ick@ick-debian8-amd64
  attributes:
    # Note: most of these will be determined automatically
    operating_system: linux
    linux_distribution: debian
    arch: amd64
    cores: 1
    memory: 4GiB
    disk: 1TB

FIXME

These are notes from a discussion I had with a friend years ago, when I was tired of Jenkins and before I wrote Ick. Saved here for inspiration.

  • Traceable.
    • All configurations stored in git.
    • All build logs, artifacts, etc, stored in some sensible manner. (Too bad git isn't it. git-annex maybe.)
  • Practically programmable
    • All configuration changs, job running, queries, etc, can be done using a simple API.
  • Extensible.
    • Clean, stable APIs for plugins and external automation to use.
    • All features done by plugins, including standard features.
  • Scaleable.
    • Handles large numbers of projects, jobs, workers, etc.
    • Speed.
    • UI.
  • Hacker UI.
    • Command line driven.
    • Service/daemon as well.
    • Notifications via e-mail, desktop, rss feeds, etc.
  • Stable.
    • Doesn't change underneath me all the fucking time.
  • Supports CI vs release builds.
  • Interoperative.
    • Does not try to do all things, but interoperates with other services when sensible. Builds on top of existing good tools.
  • Web UI.
    • At least read-only.
    • Might allow job creation, launching, configuration, as well, but not necessarily.
  • Good monitoring, tracking, reporting.
  • Good querying.
  • Reliable, robust.
  • Easy to setup and keep running.
    • Does not require frequent tweaking, even when new versions are released.
  • Unixy.