This is the text of the talk I gave at Debconf10, 2010-08-07. See also:
- http://wiki.debian.org/UpstreamGuide -- Debian wiki page for this
- http://files.liw.fi/swimming-upstream.pdf -- slides (5.5 MiB)
I was a Debian developer for many years. Last year I decided to leave Debian, and see what else there is in the world. This is the story of what happened.
In short, I have become an upstream developer instead. For three months earlier this year I worked on a project called Koha. This talk is about what I've learned about Debian from an upstream point of view, and about upstream projects from a Debian point of view.
I don't need to tell you about Debian.
Koha is an integrated library management system. That's the kind of library that has books in it. Koha is used by many public and private libraries around the world. It was originally developed in New Zealand, and has been free software since the beginning. Koha is now about ten years old, consists of thousands of lines of perl code, and is quite a mature product.
I am going to use Koha as an example of an upstream project. My talk is not just about Koha, but about all kinds of upstream projects.
The thing about building a Linux distribution like Debian is that it's not just taking a bunch of upstream code and compiling that. There's oh so much more to do. All the software has to be configured and tweaked and patched to work together.
Packaging an old, mature, somewhat crufty program like Koha for Debian can be a bit of an undertaking. Debian has decided to do many things in a particular way, but so has Koha. It's like two middle-aged people who've been single for decades falling in love and moving in together. They both want it, but both will need to adapt, and both will need to give up some of their little quirks.
In this case, the Koha upstream project wanted to have Debian packages made for Koha. Many of the Koha developers prefer Debian as the platform on which to run Koha, but installation and upgrading from source was a sore point. Not just of Koha itself, but also of its dependencies.
Koha is a web application, and that means it will need to integrate with at least a web server and a database engine. It is written in Perl, and relies on dozens of CPAN modules.
The CPAN modules were the first stumbling block. Not all of them were packaged for Debian. In fact, dozens of them are missing from the Debian stable release. Even if the module is there, it is usually too old for Koha.
Luckily, most of the missing modules were in squeeze. Five were missing. What should an upstream do in this situation? In the Koha case, it was happily easy to make Debian packages for the missing CPAN modules, so that's what I did.
In general, it is probably not feasible for upstream to package missing dependencies themselves. It is also not sensible for upstreams to restrict themselves too much. There are a lot of things outside of Debian that will be very helpful for upstream development. If upstream can make use of yet another CPAN module, that can save them days, weeks, even years of coding. This is code re-use at its best.
- upstream should avoid unpackaged dependencies, unless that's hard
- Debian should package everything so that upstream can make use of the best tools for them
I also joined the Debian pkg-perl team, who were very helpful in getting the packages in good shape, and kindly uploaded them into Debian. This was important so that Koha would not have to provide and maintain the packages outside of Debian. Doing that would have been entirely possible, but would have been more effort, and would certainly have led to duplicated effort, when other people did the same thing. As proof of that, for all five CPAN modules I packaged, the popcon count became nonzero within the first day, even though nothing in Debian depended on them. People use CPAN a lot.
- CPAN makes it very easy to make Debian packages
- pkg-perl in Debian rocks (gregoa especially helped a lot)
With those CPAN modules packaged, I was able to run the Koha test suite. This gave much confidence that Koha would actually work once installed.
The next step was to make sure it could talk to the web server and the database engine. Specifically, Apache and MySQL.
At this point, it turned out that most of the hard questions had already been discussed and solved within Debian. Equally importantly, they had been documented in the web-apps policy and the database policy for Debian. Before my Koha work, I had no experience with these Debian policies, and I did not fully appreciate how helpful they are.
The Debian policies, and the tools written to support the policies, made it very easy to make a rudimentary Koha package that would integrate with Apache and MySQL.
- upstream should have a test suite
- Debian Policy and sub-policies rock
- dbconfig-common: wow
- dh: wow
Well, when I said it was really easy, I was simplifying things a bit. There's a feature in Koha that made it a little bit more work than should have been required.
The Koha web application is divided into two parts: a public interface for library customers, and a private interface for library staff. For no good reason at all, Koha decided that these should be on different URLs. In order to have a Koha package that worked out of the box I had to use two different ports. That's ugly, and not without problems.
I expect Koha to fix that in the future. It would be better if they did not require different URLs for the two sites, and instead had a single site.
This is an example of the kind of thing upstreams do when they don't know what it means to build a distribution. They make design decisions in the dark. And the decisions make sense to them, and probably work fine from their perspective. The trick is to get them to see why Debian's point of view is right.
- upstreams will make design decisions that make sense for them, but seem stupid for distros
- distros should work with upstreams to fix these things
The next step was to make it possible to run multiple Koha instances on the same host. The Koha packaging work I was doing was funded by a New Zealand company, Catalyst IT, who will provide a hosted Koha solution with full freedom. This is much easier if they don't need to set up a new server for each customer.
Here I ran into some upstream problems. Koha comes with configuration file templates for Apache. I needed to butcher, er, tweak these a lot, to allow them to be easily adapted to multiple hosts.
There were several other configuration files as well that needed tweaking. The files were in a bunch of different syntaxes, since they were for tools written by different projects.
The result was a set of configuration file templates that are divorced from what upstream actually provides. When Koha makes changes to its own templates, the templates in the Debian package will have to be adapted. This is obviously an area that is likely to attract bugs. The fix is to port my changes to the upstream files, and I expect that to happen, but did not want to make those changes while Koha was preparing a release.
- configuration is tricky
- Debian should collaborate with upstream to make configuration handling easier
The rest of Koha configuration is in the MySQL database, and that made things easier for me. Koha is already equipped to manage upgrades, and knows how to update the configuration in the database when it gets upgraded. However, it may be that Koha is unusually enlightened about this. I am not familiar enough with web apps in general to know if they all do that. I do know that many apps provide no tools for upgrading config files upon software upgrades.
It would be really great if there was only one or a small handful of syntaxes for configuration files. I am not so fussy about what the syntax is. No, that's a lie. I am very fussy, but I am not going to go into that. I will just say that it would be good if everyone did things in roughly the same way.
One of the things Debian has learned over the years is that so called dot-dee directories are very practical. If an application only reads a single configuration file, and other packages want to add to the configuration, that single file needs to be modified. That modification is quite risky, and quite error prone.
If, instead, the app can read all files in a directory and treat them as if they were a single file, it becomes very easy for another package to add to the configuration.
This is useful not just for other packages in Debian, but also useful for system administrators.
- too little best practice for configuration
- too many syntaxes used
- "stacked" configuration files and config.d directories for everything would rock
- /etc/cron.d, /etc/logorotate.d, /etc/apache2/sites-available, ...
At this point I published my Koha packages, and they received quite an enthusiastic response. Apart from the work to support multiple sites per host, my packages were really quite simple. They did not even use debconf to provide a smoother installation experience, and required people to do some manual configuration after the install to finish Apache configuration, etc. Even so, Koha users were very happy about them.
Afterwards, I gave a tutorial on making Debian packages. There was quite a bit of interest in that, and one participant later told me he'd started looking at everything as a potential new Debian package.
- even simple packages make life better for users
- package everything (part 2)
- teach people to package so they can package everything
Koha is an old project. Perhaps unusually, it has a clear history, preserved in git. What it does not have, is clear copyright statements in each file. Indeed, many files have copyright statements that are clearly wrong. All the correct data is available, but it needs to be extracted from version control commits.
This kind of thing is a problem for Debian, who likes to be precise and correct about legal issues.
- upstreams need to be careful with copyright info
- there are few tools for helping with that
There are, of course, things Debian could improve. The biggest improvement, I think, would be a document or checklist for upstreams who want to be easy to package. Some such things already exist, but there should be a canonical one prominently available somewhere. I hope this talk and the following points can be the start of such a checklist. I will not discuss each of these points, since such discussion should happen on debian-devel. Instead, I open the floor for feedback.
- Checklist for upstreams to be easy to package:
- Be clear about dependencies (including versions).
- Avoid versions not available in the latest release of major distros, unless that takes a lot of effort.
- Have an automatic test suite to run during build time.
- If possible, have a test suite to run against the installed software.
- Follow common patterns for configuration and building software. (Debian should provide more specific advice about build systems?)
- Accept improvements to build systems, so distros can avoid using workarounds for a long time.
- Support foo.d for configuration.
- Keep copyright information up to date.
- Be careful about security. Responds promptly to security problems.
- Be mindful about portability.
- Do not embed code of other projects in yours.
- Avoid mixing multiple licenses in your code base.
- Do not invent new copyright licenses. Use a well-known one.
- Have and use an open bug tracker.
- Have and use an open version control system.
- Commit to long-term support of versions of your software in stable releases of major distros. If this is a problem, discuss it with the distros before they make a release.
- If distros tell something in your code makes life hard for them, seriously consider fixing that.
- Keep version numbering simple.
Another thing Debian could do, related to the upstream checklist, is to become more approachable to upstreams. In the case of Koha, they happened to be able to make use of a retired Debian Developer to get things done. In the general case, it would be good if Debian would have a well-known address which upstreams could ask for packaging help. The RFP process is not all that easy for an outsider.