Packaging in your packaging: dh-virtualenv

tl;dr

Code: https://github.com/spotify/dh-virtualenv
Docs: http://dh-virtualenv.readthedocs.org/en/latest/

Preamble

At Spotify, we have deployed Python software in Debian packages for a fairly long time. To build them, we have our continuous integration platform with sbuild building and uploading them automatically from each commit. For deploying software this system works fairly well when augmented with puppet. There are, however, some drawbacks on the developer side.

First drawback is the state of the Debian Python packages. Given that we run the Debian stable, the packages, even at the time of the release of the stable version, are often outdated and missing features. This leads to the need of backporting newer packages into our internal debian repositories.

Second, bigger problem, is that when you hire a proficient Python developer, you can safely assume the developer knows how to use Python packaging. However, that won’t fly with our system. In order to push out the software the dev has been working on she needs to know how to package Python inside Debian packages in first place. And Debian packaging has a somewhat steep learning curve.

How the rest of the world does it?

If you look at practically any existing Python project in the outside world, they basically follow the same pattern: setup.py or requirements.txt defines all the installation requirements, and all of them are available for installation from the Python Package Index, PyPI. This all is then installed into a dedicated virtualenv, that makes sure system libraries won’t affect your code.

This is also what developers are accustomed to and the slap in the face is pretty harsh when they see our systems for the first time.

Combining the two views: dh-virtualenv

Debian packaging has one significant advantage over plain Python packaging: It has the ability to define dependencies on system libraries. Using it you can say that installing lxml requires libxml on the target machine. So, in order to have the cake and eat it too, we decided to combine the two great packaging systems and dh-virtualenv was born!

Inspiration behind dh-virtualenv lies in the awesome blog post (and in a series of conference talks) by Hynek Schlawack. While Hynek’s solution uses fpm for packaging, we decided that it would be great to be able to keep using our current sbuild build infrastructure.

How dh-virtualenv works?

Dh-virtualenv works by registering itself in the debhelper build sequence, pretty much in similar way like the current Debian python packaging (dh_python2) does. This way using dh-virtualenv for your package is as easy as build-depending to dh-virtualenv and writing a debian/rules file containing:

%:
        dh $@ --with python-virtualenv

This will bundle all the requirements of your software (defined in a requirements.txt) into a virtualenv, do some shebang-manipulation and instruct the debian package to drop the virtualenv into some suitable location on the target machine. By default this is /usr/share/python/<package-name>, but can of course be customized.

Simplifying deployment with dh-virtualenv: Sentry

At Spotify, we have recently started testing the Sentry (http://getsentry.com) event logging platform for integrating our system logs into our development workflow. In Python, Java, and PHP services, we can add a few lines of code and all of our log messages (and any uncaught errors that we might have missed) are sent to the Sentry service, where they are aggregated. Because the messages are sent from within the programming language, rather than scraped from syslog output, Sentry is able to intelligently collapse messages that have the same formatting string, making it possible for us to better track unexpected behaviour or incidents.

Sentry is great and is super easy to install using pip, which is great for getting a server instance up and running rather quickly. However, we’re not in the habit of using pip to deploy packages in our production environment ̣- having one mechanism for deployment makes it much easier to manage knowledge, build tools, and build processes.
Before dh-virtualenv, we likely would have backported each of the python dependencies (https://github.com/getsentry/sentry/blob/master/setup.py#L64) and built a debian/control file to reference them. This was good because we had to create each package, check it into version control, build the package and ship it to our debian repository, etc. The difficulty of this managed to keep the number of external dependencies low, but made it difficult to deploy a package like Sentry.
Now, we’re able to split the native dependencies from the Python dependencies, and use our local Pypi installation to mirror the package versions that Sentry depends on. This gives us review and control over the code that is going into our production environment without having to convert and twist each of the Python packages into Debian-land.
The main package consists of a single Python file that serves as middleware for Django, making it possible to authenticate to Sentry using our SSO solution. Like any other module, this has a setup.py that contains all of the relevant information about the module. Alongside it, we include a requirements.txt that add Sentry and a few other modules as Python dependencies:
sentry[postgres]==6.2.0
eventlet==0.13.0
hiredis==0.1.1
django-auth-ldap==1.1.4

The debian/control file lets us add any of the native dependencies that are needed for building/testing/executing the Sentry server. We just use this to make sure the dh-virtualenv package is available for the build, along with the other required packages.

Build-Depends:
    python (>= 2.6.6-3~),
    debhelper (>= 8),
    dh-virtualenv,
    python-dev,
    libpq-dev,
    libldap2-dev,
    libsasl2-dev,
Standards-Version: 3.9.3
X-Python-Version: >= 2.6
When we build the package the correct Debian dependencies are put in place, and dh_virtualenv begins a controlled pip install of the Sentry package, pulling in each of the dependencies. Either you can use the global Pypi server, or you can override to use a locally hosted instance in the debian/rules file:
%:
	dh $@ --with python-virtualenv

override_dh_virtualenv:
	dh_virtualenv --index-url='http://localhost/simple'
Once the magic is done, we’re left with a Debian package that can be used to install Sentry (with all of its relevant dependencies) on a Debian stable system. Packaging this way saved us an incredible amount of time and gave us an easy way to control the code that is deployed in our environment. On the live system if we need to drop into the virtual environment to check something, we can just find the virtualenv on the system and run the python executable that is within it (/usr/share/python/sentry/bin/python) and all of the dependent packages are available without any additional magic.

Sentry is a great piece of software, but can be a bit difficult to deploy without using pip as the deploy mechanism. Using dh-virtualenv, we’re able to build a simple focused Debian package that ensures everything is available in a nicely packaged and sandboxed way.

Get it while it is hot

The source code is available in GitHub and the documentation in readthedocs. Pull requests are welcome :)

Huge thanks to Jim Whitehead who wrote the sentry part of this post and to numerous colleagues who helped to review the blog post and the code!