How we use Python at Spotify

March 20, 2013 Published by Geoff van der Meer

The most frequent question we heard at PyCon this weekend, was how do we use Python at Spotify. Hopefully this post answers the question!

At Spotify the main two places we use Python are backend services and data analysis. Python has a habit of turning up in other random places, as most of our developers are happy programming in it.

Spotify’s backend consists of many interdependent services, connected by own messaging protocol over ZeroMQ. Around 80% of these services are written in Python.

The non-Python services are typically written in Java, although we do have a few using C or C++.

Speed is a big focus for Spotify. Python fits well into this mindset, as it gets us big wins in speed of development. We also make heavy use of Python async frameworks to help services that are IO bound. Earlier services were written using Twisted, and in the last few years we’ve preferred gevent.

Some services are compute bound, and we’ve tried a range of strategies for how to handle this in Python. This has included performance testing, profiling, cython, and native libraries.

Data analysis

Spotify teams make heavy use of analytics, both in decision making and within the product itself. To simplify interactions with Hadoop, we use our Luigi package.

Luigi allows you to quickly build complex pipelines of batch jobs from your own machine. It handles the bundling of required libraries, and brings back any error logs to your local machine. This means you can quickly prototype complex data jobs.

We use Luigi, along with a range of machine learning algorithms, to power our Radio and Discover features, as well as recommendations for people you may want to follow. Simpler jobs power things like our top lists.

Around 90% of our map reduce jobs are written in Python. When it’s going all out we have seen over 6000 Python processes running over the hundreds of nodes in our Hadoop cluster.

Other uses

Spotify squads often use GraphWalker to do model based testing of both user facing clients as well as some APIs. To simplify the integration with our Python services, we ported the GraphWalker runner to Python.

Python is also used for prototyping services, quick scripts, build processes and more. There is even a Django app or two!

Community

Part of what makes Python so special is the community around it. Spotify is involved in the community in a number of ways.

We sponsor conferences such PyCon and Euro Python, provide support to local groups such as the Stockholm Python User Group and NYC PyLadies, host hackathons and contribute back to open source projects.

We are always interested in doing more for the community, so please get in touch if there is something we can help with.

Our team had a heap of fun at the recent PyCon and PyData. It was my first, and I had an amazing time meeting so many people and learning from both the talks and the hallway track!

If you’d like to work with Python at Spotify, we’re hiring in New York, Stockholm, San Francisco and Gothenburg. Or just drop by one of our offices and say hi!


Tags: , ,