RAMLfications – Python package to parse RAML

A few of us at Spotify are infatuated with RAML – a RESTful API Modeling Language described as “a simple and succinct way of describing practically-RESTful APIs”, extremely similar goal of Swagger.

I’m pleased to announce the initial release of RAMLfications, a Python package that parses RAML and validates it based on the specification into Python objects. Continue reading

Understanding the Spotify Web API

Six months ago, when we launched our Web API, we provided twelve endpoints through which developers could retrieve Spotify catalog data. Today the API has 40 distinct endpoints and more are being added all the time. In this post, I’d like to take you on a brief tour of the API and show you some of the programs that have already been developed with it. Continue reading

Diversify – Creating a Hackathon with 50/50 Female and Male Participants

There have been many efforts during the past few years to raise awareness of gender equality in the IT industry. But it’s been slow progress – and we don’t like slow! So we decided to do things differently. Instead of just hoping to achieve gender equality, we made it a requirement for our hackathon “Diversify”.
1 Continue reading

Personalization at Spotify using Cassandra

 

By Matt Brown and Kinshuk Mishra

At Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million songs. Our users have a choice to follow thousands of artists and hundreds of their friends and create their own music graph. On our service they also discover new and existing content by experiencing a variety of music promotions (album releases, artist promos), which get served over our ad platform. These options have empowered our users and made them really engaged. Over time they have created over 1.5 billion playlists and just last year they streamed over 7 billion hours worth of music. Continue reading

How Spotify Scales Apache Storm

Spotify has built several real-time pipelines using Apache Storm for use cases like ad targeting, music recommendation, and data visualization. Each of these real-time pipelines have Apache Storm wired to different systems like Kafka, Cassandra, Zookeeper, and other sources and sinks. Building applications for over 50 million active users globally requires perpetual thinking about scalability to ensure high availability and good system performance. Continue reading

Solving MapReduce Performance Problems With Sharded Joins

Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in technique. We hit one of these moments recently at Spotify. One of our critical ad analysis pipelines had issues. First it was slow. Then a few days later it was dead, unrunnable at less than 20GB memory/reducer.

We traced the problem back to a single bottleneck: one expensive join and a handful of overloaded reducers. We solved things by switching up our join strategy, in the process cutting memory usage by over 75%. Here’s how. Continue reading