Six months ago, when we launched our Web API, we provided twelve endpoints through which developers could retrieve Spotify catalog data. Today the API has 40 distinct endpoints and more are being added all the time. In this post, I’d like to take you on a brief tour of the API and show you some of the programs that have already been developed with it. Continue reading
There have been many efforts during the past few years to raise awareness of gender equality in the IT industry. But it’s been slow progress – and we don’t like slow! So we decided to do things differently. Instead of just hoping to achieve gender equality, we made it a requirement for our hackathon “Diversify”.
At Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million songs. Our users have a choice to follow thousands of artists and hundreds of their friends and create their own music graph. On our service they also discover new and existing content by experiencing a variety of music promotions (album releases, artist promos), which get served over our ad platform. These options have empowered our users and made them really engaged. Over time they have created over 1.5 billion playlists and just last year they streamed over 7 billion hours worth of music. Continue reading
Spotify has built several real-time pipelines using Apache Storm for use cases like ad targeting, music recommendation, and data visualization. Each of these real-time pipelines have Apache Storm wired to different systems like Kafka, Cassandra, Zookeeper, and other sources and sinks. Building applications for over 50 million active users globally requires perpetual thinking about scalability to ensure high availability and good system performance. Continue reading
Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in technique. We hit one of these moments recently at Spotify. One of our critical ad analysis pipelines had issues. First it was slow. Then a few days later it was dead, unrunnable at less than 20GB memory/reducer.
We traced the problem back to a single bottleneck: one expensive join and a handful of overloaded reducers. We solved things by switching up our join strategy, in the process cutting memory usage by over 75%. Here’s how. Continue reading