Personalization at Spotify using Cassandra

  By Matt Brown and Kinshuk Mishra At Spotify we have have over 60 million active users who have access to a vast music catalog of over 30 million songs. Our users have a choice to follow thousands of artists and hundreds of their friends and create their own music graph. On our service they also […]


Solving MapReduce Performance Problems With Sharded Joins

Sometimes the answer to a sluggish data pipeline isn’t more power in the Hadoop cluster, but a shift in technique. We hit one of these moments recently at Spotify. One of our critical ad analysis pipelines had issues. First it was slow. Then a few days later it was dead, unrunnable at less than 20GB […]


Data Processing with Apache Crunch at Spotify

All of our lovely Spotify users generate many terabytes of data every day. All the songs that are listened to, all the playlists you make, all the people you follow, and all the music you share. Somehow we need to organise, process and aggregate all of this into meaningful information out the other side. Here […]