Big Data Processing at Spotify: The Road to Scio (Part 2)

In this part we’ll take a closer look at Scio, including basic concepts, its unique features, and concrete use cases here at Spotify. Basic Concepts Scio is a Scala API for Apache Beam and Google Cloud Dataflow. It was designed as a thin wrapper on top of Beam’s Java SDK, while offering an easy way […]


Big Data Processing at Spotify: The Road to Scio (Part 1)

This is the first part of a 2 part blog series. In this series we will talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and how we built the majority of our new data pipelines on Google Cloud with Scio. Scio > Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o] > Verb: […]


Stepping Up the Cloud Security Game

TL;DR: securing our Cloud infrastructure is incredibly important. We are now taking another step forward by leveraging open source tools we developed in partnership with Google. Spotify engineering teams are fully embracing the devops culture: to increase development speed every dev team is responsible for their operational pipelines. From a security perspective we are continuously […]


Google Cloud Security Toolbox

At Spotify, we actively manage more than 800 Google Cloud Platform projects. As such, maintaining a proper security posture at scale has proven to be a challenging task. In an effort to seamlessly audit and strengthen the security stance of our massive cloud infrastructure, we are investing various resources into building our own tools and […]


Monitoring at Spotify: Introducing Heroic

This is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational monitoring. In this part I’ll be presenting Heroic, our scalable time series database which is now free software. Heroic is our in-house time series database. We built it to address the challenges we […]


RAMLfications – Python package to parse RAML

A few of us at Spotify are infatuated with RAML – a RESTful API Modeling Language described as “a simple and succinct way of describing practically-RESTful APIs”, extremely similar goal of Swagger. I’m pleased to announce the initial release of RAMLfications, a Python package that parses RAML and validates it based on the specification into Python objects.