Why we built a new open source Python data visualization library. Have you ever been frustrated with the complicated experience of making charts in Python? We have, so we created Chartify, an open-source Python library that wraps Bokeh to make it easier for data scientists to create charts. Give Chartify a try, and you […]
Today, we announce that we are open sourcing cstar, our Cassandra orchestration tool. Operating Cassandra is not always an easy task. It has a myriad of knobs you can tune that affect performance, security, data consistency etc. Very often you need to run a specific set of shell commands on each node of a cluster, […]
In this part we’ll take a closer look at Scio, including basic concepts, its unique features, and concrete use cases here at Spotify. Basic Concepts Scio is a Scala API for Apache Beam and Google Cloud Dataflow. It was designed as a thin wrapper on top of Beam’s Java SDK, while offering an easy way […]
This is the first part of a 2 part blog series. In this series we will talk about Scio, a Scala API for Apache Beam and Google Cloud Dataflow, and how we built the majority of our new data pipelines on Google Cloud with Scio. Scio > Ecclesiastical Latin IPA: /ˈʃi.o/, [ˈʃiː.o], [ˈʃi.i̯o] > Verb: […]
TL;DR: securing our Cloud infrastructure is incredibly important. We are now taking another step forward by leveraging open source tools we developed in partnership with Google. Spotify engineering teams are fully embracing the devops culture: to increase development speed every dev team is responsible for their operational pipelines. From a security perspective we are continuously […]
At Spotify, we actively manage more than 800 Google Cloud Platform projects. As such, maintaining a proper security posture at scale has proven to be a challenging task. In an effort to seamlessly audit and strengthen the security stance of our massive cloud infrastructure, we are investing various resources into building our own tools and […]
This is the second part in a series about Monitoring at Spotify. In the previous post I discussed our history of operational monitoring. In this part I’ll be presenting Heroic, our scalable time series database which is now free software. Heroic is our in-house time series database. We built it to address the challenges we […]
A few of us at Spotify are infatuated with RAML – a RESTful API Modeling Language described as “a simple and succinct way of describing practically-RESTful APIs”, extremely similar goal of Swagger. I’m pleased to announce the initial release of RAMLfications, a Python package that parses RAML and validates it based on the specification into Python objects.