Search anything:

How Spotify went down after an outage?

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

On March 8, 2022, Spotify faced an outage and Spotify went down for 2 hours. Let us see what happened and what concepts we can learn from this outage.

  1. Spotify’s architecture is built on a lot of different microservices. These microservices are for different purposes. A microservice for the artist. Another microservice for the songs.

  2. Each of these microservice can be deployed on a different machine or even two microservices on the same machine. Now, how do these microservices interact with each other?

  3. One way to interact can be to interact using the DNS address of the service. Exactly the way, we interact with websites is when we put an address in the address bar like “www.spotify.com”. After DNS resolution, we are returned an IP address that we can use to call the required API.

  4. But, this takes a lot of time if a single operation requires multiple API calls from different microservices internally. For example, to display the details of a song, spotify has to call image service to get the song image, the artist service to get the artist details and then the song service to get the song details.

  5. Now, if we go by DNS route, it will take a lot of time. To prevent this, there comes a techique calles service mesh which makes use of the sidecar pattern to allow connections between the microservices directly without writing any extra code.

  6. A sidecar instance runs beside each microservice which contains the mapping of addresses of other microservices and allows the current microservice to connect with other microservices. This sidecar container is reusable, so no new code is required when onboarding a new microservice.

  7. All these containers are connected to the control plane which manages all the mappings and updates all the sidecars whenever there is some change.

  8. Now, in the case of Spotify, this control plane went down preventing the microservices to connect with each other.

As a solution, Spotify switched connections through DNS based approach for microservice communication.

How Spotify went down after an outage?
Share this