System Design of Snapchat
Do not miss this exclusive book on Binary Tree Problems. Get it now for free.
This article will give an overview of the architecture, infrastructure and System Design of Snapchat, as well as provide some insight to the company (Snap Inc.'s) history.
Table of Contents
- Introduction
- Design Terms
- Beginnings
- Growth to Problems
- Modernization
- The Various Apps
- Monolith to Microservices
- Current Day
Introduction
Snapchat is an American-developed social media app released in September 2011 that supports instant messaging, picture sharing, and more. The developer of the app, Snap Inc., is a camera company focused on reinventing the camera to improve the way people live and communicate.
Notable Features:
- Messages and pictures (or snaps) are only available for 24 hours
- Supports story-sharing within groups
- Snap Map - Users are able to see the location of their friends
- Memories - Users are reminded of the images they saved or uploaded through stories a year later
Snapchat is primarily popular among the younger generations, especially teens. Additionally, the app has approximately 319 million daily users and 5.4 billion daily snaps.
Design Terms
Some terms to acquaint yourself with:
Monolith (Monolithic Architecture): A single-tiered application that runs independent of other programs. A monolith is built to execute and handle all operations that are needed to run a task. From end-to-end the application executes every functionality.
Pros:
- Easy to deploy
- Easy to scale
- Easy to develop
Cons:
- As the code base grows, it will inevitably become harder to understand.
- The larger the monolith, the longer it takes to load
- To update one component, the entire monolith has to be redeployed. Continuous deployment becomes more difficult.
- Scalability - We mentioned that its simple to scale, but it can only scale in one dimension. We cannot scale an individual component (ex: UI/UX).
- Hard to update the technology stack
Microservices: A direct opposite to the monolith. Microservices refer to an architectural style based in structuring an application as a distribution of services. These services are meant to operate different components of an application. An analogy: a customer orders, a waiter takes and delivers the order, a chef cooks that order. In this example, each component operates independently and separately of each other -- no one precisely knows what the other is doing, similarly none of these components have access to the exact same information.
Pros:
- Continous delivery is easier. For example, a UI/UX component is able to update and redeploy independently at the rate it wishes to without affecting the rest of the architecture services.
- Maintainability - Because each service is separate, its easier to understand and transform.
- Testability - Each separate service can be tested independent of each other
- For development, engineering teams can be easier divided into specialized teams assigned to a individual and specific component.
- An error in one service won't necessarily bring down the entire application, in contrast to an error in a monolith.
- Easier to update the technology stack.
Cons:
- The complexity of a distributed system and its additional components and mechanisms
- Interaction between services is more important, but more difficult to test
- Memory consumption may be significantly more, as each service runs independently.
JSON (JavaScript Object Notation): JSON is used as a means to represent JavaScript objects, literals, arrays, and data in a text-based form. This text-based form is meant to be easy to read and write and easy for software to parse. JSON is heavily used to exchange data and information between servers and web applications.
Orchestration: Orchestration in this context is defined as automating tasks together. These tasks may include configuration, coordination and management of computer systems and software.
Proxy: A proxy acts as an intermediary between a client requesting a resource and a server providing it.
Service Mesh: A software design pattern, a service mesh implements a layer on top of an infrastructure layer to enable manageable, observable and secure communication between services using a proxy.
Growth to Problems
Snapchat began as a monolith hosted in the cloud, built upon the Google App Engine. But as the app grew, gaining additional users and more data, scalability started to emerge as an issue. Additionally, systems-wide disruptions were more likely to occur with a large blast radius within the monolith. Snapchat blogs described one of their issues as a "tragedy of the commons" where features were competing with each other over accessibility to resources; features were loading at app startup time, allowing certain features to load faster but the rest to load slower.
Additionally, from a development perspective, engineers wanted clear visibility, separation, and ownership of their components, to provide flexibility and efficiency to the service.
Modernization
Eventually, as Snapchat grew, the company recognized the need to deconstruct their monolithic architecture into smaller and more-efficient parts. To that end, the company looked to implement a microservices-based architecture to support lower latency. To achieve their efforts, Snapchat decided to rewrite their app around Amazon DynamoDB -- a NoSQL database service, built for scalability. At the end of their efforts, the company achieved a reduction of median latency by 20%.
To start, the company rewrote its app into several smaller apps. From the start, Snapchat consisted of several apps, a camera, chat, memories, photo-editing, content consumption, and a map. Though combining these apps into one singular monolith was pleasant for the users, it also presented a difficult engineering challenge in keeping performance high.
The company implemented several ground rules for a rewrite. Don't preload, each feature is its own app, and make it fast. To support the rewrite, Snapchat stopped changes in many areas, making the rewrite a purely engineering challenge.
The Various Apps
Snapchat's camera app contains various features such as lenses, filters, bitmojis, and options to add augmented reality animations. Additionally, Snapchat's chat app hosts the ability to save pictures, save chats, add emojis and more. Snapchat's map has the ability to track friends, if they want you to, among other things.
Snapchat's additional apps: memories, photo-editing, and content consumption also have their own features. Memories allows photos or videos to be saved or edited for later, or posted or sent at a later date. Additionally, photo-editing allows users to trim videos, add text, add stickers, and more. Content consumption refers to Snapchat's external content which it displays to users based on a variety of factors.
Monolith to Microservices
At the time, the app relied heavily on JSON to to make network requests. However, JSON was expensive and inefficient to parse. To solve this issue, Snapchat turned to a centralized network manager API to hide the usage of JSON as an implementation detail.
With microservices comes the issue of maintaining application states, communication between services, and failure management. To develop a robust and reliable system, Snapchat utilized open-source technologies such as Temporal to solve issues with orchestration. With that the company looked to implement a service mesh design pattern. To achieve this pattern, Snapchat turned to Envoy, another open-source service, acting as a proxy. Envoy handled how service traffic flows through the infrastructure allowing transparency for developers when it came to identifying issues.
Inside the service mesh, Snapchat implemented an internal app called the Switchboard. Switchboard operated as a control panel to control Snap's services, to shift traffic, manage service dependencies (feature that manages a service based on the state of others), and drain regions. Switchboard was used in lieu of exposing the entirety of Envoy's API, to reduce the complexity of possible configurations within services.
With the service mesh, Snap had a shared internal and regional network for their microservices. Within the same region, sevices were able to communicate with each other without having to go over the public Internet and no external network traffic could communicate with the elements of the internal network. For security, only one system would be allowed to expose themselves to the internet, the Gateways. Easily enough, the gateways, the API gateways, would function as front doors processing requests from clients/users and running them down the network.
Below is a simplified depiction of the service mesh. The red lines represent user traffic and the black lines represent how configuration updates travel across the system.
A Final General Overview
External traffic from Snapchat users runs through the API gateway to the multiple features present on the app. Requests by users are managed by servers to change configuration states which sends data and information back to the various services within the app. Overall, the current architecture of Snapchat can be likened to multiple apps operating on a unified operating system which is the Snapchat app.
With this article at OpenGenus, you must have the complete idea of System Design of Snapchat.
Sign up for FREE 3 months of Amazon Music. YOU MUST NOT MISS.