System Design of Instagram

Internship at OpenGenus

Get this book -> Problems on Array: For Interviews and Competitive Programming

In this article, we have explained the System Design of Instagram which is one of the most used Internet services today with over a Billion users. We have mentioned the technologies used in Instagram.

Table of contents:

  1. What is Instagram?
  2. Requirements of Instagram
  3. Scale of System
  4. System Component Design
  5. News Feed Generation
  6. Choice of technologies

What is Instagram?

Instagram is a social networking service that allows users to upload and share photos and videos with other users.

Requirements of Instagram

Functional requirements

  1. Users can upload photos and videos.
  2. Users can follow other users.
  3. Users can like or comment on posts.
  4. Users can view the News Feed consisting of posts from the users they follow.
  5. Users can send and receive messages from other users.
  6. Users can search for other users.

Non-functional requirements

  1. The system should be highly available.
  2. The system should be highly reliable; It should ensure that any uploaded photos or videos are never lost.
  3. The system should be highly scalable.
  4. The system should ensure that the latency of News Feed generation is 200ms.

Scale of System

  1. Number of users
    Instagram currently has approximately 1 billion users.The system has a high level of traffic which most is read-heavy as the number of users consuming content is higher than the users uploading content. The estimated read-to-write ratio is 80/20.

  2. Possible actions include:

    • Like and comment on posts
    • Save posts to a collection
    • Search for users
    • View user stories
    • Post images and videos
    • Chat feature(direct message)
    • Add tags to photos
    • Search photos on tags
  3. Peak times:
    Instagram currently has approximately 500 million daily active users(DAU). Considering a default peak time of 11 AM, the traffic volume increases as compared to the average traffic volume. This creates the need to optimize the system to handle the increase in traffic. One adjustment that can be made to handle the request influx is to use a load balancing algorithm that can handle a sudden heavy load to ensure the reliability of the system.

System Component Design

The two main features that the system supports are photo/video upload and viewing photos/videos.
Since most web modern servers have a connection limit, using one service for both photo reads and writes(uploads) can cause a bottleneck as photo uploads are slow as compared to photo reads.
To optimize the system, we can split the read and write services into two separate services as shown in the diagram below.
download-

This allows for the two services to be independently scaled and optimized.

News Feed Generation

The News Feed section is the first screen visible to the user once they open Instagram. It contains the latest, most popular, and the most relevant posts and stories from the people the user follows.

Every user's news feed is unique as most users can be following different people. To generate the news feed of a particular user, the server needs to fetch a list of the people the user follows then retrieve the metadata(upload_time, likes, comments) for each of the photos. These photos are passed to the ranking algorithm that determines which photos will be in the news feed.
The downside of this approach is that it takes a lot of time to generate the news feed as it needs to perform multiple queries and the ranking process.
To reduce the latency, the news feed is pre-generated. There are dedicated servers that specifically generate the news feed and store them in a NewsFeed table. The user can now directly fetch their latest news feed from this table.

Methods of serving the News Feed

  1. Pull: in this method, the users can refresh their news feed by making a pull request to the server at regular intervals. This approach forces the user to make a pull request to see new posts.
  2. Push: in this approach, the servers can push new data to the users as soon as it is available. This method uses HTTP Long Polling which is a technique where a server keeps a client's connection open for as long as possible. This ensures the users can receive updates immediately they are available on the server.
  3. Hybrid: in this approach, the pull-based approach is used for users who have a high number of followers while the push-based approach is applied for users who don't have many followers.

Choice of technologies

Databases

  1. Photo and video storage: There are two popular technologies that can be used.
  • Amazon S3(Amazon Simple Storage Services) is a cloud-distributed object storage system. It is an excellent option due to its scalability, data availability, security, and affordability.
  • Hadoop Distributed File System (HDFS) is a distributed file system that runs on commodity hardware. It is highly fault-tolerant and has high performance as data is stored locally thus the high read and write speeds.
  1. Database Schema
    For the database schema, the are two options, Relational Databases i.e MySQL and NoSQL databases. However, since relational databases are difficult to scale, NoSQL databases would be better suited for the system. Apache Cassandra is a distributed NoSQL database. Cassandra is horizontally scalable, highly available, ensures no single point of failure, and can handle massive volumes of data. Cassandra offers high velocity writes without affecting the read efficiency.

User and Photos table

This table will store the user information such as email, username, etc once they sign up for the service. Since the photos are too large, they cannot be stored in the database. The photos are first uploaded to object storage then the URL path to the photo is stored in the photos table. The photos table will also the user id, upload date, etc.

The users will have a one-to-many relationship with the photos as users are allowed to upload multiple photos.

Screenshot-from-2021-11-29-10-38-49

Load Balancers

A load balancer is typically placed between the client and the server to distribute the clients' HTTP requests across the available backend servers.
This eliminates the single point of failure problem thus increasing the availability and throughput of the system.
There are several load balancing algorithms available such as Round Robin, Weighted Round Robin, Least Connection, Least Bandwidth, etc.
The round-robin algorithm is widely used as it's easy to implement and it has less overhead.

Cache

Caching frequently used data improves the performance as the server queries the cache instead of the entire database. This enables for faster reads which is more common than writes in Instagram

There are two common caching solutions Redis and Memcached.

Redis is an in-memory data structure store that can be used as a database, cache, and message broker.
Redis is currently widely used as it offers a wide variety of features as discussed below.

  • Redis supports different data structures such as bitmaps, strings, hashes, lists, sets, and sorted sets while Memcached only supports plain strings.
  • Redis allows the size of both keys and values to be as large as 512MB while Memcached allows keys to be as large as 250B and values of up to 1MB.
  • For cache eviction, Memcached only supports the LRU(Least Recently Used) eviction method while Redis offers six different eviction policies such as noeviction, allkeys-lru, volatile-lru, etc.
  • Redis supports on-disk data persistence through a mechanism called snapshotting while Memcached does not support on-disk persistence.

With this article at OpenGenus, you must have a strong idea about the sytem design of Instagram.