In this article, we shall be diving into how conceptually, a large scale chat messaging application like WhatsApp could be designed. We will also be looking into the specific technology Whatsapp employs for its architecture.
Table of Content
- Key features the system will support
- Defining the constraints and requirements of the system
- Diving into the design
- System Architecture of Whatsapp
Whatsapp is a massive multiplatform messaging application that allows users to share text messages and multi-media. It also has a myriad of other features such as voice and video calls and sharing of statuses among users. Though we shall be sticking to its core functionality
The most basic feature of WhatsApp is one to one chat and group chats among users, so the system will support that. We need to also be able to see when last a person was online and the delivery status of our messages
Whatsapp only permanently stores a few details on the server-side, such as usernames, numbers and other authentication and accessibility information. The only messages it stores are those that are waiting to be delivered to their recipients and even then only for 30 days after which they will be deleted. All other messages and chat histories are stored locally on the client.
Key features the system will support
- One to one chats
- Group chats (up to 256 members)
- Online/Last seen
- Sent/delivered/read ticks
Defining the constraints and requirements of our system
Whatsapp has 2 billion daily active users worldwide(more than the population of all of Europe and North America combined) and over 100 billion messages are sent daily on the platform. Our System has to be highly scalable to be able to accommodate that much activity.
- Real-time chatting with minimum latency. i.e messages must be sent and received in real-time and in the order in which the messages were sent.
- The messaging service should be highly available, but consistency is of higher priority than availability in this case.
- There should be no loss in data, due to the failure of database servers. i.e the system must be reliable.
- The system has to be resilient, it should be able to recover quickly from difficulties like server failure.
Diving into the Design
Sending a message
Let's paint a typical scenario of how a message sent from Alice could get to bob.
Alice sends a message before it is sent to the server it is first stored locally in a database on the client. The client then establishes a connection with a message server and the message is sent to the server. When the server gets the message along with some meta information like sender(Alice) and recipient(bob) ids. It has to look for bob if it finds that bob is also currently connected. It immediately sends the message to bob else if bob is currently unavailable the message is pushed to a queue and once bob connects back to the server, it retrieves the message and sends it to bob.
So far we have mentioned both Alice and bob having to establish a connection with a message server but what sort of connection do we use. In this system, the server also independently sends messages to the client without needing to be prompted by the client (i.e when the server sends a message to bob). We cannot use the HTTPS protocol to establish these connections because in HTTPS the client must always send a request to the server and gets back a response. This is not viable for this use case because we want the server to also be able to send data to the client without needing the client to askSo, we have to use another protocol known as web sockets to establish a connection between a client and a server
Websockets allow for peer to peer connections which means that data flows from both directions arbitrary there is no request/response paradigm. So Alice and Bob both establish persistent connections with the message server via the web socket protocol. We also note that for each connection we spin up an individual thread or process that handles each unique client. So Alice would have her thread on the message server with her queue likewise bob.
We have been speaking of a single messaging server but at the scale of WhatsApp one server will not be enough to handle all that traffic. If we have at least 100 billion daily messages then that translates to at least 48k messages per second. So in practice, we will need a lot more servers and have to distribute connections and requests fairly among all our running servers.
Introducing multiple servers adds more complexity because most likely Alice and bob will not be connected to the same messaging servers. We now introduce a new service known as an API Gateway. An API gateway acts as a single entry point for external requests. It encapsulates the internal system architecture and acts as a load balancer and request shaper. We also need a session service that keeps track of which messaging server a client is connected to.
So, with all the components defined the final flow for sending a message would be, Alice sends a message to the API gateway with bob as the recipient. The API gateway checks with the session service to get the server that bob has a connection with and sends the message there. If Bob is online then the message is sent else it is added to bob's message queue to be sent when Bob is back online.
Group Chat Messages
We can create another groups service whose primary responsibility will be to maintain information about groups and group members. We can have group ids that map to individual user ids of members.
If Alice sends a group chat message it will be sent with the group id and her user id. The message is routed to the session service which gets all the members from the group service and sends the message on its way to all the message servers that the members of the group have connections with. The messaging servers then handle the sending of the message to the clients.
We need a way to know if a user is online or when they were last seen. We can make the client send a pulse after an interval like every 5 seconds. We can create another service that stores each pulse timestamp for all active users in a table that is stored in memory.
So if bob's client wants to know Alice status the API gateway gets the request and queries the pulse service which checks when last Alice's pulse was recorded. If it was within 5 seconds then Alice is likely online. So the server just returns the last pulse timestamp to bob's client.
Message Status (Sent/Delivered/Read ticks)
Each message has to be unique so we can keep track of acknowledgements which means each message will have an id. When Alice sends a message to bob if it gets to the server the client receives an acknowledgement because the connection is formed with web sockets which is built on TCP, so an acknowledgement is always sent back to a client. This acknowledgement corresponds to a message sent tick
When Bob receives the message he also sends an acknowledgement back to the server, the server will then send a notification back to Alice indicating that the message has been delivered to bob which corresponds to the delivered tick.
Finally, when bob opens the chat message the final acknowledgement is sent to the server indicating that the chat has been read, this acknowledgement is then sent as a final notification to Alice which corresponds to the read blue tick
System Architecture of Whatsapp
Whatsapp has a native app for each popular platform. It supports mobile, the web(the browser) and desktop. Below is a list of all its supported platforms and the frontend technology used:
- Android: Java
- iOS: Swift
- Windows Phone: C#
- Mac Desktop app: Swift/Objective-C
- PC Desktop app: C/C#/Java
Whatsapp stores messages locally on each client as it would be too unwieldy to download so much information each time from the server. The technology it uses for this is SQLite, which as the name implies is a lightweight relational database that is self-contained and embedded into clients.
For its backend, WhatsApp deploys a host of technologies, the main programming language is Erlang which is a functional programming language. Erlang is part of what makes Whatsapp able to operate on such a large scale. It was designed to natively support concurrency, it employs a process-based model where we have small isolated processes that can communicate with each other through messages. These processes can also communicate with processes that are outside of the core it runs on. These properties enable it to scale easily and allow for self-healing systems, where if a process crashes a different process can restart it.
The OS that WhatsApp servers run on is FreeBSD, which is a free and open-source UNIX based operating system. It can be argued that FreeBSD is faster and more performant than Linux which is the more common choice for a server OS
The "Bogdan’s Erlang Abstract Machine"(BEAM) which is employed by Whatsapp is a virtual machine that compiles and executes Erlang source code. BEAM is specially designed for highly concurrent applications. It is highly scalable and resistant to failures that might be caused by high traffic loads and system updates.
Whatsapp uses a modified version of Ejabberd. Ejabberd is an open-source XMPP server written in Erlang. Although the protocol WhatsApp uses is a modified version of XMPP. The Ejabberd messaging server Whatsapp uses is heavily optimised for server performance.
Mnesia is a Distributed Database Management System that is written in Erlang. It is used to store relevant data and temporary messages. It provides some special benefits such as Real-time key/value lookups, High fault tolerance and dynamic reconfiguration.
Yet Another Web Server (YAWS) is also an Erlang based web server that supports dynamic content. Whatsapp uses it for storing multimedia data.
With this article at OpenGenus, we have been able to look into how a behemoth system like Whatsapp operates by trying our hands at designing some core features of the application. We then pulled back the curtains a little and looked at the specific technologies WhatsApp employs to be able to serve its billions of users.