Table of Contents:
- System design of Amazon Fresh
Note: Amazon and the Amazon logo are trademarks of Amazon.com, Inc. or its affiliates.
The topic of this article at OpenGenus is system design, namely the system design of a grocery system such as Amazon Fresh or BigBasket or Flipkart Grocery or JioMart or DMart.
The Amazon Fresh application is an online grocery store, within which users can view products, add them to cart, order, and have them delivered in just about any place, where they can either show up to take them(attended delivery) or provide a 2-hour delivery window for the order to be placed in a safe space to be picked later, without their presence being required(unattended delivery).
Before diving right into the intricacies of such a task, let's talk a bit about system design.
Why would we design a system?
Now, if we were to think about most applications that we develop, we clearly wouldn't need to design a system. If the application is scaled relatively low, in other words, has few requests and doesn't get much traffic, designing a system for it isn't necessary at all.
However, as the application grows, things that can go wrong start affecting a larger number of users and begin causing major financial losses.
Those simple problems that you paid no attention to in your small-scale application can have effects hardly imaginable when applied to larger applications.
Here are some benefits of system design:
- more reliable
- cost effective
- greater performance, lower latency
So, when exactly should we start designing a system? What is the threshold that needs to be reached in order for the application to require such a solution?
While there is no exact limit that we can reference, we can confidently say that such a solution is recommended when building a large scale distributed system.
What is a large scale distributed system?
Let's break this down.
Large scale means that the system deals with a lot of data, it is being used by a lot of people, has a lot of requests, needs to be extremely performant, is updated frequently etc.
Distributed means that it runs on multiple servers that need to be abstracted and hidden away from the client (that is, regardless of how many components go into building the system, the client only sees an intuitive, performant application).
Let's get back to our goal. Designing a grocery store such as Amazon Fresh.
One can probably imagine that Amazon Fresh can easily fit into the category of a large scale distributed system. Let's do a bit of math, though. We're going to need to get used to it anyway.
According to a report on the internet, Amazon gets about 197 million users per month. If we were to divide that by 8, the number of store subsidiaries Amazon has, we would get roughly 24 million users per month. It's definitely extensively scaled.
System design of Amazon Fresh / BigBasket/ JioMart
Now that we've established that Amazon Fresh is system design material, let's go ahead and design a system for such a product.
Here are the main concepts we need to focus on in order to successfully create a system design:
- Capacity estimates
- High-level design
The first notion we will turn our attention to is the requirements of the app. Simply put, what functionalities will this product have?
When establishing this point, we need to outline the core functionalities, those without which the application couldn't function properly. There are going to be secondary cases we won't cater for while doing this.
Here is a list of the core functionalities for a grocery system such as Amazon Fresh:
- View all the products, by category
- Search for products
- View product page
- Check stock of product
- Add a product to cart
- Comment on product
- Upload a product
- Assign the delivery
- Track the delivery
Here we will talk about two concepts, depending on the way data is accessed.
Since there are no reports made regarding the average number of products a person views or the number of products that get added daily, we're just going to assume an average based on the number of montly active users.
24,000,000 MAU / 30 = 800,000 DAU
800,000 * 25 = 20,000,000 products viewed per day
5,000 * 3 = 15,000 products added per day
We presumed that the number of monthly active users is roughly 24 million, which would mean 800,000 daily active users if we used the average number of days a month can have (30).
Of those 800k users, let's assume that the average user who opens the platform views about 25 products. That would mean that 20 million products are viewed every day.
Of the 800,000 daily users, let's assume only 5,000 upload products on any given day. Let's say that each of them uploads an average of 3 products daily. This gives us 15,000 products added daily.
Great! Now we can go ahead and talk about storage capacity. This refers to the number of items that are being uploaded or the "writes" of the application.
15,000 products added per day * 5MB = ~75GB per day
We already know from before that roughly 15,000 products get added every day. Assuming uploading a product costs us about 5MB, that would mean the capacity of the storage should be about 75GB per day.
75GB per day
This refers to the number of items that are being accessed or the "reads" of the application.
20,000,000 products viewed per day * 5MB = 100TB per day
We already know from before that roughly 20 million products get viewed every day, so assuming viewing a product costs us 5MB, that means the capacity of the bandwidth should be roughly 100TB per day.
100TB per day
Number of servers needed
Let's have a bit more fun by trying to estimate exactly how many servers a product such as Amazon Fresh would need.
When talking about the servers, we are going to be talking in terms of CPU cores, RAM memory needed and hard drive space(storage) needed.
We will assume each server is higher-end, having 4 CPU cores, 8GB RAM and 120GB SSD storage.
- CPU cores
800,000 DAU / 250 users per core = 3200 cores
3200 cores / 4 cores per server = 800 servers
We know from previous calculations that the website would get about 800,000 daily active users. Each CPU core can typically handle about 250 users, so that means we would need at least 3200 CPU cores in order for the website to function. If we translate it to servers with the properties we set before, that would amount to about 800 servers.
- RAM memory
100TB memory per day / 0.12TB per server = 834 servers
We know from previous calculations that about 100TB of memory is required daily, so we can obtain the necessary number of servers by dividing by the total RAM memory by the RAM memory each server has. This way, we need at least 834 servers for the website to function properly.
- SSD (Storage)
75GB per day / 120GB per server =~ 1
Since we worked out that about 75GB of data is getting stored every day, we can divide that number by the capacity of a server. Here is where we realize that the application really doesn't save that much information, and is, therefore, read-driven. 1 server would be sufficient if we were only relating to storage.
So, after all those computations, we can work out how many servers we really need. The results we got from all the computations were 800, 834, and 1. To be sensible, we should choose the largest one and then some.
A great number here would be about 900 servers, to equate for any failures that may occur.
Within this category, we will deal with two concepts:
- Database design
- Server APIs
First of all, let's design the database.
Most important of all, we shall need a User class, with an user id by which it can be identified and other relevant information that can be stored in order to make the process of ordering straightforward(such as address, email address, name etc).
Then, a Product class will be required, storing an unique id, the user id of the person that posted it, and other information that can be displayed on the product page to help customers understand what it is that they're buying (title, description, images, price etc).
One functionality we mentioned in the requirements was commenting on a product. To be able to do that, we are going to need a Comment class, which will store a comment id, by which it will be identified, a product id, by which the product on which was commented will be recognized, and an user id, which will point to the user who composed the comment. Moreover, we are going to need to store the text of the comment and the time at which it was added.
Another concept we will need to consider is the shopping cart. We are probably going to need a ShoppingCart class, which will store an unique id and the user's id. Then, we're also going to need a ShoppingCartItem, which is going to have its own id, its cart's id, the product's id and a quantity.
Moreover, we will probably need to have a table dedicated specifically to stores. A Store class should contain an unique id for the shop, a geo-hash(more on this later, but, in short, it computes the store's location) and any other optional properties we might feel are necessary(such as zip code).
One last class we will need is the Driver class. The main purpose of this will be to track the location of the drivers. This class should include the following properties: an unique id for the driver, their geo-hash, and any other relevant information such as first and last name.
Here is a diagram of what we have discussed so far:
Another important part without which our application wouldn't be able to function are the APIs where information is stored and from where it can be retrieved.
Let's look at our requirements and devise server APIs for those that need it.
- Viewing all the products by category => Here, we'll need an API that retrieves the products based on the category they are in. Let's assume we also want to only load a few at a time, so the application doesn't lag(if there are a lot of products and we request all at the same time, it has a high risk of doing so). A getProductByCategory(category, limit) API could be reasonable.
- Search for products => Although it still has to return a number of products that match a criterion, it is not based on category, so we shall need a new API for it. getProductByTitle(query, limit) could be good. Of course, we would construct the function so that titles that match the query parameter would pop up, not only exact replicas of the query.
- View product page => While a product page is standard, the information presented in it is specific to the product and, therefore, has to be obtained from a specific API. getProduct(product_id) is the go-to here.
- Comment on product => This is the first API with a method other than GET. Here, we are not asking for anything, but rather we are adding something to our database. This method is called POST. postComment(user_id, product_id, body, time) should be sufficient. We are specifying the user who posted the comment, the product on which the comment was posted, the actual text of the comment and the time at which it was posted.
- Upload a product => This is another API with a POST method. Let's call it postProduct(user_id, info). The user_id is the ID of the user who posted it, and the info is an object that has all the relevant information of the product, since it would be tedious to have to write all properties down every time we call the API.
- Add to cart => This API also uses a POST method. We could call it addToCart(user_id, product_id, quantity). This would create a new shoppingCartItem and append it to the user's existent shoppingCart.
Take a look at the summary of the APIs of this product:
Achieving other requirements
Here, we'll talk about how to achieve the requirements that don't necesarily need a built API, but rather are better off using external services.
- Check stock of product => In order to be able to add a product to cart, we first have to check whether that product is available in stores near our location. What this should do is take the current location, decide which facility is nearest, access the inventory database, and check the availability of the product in said establishment. This is usually achieved with an inventory management system, rather than making calls directly to the database. If we receive a positive response, we can go ahead and allow adding it to cart. Otherwise, we should disable the 'add to cart' function and indicate that the product is not available in an intuitive way.
- Order => This is the most dangerous tass we have to deal with here, because it involves real world implication, namely money. Instead of risking anything, we could use other already established products that deal with this in an efficient way, called payment processors. Such tools would ensure payments on our application are done safely. Some of the most famous payment processors are Stripe, PayPal and Square.
When adding products to cart or ordering said products, we also have to consider the inventory. So, for example, if one user adds a few products to cart, the purchased quantity of those products has to be marked inavailable for a standard amount of time displayed on the website. When ordering it, the product has to be removed all together from the database. This is done through the inventory management system.
- Assign the delivery => Here, we are going to focus on how drivers are assigned to the delivery of goods.
Keep in mind we are working on making an efficient app. Thus, in order for someone to be assigned to deliver the order, we will need to find the quickest and most performant way to do so. In this case, the most efficient way to assign a delivery is to find the closest driver.
We already know the location of the store that's being ordered from (check the geo-hashing algorithm explained at the additional resources for a bit more information on this). What we need now is to know the location of the drivers. Sure, we could request their latitude and longitude once and then hash it as we did for the stores, but it's likely not going to work, since, unlike stores, drivers move around.
The most efficient solution here would be to keep track of the driver's area in a driver table, which we will need to add to our database solution.
The way this would work is by using a driver update service. Each driver would have to run a software on their phone, that would request their location(latitude and longitude) periodically, such as every 10 seconds. The driver update service should be called every time and convert or call another service that converts the latitude and longitude to geo-hash. Then, based on their geo-hash, the drivers could be assigned orders.
- Track the delivery => In order to track the delivery, we need to establish a connection between the client and the driver. This would likely be done by using a load balancer, so that they can be connected to the same server, and then proceeding to send the latitude and longitude of the driver to the client. There could be a debate over the protocol being used to send the data, but something like HTTP could be just fine.
Now let's look at the actual components into play in more depth.
We may want to add other components, such as load balancers, cache and others in order to improve the performance of our app.
Here is an extremely abstracted solution:
Keep in mind that this is very broad. There could be a lot more done for optimization. For example, a large platform like Amazon would probably have a lot more servers and many of load balancers.
My goal here is to emphasize that the client, server and database are clearly not the only instances in this system. A large scale application would probably require a CDN and caching for improving their performance and saving money, considering that reading information from memory is about 100 times faster than from the database. Also, caching would not only be done on the database, but rather also on additional layers, such as the DNS, the CDN, and the application server.
Moreover, we added the load balancer to prevent single-point failure. Imagine what could happen if the whole product would depend on one single server. Any difficulty to that server and the app is down! This is why scaling horizontally is amazing.
Recommended technologies and algorithms
Let's discuss one final point, that is, the technologies and algorithms that can be used in order to create all that we talked about today.
In the matter of the database, a SQL, relational type of database would be recommended. It would be the best choice because it synchronizes different objects easier, which is very useful when dealing with sensitive actions, such as payment.
Some examples of databases that could be used:
There are a lot of technologies that can be considered for this task. Here, the most important criterion would probably be the programming language. Some technologies are easier to use than others, so researching this is also important.
For this application, I would probably go with Django, since it is easy to use, authentication can be done easily, and it is designed precisely for large scale application development. Django uses Python.
When choosing a network protocol, we are going to want to focus on security most of all. We can choose HTTPS if we'd prefer a stateless protocol, or something like SSL if we'd rather a stateful protocol. This, of course, also depends on the request that is being made and every detail should be taken into account, since, in a large application, any delay that may seem minor can compound.
Load balancer routing method
Let's also decide on which load balancer routing algorithm is the best.
The best routing method for such a website would definetly be IP hashing algorithm. Described simply, this hashes the user's IP the first time they log onto the website, and then redirects said user to the same server every time they access the application.
One additional point that is worth making before we end this article is geo-hashing.
This algorithm enables us to do something we spoke about at serveral earlier points, but didn't go too deep into. All throughout the article, we discussed being able to locate the nearest store to the client, and then to the driver when assigning the delivery. But how exactly can we tell what each store's location is?
One great way would be a using geosharded database for the stores. What this means is that we would take each area as a 'box', give it a hash, and then start dividing the area by a constant number(for example, 3) in smaller 'boxes'. Each of these smaller boxes need to start with the same hash as that of the parent 'box', and then have added to them one unique character at the end. As it goes in deeper and deeper, it accounts for each location on the map.
Then, when we get the latitude and longitude of an user or driver, we will know which store they are closest to.
The hashing API should be taken from external sources, especially since applications that have their focus on geolocation have worked tremendously to improve theirs. One such great example would be Uber's.
In conclusion of this OpenGenus article, a system such as Amazon Fresh/ BigBasket/ JioMart is not at all easy to design nor build.
Our goal here was to design a framework, a frame of reference of how such a system can be achieved. Intermediate solutions can also be added, that is, tools by various companies that can enhance the users' experience or make the developers' work easier.
I hope that this article at OpenGenus leads you to consider what goes into designing and building large scale distribted systems and provides you with a frame of points to follow when doing so.