Search anything:

System Design of Amazon

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

In this article, we have explored System Design of Amazon, the largest e-commerce platform in depth.

Note: Amazon and the Amazon logo are trademarks of Amazon.com, Inc. or its affiliates.

Table of contents:

  1. Introduction to Amazon
  2. System Requirements
  3. The system's capacity
  4. Estimates
  5. OpenCart Shopping Cart Solution
  6. Components
  7. Load Balancer
  8. Summary


Introduction to Amazon

Electronic commerce, or ecommerce, refers to transactions carried out through the internet. Ecommerce refers to the buying and selling of goods and services over the internet by individuals and businesses.

Online marketplaces are online platforms that support e-commerce transactions between buyers and sellers, allowing purchasers to display their products and reach a wider audience. Customers like these platforms because they have a large array of merchandise and services from various vendors and providers all around the world.
Amazon is one of the most well-known internet marketplaces in the world. Amazon.com is a global e-commerce corporation that specializes in online retail, computing services, consumer electronics, digital content, and other local services including daily specials and groceries. With net sales of close to 386 billion dollars in 2020, Amazon is the largest e-retailer in the United States. The majority of the company's revenue comes from e-commerce sales of electronics and other products, followed by revenues from third-party sellers, subscriptions, and AWS cloud services. Amazon is regarded as one of the most valuable brands in the world due to its global breadth and reach.

Do you want to build an Amazon? Let’s check out the design.

System Requirements


Functional requirements define the functions that an entire program or one of its components should accomplish in software development. Data input, system action, and data output are the three phases that make up a function. It can perform calculations, data manipulation, business operations, user interface, and a variety of other functions.
To put it another way, a functional requirement specifies what an application must or must not do after receiving data.
The importance of functional requirements is that they indicate software developers how the system should act. If a system fails to meet functional requirements, it is not functioning properly. These are the functional requirements for this system:

User Profile
In the system, there are two sorts of accounts: one will be an Admin, who will be in charge of introducing new product categories and blocking/unblocking users, and the other will be a Member, who will be able to buy/sell things.
Guests may browse items and add them to their shopping carts after searching for them and seeing them. They must first register in order to place an order.

Product On Board
Seller on board the product. Individual sellers offer items one at a time, but Professional sellers can market them in big batches via bulk uploading or inventory management with third-party platforms.
The product will be available to both B2C and B2B clients after successfully listed.

Select Product
User has an option to select or search for a product from a catalogue. Options may be available for each product.

Search Product
Our technology allows users to search for items by name or category. A Product Category will be assigned to each Product.
User may use the search bar to type whatever product selected.

Cart Facility
A shopping cart is a piece of software that makes purchasing a product or service easier. It receives the customer's payment and arranges for the information to be distributed to the merchant, payment processor, and other stakeholders.
Users may use the shopping cart to check out and purchase products.

Proceed To Buy
Credit cards or electronic bank transfers should be accepted as payment methods. Users may give a product a rating and leave a review. The user should have the option of specifying a shipping address for their order. If an order has not yet shipped, users can cancel it. When there is a change in the order or delivery status, users should be notified. Users should be able to monitor their orders to know where they are in the process.

Non Functional

The software's performance criteria and quality qualities are determined by non-functional requirements. Non-functional requirements are crucial because they assist software developers in defining the system's capabilities and limits, which are necessary for producing high-quality software. These are the non functional requirements for this system:

Low Latency
The system should be designed to handle a million or more queries per second on a big scale with low latency. The system must be quick; else, the user would have a negative experience.

High Availability
Assuring that your application service remains available without causing any large - scale failures.
At both the network and application levels, everything from load balancers, firewalls, and routers to reverse proxy and monitoring systems is totally redundant, ensuring the highest level of service availability.

High Consistency
Immediate consistency ensures that the client always sees the most recent data and that the data is secure as soon as it is written.

The system's capacity

In 2020 about 65% of the entire US population has visited one of Amazon’s website at least once per month. This stat proves the dominant power of the ecommerce platform in the US.
The latest data on the website shows the number of Amazon customers is over 300 million.


By 2020, around 65 percent of the US population will have visited an Amazon website at least once per month. This figure demonstrates the ecommerce platform's dominance in the United States.

According to the most recent data on the website, Amazon has over 300 million customers.
Let us assume 65% of the customers visits the site daily.
300 Million * .65 = 195,000,000 million / day
195 Million * 30 days = 5,850,000,000 / month

OpenCart Shopping Cart Solution

EC2 requirements:
CPU no less than 1 core, Memory no less than 1G.

Applicable Scene:
ecommerce, Online store, B2C, B2B, Product information management, international trading

Software Pricing Details
OpenCart Shopping Cart Solution
$0.055 /hr or $482.00 /yr
running on t2.medium

Infrastructure Pricing Details
Estimated Infrastructure Cost
$0.046 EC2/hr

Total/hr: $0.101



AWS is essentially the world's largest CDN; the cloud symbolizes a collection of data centers purchased and operated by Amazon, resulting in a redundant service - most likely in their warehouse vacant places throughout the world... They created an in-house system to manage all of the servers, which included Amazon Elastic Compute Cloud (EC2) and Amazon Simple Storage Service (S3).
Map of AWS Infrastructure Around the World

The AWS Cloud currently spans 84 Availability Zones across 26 geographic regions across the world, with plans to add 24 more Availability Zones and 8 new AWS Regions in Australia, Canada, India, Israel, New Zealand, Spain, Switzerland, and the United Arab Emirates in the near future (UAE).


Mongo DB
MongoDB is a document-oriented database application for high-volume storage that is open source and cross-platform. MongoDB is a NoSQL database application that works with JSON-like documents and optional schemas.
A database in MongoDB can contain two different sorts of clusters. It could be sharding or replica sets.
A MongoDB replica set ensures data redundancy and high availability by distributing data across many MongoDB servers.

All data would be saved on a single server if a MongoDB deployment failed to maintain a replica set. All data is lost if the primary server collapses, but not if a replica set is activated. As a result, we can immediately appreciate the need of having a replica set for a MongoDB deployment.
The process of storing data records on numerous machines is known as sharding. MongoDB's approach to data scalability is as follows. In other words, it makes managing enormous amounts of data easier.

Redis Cluster
If a master goes down and a suitable replacement is located among the replicas, the cluster will trigger a failover, and the chosen replica will take over as the master. The apparent benefit of this solution is that you can spread your data over multiple servers for increased availability and scalability as needed.

Elastic Search Cluster
Elasticsearch is designed to be available at all times and to scale with your requirements. This is accomplished by natural distribution. Elasticsearch intelligently distributes your data and query load over all available nodes when you add computers (nodes) to a cluster to boost capacity. There's no need to rewrite your application because Elasticsearch understands how to balance multi-node clusters for scalability and high availability.

Cassandra Cluster
The way a distributed data store handles data and its replication throughout the cluster is one of the most important features. If each partition is kept on a single node, the system will have numerous points of failure, with every node failure resulting in data loss. These systems must be able to replicate data across numerous nodes, reducing the likelihood of node failure or data loss.
Cassandra's replication solution is well-designed, with rack and data center awareness. As a result, it can be set up to place copies in order to preserve availability even in the event of catastrophic events like switch failures, network partitions, or data center outages. Cassandra also has a pre-planned approach for maintaining the replication factor in the event of a node failure.

Kafka allows vast volumes of data to be ingested quickly into data lakes or warehouses. Businesses can use Kafka to receive real-time intelligence into their operations, allowing them to react to changing business situations in real time.

Spark Job Server
Spark Job Server provides a RESTful API for managing Spark jobs, jars, and contexts, transforming Spark into a user-friendly service with a consistent API for all tasks.
So, what exactly is Spark Server?
Spark is a multi-purpose distributed data processing engine that may be used in a variety of situations. There are libraries for SQL, machine learning, graph computation, and stream processing that can be used in conjunction with the Spark core data processing engine.

Here is an example of a product table:

CREATE TABLE `shopping_cart`.`product` (
    `name` VARCHAR(100) NOT NULL,
    `desc` TEXT NOT NULL,
    `category` VARCHAR(50) NOT NULL,
    `price` DECIMAL(6) NOT NULL,
    `discount_id` INT(5) DEFAULT '0',
    `created_at` TIMESTAMP NOT NULL,
    `modified_at` TIMESTAMP,
    UNIQUE KEY `prod_index` (`id`) USING BTREE,
    UNIQUE KEY `sku_index` (`id`,`SKU`) USING BTREE,
    PRIMARY KEY (`id`),
    CONSTRAINT `fk_prod_discount`
        FOREIGN KEY (`discount_id`)
        REFERENCES `shopping_cart`.`discount` (`id`)


Here is an example of a shopping_session table:

CREATE TABLE `shopping_cart`.`shopping_session` (
    `user_id` INT(10) DEFAULT NULL,
    `total` DECIMAL(10) NOT NULL DEFAULT '0.00',
    `created_at` TIMESTAMP NOT NULL,
    `modified_at` TIMESTAMP,
    UNIQUE KEY `session_index` (`id`,`user_id`) USING BTREE,
    PRIMARY KEY (`id`),
    CONSTRAINT `fk_shopping_user`
        FOREIGN KEY (`user_id`)
        REFERENCES `shopping_cart`.`user` (`id`)


Here is an example of an order_details table:

CREATE TABLE `order_details` (
    `user_id` INT(10),
    `total` DECIMAL(10) NOT NULL,
    `payment_id` INT(20) NOT NULL,
    `created_at` TIMESTAMP NOT NULL,
    `modified_at` TIMESTAMP,
    UNIQUE KEY `order_index` (`id`) USING BTREE,
    UNIQUE KEY `customer_order_index` (`id`,`user_id`) USING BTREE,
    PRIMARY KEY (`id`),
    CONSTRAINT `fk_shopping_user_order`
        FOREIGN KEY (`user_id`)
        REFERENCES `shopping_cart`.`user` (`id`)
    CONSTRAINT `fk_order_payment`
        FOREIGN KEY (`payment_id`)
        REFERENCES `shopping_cart`.`payment_details` (`id`)

Load Balancer

During times of special discounts, holiday shopping, or the launch of a much-anticipated product, ecommerce websites frequently see a considerable increase in traffic. If a website's normal traffic is 500-600 visitors, it's doubtful that it'll be able to handle a sudden increase to 2000-5000 people.
The load will be divided between servers using server load balancing, allowing your website to scale without crumbling under the weight of additional traffic.
Uninterrupted uptime. A load balancer will automatically transfer traffic from one server to the other servers that are up and running, ensuring that users never suffer website downtime.

Through load dispersion, the load balancer adjusts to auto scaling and reduces website latency.

A time lag is a loss of income in the age of 4G internet, therefore a load balancer actually helps to enhance the bottom line.


After two years of unpredictability and odd growth patterns, global retail and retail ecommerce expenditure is likely to normalize in 2022. Even in a slower-growth scenario, the overall amount of additional expenditure will be massive.
Amazon's design succeeds because it incorporates four important concepts that are present in all excellent shopping experiences, whether digital or analog, high-end or low-cost. All amazing shopping experiences are, at their core,:

Great shopping experiences make prices and the purchasing procedure transparent and simple.

When individuals have an option between many items or versions of a product, exceptional shopping experiences make those product choices concrete and immediate, allowing them to make confident, wise decisions.

People want to know that the store they're dealing with is open and honest.
As a storefront that handles both first-party and third-party sales (Amazon's "Marketplace" accounts for about half of its total sales), Amazon has a significant challenge in creating a consistent experience that lives up to its claims.
Amazon is betting that the possibility for user confusion and additional load posed by Marketplace items mixed in with first-party offerings will allow for a consistent experience in other areas where customer issues are far more likely: shipping and returns. When you buy something on Amazon, whether directly or through the Marketplace, you still feel like you're buying it from Amazon—this allows Amazon to extend Prime's two-day shipping to third parties and streamline the returns process, both of which aim to build fundamental trust with users around whatever they buy on Amazon. This would be more difficult to do if Amazon were a platform that allowed third-party sellers to design the experience with more control.

People aren't always sure what they want or how to obtain it. Great shopping experiences anticipate problems and respond to issues before they arise.

With this article at OpenGenus, you must have the complete idea of System Design of Amazon.

System Design of Amazon
Share this