Search anything:

Design of Rate Limiting System

Binary Tree book by OpenGenus

Open-Source Internship opportunity by OpenGenus for programmers. Apply now.

Wisdom is not a product of schooling but of the lifelong attempt to acquire it.” – Albert Einstein , is the thought for today. If you want to make sure the services and resources are made use reasonably then you are at the right place! Let's begin!!

The first question that comes to our mind is what is Rate limiting? Basically rate limiting is mainly used to control the amount of incoming and outgoing traffic within the networks. Consider a website of your own which handles multiple requests. Someone may request huge amount of data at a time which tends to be heavy on other users of the server.When you have rate limiting systems, these kind of problems doesn't come into picture.

When you want to prevent your services from excessive use, rate limiting systems becomes your best friend! Additional benefits includes avoids excess costs,controls flow and also prevents resource starvation.

Why do we need Rate Limiting System?

In simple words, rate is the no. of occurrence of any amount for a specific period of time.Rate limiting systems are important for both client side and server side as defensive mechanism for services and resources.
Some of them to mention are:

1. Avoiding Excessive Costs
Cost motivated limits are necessary so as to meet each users demand and also control the costs and investments of the organization/company.
When limits are kept, is the user makes more requests either the server will pop up an error message or the bills for that application will increase accordingly.

2. Managing Policies
Service and resource sharing is wide across all the applications and platforms and keeping a limit on it will ensure reasonable use of the resources.

3. Preventing Starvation and User Experience
Starvation in this case refers to hungriness for resources and services. If a single user takes all the services offered by the application other users who are waiting to avail the services will starve.This is widely used in case of API-based services.

The user should be able to experience the beauty of application and no user should starve.

4. Security
Some malicious users or hackers will try to the data through requests, say login credentials by trial and error methods(technically Brute-Force Method). Keeping a limit per day or any specific time periods can offer more security.

Real Life Scenarios of Rate Limits

Scenario 1:
You have an application that is deployed inside a server and it handles requests and responds accordingly. Some day you find there's 5 times more requests than the usual ones which tends to be heavy.

Scenario 2:
You have a product that provides API's for some applications like ML,weather etc.You let the user to try out your services and then purchase it.

Rate limiting systems are everywhere. All well known public services use rate limits Some of them to mention are: Youtube,Facebook,Google,Instagram,Cloud Services and what not!

Standard Techniques For Enforcing Rate Limits

Token Bucket

Token is analogous with the number of times the services can be used.Whenever there is a request for a particular service,a token is taken off to in order to fulfill the request.If the bucket is left with no tokens implies that the services are used completely.

For example, token bucket is used to conform a data packet to the bucket,like verifying whether the data packet has the specified bandwidth.Once the packet is conformed the contents of the bucket changes. If the bucket has few tokens than required then the packet is not conformanted.


  1. Packet Switched Networks(Communication happens in terms of packets of Data).
  2. Telecommunication Networks


  1. Better than leaky bucket algorithm.
  2. Tokens are discarded not packets, which does not leaks the data.

1.When bucket if full tokens are discarded.

step 1: Tokens are generated and added to the bucket at the rate of one token at every second.
step 2: The bucket has a fixed capacity and if the token is added when the bucket if full then it is discarded.
step 3: If the packet that has to conformanted is of n-bytes,then it represents n-tokens.If the bucket has enough tokens i.e n or more than n then it is conformanted else considered as non-conformanted.

Leaky Bucket

Leaky bucket is analogous with the amount that can leak out of the bucket.Basically, a system can serve N requests at max where N is the queue size.A system holds and handles the request until it has the capacity to do it. If there is any extra service for request the it's spilled out of bucket i.e simply discarded.


  1. Traffic policing in telecommunication and packet-switched computer network.


  1. The queue used is of constant size in most of the cased which reduces the memory.
  2. As requests are processed at a constant rate,burst of requests doesn't affect the system. Queue is FIFO based.


  1. The time taken for a request to completely avail the service is not fixed.
  2. When bucket is full packets are discarded,which is loss of data.

step 1: Initialize the counter to 'N' w.r.t clock.
step 2: Compare N with the size of the packet.If N is greater than the size of the packet at the front of the queue,then send the packet into the network and decrement the counter.
step 3: Repeat step2 until N is less than the size of the packet.
step 4: Reset the counter and go to step 1


// Initialize no_of_queries, storage, output_pkt_size, input_pkt_size, bucket_size, size_left

		for(int i=0;i<no_of_queries;i++)
			size_left=bucket_size-storage; //space left
				System.out.println("Buffer size= "+storage+
					" out of bucket size= "+bucket_size);
				System.out.println("Packet loss = "
					//full size	
				System.out.println("Buffer size= "+storage+
							" out of bucket size= "+bucket_size);

Fixed Window

Although this is the most simple technique is limited to principles because it can give incorrect count.For the unit time window there is a bucket, which maintains the count of the requests made.

1.Algorithm works on simple logic.
2.Less memory oriented.

1.Fails to handle the traffic of requests made especially in the boundary condition.

step 1: Assign counter to each window,which counts the number of requests made in that window.
step 2: If the value of the counter exceeds the limit drop the remaining requests.
step 3: Reset counter after the window elapses.

Sliding Window

A queue of timestamps is maintained for each user which gives count of the requests made.If the number of requests made lies within the boundary of alloted count the request is serviced else exception is raised.

Instead of completely changing the counter after every window, we use the information from the previous counter to estimate the size of the current one.

1.Easy to implement.
2.Solves the problems associated with traffic that occur in fixed window.
3.Has more real life applications and used in many companies.

step 1: Assign counter for each window.
step 2: Check for previous window request rate on the basis of current timestamp to smooth out the bursts of traffic.


def is_allowed(key:str) -> Bool:

//The function returns True if the request goes through and False otherwise.
    current_time = int(time.time())

    # Fetch the configuration for the given key
    # the configuration holds the number of requests allowed in a time window.
    config = get_ratelimit_config(key)

    # Fetch the current window for the key
    # The window returned, holds the number of requests served since the start_time
    # provided as the argument.
    start_time = current_time - config.time_window_sec
    window = get_current_window(key, start_time)

    if window.number_of_requests > config.capacity:
        return False
    # Since the request goes through, register it.
    register_request(key, current_time)
    return True

Hierarchical token bucket

Hierarchical token bucket algorithm is used to limit clients upload/download speed so that the user does not saturate the bandwidth.Hierarchical token bucket is called HTB in short.

As the name suggests,in hierarchical token bucket arbitrary number of token buckets are arranged in hierarchy.

1.Queuing mechanism for the linux traffic control system.

Different levels at which Rate Limits can be applied

  1. User based rate limits - Constraint on no. of users.
  2. Concurrency - How many parallel sessions are allowed?
  3. Location based/IP address based.
  4. Server- when it's used to offer and control service.

Suitable Database Type

In case of the distributed systems we come across problems with the rate limiting like, inconsistency and race condition.In order to avoid that relaxing the rate limit is not the better option, rather we can modify the database and how the rate is being updated.
Local Memory and Syncing with cloud - Sometimes the updated rate in the cloud tends to be more than the actual no. of times request is being made.Updating the rate in both local memory of the user and also the cloud can give us the actual count.

Without rate limiting,the users can make as many requests they want and it would lead to the crashing of the server or starvation,thereby not rendering smooth experience for the user and less cost effective.

Hope you had a good time going through the article at OPENGENUS!
Best Wishes for all the endeavors!!

Design of Rate Limiting System
Share this