Backpressure and Exponential Back-off to handle overloaded systems

Internship at OpenGenus

Get FREE domain for 1st year and build your brand new site

In this article, we explored how overloaded networks are handled using techniques such as applying back pressure and exponential back-off. Let us first begin with the reasons for overload and congestion, after which we will learn how to mitigate it.

Table of contents:

  1. How Systems get overloaded?
  2. Mitigation of Overload: Congestion Control
  3. Backpressure
  4. Exponential Back-off

Let us get started with Backpressure and Exponential Back-off to handle overloaded systems.

How Systems get overloaded?

Some causes for overload in Systems are:

  1. Congestion will depend on the properties of the devices used in maintaining the network. If the routers, switches and other devices are loaded beyond their handling capacity, congestion is bound to occur, slowing down traffic.

  2. Using Outdated hardware can also contribute to congestion since older devices cannot keep up with the growth of the network.

  3. If the design of the network is not a perfectly tailored fit for the operations of the organisation, the chances of congestion rise.

  4. Low bandwidth availability in the network.

Mitigation of Overload: Congestion Control

In networks, data travels in packets and routers maintain their input and output queues to keep track of the packets. Every packet at a router takes some time to process during which the router figures out the path the packet must be sent through, using the routing table. The Input queue stores the incoming packets in the order they come while the router is busy processing a packet. The Output queue stores the outgoing packets before they are sent on their way.

The router may experience congestion due to the following technicalities:

  1. There are too many incoming packets.

  2. The speed of the incoming packets is faster than the time it takes for the router to use its routing algorithms and send it on its way, causing the input queue to get longer and longer.

  3. If the speed of packet processing is higher than the speed of outgoing packets,the output queue will become long.

In TCP networks, when data is sent across the network by the source, an acknowledgment is sent by the receiver after the data is received. For every segment of data, the sender maintains a retransmission timer which keeps track of the time since the segment was sent. If the acknowledgment is not received by the sender regarding a transmitted segment within a certain time limit, termed as the retransmission timeout, the segments are retransmitted. This lack of acknowledgment can be due to many reasons such as loss of packet, corruption of packet or delay in propagation.

It gets tricky when the queues at a node are at maximum capacity, since all forthcoming packets cannot be stored and important data will be lost.
So, if the network is congested at some point and data transmission is hindered, then acknowledgments may not reach the sender even though the data has been received. Once the timeout ends, the sender may flood the channel once again with data, adding to even more traffic and congestion. This overload may result in network crashes.

When the network is congested, we will see a drop in the speed in which data travels from source to destination.

Hence, congestion control is a definite must for a smoothly-functioning network.

We will now explore two methods which take place at different points of the network to aid in congestion control.

Backpressure

Backpressure in networks is analogous to backpressure in physics. It is a closed-loop congestion control measure i.e. it is used after congestion occurs, to mitigate its effects.

When a node(i.e. a router) has too many incoming packets leading to overload, the congested node stops receiving data from the immediate previous node. This will cause the previous node's outgoing queue to fill up eventually, thus congesting it. The congestion propagates backwards as nodes stop accepting packets from their predecessors, which may eventually reach the source. Once the source is made aware that the network is experiencing congestion, it will stop sending data, thus reducing the load. Eventually, the packets will resume their flow as congestion clears up.

A very important aspect of backpressure is limiting the size of queues at every node. If the queues are long or possibly limitless, the packets will keep lining up without any indication of congestion or overload. In such cases, the point of congestion cannot be determined and corrective action cannot be undertaken.
So we see that backpressure is a node-to-node mechanism which involves congested nodes applying a backward pressure i.e. pressure against the rightful flow of data, to reduce the load on the node and alleviating congestion. Backpressure is applied from the point of congestion.

Exponential Back-off

The concept of exponential back off was first seen in the pure ALOHA network, and describes an algorithm about how retries must be scheduled.

If you are in a congested network or the data you need is stored in a server suffering from overload, your requests/queries may not be serviced, because it will add to the already excessive data traffic. So, it is asked to back off and try again later. But it's obvious that if the system is overloaded, then an immediate retry is going to have no effect, because the system is still under heavy traffic. So, all requests must be asked to back off for a certain time period and then retry.
But even in this case, if the overload is resolved more quickly than the back-off time, then essential computing time will be lost.

So, retries are scheduled using back-off algorithms. Let us see the most basic implementation termed as binary exponential back-off.

The formula for this looks like

Back-off time = random(2K-1) X Maximum Propagation time

  • Here, K is the number of unsuccessful retries. So, we see that after every failed attempt, the value of K increases by 1, and so, the back-off time increases exponentially. Since, the retry attempts happen later and later, the system gets enough time to answer pending queries from the overload and tackle congestion.

  • random(2K-1) picks a random number lying between 0 and 2K-1, so that all users requesting data do not come back at the same moment after back-off time expires. The randomness of the back-off reduces the probability of clashes between multiple users, giving the incoming requests/queries/packets a more even and consistent flow.

  • Maximum propagation time refers to the time taken in the network for a data packet/segment to make a round trip from the sender to the receiver and back.
    It must also be noted that this exponential increase does not continue endlessly while the system suffers from overload.

  • The number of retries are capped at a value Kmax after which the system is asked to retry later and the connection is severed by a circuit breaker. All further requests made by the user, for the time being, will now be directly rejected by the circuit breaker, instead of conveying the request to the system and experiencing a time-out. This also helps in making new users in overloaded networks to turn away directly , and thus prevents them from adding to the congestion. They can retry at a later time when the server/network has the capacity to accommodate their queries.

Exponential Back-off happens at the user's end or the sender's end.

Thus, in this article at OpenGenus, we have seen how overloaded networks and systems using the concepts of backpressure and exponential back-off. Keep learning!