CAP theorem (Brewer's theorem)

Reading time: 15 minutes

The CAP theorem is the idea that a distributed computing system is not able to provide partition tolerance, consistency and availability at the same time. For specifically, a distributed computing system must chose two of the following three:

  • partition tolerance
  • consistency
  • availability

CAP theorem was developed in 2000 by Eric Brewer.

Explanation

To understand this theorem, you need to understand three fundamental concepts:

  • partition tolerance
  • consistency
  • availability

Partition Tolerance

Partition Tolerance signifies that the system continues to run, despite the number of messages being delayed by the network between nodes.

A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data records are sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

When dealing with modern distributed systems, Partition Tolerance is not an option. It’s a necessity. Hence, we have to trade between Consistency and Availability.

High Consistency

High Consistency states that all nodes see the same data at the same time.

In simple terms, performing a read operation will return the value of the most recent write operation causing all nodes to return the same data.

A system has consistency if a transaction starts with the system in a consistent state, and ends with the system in a consistent state. In this model, a system can shift into an inconsistent state during a transaction but the entire transaction gets rolled back if there is an error during any stage in the process.

High Availability

High Availability states that every request gets a response on success or failure.

Achieving availability in a distributed system requires that the system remains operational 100% of the time. Every client gets a response, regardless of the state of any individual node in the system. This metric is trivial to measure: either you can submit read/write commands, or you cannot. Hence, the databases are time independent as the nodes need to be available online at all times.

The theory proposes that when a network has been partitioned to ensure that a network failure will not prevent communication between servers, the distributed system must choose between consistency or availability.

This is captured by the following image:

cap

Conclusion

The CAP theorem has primarily proven useful for establishing priorities in database server infrastructure and configuration. In such a scenario, it is still possible to achieve both consistency and availability within acceptable parameters.

For example, data may be allowed to be inconsistent for short periods of time while new writes propagate throughout the system. Critical servers that handle client read/writes may be partitioned in such a way that failures in other sections do not noticeably affect performance for end users.