CAP Theorem in Modern Distributed Systems

Sanket Patil
3 min readOct 17, 2021

In modern systems, vertical scalability has it’s limitations due to upper bound it can hit and have certain point of diminishing returns (ratio of value to cost) beyond which cost increases exponentially for every unit of value. Horizontal scalability has become go to method of scaling huge parallel workloads or databases. Scaling horizontally means adding more machines and have them co-ordinate over the network, which induces some pain points in system architecture along with trade offs between consistency and availability.

The CAP theorem states that a distributed database system can only guarantee two out of these three characteristics: Consistency, Availability, and Partition Tolerance.

Consistency: A system is said to be consistent if all nodes see the same data at the same time.

Availability: Availability means the system remains operational all the time with no downtime.

Partition Tolerance: It basically means that you’re communicating over an asynchronous network that may delay or drop messages.

In horizontally scaled systems, partition tolerance is a necessity because there are multiple nodes or datacenters communication over networks. So decision makers of system architecture have no choice in this matter, as it is essential. That leaves with a decision to be made between consistency and availability. A system can either be fully consistent or fully available, because there is no way to replicate changes from node A to node B at same instant it is written. You can either stop serving client requests (unavailability) until changes get replicated from A to B, to make it fully consistent. Or you can keep serving the requests (accounts for availability), but reads from node B will return stale values (inconsistency) for a while.

During designing a system, a decision must be taken on trade off between consistency and availability based on nature of application and use case. For example in financial applications/systems, consistency has utmost important. Reason is obvious, no one would like to see wrong balance in their bank account. So availability can be traded off for enhanced consistency.

In social media or video sharing applications/systems, availability is more important than consistency due to lesser critical nature. For example no one will care if ‘like count’ of their post is 500 but in reality 520 people have liked the post. So consistency can take a hit, but the application must have high availability.

Digging Deeper:

In real world, there are multiple forms of consistency. They are broadly defined between stronger models and weaker models. Strong consistency means changes are immediately visible to all subsequent requests. Weak means they aren’t.

Consistency in CAP is actually linearizability which is different from ‘C’ of ACID which is a property associated with transactional relational databases. Linearizability can be interpreted as client side consistency. Formally it’s defined as “If operation B started after operation A successfully completed, then operation B must see the the system in the same state as it was on completion of operation A, or a newer state”. In Leyman’s terms it could be understood using simple scenario: Consider person A and B are following a game of cricket. If person A sees that game is over means person B who checks score after A, he must see that game is over.

As for availability in CAP, it is formally defined as “every request received by a non-failing [database] node in the system must result in a [non-error] response”. Focus is on ‘non-failing’ and ‘non-error’ part. In typical notion, availability is defined by some uptime metric (Like number of nines. i.e. It is 99% available means it’s down only for ~3.6 days). The difference between these notions is that your node can be up for 99% of time and not serve request in that period because of some reason (server overload, bug in application etc.). To be CAP available, the node needs to give valid non-error responses to client requests in the period it is available (~364 days).

Sources:

Official CAP research paper:

https://www.comp.nus.edu.sg/~gilbert/pubs/BrewersConjecture-SigAct.pdf)

Martin Kleppmann’s Blog:

https://martin.kleppmann.com/2015/05/11/please-stop-calling-databases-cp-or-ap.html)

--

--