As we think about large-scale web applications, we need storage backends that scale and support concurrency. By scalability, we aim for increasable data capacity and growing read/write throughput of a high degree. The application servers in our model handle huge numbers of requests in parallel. As they, in turn, rely on the backend storage systems, we need to cope with highly concurrent access at this level as well.
Throughput and capacity increases can only be sustainably achieved by employing a horizontal scale mechanism. A single database server would only be able to scale up to a certain load, even with specialized hardware equipment. As a consequence, we need a concurrent and distributed system for scalable backend storage.
The backend storage must persistently hold the application state, thus we are also expecting some kind of data consistency when reading/writing data in our application. We are dealing with a distributed system, so we must expect failures in advance. In case of a failure of a single node, we still want the overall storage system to operate seamlessly and maintain its availability. Likewise, we are executing storage operations as part of web requests, thus we demand low latency of operations.
These general requirements lead us to an important barrier of distributed systems, Brewer's theorem on the correlation of consistency, availability and partition tolerance.
Consistency
The consistency property describes a consistent view of data on all nodes of the distributed system. That is, the system assures that operations have an atomic characteristic and changes are disseminated simultaneously to all nodes, yielding the same results. Availability
This property demands the system to eventually answer every request, even in case of failures. This must be true for both read and write operations. In practice, this property is often narrowed down to bounded responses in reasonable time. However, Gilbert and Lynch have confirmed the theorem even for unbounded, eventual responses. Partition tolerance
This property describes the fact that the system is resilient to message losses between nodes. A partition is an arbitrary split between nodes of a system, resulting in complete message loss in between. This affects the prior properties. Mechanisms for maintaining consistency must cope with messages losses. And according to the availability property, every node of any potential partition must be able to respond to a request.
The core statement of Brewer's theorem now goes as follows:
"You can have at most two of these properties for any shared-data system."
We have seen that all properties are desirable. But any real system must trade off the properties and dismiss at least one of them, as shown in figure So we have three distinct combinations with significantly different characteristics.
Consistency & Availability (CA)
The group of distributed systems following this model provides a consistent and available service, but does not tolerate partitions. In case of a partition, these systems may become inconsistent, as we will see soon. The combination is also known as high-availability consistency.
Contenders following this approach include most of the traditional relational database management systems. Replication is an important mechanism for achieving highly available consistency and transaction protocols such as the two-phase commit protocol are applied to ensure consistency. The separation into partitions may lead to so-called "split brain" scenarios, in which different partitions create conflicting replicas as a result of isolation.
Recovering from such scenarios would require some kind of consensus protocol. This in turn would disallow nodes to service requests unless a consensus is available. We would thus convert our CA approach into a CP approach at the sacrifice of availability. The shortcomings of coping with network errors renders the CA approach less suitable for larger distributed database systems.
Consistency & Partition Tolerance (CP)
Distributed systems at the intersection of consistency and partition tolerance provide a strongly consistent service. Consistency is guaranteed even in the presence of a partition. However, nodes of a partition may not be able to respond to requests as long as they have not yet agreed with other nodes that may be temporarily unreachable. As a result, availability cannot be always provided. This combination of properties is also known as enforced consistency.
Throughput and capacity increases can only be sustainably achieved by employing a horizontal scale mechanism. A single database server would only be able to scale up to a certain load, even with specialized hardware equipment. As a consequence, we need a concurrent and distributed system for scalable backend storage.
The backend storage must persistently hold the application state, thus we are also expecting some kind of data consistency when reading/writing data in our application. We are dealing with a distributed system, so we must expect failures in advance. In case of a failure of a single node, we still want the overall storage system to operate seamlessly and maintain its availability. Likewise, we are executing storage operations as part of web requests, thus we demand low latency of operations.
These general requirements lead us to an important barrier of distributed systems, Brewer's theorem on the correlation of consistency, availability and partition tolerance.
Consistency
The consistency property describes a consistent view of data on all nodes of the distributed system. That is, the system assures that operations have an atomic characteristic and changes are disseminated simultaneously to all nodes, yielding the same results. Availability
This property demands the system to eventually answer every request, even in case of failures. This must be true for both read and write operations. In practice, this property is often narrowed down to bounded responses in reasonable time. However, Gilbert and Lynch have confirmed the theorem even for unbounded, eventual responses. Partition tolerance
This property describes the fact that the system is resilient to message losses between nodes. A partition is an arbitrary split between nodes of a system, resulting in complete message loss in between. This affects the prior properties. Mechanisms for maintaining consistency must cope with messages losses. And according to the availability property, every node of any potential partition must be able to respond to a request.
The core statement of Brewer's theorem now goes as follows:
"You can have at most two of these properties for any shared-data system."
We have seen that all properties are desirable. But any real system must trade off the properties and dismiss at least one of them, as shown in figure So we have three distinct combinations with significantly different characteristics.
Consistency & Availability (CA)
The group of distributed systems following this model provides a consistent and available service, but does not tolerate partitions. In case of a partition, these systems may become inconsistent, as we will see soon. The combination is also known as high-availability consistency.
Contenders following this approach include most of the traditional relational database management systems. Replication is an important mechanism for achieving highly available consistency and transaction protocols such as the two-phase commit protocol are applied to ensure consistency. The separation into partitions may lead to so-called "split brain" scenarios, in which different partitions create conflicting replicas as a result of isolation.
Recovering from such scenarios would require some kind of consensus protocol. This in turn would disallow nodes to service requests unless a consensus is available. We would thus convert our CA approach into a CP approach at the sacrifice of availability. The shortcomings of coping with network errors renders the CA approach less suitable for larger distributed database systems.
Consistency & Partition Tolerance (CP)
Distributed systems at the intersection of consistency and partition tolerance provide a strongly consistent service. Consistency is guaranteed even in the presence of a partition. However, nodes of a partition may not be able to respond to requests as long as they have not yet agreed with other nodes that may be temporarily unreachable. As a result, availability cannot be always provided. This combination of properties is also known as enforced consistency.
In practice, this approach is important when consistency must be enforced at any costs, and the system is still inherently distributed by design. That is for instance a banking application, where the balance of all accounts is a primary constraint. Database systems implementing this model are also often based on relational database systems. Supporting consistent states even in case of network errors requires the usage of sophisticated algorithms for quorum and majority decisions. Such a protocol for solving consensus is the Paxos protocol [Lam98].
Availability & Partition Tolerance (AP)
Distributed systems that are always available and tolerate partitions maintain their service even when partitions occur. However, a reachable node may then hold (temporarily) inconsistent state. Thus, this intersection of properties is also known as eventual consistency.
In terms of data and state, the sacrifice of strong consistency guarantees appears to be questionable at first glance. However, many applications can indeed tolerate deferred consistency properties when favoring availability at all costs. In this case, it is important to keep in mind potential issues due to eventually consistent data on application level during development. Well known examples following this approach are the DNS or web caches. Stale data (e.g. host mappings resp. cached responses) are acceptable for a while, but eventually the latest version of the data disseminates and purges older entries. As eventual consistency entails the possibility of data conflicts, appropriate resolution mechanisms must be employed. Often, conflict resolution is also part of the application logic, and not only provided by the database system. In general, conflicts are resolved either on read or write operations, or asynchronously in the background, decoupled from client operations.
Distributed systems that are always available and tolerate partitions maintain their service even when partitions occur. However, a reachable node may then hold (temporarily) inconsistent state. Thus, this intersection of properties is also known as eventual consistency.
In terms of data and state, the sacrifice of strong consistency guarantees appears to be questionable at first glance. However, many applications can indeed tolerate deferred consistency properties when favoring availability at all costs. In this case, it is important to keep in mind potential issues due to eventually consistent data on application level during development. Well known examples following this approach are the DNS or web caches. Stale data (e.g. host mappings resp. cached responses) are acceptable for a while, but eventually the latest version of the data disseminates and purges older entries. As eventual consistency entails the possibility of data conflicts, appropriate resolution mechanisms must be employed. Often, conflict resolution is also part of the application logic, and not only provided by the database system. In general, conflicts are resolved either on read or write operations, or asynchronously in the background, decoupled from client operations.