This site hosts historical documentation. Visit www.terracotta.org for recent product information.

Bidirectional WAN Replication

This page covers the features of Bidirectional WAN replication. For a comparison of Unidirectional and Bidirectional modes, refer to the Replication Modes section.

Configuration

The default mode is Unidirectional, and Bidirectional mode must be specified in the wan-config.xml. For configuration information, refer to the Orchestrator Configuration Parameters section.

Operation

With Bidirectional mode, Replica caches notify their Master cache of any modification they receive, and the Master cache either applies or rejects the modification based upon its tracking of each cache element.

In the figure below, Master caches in Region 1 are distributed between the application servers and the Terracotta servers of the TSA. Cache modifications in either region are communicated through the Orchestrators, and the Master caches determine which modifications become part of the authoritative data set.

Master-Replica

Figure 1. Bidirectional replication with Master caches in Region 1 and Replica caches in Region 2

Data Consistency and Conflict Resolution

With Bidirectional mode, the WAN replication service balances high performance with eventual data consistency through synchronization among Orchestrators. Two additional strategies are employed for data consistency: element versioning and optimistic local modifications.

Please note when a key is getting updated simultaneously at high frequency from multiple regions, it is possible that even with conflict resolution, the key values will diverge. Therefore we recommend in bi-directional configurations to avoid conflicts when possible, in particular to use distinct caches (only updated in specific region), then replicated to other region. This will allow data to be replicated and available in multiple regions, but avoid any data conflict.

Element versioning

Element versioning is the core component of data consistency and conflict resolution. All modifications (puts/updates/deletes) to an element cause an internal version to be incremented once successfully applied. Versioning ensures there are no out-of-order updates applied to the Master cache (no duplicates, no backward counting, and no gaps), and versioning is used to resolve simultaneous conflicting updates.

Conflict resolution is invoked for two updates arriving for the same key with the same version, to determine which update is applied and which is rejected. The decision is made based upon the order in which the Master cache processes the updates. The first processed update wins.

The diagram below shows concurrent updates from Regions 2 and 3 on the same key (Key1) being resolved by the Master cache in Region 1. Region 2 (Key1=aaa) wins because its update arrives at the Master cache first. This results in a repair of Region 3, where Key1=bbb is repaired to Key1=aaa.

Data Conflict Resolution 2

Figure 2. For the same version of the element, the Master cache repairs the second-to-arrive value to match the first-to-arrive value

Optimistic local modifications with Master repair

To improve performance, local modifications (put/remove) are optimistically applied to the cache. Subsequent reads of the cache by the local application will return the expected modified value.

If there is a race to modify an element in the Master cache, the Master cache can force a repair of the Replica's value. A Master repair means that one of the racing modifications was rejected, and the Master cache has sent a repair message that causes the local element to now have the winning state. This can mean that a put may be overwritten with the winning value, or a losing remove may cause the local remove to be reverted.

In the following diagram, Region 1 performs a put (xy=1 at version 1), while Region 2 simultaneously performs put (xy=2 also at version 1). Because Region 1 contains the Master cache, it replicates xy=1:v1 to Region 2. And when the Master cache rejects the put from the Replica cache, it automatically resolves and sends a "repair" of the value back to Region 2.

Data Conflict Resolution 1

Figure 3. Process of a Master cache repairing a Replica cache

After a repair, if there are any rejected values in any local caches in the region, they will be cleared with the existing process for non-replicated caches. For more information, refer to the Expiration and Eviction page.