The content you have selected is available exclusively to Terracotta, Ehcache, and Quartz community members. If you are already a member, please login. If not, please take a moment to become one. Please sign up to join the Terracotta, Ehcache, and Quartz communities. Membership is free and only requires you to complete the form below. In addition to full documentation access, you will also be eligible to attend product briefings, download select early access product releases before the general public, post forum questions, file bug reports, submit patches and receive other member benefits still to come.
(But you're not quite done—you still need to validate your account. Please check your email!)
Thanks for joining the Terracotta and Ehcache community. You will recieve an e-mail shortly containing instructions for validating your account and downloading Ehcache EX.
After you validate your account, you will have full access to all Terracotta and Ehcache community resources, such as:
This document shows you how to architect a Terracotta server array to add cluster reliability, availability, and scalability.
Terracotta server arrays can vary from a basic two-node tandem to a multi-node array providing configurable scale, high performance, and deep failover coverage.
The main features of a Terracotta server array include:
The major components of a Terracotta server array are:
Figure 1 illustrates a Terracotta cluster with three mirror groups. Each mirror group has an active server and a standby, and manages one third of the shared data in the cluster.
A Terracotta cluster has the following functional characteristics:
There can never be more than one active server instance per mirror group, but there can be any number of standbys. In Figure 1: A Server Array With 3 Mirror Groups, Mirror Group 1 could have two standbys, while Mirror Group 3 could have four standbys.
The number of partitions equals the number of mirror groups. In Figure 1: A Server Array With 3 Mirror Groups, each mirror group has one third of the shared data in the cluster.
Failover is provided within each mirror group, not across mirror groups. This is because mirror groups provide scale by managing discrete portions of the shared data in the cluster -- they do not replicate each other. In Figure 1: A Server Array With 3 Mirror Groups, if Mirror Group 1 goes down, the cluster must pause (stop work) until Mirror Group 1 is back up with its portion of the shared data intact.
No additional configuration is required to coordinate active server instances.
In Figure 1: A Server Array With 3 Mirror Groups, the L2 PASSIVE servers can be shut down, replaced, or joined by new L2 PASSIVE servers with no affect on cluster functions. However, to add or remove an entire mirror group, the cluster must be brought down.
To successfully configure a Terracotta server array using the Terracotta configuration file, note the following:
"permanent-store" means that application state, or shared in-memory data, is backed up to disk. In case of failure, it is automatically restored. Shared data is removed from disk once it's no longer exists in any client's memory.
For more information on Terracotta configuration files, see:
Certain versions of Terracotta provide tools to create backups of the Terracotta server array's disk store. See the Terracotta Operations Center and the Database Backup Utility (backup-data) for more information.
In a development environment, persisting shared data is often unnecessary and even inconvenient. It puts more load on the server, while accumulated data can fill up disks or prevent automatic restarts of servers, requiring manual intervention. Running a single-server Terracotta cluster without persistence (see Figure 2: Single Server – No Persistence of Shared Data) is a good solution for creating a more efficient development environment.
By default, a single Terracotta server is in "temporary-swap-mode", which means it lacks persistence. Its configuration could look like the following:
<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-5.xsd">
<servers>
<server name="Server1">
<data>/opt/terracotta/server1-data</data>
<l2-group-port>9530</l2-group-port>
</server>
<servers>
...
</tc:tc-config>
If this server goes down, all application state placed in the clustered shared heap is lost.
The "unreliable" configuration above may be advantageous in development, but if shared in-memory data must be persisted, the server's configuration must be expanded:
<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-5.xsd">
<servers>
<server name="Server1">
<data>/opt/terracotta/server1-data</data>
<l2-group-port>9530</l2-group-port>
<dso>
<!-- The persistence mode is "temporary-swap-only" by default, so it must be changed explicitly. -->
<persistence>
<mode>permanent-store</mode>
</persistence>
</dso>
</server>
</servers>
...
</tc:tc-config>
By changing the value of the <persistence> element's <mode> element to "permanent-store" (it's "temporary-swap-only" by default), the server now backs up all shared in-memory data to disk (see Figure 3: Reliable Single Server – Shared Data Persisted to Disk).
The example illustrated in Figure 3: Reliable Single Server – Shared Data Persisted to Disk presents a reliable but not _highly available_ cluster. If the server fails, the cluster fails. There is no redundancy to provide failover. Adding a standby server adds availability because the standby is ready to take over immediately (see Figure 4: Available and Reliable – Failover, Persistence).
In this array, if the active Terracotta server instance fails then the standby instantly takes over and the cluster continues functioning. No data is lost.
The following Terracotta configuration file demonstrates how to configure this two-server array:
<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-5.xsd">
<servers>
<server name="Server1">
<data>/opt/terracotta/server1-data</data>
<l2-group-port>9530</l2-group-port>
<dso>
<persistence>
<mode>permanent-store</mode>
</persistence>
</dso>
</server>
<server name="Server2">
<data>/opt/terracotta/server2-data</data>
<l2-group-port>9530</l2-group-port>
<dso>
<persistence>
<mode>permanent-store</mode>
</persistence>
</dso>
</server>
<ha>
<mode>networked-active-passive</mode>
<networked-active-passive>
<election-time>5</election-time>
</networked-active-passive>
</ha>
</servers>
...
</tc:tc-config>
Terracotta automatically syncs the two servers. The active and passive servers must always be synced, allowing the passive to mirror the state of the active. The recommended <mode> in the <ha> section is "networked-active-passive" because it allows the active and passive servers to synchronize directly, without relying on a disk.
You can add more standby servers to this configuration by adding more <server> sections.
If the server instances are started at the same time, one is elected as active server while the others serve as standbys. If a server instance is started while an active server instance is present, it syncs up state from (or replicates shared data of) the active server instance before becoming a standby. A standby cannot become an active server instance during a failure until its state is fully synced up. If the active fails and more than one standby is present, an election determines the new active.
The active server instance carries the load of sending state to the standby during the sync process. The time taken to sync up is dependent on the amount of data to be synced up and on the current load on the cluster. The active server instance and standbys should be run on similarly configured machines for better throughput, and should be started together to avoid unnecessary sync ups.
Terracotta server instances acting as standbys can run either in persistent mode or non-persistent mode. If an active server instance running in persistent mode goes down, and a standby takes over, the data directory must be cleared before bringing back the crashed server. Removing the data is necessary because the cluster state could have changed since the crash. The new state is taken from the new active server instance when the crashed server comes back up. A crashed standby running in persistent mode, however, automatically recovers.
For capacity requirements that exceed the capabilities of an two-server active-passive setup, expand the Terracotta server array using a mirror-groups configuration. Using mirror groups with multiple coordinated active Terracotta server instances adds scalability to server array.
Scalable server arrays are available in enterprise versions of Terracotta.
Mirror groups are specified in the <servers> section of the Terracotta configuration file. Mirror groups work by assigning group memberships to Terracotta server instances. The following snippet from a Terracotta configuration file shows a mirror-group configuration with four servers:
...
<servers>
<server name="server1">
...
</server>
<server name="server2">
...
</server>
<server name="server3">
...
</server>
<server name="server4">
...
</server>
<ha>
<mode>networked-active-passive</mode>
<networked-active-passive>
<election-time>5</election-time>
</networked-active-passive>
</ha>
<mirror-groups>
<mirror-group group-name="group1">
<members>
<member>server1</member>
<member>server2</member>
</members>
</mirror-group>
<mirror-group group-name="group2">
<members>
<member>server3</member>
<member>server4</member>
</members>
</mirror-group>
</mirror-groups>
</servers>
...
In this example, the cluster is configured to have two active servers, each with its own standby. If server1 is elected active in group1, server2 becomes its standby. If server3 is elected active in group2, server4 becomes its standby. server1 and server2 automatically coordinate their work managing Terracotta clients and shared data across the cluster.
See the mirror-groups section in the Configuration Guide and Reference for more information on mirror-groups configuration elements.
In a Terracotta server array designed for multiple active Terracotta server instance, the server instances in each mirror group participate in an election to choose the active. Once every mirror group has elected an active server instance, all the active server instances in the cluster begin cooperatively managing the cluster. The rest of the server instances become standbys for the active server instance in their mirror group. If the active in a mirror group fails, a new election takes place to determine that mirror group's new active. Clients continue work without regard to the failure.
When a server instance is started in a mirror group while an active server instance is present, it syncs up state from the active server instance before becoming a standby. A standby cannot become an active server instance during a failure until its state is fully synced up.
The active server instance in a mirror group carries the load of sending state to the standby during the sync process. The time taken to sync up is dependent on the amount of data to be synced up and on the current load on the cluster. The active server instance and standbys should be run on similarly configured machines for better throughput, and should be started together to avoid unnecessary sync ups.
Terracotta server instances acting as standbys can run either in persistent mode or non-persistent mode. If an active server instance running in persistent mode goes down, and a standby takes over, the data directory must be cleared before bringing back the crashed server. Removing the data is necessary because the cluster state could have changed since the crash. The new state is taken from the new active server instance when the crashed server comes back up. A crashed standby running in persistent mode, however, automatically recovers.
High-availability configuration can be set per mirror group. See the mirror-groups section in the Configuration Guide and Reference for more information.