Terracotta Server Arrays

THIS IS ARCHIVE DOCUMENTATION FOR TERRACOTTA v. 3.0.

For the current release, see the current DSO documentation ยป

Release: 3.0.1
Publish Date: May, 2009

Terracotta Server Arrays

Introduction

This document shows you how to architect a Terracotta server array to add cluster reliability, availability, and scalability.

Terracotta server arrays can vary from a basic two-node tandem to a multi-node array providing configurable scale, high performance, and deep failover coverage.

The main features of a Terracotta server array include:

  • Scalability Without Complexity – Simple configuration to add server instances to meet growing demand and facilitate capacity planning
  • High Availability – Instant failover for continuous uptime and services
  • Configurable Health Monitoring – Terracotta HealthChecker for internode monitoring
  • Persistent Application State – Automatic permanent storage of all current shared (in-memory) data
  • Automatic Node Reconnection – Temporarily disconnected server instances and clients rejoin the cluster without operator intervention

This document may refer to a Terracotta server instance as L2, and a Terracotta client (the node running your application) as L1. These are the shorthand references used in Terracotta configuration files.

Definitions and Functional Characteristics

The major components of a Terracotta server array are:

  • Cluster — All of the Terracotta server instances and clients that work together to share application state or a data set.
  • Server array — All the Terracotta server instances in a single cluster. Clustered data, also called in-memory data, or shared data, is partitioned equally among active Terracotta server instances for management and persistence purposes.
  • Terracotta mirror group — A unit in the Terracotta server array. Sometimes also called a "stripe," a mirror group is composed of exactly one active Terracotta server instance and at least one "hot standby" Terracotta server instance (simply called a "standby"). The active server instance manages and persists the fraction of shared data alloted to its mirror group, while each standby server in the mirror group replicates (or mirrors) the shared data managed by the active server. Mirror groups add capacity to the cluster.
  • Terracotta server instance — A single Terracotta server. An active server instance manages Terracotta clients, coordinates shared objects, and persists data. Server instances have no awareness of the clustered applications running on Terracotta clients. A standby is a live backup server instance which continuously replicates the shared data of an active server instance, instantaneously replacing the active if the active fails. Standby servers add failover coverage within each mirror group.
  • Terracotta client — Terracotta clients run on application servers along with the applications being clustered by Terracotta. Clients manage live shared-object graphs.

Figure 1 illustrates a Terracotta cluster with three mirror groups. Each mirror group has an active server and a standby, and manages one third of the shared data in the cluster.


A Terracotta cluster has the following functional characteristics:

  • Each mirror group automatically elects one active Terracotta server instance.

    There can never be more than one active server instance per mirror group, but there can be any number of standbys. In Figure 1, Mirror Group 1 could have two standbys, while Mirror Group 3 could have four standbys.

  • Every mirror group in the cluster must have a Terracotta server instance in active mode before the cluster is ready to do work.
  • The shared data in the cluster is partitioned and distributed to the mirror groups.

    The number of partitions equals the number of mirror groups. In Figure 1, each mirror group has one third of the shared data in the cluster.

  • Mirror groups CAN NOT provide failover for each other.

    Failover is provided within each mirror group, not across mirror groups. This is because mirror groups provide scale by managing discrete portions of the shared data in the cluster — they do not replicate each other. In Figure 1, if Mirror Group 1 goes down, the cluster must pause (stop work) until Mirror Group 1 is back up with its portion of the shared data intact.

  • Active servers are self-coordinating among themselves.

    No additional configuration is required to coordinate active server instances.

  • Only standby server instances can be hot-swapped in an array.

    In Figure 1, the L2 PASSIVE servers can be shut down, replaced, or joined by new L2 PASSIVE servers with no affect on cluster functions. However, to add or remove an entire mirror group, the cluster must be brought down.

Server Array Configuration Tips

To successfully configure a Terracotta server array using the Terracotta configuration file, note the following:

  • Two or more servers should be defined in the <servers> section of Terracotta configuration file.
  • <l2-group-port> is the port used by the Terracotta server to communicate with other Terracotta servers.
  • The <ha> section should indicate the mode as networked-active-passive.
  • The <networked-active-passive> subsection has a configurable parameter called <election-time> whose value is given in seconds. <election-time> sets the duration for electing an ACTIVE server, often a factor in network latency and server load. The default value is 5 seconds.
  • A reconnection mechanism can be enabled to restore lost connections between active and passive Terracotta server instances. See Automatic Server Instance Reconnect for more information.
  • A reconnection mechanism can be enabled to restore lost connections between Terracotta clients and server instances. See Automatic Client Reconnect for more information.
  • For data safety, persistence should be set to "permanent-store" for server arrays.
    "permanent-store" means that application state, or shared in-memory data, is backed up to disk. In case of failure, it is automatically restored. Shared data is removed from disk once it's no longer exists in any client's memory.

All servers and clients should be running the same version of Terracotta and Java.

For more information on Terracotta configuration files, see:

Backing Up Persisted Shared Data

Certain versions of Terracotta provide tools to create backups of the Terracotta server array's disk store. See the Terracotta Operations Center and the Tools Catalog for more information.

Terracotta Cluster in Development

Persistence: None | Failover: None | Scale: None

In a development environment, persisting shared data is often unnecessary and even inconvenient. It puts more load on the server, while accumulated data can fill up disks or prevent automatic restarts of servers, requiring manual intervention. Running a single-server Terracotta cluster without persistence (see Fig. 2) is a good solution for creating a more efficient development environment.


By default, a single Terracotta server is in "temporary-swap-mode", which means it lacks persistence. Its configuration could look like the following:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <data>/opt/terracotta/server1-data</data>
      <l2-group-port>9530</l2-group-port>
    </server>
   <servers>
  ...
</tc:tc-config>

If this server goes down, all application state placed in the clustered shared heap is lost.

Terracotta Cluster With Reliability

The "unreliable" configuration above may be advantageous in development, but if shared in-memory data must be persisted, the server's configuration must be expanded:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <data>/opt/terracotta/server1-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
      
   <!-- The persistence mode is "temporary-swap-only" by default, so it must be changed explicitly. -->
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
        
      </dso>
    </server>
  </servers>

  ...
</tc:tc-config>

By changing the value of the <persistence> element's <mode> element to "permanent-store" (it's "temporary-swap-only" by default), the server now backs up all shared in-memory data to disk (see Fig. 3).


Terracotta Server Array with High Availability

The example illustrated in Fig. 3 presents a reliable but not highly available cluster. If the server fails, the cluster fails. There is no redundancy to provide failover. Adding a standby server adds availability because the standby is ready to take over immediately (see Fig. 4).


In this array, if the active Terracotta server instance fails then the standby instantly takes over and the cluster continues functioning. No data is lost.

The following Terracotta configuration file demonstrates how to configure this a two-server array:

<?xml version="1.0" encoding="UTF-8" ?>
<tc:tc-config xmlns:tc="http://www.terracotta.org/config"
                         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
                         xsi:schemaLocation="http://www.terracotta.org/schema/terracotta-4.xsd">
  <servers>
    <server name="Server 1">
      <data>/opt/terracotta/server1-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
    <server name="Server 2">
      <data>/opt/terracotta/server2-data</data>
      <l2-group-port>9530</l2-group-port>
      <dso>
        <persistence>
          <mode>permanent-store</mode>
        </persistence>
      </dso>
    </server>
     <ha>
        <mode>networked-active-passive</mode>
           <networked-active-passive>
               <election-time>5</election-time>
           </networked-active-passive>
       </ha>
  </servers>

  ...
</tc:tc-config>

Terracotta automatically syncs the two servers. The active and passive servers must always be synced, allowing the passive to mirror the state of the active. The recommended <mode> in the <ha> section is "networked-active-passive" because it allows the active and passive servers to synchronize directly, without relying on a disk.

You can add more standby servers to this configuration by adding more <server> sections.

If the server instances are started at the same time, one is elected as active server while the others serve as standbys. If a server instance is started while an active server instance is present, it syncs up state from (or replicates shared data of) the active server instance before becoming a standby. A standby cannot become an active server instance during a failure until its state is fully synced up. If the active fails and more than one standby is present, an election determines the new active.

Standbys can be hot swapped.

The active server instance carries the load of sending state to the standby during the sync process. The time taken to sync up is dependent on the amount of data to be synced up and on the current load on the cluster. The active server instance and standbys should be run on similarly configured machines for better throughput, and should be started together to avoid unnecessary sync ups.

Terracotta server instances acting as standbys can run either in persistent mode or non-persistent mode. If an active server instance running in persistent mode goes down, and a standby takes over, the data directory must be cleared before bringing back the crashed server. Removing the data is necessary because the cluster state could have changed since the crash. The new state is taken from the new active server instance when the crashed server comes back up. A crashed standby running in persistent mode, however, automatically recovers.

Under certain circumstances, standbys may fail to automatically clear their data directory and fail to restart, generating errors. In this case, the data directory must be manually cleared.

Scaling the Terracotta Server Array

For capacity requirements that exceed the capabilities of an two-server active-passive setup, expand the Terracotta server array using a mirror-groups configuration. Using mirror groups with multiple coordinated active Terracotta server instances adds scalability to server array.

Scalable server arrays are available in [enterprise versions of Terracotta].


Mirror groups are specified in the <servers> section of the Terracotta configuration file. Mirror groups work by assigning group memberships to Terracotta server instances. The following Terracotta configuration file shows a mirror-group configuration with four servers:

<servers>
   <server name=server1>
   ...
   </server>
   <server name=server2>
   ...
   </server>
   <server name=server3>
   ...
   </server>
   <server name=server4>
   ...
   </server>
   <ha>
    <mode>networked-active-passive</mode>
    <networked-active-passive>
       <election-time>5</election-time>
    </networked-active-passive>
   </ha>
   <mirror-groups>
     <mirror-group group-name="group1">
       <members>
         <member>server1</member>
         <member>server2</member>
       </members>
     </mirror-group>
     <mirror-group group-name="group2">
       <members>
         <member>server3</member>
         <member>server4</member>
       </members>
     </mirror-group>
   </mirror-groups>
 </servers>

In this example, the cluster is configured to have two active servers, each with its own standby. If server1 is elected active in group1, server2 becomes its standby. If server3 is elected active in group2, server4 becomes its standby. server1 and server2 automatically coordinate their work managing Terracotta clients and shared data across the cluster.

See the mirror-groups section in the Configuration Guide and Reference for more information on mirror-groups configuration elements.

In a Terracotta server array designed for multiple active Terracotta server instance, the server instances in each mirror group participate in an election to choose the active. Once every mirror group has elected an active server instance, all the active server instances in the cluster begin cooperatively managing the cluster. The rest of the server instances become standbys for the active server instance in their mirror group. If the active in a mirror group fails, a new election takes place to determine that mirror group's new active. Clients continue work without regard to the failure.

When a server instance is started in a mirror group while an active server instance is present, it syncs up state from the active server instance before becoming a standby. A standby cannot become an active server instance during a failure until its state is fully synced up.

Hot standbys can be hot swapped.

The active server instance in a mirror group carries the load of sending state to the standby during the sync process. The time taken to sync up is dependent on the amount of data to be synced up and on the current load on the cluster. The active server instance and standbys should be run on similarly configured machines for better throughput, and should be started together to avoid unnecessary sync ups.

Terracotta server instances acting as standbys can run either in persistent mode or non-persistent mode. If an active server instance running in persistent mode goes down, and a standby takes over, the data directory must be cleared before bringing back the crashed server. Removing the data is necessary because the cluster state could have changed since the crash. The new state is taken from the new active server instance when the crashed server comes back up. A crashed standby running in persistent mode, however, automatically recovers.

Under certain circumstances, standbys may fail to automatically clear their data directory and fail to restart, generating errors. In this case, the data directory must be manually cleared.

High-availability configuration can be set per mirror group. See the mirror-groups section in the Configuration Guide and Reference for more information.

Labels

 
(None)