Skip navigation

Architecture and How Terracotta Works

Architecture

The Terracotta architecture is designed to minimize network chatter, guarantee that there can be no split-brain, provide the absolute maximum of operator control, all while providing 100% High Availability and no Single Points of Failure.

The Architecture has the following features:

  • The Client Nodes - Each client node corresponds to a client node in the cluster. The client nodes run on a standard JVM such as Sun's HotSpot JVM. Terracotta is installed to the JVM with Terracotta libraries that are loaded at the JVM startup.
  • The Terracotta Server Cluster - The Terracotta Server Cluster provides the clustering intelligence. Each server in the cluster is a 100% pure Java process. The current Terracotta implementation operates in an Active/Passive mode, with one Active Server and one or many Passive Servers.
  • Storage - The Terracotta server uses disk storage in several ways.
    • Virtual Heap storage - as objects are paged out of the client nodes, into the server, if the server heap fills up, objects are paged onto disk
    • Lock Arbiter - To ensure that there is no possibility of the classic "split-brain" problem, Terracotta relies on the disk infrastructure to provide a lock.
    • Shared Storage - to transmit the object state from the active to passive, objects are persisted to disk, which then shares the state to the passive server(s).

How Network Attached Memory Plugs In to the JVM

Terracotta uses ASM to manipulate application classes as those classes load into the JVM. Developers can pick Sun Hotspot or IBM's runtime, and any of several supported application servers including Geronimo, Glassfish, Jetty, Tomcat, Weblogic, or Websphere. Terracotta comes with scripts and tools that help integrate it to the JVM. In the simplest terms, Terracotta is made up of a jar file and some command line additions to the Java invocation command (e.g., /usr/bin/java stays /usr/bin/java with Terracotta). Those command line options make sure Terracotta's libraries are pulled into the JVM and the it can see all other classes load.

The Terracotta configuration file dictates which classes become clustered and which do not. Terracotta then examines classes for fields it needs to cluster, and threading semantics that need to be shared. For example, if to share customer objects throughout an application cluster, the developer need only tell Terracotta to cluster customers and to synchronize customers cluster-wide.

Terracotta looks for bytecode instructions like the following (not an exhaustive list):

  • GETFIELD
  • PUTFIELD
  • AASTORE
  • AALOAD
  • MONITORENTRY
  • MONITOREXIT
  • INVOKE0
  • INVOKESPECIAL

On each of those, Terracotta does the work of Network Attached Memory. Specifically:

BYTECODE Injected Behavior
GETFIELD Read from the Network for certain objects. Terracotta also has a heap-level cache that contains pure Java objects. So GETFIELD reads from RAM if-present and faults in from NAM if a cache miss occurs.
PUTFIELD Write to the Network for certain objects. When writing field data through the assignment operator "=" or through similar mechanisms, Terracotta writes the changed bytes to NAM as well as allowing those to flow to the JVM's heap.
AASTORE Same as PUTFIELD but for arrays
AALOAD Sames as GETFIELD but for arrays
MONITORENTRY Get a lock inside the JVM on the specified object AND get a lock in NAM in case a thread on another JVM is trying to edit this object at the same time
MONITOREXIT Flush changes to the JVM's heap cache back to NAM in case another JVM is using the same objects as this JVM
INVOKE0 Stop a constructor from firing if it has fired elsewhere in the application cluster already. Instead page in the object graph from NAM that was flushed down to NAM by the previous JVM

With one JVM running an application, NAM looks like this...


(click to enlarge)

If the application now needs to write to (or dirty) a single object somewhere in the object graph it can go ahead and do so. Note that with Terracotta the graph need not be serializable, nor are bean-style get() and set() calls needed to edit the object fields in NAM.


(click to enlarge)

Last, the application leaves it up to the NAM server cluster to decide what changes to push and to which JVMs. In this image, notice that virtual heaps allow one of the JVMs to not even hold a reference to the changing object, and thus that JVM doesn't get told about the changes the application just made, since it doesn't need to know.


(click to enlarge)

How can developers specify Terracotta NAM behavior?

The Terracotta config file directs and controls how clustering gets applied to the application. There are several concepts developers should know when configuring Terracotta. Those include roots, locks, DMI, include / exclude class, and the Terracotta bootjar.

Roots are field references that point to the top of an object graph one needs to cluster. In NFS, you would mount a file server at a directory location at its mount point, and roots are analogous to these NFS mount points, but for RAM. Once people "cd" into that directory they are then attached to the file server and all files created from this directory onward will actually reside on the file server. Roots are fields that, as soon as the application traverses the object reference onto that field, the application is now in shared heap space on NAM as opposed to other objects which exist only in RAM. Terracotta's eclipse plug-in lets developers designate roots via the popular Eclipse IDE. And, Terracotta's config modules help developers skip this step altogether since Terracotta has pre-determined the roots inside the frameworks in use.

Locks act as a cluster-wide synchronization mechanism that allow the application to use pure Java constructs for coordinating work and heap access across threads yet to have those constructs work across JVM boundaries when the application is deployed a cluster. Developers have several options including auto-locks, auto-method-level locks, named-locks and none have to be strictly pessimistic. You can use the Java synchronized keyword yet config read-locks, write-locks, and even concurrent-locks into and application without custom APIs.

DMI is short for "distributed method invocation." When an object attempts to call a particular method on itself or on another object, if the target object is clustered, [you can configure that method] to be simultaneously entered in all JVMs. Terracotta helps enable the classic "listener pattern" with this feature. Imagine registering a local list of listener objects and knowing that all the listeners, on all JVMs will get fired as soon as any one of the JVMs calls fireChangedEvent() or some such method in the application code. As an example, this feature is used by Jetty's Cometd project.

Include and exclude classes and the bootjar are to control which parts of the app Terracotta prepares for NAM. You can direct Terracotta to cluster any List object that joins one of the application's roots. You can also direct Terracotta to cluster workflow objects, but to exclude customer objects with simple <include-class> and <exclude-class> tags in the Terracotta config file. This capability helps make Terracotta easier to use, allowing developers to keep the Terracotta nature away from classes that it either doesn't support or that it shouldn't otherwise touch. In short, Terracotta can be used to cluster many parts of an application's heap but it does not cluster everything, ever. This also helps applications start up fast and run with the least amount of clustering overhead necessary to function correctly.

Article Rating?

Adaptavist Theme Builder Powered by Atlassian Confluence