Dashboard > Terracotta Developer Documentation > Home > Architecture
  Terracotta Developer Documentation Log In   View a printable version of the current page.  
  Architecture
Added by Steve Harris, last edited by Gary Keim on Dec 02, 2006  (view change)
Labels: 
(None)

Terracotta DSO Architecture

Overview

Terracotta DSO is software to layer in the sharing and coordinating of data. Our core belief is that much like GC, for many applications it is more efficient and simpler to let the runtime system take care of these services rather then going through the pain of coding them yourself. We believe that just like GC, over time as the DSO runtime services mature, they will no longer just be the simplest way to go, but fundamentally the fastest. This document will go over our architecture covering as much of the what, how, and why as well as the where we are going as we can squeeze in.

What does DSO do?

Good question! I'm not going to focus here on why one should use it or where it fits into the huge world of software. I'll leave that to the user docs and marketing. In this doc I'm just going to cover what we do and what we want to be doing moving forward.

Terracotta DSO extends the Java Memory model from one to multiple JVMs. It layers in at the bytecode level and requires no Java API. It maintains object identity, wait, notify and synchronize and creates a virtual heap for the parts of your object graph that you mark as in need of sharing. You mark a part of your application as in need of sharing by specifying Roots.

Roots

A Root is a Class.variable combination and all objects reachable from that root will auto-magically become shared. You can mark fields as transient if you don't want them shared or you can instruct DSO to honor the transients that are already there. We also allow for operations to be performed when an object is loaded. This is accomplished by adding a bit of BeanShell or method call using our on-load facility.

Once you have picked your Roots you can tell us to extend synchronization so that any place where one is synchronizing on a "Shared Object" that fine grained synchronization is extend across the cluster.

Locks

The other major component of the Java Memory Model is it's locks. We allow one to extend the synchronized key word across a cluster by painting it as auto-locked. It translates into a fine grained lock on the identity of the object being locked on. In cases where Java synchronization is not used, we also allow for named locks to be defined. Both named and auto locks can be marked as:

  • Write locks - same semantics as Java synchronized
  • Read locks - allow either one writer or multiple readers
  • Concurrent - ends up acting as a transaction boundary but doesn't do any locking

How things work

The User Perspective

From a user perspective one creates a tc-config.xml file that picks one or more Roots, tells us where to lock (generally with pretty broad strokes that say auto-lock some large subset of the code) and anything else we need to know (See the reference config to learn about all the possible things one can set). While the amount of xml for most apps tends to be pretty small and simple you can avoid it all together by using the Terracotta DSO EclipsePlugin. Once you have specified your config you start your Java Application with a script called dso-java instead of starting it with regular Java. This script calls java with a few extra startup parameters that allow DSO to hook in at the proper level. One could easily not use that script and just pass in the params on your own. To find out what the parameters are you can use the dso-env script. Once you have your Application started up just start the Terracotta Server and your done.

If you want to see what's going on you can either start the tc-admin console and poke around at your live objects or you can turn on the debug options in your client config and have it spit out a bunch of text about what is being locked shared etc. A nice little debug feature is that you can turn those options on with stack traces so you can figure out in a bit more detail where things are being locked and shared while your testing.

Once you are up and running DSO dynamically and transparently watches the shared objects for changes and sends those changes only to the places they are needed (NOTE: This is not peer to peer so your changes aren't broadcast. This is a smart replication to the server and the places that need the info). DSO maintains a virtual heap so only the parts of the object graph that are needed in each JVM are actually there. As your code reaches into your object graph DSO will dynamically start pulling in the needed data. This allows one to have heaps that are much bigger than your memory space. Of course if one is cycling through all the data all the time one will end up with thrashing but if what you are really doing is having various JVM's acting on some part of the data for some part of the time this is a really useful feature.

Developer Perspective

Warning: This is a big topic. Here is a high level diagram of the parts of DSO. I'll start you off with this and then break down the subsystems further starting with the transparency layer.

Byte Code

The transparency layer is for many the most interesting part of DSO. It is really at least two things. A byte code manipulation layer and an internal API (Represented by ManagerUtil). The byte code layer is written mostly in ASM from ObjectWeb with a little bit of AspectWerkz here and there. It is used to do a few things:

  • Monitor changes to fields - we call this PhysicalInstrumentation.
  • Allow us to get and set values without the cost of reflection.
  • Monitor lock acquiring and releases
  • Insert a field for caching shadow objects
  • Insert our LogicalInstrumenation for logically instrumented classes
  • Other assorted fun instrumentation for containers, Distributed method calls etc.

Client Transaction Manager

The client transaction manager maintains a thread local that contains the current transaction. As changes to shared objects occur those changes are passed to the active client transaction. On commit those changes are packaged up and handed off to a remote transaction manager. The remote transaction manager packages up the transaction and does any windowing and hangs onto the transaction until it receives a server side ack. Commit is currently a non-blocking call as any blocking that is needed is handled by the lock manager.

Client Object Manager

Maintains WeakIdentityMap of all shared objects. Resposible for looking these objects up when they aren't local. Makes sure we never end up with two copies of the same object. Interacts closely with the RemoteObjectManager which deals with SharedObjects at the DNA (think squished) level.

The RemoteObjectManager does all the direct comms with the server.

Client Lock Manager

Reponsibly for handing out locks on the client side. We have the ability to keep locks locally (greedy locks) until they are needed somewhere else and the ClientLockManager deals with the local aspects of that process. Also takes care of the local wait and notify responsibilities.

Client Memory Manager

Monitors heap usage and starts breaking connections between objects so that when the jvm's GC occurs it can collect objects that haven't been recently changed.

Communications layer

Asynchronously sends messages between client and server.

Server Transaction Manager

Server Object Manager

Owns the check in and check out of objects.

Server Perisistence layer

Client object State

The server needs to know the list of live shared objects in each client. We developed a sparse set that keeps all object IDs that might be in a given client. The GC uses this to rescue objects that might not otherwise be rescued.

Server Memory Manager

Server Garbage Collector

SEDA style internals

Both our client and server use SEDA style internals for moving state and behavior through the processes. This is to make it easier to avoid pipeline stalls and to nicely segment things into chunks of work. It takes some getting used to but it is really the only way to fly when developing scalable servers. We don't actually use sandstorm we just use the ideas.

Approach to tuning

CVT Developer documentation (Terracotta Developer Documentation)
Sessions Architecture (Terracotta Developer Documentation)
Terracotta for Spring Architecture (Terracotta Developer Documentation)

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.5 Build:#811 Jul 25, 2007) - Bug/feature request - Contact Administrators