| About Terracotta Documentation This documentation is about Terracotta DSO, an advanced distributed-computing technology aimed at meeting special clustering requirements. Terracotta products without the overhead of DSO meet the needs of most use cases and clustering requirements. To learn how to migrate from Terracotta DSO to standard Terracotta products, see Migrating From Terracotta DSO. To find documentation on non-DSO (standard) Terracotta products, see the Terracotta Product Documentation. Terracotta release information, such as release notes and platform compatibility, is found in Product Information. |
|
Terracotta DSO Tuning GuideIntroductionThis tuning guide is an introduction to the important concepts needed to tune an application with Terracotta Distributed Shared Objects (DSO). It also provides a discussion on actual tuning practices that will help you get the best performance out of DSO. Tuning Best PracticesDon't Jump To ConclusionsThe first rule of performance testing and tuning is: Don't pre-optimize. The second rule is: Don't jump to conclusions (which is a variation of the first rule). Until performance tests are run, it's almost impossible to know where the bottlenecks are. Once you start running your performance tests, don't jump to conclusions about the cause of these bottlenecks, and about what the solution is. Proper bottleneck analysis is almost certainly cheaper than optimizing code that isn't the cause of your worst bottleneck. Turn Off ProfilingDisable profiling hooks and tools while running performance tests to prevent them from skewing your results. However, make sure you keep verbose GC settings on. You will need to refer to them to do your analysis. Quantify the overhead of your monitoring tools so that you can factor them out of any results. Set Min And Max Heap Settings The SameMake sure you set your min and max heap settings to the same value. This prevents the cost of resizing the heap as your application starts. Take Good NotesWhile you are in the performance testing and tuning cycle, make sure you take good notes. You may think you don't have time to take good notes, but the truth is you don't have time not to. When performance tuning, and/or tracking something down that requires multiple runs or configurations of your software always, always, always take notes on each run. You should write down all the details you can think of. Some examples include:
You should also archive your test output and other artifacts and always keep a date/time stamp on your test results. This will prevent the inevitable rerunning of tests because you forgot the results, or mixing up what you have tried and not tried. It only takes one mistake to use up more time than tons of note taking would require. Understand Memory Usage and Garbage CollectionOne of the most important factors for keeping a cluster running and optimizing performance involves the efficient use of memory. Not too much memory should remain unused on a node, but having too little free memory can lead to node failures and even stop entire clusters. In a Terracotta cluster, the main tools to managing memory are the following: Terracotta Virtual Memory Manager
Java Garbage Collection Distributed Garbage Collector JVM Heap Settings These tools are interrelated and each should be observed and tuned while keeping the others in mind. For example, the DGC cannot collect (from a Terracotta server) shared objects that are no longer referenced on any Terracotta client heaps until those objects are first collected on all those heaps by Java GC. An efficiently tuned Java GC helps DGC complete its task more efficiently. A Simple Tuning ProcessMake sure you can answer the following questions:
If you are CPU bound, but not GC bound, then you can start looking for hotspots in your code. One of the best ways to do this is to take a series of thread dumps and examine them to find out where the application is spending most of its time. Start trying to optimize the worst offender, then try the test again. Once you are not CPU bound anymore, it's time to start searching for the bottleneck. It is best to think about a non CPU bound performance problem as thread starvation rather than a network problem or a disk problem (at least at first). Continue to take thread dumps and try to understand where threads are blocked and why. To get a clear picture of where your bottlenecks are, wrap large swaths of your slow code like so: long start = System.currentTimeMillis(); //... your code here ... long t = System.currentTimeMillis() - start; count++; total += t; if(count % 1000 == 0){ // obviously the number you divide by here is dependent on how often the code // is called System.out.println("T1 Average: " + (t/count) + " count:" + count); } Do this in multiple parts of your code and narrow in tighter and tighter until you have found the part or parts of your code that are taking the longest. Lock TuningThe cluster-wide locking you do in your application can have a big impact on cluster performance. The following are some rules of thumb to consider when trying to improve the performance of your locking. Cross-Node ContentionTry to avoid lock contention across cluster nodes as much as possible. Try to partition your work to avoid heavy lock contention. As in a single JVM, you don't want highly contended locks if you can avoid them. Lock HoppingIf you do have a highly contended lock, try making your lock acquisition more coarse-grained to batch operations within the scope of a single lock acquisition and release. This will reduce the lock hopping between threads and JVMs. An example of when to do this would be doing an operation that performs multiple gets and puts on a synchronized Map. You would be better off synchronizing at the higher level operation than auto-locking the Map operations. Very Coarse LockingAvoid locks around very large operations. Terracotta doesn't currently fragment transactions, so you may see poor performance or even exceed available memory if you try to do huge operations all within the scope of a single lock boundary. Example: synchronized (myClusteredObject) { for (long i=0; i<99999999999; i++) { myClusteredObject.makeAVeryLargeNumberOfChanges(); } } This code might blow your heap. Very Fine LockingYou might need to avoid locking within long, tight loops as the overhead of creating lots of very small transactions may negatively affect performance. Example: for (long i=0; i<99999999999; i++) { synchronized (myClusteredObject) { myClusteredObject.makeAVeryLargeNumberOfChanges(); } } This code might be slow. Obviously, this advice sounds like the opposite of the previous point about not locking around very large operations. The point is that you need to strike a balance between too coarse and too fine grained locking that is appropriate for your application and what you are doing within the scope of those locks. (As Terracotta implements more sophisticated batching algorithms, this will become less of a user-facing issue). Remember that locks serve as the boundaries of transactions for Terracotta, so the amount of locking you do and the amount of change occurs within the scope of those locks will affect the throughput and the memory consumption of your application. When you've identified a locking problem, try different lock granularities systematically to find the optimal lock granularity for what you are trying to do in your application. Unnecessary LockingAvoid unnecessary cluster-wide synchronization. Choose carefully and only declare Terracotta autolocks around code that you want to be locked cluster-wide. Always Acquire Nested Locks In The Same OrderAs a general concurrent programming practice, if you acquire nested locks, make sure you always acquire them in the same order or you will deadlock. Bad Example: public void doStuff() { synchronized (objectA) { // ... do stuff synchronized (objectB) { // ... do stuff } // ... do stuff } } public void doOtherStuff() { synchronized (objectB) { // do other stuff synchronized (objectA) { // Yikes! I've grabbed locks in a different order here than in // doStuff(). I've just coded a deadlock! // ... attempt to do other stuff, but maybe deadlock } // ... } } This code will lead to a deadlock Good Example: public void doStuff() { synchronized (objectA) { // ... do stuff synchronized (objectB) { // ... do stuff } // ... do stuff } } public void doOtherStuff() { synchronized (objectA) { // do other stuff synchronized (objectB) { // This is much better. I'm always acquiring locks in the // same order. // ... do other stuff } // ... } } This code doesn't have the out-of-order lock acquisition problem. Use Lock Logging And The Lock ProfilerIt's a good idea to turn on lock logging to see what exactly is happening with your locks so you can make judgements about whether you are seeing what you want to see. See the "lock-debug" section of the Terracotta Configuration Guide and Reference for instructions on turning on lock debugging. You should also use the lock profiler available in the Terracotta Developer Console. It is a very useful tool for tuning your locking. Performance Tuning Tools
Terracotta Developer ConsoleThe Terracotta Developer Console provides a wealth of information and insight into the runtime characteristics of your cluster, including:
![]() In addition, the Developer Console provides detailed information on all of the shared objects by class, including the number of times each has been created, and can provide the same information in a package view or map view. ![]() The latest Developer Console provides:
See the Terracotta Developer Console documentation for more information on the console's latest features. JVM Heap and Garbage CollectionBecause Terracotta effectively makes clustering a service of the runtime, tuning the runtime, or the JVM, is a very appropriate target for increasing performance. While JVM tuning is an exhaustive topic in its own right, our focus will be memory and garbage collection. We will identify a core set of tuning parameters that can be used and experimented with to arrive at optimal settings. The first step in tuning the JVM is to get a look at how your application uses memory in real-time. There are several open source and commercial tools that provide that functionality. We'll use jstat, a free command-line tool included in the Sun JDK. Sun's Visual GC, essentially a graphical version of that tool, can also be downloaded from the Sun website. Again, to use jstat, you need a full JDK (not just the JRE). jstat is located in the bin directory of the JDK installation. For our purposes, we'll need to use two jstat parameters – the "-gcutil" option followed by the process id of the target java process.
With just those settings, we get a snapshot of the various memory regions associated with the heap in a JVM – survivor one and two, eden, old and permanent. You can see these in the following screen capture: ![]() A third numeric parameter specifies the number of snapshots we'd like to see; a value of 0 tells jstat to run continuously. In this snapshot, we see where Eden had filled up, resulting in a brief, non-blocking garbage collection operation. ![]() As you see your application run in real-time, you can get a sense of where heap size and/or garbage collection settings may need to be changed. Heap settingsIn general, the first consideration when tuning Terracotta is heap memory. Terracotta adds overhead to shared data structures on the order to 10-15%. As a result, an application with adequate heap settings may no longer have adequate heap settings as a clustered application. Consider increasing your heap size accordingly. Normally, you should set the initial and maximum heap sizes to the same value. This can save the JVM from having to perform multiple memory allocations, particularly at startup. It is also recommended that you increase memory with the adding of processors to a given system. This allows for greater parallelization. Garbage Collection (GC)The following discussion focuses on GC tuning in the presence of Terracotta. Since the Eden space in the heap is collected more frequently than the tenured space, and GCs in Eden are minor (not full "stop-the-world" collections), settings that allow all throwaway garbage to remain in Eden lead to more efficient GC. If your application generates a lot of throwaway garbage, increasing the size of the Eden space allows Eden to fill up but not overflow into the tenured space. If Eden is too small, throwaway garbage spills to the tenured space, which is collected only when the full GC runs. If an application generates a lot of permanent objects, but little throwaway garbage, having a smaller Eden space can reduce overall garbage-collection time. On a Terracotta server instance, you can enable parallel GC. Parallel GC takes longer and usually decreases the instantaneous throughput, but does not have the stop-the-world effect. The overall effect of the parallel GC is to increase the average overall throughput, as the stop the world type of GC really affects average throughput measurements. Example GC settingsHere are some GC settings you can use: -verbose:gc verbose:gc is a Java command-line switch that causes detailed GC information to be output to standard out. -XX:SurvivorRatio=8 – large survivor spaces for short-lived objects This example sets survivor space ratio to 1:8, which results in larger survivor spaces. Smaller numbers mean larger survivor spaces. When the survivor space is large, it gives short-lived objects more time to die in the young generation (Eden and Survivor). -XX:+UseParallelGC This setting specifies that the parallel garbage collector should be used for the new generation, the default on server machines. See below for the -server switch. -XX:ParallelGCThreads=20 This setting specifies the number of threads to use for parallel GC operations. -XX:+UseParallelOldGC This setting insures that certain portions of an old generation collection will be performed in parallel, resulting in an overall speedup of the process. -Xmn1g In this example, a 1GB heap size is configured for the young generation. The young generation can be collected in parallel. This helps keep short lived objects out of the old generation, which is more expensive to garbage collect. -server switchAs of Java 5, the collector strategy used is based on the class of the machine on which the application runs. Windows boxes are considered "client" machines (meaning they use the client VM vs. the server VM). It's generally advisable to force Java to consider the box on which a Terracotta client is running to be a "server". This requires the addition of the "-server" switch to the Java command-line. Tuning the Terracotta Distributed Garbage CollectorThe Terracotta Distributed Garbage Collector (DGC) finds objects that are eligible for collection in a Terracotta server instance, then removes them from the server instance and from persistent storage. The DGC requires tuning both for optimization reasons and to remedy failures. Sometimes, for example, the DGC process is unable to reclaim objects fast enough, resulting either in poor performance, a full disk, or both. For more information on what the DGC is and how it operates, see the Concept and Architecture Guide. Important DGC Tuning Properties
Symptoms of Badly Tuned DGCThere are three main symptoms of a badly tuned DGC:
Identifying the Cause of the Problem and Determining a SolutionThe following are common causes of cluster performance problems. Too Much GarbageOne of the most common problems with Java applications is code that creates too much garbage. Everything referencable by a clustered object automatically becomes clustered unless it is explicitly declared transient. Because there is overhead to managing clustered data within the limited resources of a JVM, only objects that actually need to be part of the cluster should be added to distributed object graphs. In addition, while Terracotta can handle very large numbers of shared objects, burdening a cluster with too much garbage taxes its performance. Reducing the volume of shared objects reduces the volume of garbage, easing the burden on cluster resources. The following are some of the ways you can decrease the volume of shared objects:
One way to discover candidate classes is to analyze the class creation count reported in the Developer Console. Begin by investigating the top most-instantiated classes listed in the Terracotta Developer Console classes browser. DGC Interval Too LongSometimes, the default DGC interval is not fast enough to keep up with the rate of garbage creation. Try making the DGC interval smaller (see the Configuration Guide and Reference for details). You can also take reduce the duration of full DGC collections by enabling the collection of DGC young generations. This type of collection is effective when there are many short-lived objects that have been garbage-collected quickly from all client heaps and have never been flushed from the Terracotta server's cache. Using the DGC young-generation collection, these objects are removed before the full DGC collection is run. Garbage Objects Not Removed From Client JVMs Fast EnoughThe DGC algorithm can't safely declare an object garbage if any client JVM has it in heap. The sooner a clustered object is collected by Java GC on the connected client JVMs, the sooner it can be collected by the Terracotta DGC, making DGC more effective. Tuning the Terracotta Virtual Memory Manager can improve the performance of garbage collection. The Underlying Data Store Can't Keep UpOnce you have the DGC interval and the client JVM GC settings tuned such that the managed object count is not ever-increasing, you may still see the disk attached to a Terracotta server instance fill up. Although the DGC algorithm may be identifying garbage objects in a timely manner, the underlying data store is not cleaning up its records from the disk fast enough. The relevant data-store tuning properties are as follows:
The relevant Terracotta properties are as follows:
Tuning the Terracotta VMMThe Terracotta Virtual Memory Manager (VMM) frees referenced objects for garbage collection by the local JVM garbage collector. Tuning the VMM involves setting certain properties to control its behavior under specified conditions that more closely match your application's needs. The most important VMM properties are defined below. To learn how to set VMM properties using the Terracotta configuration file, see the Configuration Guide and Reference. VMM on Terracotta Server InstancesThe following are the most important VMM properties to tune on a Terracotta server instance:
VMM on Terracotta ClientsThe following are the most important VMM properties to tune on a Terracotta client:
Symptoms of Badly Tuned Terracotta VMMIn an application with heavy usage of heap memory, a well-tuned VMM can keep a JVM's heap from becoming full and prevent a Terracotta client or server instance from failing. Problems with the VMM manifest themselves by throwing OutOfMemoryErrors. An OutOfMemoryError (OOME) can appear either in the client JVMs or in a Terracotta server instance as certain very large objects grow too large to fit in the heap of either the client JVM or the server instance. Identifying the Cause of the Problem and Determining a SolutionAn OOME in a Terracotta server instance may be an indication that a collection is too large to fit in the server instance's heap. A server instance can also throw an OOME when there is a very large transaction sent by a client. Whatever the cause, the problem can often be fixed by increasing the heap size of the server instance's JVM. An OOME in a Terracotta client JVM may also an indication that a collection is too large to fit in the server instance's heap. Other causes of OOMEs in the client may be a memory leak in your application unrelated to Terracotta, or a very large transaction. To determine if there are very large logically managed objects that won't fit in memory, try the following:
The Terracotta VMM Evictor Is Not Keeping UpAs memory pressure increases in both the client JVMs and the Terracotta server array, the VMM subsystem will attempt to reclaim memory by evicting "less frequently used" objects from the heap. This evictor thread may fall behind in some cases, allowing ever more clustered objects that should be evicted to stay in the heap. If the evictor thread falls far enough behind, you will see an OOME in either clients or server instances. To determine if this is the problem, turn on VMM logging in the Terracotta clients and server instances. VMM log messages show the number of evictable objects and the number of objects evicted. If the number of evictable objects is large and growing, this is an indicator that the evictor is not keeping up.
To tune the evictor in VMM, try making the threshold and the criticalThreshold more aggressive (for example, dial down to 30). Also try increasing the percentage to evict. Non-Partially Faulted Object Is Too Large To Fit In MemoryIf a non-partially faulted object is too large to fit in memory, there are two options:
A Partially Faulted Collection Still Won't Fit In MemoryUnder certain conditions, even partially faulted objects may not fit in memory:
See the Clustered Data Structures Guide for more information on how partitioning with specific data structures is handled in Terracotta. ResourcesThe following are helpful resources for tuning Java
Next StepsWhen you feel you understand how to tune your clustered application, you're ready to prepare for deployment. |



