Dashboard > Terracotta Public Wiki > Home > lsb > TechnicalFAQ
  Terracotta Public Wiki Log In   View a printable version of the current page.  
  TechnicalFAQ
Added by Sreenivasan Iyer, last edited by Himanshu Gupta on Jul 01, 2008  (view change)
Labels: 
(None)

TechnicalFAQ

June 20, 2008—The current state of this document is a brain dump from our field engineering team. We are currently working on integrating this information into a more polished documentation format but wanted to get this information out in public sooner rather than later.

Contents

The page Header does not exist.

LIVING WITH TERRACOTTA (POST DEPLOYMENT)

How do I backup the state persisted on Terracotta Server's Disk?

Look at the attachment entitled BerkelyDbBackup.zip - it contains code, a script file and a readMe file which contains the instructions to run the script. Here is a brief overview:

  • This utility opens the Environment in the Read-Only mode, while the TC Server is running with opened Environment in Read-Write mode.
  • This utility gets the snap shot of all the log files the moment it attaches to the Environment and copies those files into the backup location.
  • As these files are frozen by this Read-Only process, the Cleaner process cant delete any files, that way there wont be any inconsistency in the data. It is designed for taking a full backup.
  • Note that this does not backup the tc-config.xml

Can I, and if so, how can I upgrade my cluster with a new Terracotta release but with no downtime?

Yes - you can minimize cluster downtime to almost 0.

  • Subdivide the cluster into 2 (or more).
  • Take 1 Half of L1s out of rotation.
  • Kill the primary Terracotta Server (L2)
  • Remainder of L1s, that are still taking live traffic fail over to the standby Terracotta Server.
  • Upgrade the "out-of-rotation" L1s and L2 to the new version of Terracotta.
  • Restart the Terracotta Server and the L1s.
  • Take the 2nd Half of L1s and the standby L2 (which is now the Primary) out of rotation.
  • Simultaneously, introduce the upgraded L1s + L2 back into rotation.
  • Then upgrade the 2nd Half of L1s and the standby server and introduce them into the cluster.
  • NOTE: THIS ASSUMES THAT THE BerkeleyDB DATA FILES STAYED COMPATIBLE BETWEEN THE 2 RELEASES. In the odd-case that it dosen't - you will have to blow away the BDB data-files before restarting the upgraded TC Server (or point it to an empty directory location in the tc-config.xml).

Can I, and if so, how can I accomplish new Application release roll-out with no downtime?

Troubleshooting: See the Troubleshooting Guide. In addition, see below for some common occurrences (relatively speaking) and associated recovery

My Application appears deadlocked - how do I recover and determine what is going on?
  • Are you sure it is deadlocked or painfully slow?
  • To determine, take a series of thread dumps 5s apart for 1-2 minutes. If the relevant threads' stack-trace appear stationary, then there is a high likelihood that you have a deadlock (either in application code or in Terracotta).
  • To ANALYZE: Based on the thread-dump and the Lock Profiler (Admin Console), determine where, if in application-code, this might be occurring. Also code-review to ensure that locks are being obtained in the same sequence to eliminate possibility of application deadlock. Also consider tools to help deadlock detection.
  • TO RECOVER: Kill 1 L1 at a time - in the luckiest case, the very first L1 restart relieves the cluster of the deadlock and in the worst case the last L1 restart relieves it of the deadlock.
Client JVM threw an Out of Memory Exception - can you enumerate the usual suspects?
  • Heap allocated is too low - i.e. is Xms, Xmx too low. EXAMINE JAVA-STARTUP OPTIONS.
  • A non-clustered portion of your application increased its memory consumption significantly.
  • The rate of inserts into the clustered data-structures, exceeded the rate at which the Virtual Memory Manager was able to evict. TUNE VIRTUAL MEMORY MANAGER, in this case.
  • The keys in the partial associative array (Hashmap, Hashtable, ConcurrentHashMap) are all faulted in when the partial data-structure is referenced. If the cumulative size of keys exceeds VM Heap allocations, then an OOME can result. In such a case, (rare e.g. 1/2 Billion Keys in the Map), PARTITION DATA-STRUCTURE or make a composite data-structure (e.g. Map of Maps) - so that VMM can page out sub-maps.
  • You are using a non-partial data-structure with literals hanging off the very first level of the value-object - in which case VMM cannot evict and reclaim space associated with the Literals. (e.g.
    myLinkedBlockingQueue.put(Foo); // and Foo has int x, String y

    Both Foo.x and Foo.y are faulted in when myLinkedBlockingQueue is referenced. You are fine though if Foo is wrapped in a FooWrapper and

    myLinkedBlockingQueue.put(FooWrapper));
  • Terracotta does not support "partial" Terracotta Transactions - so you could OOME the L1 with a huge "Terracotta Transaction" - batch into suitable chunks (e.g. Batch into groups of 10,000 the code below, instead of trying to insert all Integer.MAX_VALUE objects in 1 shot.
    synchronized (myHashMap) { 
       for (int i=0; i < Integer.MAX_VALUE; i++) 
          myHashMap.put(key[i], value[i]);
       }
    }
Server JVM threw an Out Of Memory Exception - can you enumerate the usual suspects?
  • See above - similar arguments apply.
I believe I have run into a Terracotta Bug - How do I report an issue?
  • See if TroubleShooting Guide and this FAQ help you recover.
  • Artifacts Typically Needed to investigate complex issues (Deadlocks, Performance Problems etc.):
    • Problem Summary
    • Problem Description
    • Terracotta Client Logs
    • Stdout Logs of client JVM.
    • Terracotta Server Logs
    • Thread Dumps on L2 and atleast the offending L1 (if not all L1s) - 1 set every 15s, at least 4 times.
    • Output of SVT Run (i.e. record for 2 minutes and stop recording)
    • Output of Lock Profiling Session.
    • Admin Console Tabs (Classes, DGC, Roots)
    • Data file (in some cases, assuming it is of manageable size)
  • If OSS Customer:
    • Post on Forums on Mailing List with above Artifacts.
  • If Enterprise-Subscription Customer:
    • Supply artifacts via Web or email for P3, P4.
    • Call Support Hotline for P1, P2 (and email artifacts needed).
When do I have to restart client JVMs (outside of planned upgrades to Terracotta and Application, that is)?
  • When the client JVM gets disconnected off the cluster. i.e. it took longer than l2.l1reconnect.timeout.millis (see tc.properties) to reestablish its connectivity to the Terracotta server or attempted to connect with the same connection_id after after client-reconnect-window
  • You can grep the log for specfic exceptions or BETTER STILL, trap the JMX Node Disconnected event and take appropriate action, PROGRAMATICALLY (see http://www.terracotta.org/confluence/display/docs1/JMX+Guide)

<br>

DEPLOYING TERRACOTTA

How do I size my Production Terracotta Servers?

Consider the following factors:

  • Wrt Memory Sizing:
    • What is the size of the clustered object graph?
      • Does it fit in Heap or does it far exceed size of available RAM.
    • Do Application Latency SLAs allow for occasional faulting from Disk?
      • If No, then, in such a case, there is a requirement that the entire Data structure fit in HEAP (or in RAM). When HEAP Xmx gets too large, a pauseful Full GC can wreak havoc in terms of cluster TPS. We "typically" do not recommend a heap larger than 6-8G, although one still needs to provision a box with a large amount of RAM so BDB (BerkeleyDB, the database that runs in-process with Terracotta) blocks are cached by the OS.
  • Wrt CPU Sizing:
    • Is the machine dedicated to the Terracotta Server Process or does it run other processes.
    • Does the application create a fair amount of garbage - If yes, then one can expect a fair amount of GC and DGC activity (and hence higher CPU needs).
    • How many Writes/Reads against the Terracotta Server.
    • If there is a fair bit of clustered I/O or extensive GC/DGC, then at least, a 4 core box would be useful in such a case.
  • Other Considerations:
    • Dual NIC'ed boxes for L1 and L2 are required to avoid SpoFs (Single Points of Failure).
    • Disk - see next issue below.

Please enumerate disk storage choices with Terracotta.

  • Write to local Disk (fast, cheap)
  • Write to shared Storage (BLOCK STORAGE) - (Separate LUNs in case of Network-Active/Passive, Same LUN in case of Disk-Based-Active/Passive) e.g.
    • Block Storage via HBA talking Fibre-Channel (fast, expensive, centralized storage implies greater manageability).
    • Block storage via iSCSI (slower, cheaper)
  • Write to shared storage (NETWORK FILE SYSTEM) - (Not recommended, anymore).
    • NFS Mount
    • NAS Device.

What should I monitor when I deploy a Terracotta Integration into Production?

    • See the attached document entitled : "Terracotta_ClusterMonitoring_Recommendations.doc"

How can I deploy to multiple active Terracotta servers - for scale and HA?

    • Typically Terracotta supports only a Single Active Server and multiple Standbys.
    • However we do support a TIM (Terracotta Integrtion Module) - named Partitioned EHCache (only available to Enterprise Subscription Customers), which transparently stripes an EHCache Instance across multiple Active Terracotta Servers. One can thus scale out the Terracotta Server Tier as well and support a much larger TPS requirement, until other factors (such as partial key limitations constrain). See details attached as partitioned-EHCache-readme.txt
    • Note that using this mechanism, no matter what associative array represents the application interface (e.g. Hashmap, Hashtable, ConcurrentHashmap etc.) - one can, with an extra level of indirection, point to a Cache-Entry in an EHCache instance, which can then be striped across multiple Active Terracotta Servers.
    • Some restrictions apply: Assumed that the EHCache key, values in each Terracotta Server are completely contained in a given silo. i.e. there is no overflow of object graphs allowed from one TC server to another (if so, a runtime exception is thrown on the client during object-graph traversal).

How do I make sure that there is no Single Point of Failure in my architecture?

    • You need to deploy one of Terracotta certified configurations.
    • See http://www.terracotta.org/confluence/display/docs1/Network+Active-Passive+Deployment+Guide for Networked Active/Passive HA. Setting up your network as specified in the document, would imply that there is no Single Point of Failure in your architecture.
    • Other option is Shared-Disk-Based Active/Passive (you still need to ensure that there is no SPoF in your network architecture):
      • Block Storage (accessed via Fibre-Channel or iSCSI)
        • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
        • Use Redhat Cluster Services to facilitate failover.
        • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
        • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
      • Network File System (not certified on 2.6.x and beyond, given that there is a known failure not covered in this case) - See Document Terracotta_SharedDisk_HA_withNFS.pdf
        • NFS mount point as shared disk
        • NAS appliance as shared disk

What are the advantages/disadvantages of Networked-HA over Shared-disk-based-HA?

  • So many advantages, that Networked HA is the RECOMMENDED HA solution (unless there is a very compelling reason in your use-case or IT environment to use disk-based HA).
  • Hot-Hot Standby (i.e. no dip in TPS on failover) whereas a Shared Disk based solution is a Hot-Cold Standby (i.e. expect a dip in TPS on failover and eventual recovery).
  • HA in a box (no need for any special VCS or RedHat Clustering setup or a NAS device etc.)
  • Can use local Disk on each Terracotta server and cheaper in terms of disk setup (no virtualized storage needed, no HBA cards, no FibreChannel etc.)
  • Fail-over can be tuned to be sub-10seconds via modifying tc.properties.
  • 2 physical copies of the Data being persisted to disk.
  • etc.

Understand that Network-Active-Passive is recommended. Even so, can you describe how I achieve Terracotta Server Failover via Shared-Disk-based Active/Passive setup?

  • Block storage (accessed via Fibre-Channel or iSCSI)
    • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
      • Define IP, Volume, Application (Terracotta Server) in a group [IP is not strictly needed, since tc-config allows you to specify multiple IPs - for the active TC Server and standby TC Servers.]
      • VCS will monitor these resources and in case of any failure, will call shutdown services on the node and failover IP, remount volume and start TCServer on the standby node. You need to supply shutdown/startup/monitoring Scripts.
    • Use Redhat Cluster Services to facilitate failover.
    • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
    • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
  • Network File Storage (not recommended anymore)
    • NFS Mount
    • NAS Device (which basically offers you a turn-key HA NFS mount)

TESTING WITH TERRACOTTA

For some general do's/don't see
http://dsoguy.blogspot.com/2007/06/why-your-distributed-tests-are-lying-to.html
http://dsoguy.blogspot.com/2007/07/distributed-performance-testing-anti.html
http://dsoguy.blogspot.com/2007/07/more-lies-distributed-performance.html and
http://dsoguy.blogspot.com/2007/07/what-were-those-results-again.html

How does my functional testing change in the presence of Terracotta?

It dosen't in the sense that BlackBox and WhiteBox testing should proceed as normal. We however require that you ensure maximal code-coverage, so that there are no situations untested where an object reference may join the clustered object graph and then throw a runtime UnlockedSharedException or a NonPortableExcepetion.

How should I performance/stress test with Terracotta. What metrics am I looking for?

  • Measure overall Throughput and Latency of at least a basket of transactions
  • Note to run the test in the presence of extraneous usage such as Admin Console, Occasional SVT recording session and Monitoring infrastrucutre (e.g. JMX Ping etc.)
  • You probably need to monitor class creation, locks, DGC activity and CPU/Memory/Disk/Network on L1s and L2s. Most of these are available via the Terracotta Admin Console. You can augment via nmon and similar tools - see http://www.terracotta.org/confluence/display/wiki/Testing+Distributed+Software

How do I execute on Availability Testing in QA/ Stage?

  • You may or may not want to repeat the infrastructure failure tests, especially if you are mirroring the certified config.

Have clients on the WAN and the LAN - can they both connect to the same terracotta servers?

  • Yes they can
  • However, a WAN client may not relinquish Locks in a timely fashion as compared to a LAN client (depending on what kind of WAN latencies you experience) - therefore LAN clients get awarded more latencies and/or contention.
  • Simiarly, a WAN client may not have such a reliable connection to the Terracotta server. The Terracotta server allows a L1 (client) to reconnect within a certain window (You specify that in tc.properties: Defaults ==> l1.reconnect.enabled=false; l1.reconnect.timeout.millis=5000) after the persistent TCP/IP connection between itself and the L1 is interrupted. Therefore if the WAN connectivity is interrupted for 4s - the remote WAN client may still connect back to the TCServer without issue, but then resources held by the WAN client are blocked across the cluster for 4s.
  • Therefore, allowing WAN clients connectivity ought to be treated with caution.
  • For ReadOnly cases, that allow a bit of asynchronicity - you might be able to clone the state and have WAN clients pick up a copy of state on a periodic basis off a shared datastructure (e.g. a Queue).

How can I test WAN clients?

  • Terracotta distribution bundles a primitive proxy+LoadBalancer in the product.
  • Look at com.net.tc.proxy.TCPProxy.proxy in tc.jar (or on SVN if you want to look at the source).
  • Usage is something like this:
    • cd to Terracotta-install/lib
    • host-siyer$ java -classpath tc.jar com.tc.net.proxy.TCPProxy
    • usage: TCPProxy <listen port> <endpoint[,endpoint...]> [delay]
      • <listen port> - The port the proxy should listen on
      • <endpoint> - Comma separated list of 1 or more <host>:<port> pairs to round robin requests to
      • [delay] - Millisecond delay between network data (optional,default: 0)
    • host-siyer$ java -classpath tc.jar com.tc.net.proxy.TCPProxy 9000 <tc-server-host>:9510 500
      • Thu Sep 13 11:04:53 PDT 2007: Starting listener on port 9000, proxying to [server:9510] with 500ms delay
    • So now you point the tc-config server element to the host and port where TCPProxy is running.
    • proxy> help
      • h - this help message
      • s - print proxy status
      • d <num> - adjust the delay time to <num> milliseconds
      • c - close all active connections
      • l - toggle debug logging
      • q - quit (shutdown proxy)
    • You can change the WAN delay at will (e.g. to 7s) after the proxy is running, with something like below.
      • proxy> d 7000
      • proxy> quit

Do you have an automated framework for Testing?

  • Droid is a scripted distributed testing framework.

Droid is a minimal framework for starting up, configuring, synchronizing, and collecting statistics about a test run on multiple machines. There are two concepts - the agent, and the worker (both just Java programs running in a JVM). The activities of both are scripted using Groovy, which is a scripting language that runs on the JVM and has many syntax similarities to Java.

An agent must be started on every machine that participates in the test. The agent runs a script that can coordinate with other agents (via Terracotta of course) and can choose how and when to start workers. The worker also starts with a Groovy script and performs the actual work of the test. The agent/worker split is necessary to allow things like a test that starts up a cluster, then later adds or removes nodes of the cluster. In that case, the agents run throughout the test but workers may be started or stopped according to the agent's script.

You can get the droid from the following location :

http://svn.terracotta.org/svn/forge/projects/droid

  • Cachetest are the set of tests which use droid as a framework to run across JVMs.

To run the cache tests, you will need to grab and build both projects. At runtime, you will be running the droid framework, using wrapper scripts from cache test to properly invoke it using the cache testing code. You can download the tests from the following location

http://svn.terracotta.org/svn/forge/projects/cachetest

You will need to modify agent.sh according to your environment. The cache tests are designed for distributed testing so you should be testing on multiple machines, however you can run everything on a single machine while in development.

The agents use a clustered CyclicBarrier to wait for each other, so the tests don't have to be started at exactly the right time or anything. They will start when the number of agents as specified in the AGENTS variable in agent.sh have arrived at the barrier.

TUNING TERRACOTTA

What are the main components of tuning Terracotta.

For Scale and Latency, one needs to typically look at:

  • Instrumentation Scope
  • Distributed Locking
    • Lock Type
    • Lock Striping
    • Lock Scoping/Batching
    • Injection of Synchronization within your application.
    • Be aware of Terracotta Limitations.
  • Memory Management
    • Garbage Collection
      • on Client JVM
      • on Terracotta Server JVM
    • Virtual Memory Manager
      • on client JVM
      • on Terracotta Server JVM
    • Distributed Garbage Collection on Terracotta Server JVM
  • Application Code
    • Choice of Data Structures that offer more concurrency and play well with the Terracotta Virtual Memory Manager.
    • Synchronizing on the right objects
  • Faulting Behavior
    • Fault Count in tc-config.xml
    • Fault Threads on the L2.
  • Transaction Batching
    • l1.transactionmanager properties in tc.properties.
    • Commit Threads on the L2.

For HA, one needs to look at:

  • Choice of HA strategy
  • Infrastructure Design
  • L1 <-> L2 HeartBeating
  • L2 <-> L2 HeartBeating

How do I tune Instrumentation Scope?

So how do I tune distributed Locks.

  • Lock Type.
    • In order of most pessimistic to optimistic, Lock Types supported by Terracotta are:
      • Synchronous Write
      • Write
      • Read
      • Concurrent
    • So make sure that we are using the right lock-type. e.g.
      • if a method is getXXX () and is only reading shared state, make sure it is not marked in tc-config.xml as lock-type of write.
      • Synchronous-write gives you greater guarantees, but comes at a high cost in terms of scale/latency - so use with caution.
      • Concurrent locks only demarcate transaction boundaries - so again be sure that your application is fine with broken semantic correctness
  • Lock Striping:
    • If you notice too much contention on a lock, you may be able to work around it via striping the lock.
    • The most popular example is when executing Hashmap.put and Hashmap.remove , you presumably are locking on the whole Hashmap. So by replacing a Hashmap with a ConcurrentHashMap (which by default support 16 buckets - and can be overridden via passing the concurrency-level as a constructor parameter) - you stripe the single Hashmap lock 16 ways. SEE http://unserializableone.blogspot.com/2007/04/performance-comparision-between.html
  • Lock Scoping/Batching:
    • Batching implies executing a bunch of state mutations in the context of a single distributed lock as against multiple-distributed locks. It usually is a trade-ff between concurrency (the more fine grained the locking, the more concurrent the application) and latency (the more coarse grained the lock, the more mutatations can be "batched" as a transaction reducing overheads around lock acquisistion and "Terracotta Transaction" committing).
    • As an example consider for (i=0; i< 1000; i++) { synchronized(foo) { hashmap.put(i, new Object()); }}. Executing the whole for loop in a single transaction (i.e. synchronizing outside the for-loop) is better than synchronizing within the for-loop.
  • Terracotta limitations:
    • Of course, one needs to be careful of not creating too huge of a "Terracotta Transaction" since as of now, Terracotta does not support transparent partial- transaction-commits or transaction-streaming, hence you could OOME the L1 with a giant "Terracotta-transaction".
    • Also be careful of what data-structures, since Terracotta is a little inconsistent today - e.g. ConcurrentHashMaps are implicitly locked, but Vectors/Hashtables and other thread-safe Collection implementations are not explicitly locked.
    • Volatiles are not supported.
  • Injection of Synchronization into the Application:
    • Use ReentrantReadWriteLocks.readLocks and ReentrantReadWriteLocks.writeLocks instead of synchronization.
    • Use a ThreadSafe DataStructure that Terracotta implicitly locks (e.g. ConcurrentHashMap) or a data structure that Terracotta provides a TIM for (e.g. Hashtable) - if you want locking at the datastructure level (as against higher in the application)
    • <auto-synchronized=true> in the lock-definition section of the tc-config.xml - implies that Terracotta will inject a ReentrantReadWriteLock.readLock and .writeLock based on your lock-type definition.
  • Others:
    • You will essentially need to analyze your code to identify what level of locking coarse-ness works best.
    • Please use the LOCK-PROFILER (admin Console) to gain visibility into the distributed locking behavior of your integrated application. See the Lock Profiler Guide at for details.

Can I get details of Garbage Collection Tuning with Terracotta.

  • It is no different than GC tuning on any standard JVM. On both client-JVM and TerracottaServer-JVM, the methodology is the same.
    • Profile GC characteristics - several tools available.
      • jstat [(e.g. $JAVA_HOME/bin/jstat -gcutil -h 10 -t <pid> 1s) will print GC stats every 1s]
      • Visual-GC/ JConsole / Visual-VM etc.
      • Parameters passed to java startup will result in more GC output in logs.
        • -verbose:gc
        • -XX:+PrintGCDetails
        • -XX:+PrintGCTimeStamps
        • -XX:+HeapDumpOnOutOfMemory (to get a heap-dump when JVM encounters Out Of Memory Exception)
    • Characterise the observations.
      • Are there frequent full GCs?
      • Are some of the full GC pauses unacceptably long (> double-digit seconds for example)?
      • Are Eden and Old appropriately sized - or is Old predominantly empty, while Eden is constantly getting filled up?
    • Improvements to GC characteristics typcially entail:
      • Heap Sizing (e.g. bump -Xms, -Xmx from 256m to 2048m)
      • Ratio of New/Young/Eden to Old/Tenured - [e.g. Modify via -XX:NewRatio (e.g. NewRatio=2, means Young is 1/3rd the Heap and Old is 2/3rd the Heap) or specify New Sizes directly - -XX:NewSize=512m -XX:MaxNewSize=512m implies size of Eden is 512m).
      • Choice of Collector (e.g. use -XX:+UseConcMarkSweepGC for the ConcurrentCollector. You could use parallel for Young via -XX:+UseParNewGC -XX:ParallelThreads=2, assuming you have atleast 2 cores on the box).
      • PermSize calibration (e.g. -XX:PermSize=256m -XX:MaxPermSize=256m)
    • For a more detailed article, that discusses GC tuning, see http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

How do I tune Virtual Memory Manager on the Client JVM and Terracotta Server JVM

  • What is Virtual Memory - see http://www.terracotta.org/confluence/display/docs1/Concept+and+Architecture+Guide#ConceptandArchitectureGuide-VirtualHeap and http://javamuse.blogspot.com/2007/10/why-is-rat-better-mouse-trap.html
  • It is also important to understand that you have to choose data-structures that play well with the Virtual Memory Manager implementation. e.g. know the difference between Partial Collections and Non-Partial Collections. Partial Collections are currently only implemented for HashMap, Hashtable, ConcurrentHashMap, LinkedBlockingQueue, Arrays).
  • Typical parameters to tune on the client JVM are (following makes the VMM run more aggressively).
    • l1.cachemanager.threshold (Decrease it - e.g. to 50 from default)
    • l2.cachemanager.criticalThreshold (Decrease it - e.g. to 50 from default)
    • l1.cachemanager.percentageToEvict (Increase it - e.g. to 20 from default)
    • l1.cachemanager.enabled, l1.cachemanager.logging.enabled (i.e. is Virtual Memory manager enabled at all and can I log its activity)
  • Typical parameters to tune on the Terracotta server are
    • l2.cachemanager.threshold
    • l2.cachemanager.criticalThreshold
    • l2.cachemanager.percentageToEvict
    • l2.cachemanager.enabled, l2.cachemanager.logging.enabled. (i.e. is Virtual Memory manager enabled at all and can I log its activity).

What is DGC and why should I tune it. And if I need to, how should I tune DGC.

  • How do you know that you have to tune DGC.
    • If left untuned, cluster wide garbage will remain in the distributed system for longer than needed. This will manifest itself as increases in the Live Managed Object Count (on the DGC Tab in the Admin Console) which will result in performance/scale degradation. It will also manifest itself as increased disk consumption on the Terracotta Server. Hence it is an important step in the Terracotta integration SDLC.
  • To tune DGC, you need to tune:
    • DGC Interval
      • In the tc-config.xml, you could tune DGC to run more frequently than the default 60 minutes.
      • We have seen applications (those with high churn - e.g. caches with low TTLs, VOIP Sessions) that need DGC to run as frequently as every 3 minutes
    • GC on the L1
      • For the Terracotta Server to consider a reference as garbage, no L1 in the cluster can have a reference to the object. To be sure of it, the reference has to get GC'ed on the L1.
      • To do so in a timely fashion, you have to tune OccupanyFraction. i.e. set XX:CMSInitiatingOccupancyFraction in the Java Options. (e.g. If you set -XX:UseCMSInitiatingOccupancyOnly=true and -XX:CMSInitialtingOccupanyFraction=50, then GCs will happen when Old gets to 50%. Otherwise, the threshold at which is collected is calculated by the formula ==> intiatingOccupancy = 100-MinHeapFreeRatio + MinHeapFreeRatio * (CMSTriggerRatio/100)). We have tuned it to as low as 10% - i.e. CMSIntitatingOccupancyFraction=10, to ensure that when a reference becomes garbage, the L2 is aware of it very soon.
    • BerkeleyDB
      • Sometimes, DGC is fine in terms of keeping up with Garbage churn, but BerkeleyDB Cleaner can't keep up. Hence, there may be a need to tune je.properties or tc.properties.
      • je.properties
        • l2.berkeleydb.je.cleaner.bytesInterval=100000000 (decrease - e.g. 20Million) - more aggressive cleaning.
        • l2.berkeleydb.je.checkpointer.bytesInterval=100000000 (decrease it e.g. ~ 20MB) - this forces more frequent checkpoints.
        • l2.berkeleydb.je.cleaner.lookAheadCacheSize=32768 (increase it - e.g. 65536) - lookahead cachesize for cleaning. This will reduce #Btree lookups.
        • l2.berkeleydb.je.cleaner.minAge=5 (decrease it - e.g. 1) - files get considered for cleaning sooner.
        • l2.berkeleydb.je.cleaner.maxBatchFiles=100 (e.g. set to 100) - Upper bounds the cleaner's backlog
        • l2.berkeleydb.je.cleaner.rmwFix=true disable this e.g. false) -
        • l2.berkeleydb.je.cleaner.threads=4 (increase this e.g. 8) - more threads cleaning.
      • tc.properties
        • l2.objectmanager.deleteBatchSize = 5000 (increase it - e.g. 40000) - more batched deletes
        • l2.objectmanager.loadObjectID.checkpoint.maxlimit = 1000 (increase it - e.g. 4Million) - product will default to this value in the next release.

How do I tune Faulting Behavior

I see fault and flush rate as almost identical on the Terracotta Admin Console. How do I fix it?

  • Flush means that pre-fetched objects that were pulled in as raw-DNA from the Terracotta server are not being requested (which is when they get converted from raw-DNA to Object references) - in such a case, we "flush" those DNAs, after a certain time-out, to reclaim memory on the L1.
  • In such a case you probably need to tune the fault-count (reduce it) and l1.objectmanager.remote.maxDNALRUSize, in tc.properties - (increase it e.g. to 120 from default of 60)

The bottleneck is in committing "Terracotta transactions" to the Terracotta server. What can I do?

  • Terracotta Transactions are written to a commit-buffer on the L1 from where a Terracotta thread writes to the Terracotta server and waits for an ACK from the server.
  • So one could increase the size of this commit buffer and "batch" Transaction writes to the Terracotta Server. One can do so via the following tc.properties:
    • l1.transactionmanager.logging.enabled=false (enable it e.g. true - to get more details)
    • l1.transactionmanager.maxOutstandingBatchSize=4 (increase it e.g. 8)
    • l1.transactionmanager.maxBatchSizeInKiloBytes=128 (increase it e.g. 256)
    • l1.transactionmanager.maxPendingBatches=88 (increase it e.g. 176)
  • If the bottleneck is on the L2 committing transactions, consider:
    • l2.seda.commitstage.threads=4 (increase it e.g. 8)

How do I tune Network Active/Passive for faster failover from Active to Passive.

By default with Network-Active-Passive, failover will occur within 45s. You will have to tune the following properties to get it lower (we have seen sub 5s in certain cases)

  • l2.healthcheck.l2.ping.enabled = true
  • l2.healthcheck.l2.ping.idletime = 5000
  • l2.healthcheck.l2.ping.interval = 1000
  • l2.healthcheck.l2.ping.probes = 3
  • l2.healthcheck.l2.socketConnect = true
  • l2.healthcheck.l2.socketConnectTimeout = 2
  • l2.healthcheck.l2.socketConnectCount = 10

TERRACOTTA INTEGRATION MODULES (TIMs) Tuning:

EHCache - Am using the EHCache TIM - but performance is poor.
  • Tune following tc.properties
    • Read and Write Concurrency: ehcache.concurrency = 1 (Increase it - e.g. to a prime number like 47)
    • Eviction properties: ehcache.global.eviction* properties
    • Lock Levels: ehcache.lock.readLevel = READ & ehcache.lock.writeLevel = WRITE to something more permissive if the application allows it.

<br>

DEVELOPING WITH TERRACOTTA

Distributed Cache

What are my choices with regards to Terracotta-clustered distributed caches ?

Terracotta provides:

Do you have a replacement for JBossCache?

Yes,

How do I implement a reaper for my cache?
  • TTL Based Mechanism:
    • You have to set a TTL and update _ExpirationTime for your cache-entry when it is created.
    • Periodically run a reaper thread that compares _ExpirationTime to CurrentTime and removes cache-entries if needed (or annotates them as expired)
I understand that for semantic correctness, I need to read with a READ Lock. But what if I am ok with Read-committed - when will references on any JVM become eventually correct.

Just as in the JVM Memory Model, you get no guarantees as to when Thread Memory gets flushed to Main Memory, if the 2nd thread dosen't request the same monitor - similarly, there are no guarantees (although it could range from sub-seconds to few seconds). Some applications could allow you to read without a Read-Lock (some cost savings in that no lock need be acquired), but you may read state committed prior or partially - given that ReentrantReadWriteLocks.readLocks can be doled out multiple concurrent readers, it seems to us that it is almost better to ALWAYS READ with a READ LOCK since the cost of acquiring one is significantly lower.

Classloaders

Does Terracotta support OSGI?

Not out of the box. But one can get around it by naming/registering classloaders. We have examples of Eclipse RCP applications (based on the Equinox implementation of the OSGI framework) being clustered via this technique with Terracotta. See http://forums.terracotta.org/forums/posts/list/15/131.page

When and why do I need to explicitly name and register ClassLoaders?

Object identity in Terracotta is the name of the ClassLoader + the fully qualified field reference. Hence the need to name/register classloaders, as of now to ensure that identity gets preserved when necessary in multiple-classloader environments.

Miscellaneous

What are semantics associated with the Concurrent Lock ?

A Concurrent Lock in Terracotta defines a transaction boundary but gives no visibility or correctness guarantees.
There is no conversation with the Terracotta server - but the "Terracotta transaction" begins when the concurrent lock is acquired and ends when the concurrent lock is released and all mutations within are batched and committed to the Terracotta server. So with a concurrent lock, 2 threads could read or write to a shared object graph concurrently (i.e no correctness guarantees). Defining no lock implies GETS/READS might work but there is no guarantee of correctness (in that one might be able to get stale or partially committed data) - but then you would not be able to write to it. With Concurrent Lock in place, GET/READS/WRITES can all work with no guarantees around correctness. (USAGE NOT RECOMMENDED)...

Does Terracotta support annotations?

Yes - see http://www.terracotta.org/confluence/display/labs/Annotations

Powered by Atlassian Confluence, the Enterprise Wiki. (Version: 2.5.5 Build:#811 Jul 25, 2007) - Bug/feature request - Contact Administrators