DEPLOYING TERRACOTTA

Added by Tushar Khairnar, last edited by Sreenivasan Iyer on Dec 09, 2009  (view change)

Labels

 
(None)

Unknown macro: {builder-pagetitle}

Contents

The page Header does not exist.

HIGH AVAILABILITY:

How do I make sure that there is no Single Point of Failure in my architecture?

    • You need to deploy one of Terracotta certified configurations.
    • See http://www.terracotta.org/confluence/display/docs1/Network+Active-Passive+Deployment+Guide for Networked Active/Passive HA. Setting up your network as specified in the document, would imply that there is no Single Point of Failure in your architecture.
    • Other option is Shared-Disk-Based Active/Passive (you still need to ensure that there is no SPoF in your network architecture):
      • Block Storage (accessed via Fibre-Channel or iSCSI)
        • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
        • Use Redhat Cluster Services to facilitate failover.
        • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
        • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
      • Network File System (not certified on 2.6.x and beyond, given that there is a known failure not covered in this case) - See Document Terracotta_SharedDisk_HA_withNFS.pdf
        • NFS mount point as shared disk
        • NAS appliance as shared disk

What are the advantages/disadvantages of Networked-HA over Shared-disk-based-HA?

  • So many advantages, that Networked HA is the RECOMMENDED HA solution (unless there is a very compelling reason in your use-case or IT environment to use disk-based HA).
  • Hot-Hot Standby (i.e. no dip in TPS on failover) whereas a Shared Disk based solution is a Hot-Cold Standby (i.e. expect a dip in TPS on failover and eventual recovery).
  • HA in a box (no need for any special VCS or RedHat Clustering setup or a NAS device etc.)
  • Can use local Disk on each Terracotta server and cheaper in terms of disk setup (no virtualized storage needed, no HBA cards, no FibreChannel etc.)
  • Fail-over can be tuned to be sub-10seconds via modifying tc.properties.
  • 2 physical copies of the Data being persisted to disk.
  • etc.

Understand that Network-Active-Passive is recommended. Even so, can you describe how I achieve Terracotta Server Failover via Shared-Disk-based Active/Passive setup?

  • Block storage (accessed via Fibre-Channel or iSCSI)
    • Use Vertias Cluster Services (VCS) to facilitate failover - See Document Terracotta_SharedDisk_HA_withVCS.pdf
      • Define IP, Volume, Application (Terracotta Server) in a group [IP is not strictly needed, since tc-config allows you to specify multiple IPs - for the active TC Server and standby TC Servers.]
      • VCS will monitor these resources and in case of any failure, will call shutdown services on the node and failover IP, remount volume and start TCServer on the standby node. You need to supply shutdown/startup/monitoring Scripts.
    • Use Redhat Cluster Services to facilitate failover.
    • Use Vertias VxCFS (Veritas Cluster File System) ==> not certified yet
    • Use RedHat GFS (not fully tested or certified yet) ==> not certified yet.
  • Network File Storage (not recommended anymore)
    • NFS Mount
    • NAS Device (which basically offers you a turn-key HA NFS mount)

I have a DR site as well - How do I replicate Terracotta State to the DR Site?

  • See the attached document (DROptions.pdf) for an overview of options available from:
    • Using Terracotta Backup from the Terracotta Operations Center Console (Paid Versions only)
    • Setting up another Mirror in the DR Site (at marginal cost to every live transaction received by the Active Terracotta Server)
    • Implementing this at the application tier via messaging to a receiver JVM in the DR site.

INFRASTRUCTURE-PROVISIONING:

How do I size my Production Terracotta Servers?

Consider the following factors:

  • Wrt Memory Sizing:
    • What is the size of the clustered object graph?
      • Does it fit in Heap or does it far exceed size of available RAM.
    • Do Application Latency SLAs allow for occasional faulting from Disk?
      • If No, then, in such a case, there is a requirement that the entire Data structure fit in JAVA-HEAP (or in Machine RAM). When HEAP Xmx gets too large, a pauseful Full-GC can wreak havoc in terms of cluster TPS. We "typically" do not recommend a heap larger than 6-8G, although one still needs to provision a box with a large amount of RAM so BDB (BerkeleyDB, the database that runs in-process with Terracotta) blocks are cached by the OS.
  • Wrt CPU Sizing:
    • Is the machine dedicated to the Terracotta Server Process or does it run other processes.
    • Does the application create a fair amount of garbage - If yes, then one can expect a fair amount of GC and DGC activity (and hence higher CPU needs).
    • How many Writes/Reads against the Terracotta Server.
    • If there is a fair bit of clustered I/O or extensive GC/DGC, then a 4-core box, at the very least, would be a requirement.
  • Other Considerations:
    • Dual NIC'ed boxes for L1 and L2 are required to avoid SpoFs (Single Points of Failure).
    • Disk/Network - see next set of issues below.

Can I deploy multiple active Terracotta servers - for scale and HA?

    • Typically Terracotta supports only a Single Active Server and multiple Standbys (in ES/EX) - FX (Enterprise customers) supports multiple Active TC Servers (Each TC Server and its hot-standby are referred to as a Mirror).
    • In version2, you can stripe any arbitrary POJO graph against any number of Mirror-Groups.
    • Deprecated ==> In Version1, we supported a TIM (Terracotta Integration Module) - named Partitioned-EHCache (only available to Enterprise Subscription Customers), which transparently stripes an EHCache Instance across multiple Active Terracotta Servers. One can thus scale out the Terracotta Server Tier as well and support a much larger TPS requirement, until other factors (such as partial key limitations constrain). See details attached as partitioned-EHCache-readme.txt. Note that using this mechanism, no matter what associative array represents the application interface (e.g. Hashmap, Hashtable, ConcurrentHashmap etc.) - one can, with an extra level of indirection, point to a Cache-Entry in an EHCache instance, which can then be striped across multiple Active Terracotta Servers. Some restrictions apply: Assumed that the EHCache key, values in each Terracotta Server are completely contained in a given silo. i.e. there is no overflow of object graphs allowed from one TC server to another (if so, a runtime exception is thrown on the client during object-graph traversal).

Please enumerate disk storage choices with Terracotta.

  • Disk Options:
    • Write to local Disk (fast, cheap)
    • Write to shared Storage (BLOCK STORAGE) - (Separate LUNs in case of Network-Active/Passive, Same LUN in case of Disk-Based-Active/Passive) e.g.
      • Block Storage via HBA talking Fibre-Channel (fast, expensive, centralized storage implies greater manageability).
      • Block storage via iSCSI (slower, cheaper)
    • Write to shared storage (NETWORK FILE SYSTEM) - (Not recommended, anymore).
      • NFS Mount
      • NAS Device.
    • Disk RAID Levels - at least Mirrored, although Striped and Mirrored is ideal.
    • RECOMMENDATION: Write to Local Disk (fewer failure possibilities and typically faster) - In some cases, however we have seen each TCServer right to its own LUN on Shared Storage.
  • File-System Choices:
    • EXT2: Not recommended.
    • EXT3: Recommend it - since the version of BDB embedded within the TC Server requires this file-system. (This BDB version is known to be unstable on ext2). If mount-options need to be fine-tuned, then you could try noatime, commit=1 (default is 5), data=ordered (the default) and /proc/sys/vm/dirty_ratio=10 and /proc/sys/vm/dirty_background_ratio=5 (on Linux, at least).
    • ReiserFS: Suse installations default to ReiserFS - ReiserFS seems about 20% faster than ext3 in many cases.
    • Others: ZFS/UFS - seen on occasion
    • RECOMMENDATION: We have seen best numbers on Ext3 and ReiserFS - so those are the recommended file-systems.

Please enumerate Network Provisioning Considerations.

  • Gigabit full duplex between L1s and L2 (for any sizeable use case). There might be an occasional use-cases, where 10/100 might suffice.
  • Same thing between L2s, if it is going through the same switches. If it is a direct trunc (i.e. route does not pass through switch) - then even 10/100 should be ok. Latency, not bandwidth, is of utmost importance between L2s, as we do not transfer a lot of data for election, heartbeating etc. Although, in case of failures, there is a DB-Sync operation that is bandwidth intensive.
  • Make sure that the network interfaces are configured properly in /etc/host.conf (or similar) for utilizing full capacity and full duplex. In many deployments, we have seen incorrect setups (i.e. setup for 10/100).
  • Redundant NICs, redundant Switches with automatic failover (via VRRP, HSRP). See NAP diagram.
  • Configure the NIC failover time on the OS level appropriately, VRRP failover time appropriately - so your overall failover-time SLA is met.
  • With dual NICs - make sure they are in failover mode.
  • With dual redundant NICs, make sure that each mirrored pair is set up for failover.

MONITORING:

What should I monitor when I deploy a Terracotta Integration into Production?*

    • See the attached document entitled : "Terracotta_Operations_Runbook.doc"

SECURITY:

Security Best Practices for Terracotta.

  • Data put into Terracotta Shared Data Structures can be directly viewed in the Terracotta Developer Console. So If its a concern, then encrypt data before putting into Terracotta. Eg. Passwords/ Sensitive Data such as Payroll etc.
  • Use Jmx-security for Authentication for Admin Console. See here:
    http://www.terracotta.org/web/display/docs/Configuration+Guide+and+Reference#ConfigurationGuideandReference-serverauthentication
  • IPSec tunnel over SSH for encrypted L1-L2 communication.
  • Can build firewall rules to allow connections to the l2 on the DSO Port from a known list of L1s ONLY.
  • Additionally, one can via JMX API get the Cluster Topology and add Monitoring to ensure that the connected L1s matches a known-allowed list - no more, no less.