|
|
Release: 2.6.2 Previous Releases
Publish Date: July, 2008 |
rate-13500814-19754
| Article Rating? |
|
|
|
Cluster Statistics Recorder Guide |
|
What is the Cluster Statistics Recorder?
The Terracotta Cluster Statistics Recorder allows you to configure and manage the recording of statistics for your whole cluster.
It has been designed to be armed and ready to use whenever you want to get detailed information about all the nodes of a running system, and be able to correlate the results. It's not necessary to restart an application before being able to start capturing statistics. Note that while using the statistics recorder, the performance of your system might decrease. You are thus advised to not use the statistics recorder on a critical production system.
Architecture
In order to make it easier to understand the terminology that will be used in the rest of this document, we are going to briefly explain what the architecture of the statistics recorder looks like. Feel free to to look back at the schematic below whenever you want to clearly understand how each component fits into the global picture.

Agents
Each node in a Terracotta cluster is able to record statistics about its running system. To make this possible, they contain a statistics capturing agent. Agents run in parallel with the rest of the functionalities of your application. They are able to retrieve and to emit the data of the statistics that have been enabled in a capture session. Since the capturing and the emitting of the data is done independently in the agents, a disk buffer is used to spool the statistics data into before it is actually being sent out.
Agents are identified through two means, their IP address and a differentiator. The differentiator is generated by the environment in which the agent runs to make it possible to differentiate between several agents that might be running in different JMVs on the same machine.
Gateway
Since the nodes in the cluster can come and go, the actual topology is dynamic. It would be quite tedious to have to keep track of each note individually before being able to perform the statistics recording. This function is therefore performed by the gateway, which sits in front of all the nodes and groups together the functionalities into a single API. When interacting with the gateway, all the actions are always performed on all the nodes that are part of the cluster. Also, when the statistics capturing session is ongoing and a new node joins, the gateway will automatically configure that node and starts recording statistics on it also. Basically, the gateway makes the whole cluster look like a single machine.
Capture sessions
Before being able to start capturing, you need to create a capturing session with a unique identifier. This session allows you to setup which statistics are enabled and also to configure options that should be taken into account when the data capturing is performed. The unique identifier makes it possible to later group the data from all the individual nodes into a same capturing session.
Gatherer
The Terracotta server embeds a statistics gatherer that connects to the gateway. This gatherer will receive all the statistics data that will be sent by the gateway and stores this locally in embedded database. The data of the capturing sessions will thus end up in the gatherer, ready to be used by any tool that might need it. Note that this allows the statistics gathering process to be running without any client application being connected. This makes it possible to connect with the administration console, set up a capturing session, start capturing, disconnect the administration console, wait for a certain amount of time, connect again and finally retrieve the data of the captured statistics.
Statistics store
The gatherer stores to statistics data into an embedded database, which is the statistics store. The same statistics store is used by the Snapshot Visualization Tool to peruse the statistics data and make it possible to perform meaningful queries on what is available. Advanced users are also able to connect through JDBC to statistics stores and use SQL to perform queries of their own.
Statistics
Statistics have names to identify them and each agent is aware of the statistics that it supports. When statistics data is captured, the entries that are entered in the store don't necessarily have the same name. For example, the 'memory' statistic will each time store three entries: 'memory free', 'memory max' and 'memory used'. Also, each name might have several data elements. For instance, the 'cpu' statistic will create separate entries for each CPU in the machine called for example 'cpu 0' and 'cpu 1'. The names and the elements of each statistic data capture can thus create a lot of entries that are stored at the same time.
Supported statistics
The statistics that are supported by the agents are dependent on the system that the application is running on and also whether it's a Terracotta client or server. So, for example, the operating system, the JDK version, and possibly later the frameworks that are being used by an application, all contribute to determining the set of statistics that are supported by individual agents. Here again, the gateway takes away the burden of having to interact with the individual agents to handle their set of statistics. The gateway dynamically queries the agents in the cluster to provide an aggregated and comprehensive list of all the statistics that are supported by the cluster as a whole. When the statistics are enabled in a capturing session, the gateway will enable the statistics on the appropriate agents.
Statistic types
The statistics recorder supports three different types of statistics, each supported statistic has one particular type that is appropriate to the data that is being collected.
- startup: statistics of the startup type will be automatically captured at the beginning of a capturing session.
- snapshot: statistics of the snapshot type will be automatically captured during the capturing session, based on the frequency that has been set up in the configuration.
- triggered: statistics of the triggered type will not be captured automatically, they are either captured through API calls or when certain conditions in the running system occur.
Generally you shouldn't worry about these types at all since each statistic is aware of its own type and the statistics recorder behaves appropriately during the capture process.
Configuration
The default behavior of the configuration should suffice unless you are running several JVMs on the same machine, require specific locations where statistics data need to be buffered and stored, or want to protect the access to the restful interface.
Paths in tc-config.xml
The statistics recorder needs to store data on disc, both for the agent buffers and the gatherer store.
Terracotta server agent
By default, the directory that will be used both for the buffer and the store will be 'terracotta' in the current directory.
If you want to change the directory, you can add a <statistics> element inside a <server> element of the tc-config.xml file.
For example, below, the terracotta/server-stats will be used in the user's home directory instead:
<server>
<statistics>%(user.home)/terracotta/server-stats</statistics>
</server>
Terracotta client agent
By default, the directory that will be used both for the buffer will be 'terracotta-10.0.2.1' in the current directory, where 10.0.2.1 is the IP address of the client machine.
If you want to change the directory, you can add a <statistics> element inside the <clients> element of the tc-config.xml file.
For example, below, the terracotta/client-stats will be used in the user's home directory instead:
<clients>
<statistics>%(user.home)/terracotta/client-stats</statistics>
</clients>
Authentication
You'll read below that one of the ways to work with the statistics recorder is a RESTful interface. This opens up functionalities over regular HTTP that you only might want certain people to have access to. To protect this interface you can setup a file with usernames and passwords. The path to this file can be added through a <http-authentication> element inside a <server> element of the tc-config.xml file.
For example, below the terracotta/realm.properties file inside the user's home directory will be used to get credentials from for the HTTP authentication:
<server>
<http-authentication>
<user-realm-file>%(user.home)/terracotta/realm.properties</user-realm-file>
</http-authentication>
</server>
The format of this file is as follows:
username1: password1, statistics
username2: password2, statistics
...
You should fill in the appropriate values for username1, password1, ... and each user must be listed on its own line. Make sure to end each line with 'statistics' since that indicates that the user has access to the statistics RESTful interface.
Passwords may be clear text, obfuscated or checksummed. The class org.mortbay.jetty.security.Password from the Jetty webserver should be used to generate obfuscated passwords or password checksums. Please refer the documentation there for more information about how to generate the other password versions.
Non-dynamic statistics
Even though we strive for making the whole statistics recorder infrastructure totally dynamic, certain statistics create such a high performance impacts, that they need to be explicitly enabled before your application is started up. This is done by setting up the appropriate [tc.properties].
Below is the list of non-dynamic statistics and the properties that need to be set to true before they can be used:
| statistic name |
property name |
| l2 faults from disk |
l2.objectmanager.fault.logging.enabled |
| message monitor |
tcm.monitor.enabled |
Running the Cluster Statistics Recorder
The Cluster Statistics Recorder has been designed in such a way that multiple interfaces can be used to achieve the same functionalities.
Graphical User Interface (GUI)
The easiest way to get started is to use the visual interface that is part of our administration console, which will provide you with a friendly GUI that allows you to perform most operations without having to learn any commands. Please refer to the Admin Console Guide for further information.
Command-line interface (CLI)
The CLI interface allows you to perform all the operations that are available through the GUI in a text console. This makes it possible to integrate the statistics recorder into scripts, to easily launch capture sessions on headless displays, and more ...
Running the CLI
Windows
Linux
CLI usage and commands
The commands that are provided to tc-stats are separated by spaces and you can use as many as you want on the same command-line entry. It works as a series of commands that are executed in the same order as they've been listed.
Instead of listing all the command on the command-line, you can put them in a dedicated file and use tc-stats with the --file option to retrieve the commands from that file instead.
For instance, the example below will startup the local gatherer and the gateway, retrieve the list of supported statistics of the cluster, and shut the gatherer and the gateway down again.
bin\tc-stats.bat startup getSupportedStatistics shutdown
The output could be:
Starting a capture session from the CLI
To start a capture session from the CLI, you have to use the following commands in order:
- startup
- createSession
- enableStatistics
- startCapturing
For example:
bin\tc-stats.bat startup createSession "test session" enableStatistics "cpu,l2 l1 fault,memory" startCapturing
Whose output could be:
Stopping a capture session and retrieving the data
To stop a capture session from the CLI, you have to use the stopCapturing command:
bin\tc-stats.bat stopCapturing
Afterwards, to retrieve the statistics data, you can do:
bin\tc-stats.bat retrieveStatistics statistics.zip
This zip file contains a single entry with all the statistics that are stored in the gatherer in the CSV format. See below for more information about the structure of the CSV file.
Capturing triggered statistics
Some statistics will never be captured automatically since they are of the 'triggered' type. The 'thread dump' statistics is one of those. To capture the data for such statistics, you have to use the 'captureStatistic' command.
For example:
bin\tc-stats.bat captureStatistic "thread dump"
You will receive the data of the captured statistic as text in the standard output, but the same data has also been recorded in the gatherer. When you retrieve the statistics afterwards, the data of the triggered statistics will be present together with the data that is captured automatically.
RESTful interface
All the commands that are available through the CLI and admin GUI are also available through a RESTful interface. This means that you can use any HTTP client to perform these operations, for example a curl or a web browser.
Accessing the list of supported RESTful commands (methods)
The RESTful interface is available on your Terracotta server and runs on the same port as the DSO server (see the config reference guide for more information). By default this will be port 9510. The RESTful interface is located at the statistics-gatherer path.
So if you're working on the same machine as the one you're running a server on, you'd access the URL below with a web browser to get a list of all the methods that are supported through the RESTful interface. If you're running to server on another machine or with another port number, you'd have to change those in the URL of course.
The commands you see will look very familiar since most of them are the same as what you've seen above for the CLI.
Starting a capture session from the RESTful interface
To start a capture session from the RESTful interface, you have to use the same commands as for the CLI. You'll have to visit each one independently with a HTTP client.
Note that the arguments to the commands are separated through a question mark (?). Each argument has the format 'key=value' and they are separated through ampersands (&). If an argument has multiple values, like the 'names' argument above, you have to repeat the key/value pair several times. If there are any spaces in the argument values, you need to replace them with '%20', otherwise the URL will not be correct.
Retrieving the capturing data live as text
The RESTful interface allows you to use its 'retrieveStatistics' command to see the captured statistics data at any moment in time. By providing it with the 'format=txt' argument, you can see it directly in the browser.
This is the full command:
RESTful methods and their arguments
Below is a list of all the supported RESTful methods and the arguments they support. Note that an asterisk (*) in front of argument names indicates that these arguments are mandatory.
| Method |
Argument(s) |
startup starts up the gatherer |
none |
shutdown shuts down the gatherer |
none |
reinitialize reinitializes the entire statistics recorder cluster |
none |
createSession creates a new capture session and actives it |
*sessionId: the unique session ID |
closeSession closes the currently active session |
none |
getActiveSessionId returns the ID of the active session |
none |
getAvailableSessionIds returns the list of session IDs in the statistics store |
none |
getAvailableAgentDifferentiators returns the list of agent differentiators |
none |
getSupportedStatistics returns the list of all the supported statistics in the cluster |
none |
enableStatistics sets the enabled statistics for the active session |
*names: the names of the statistics |
captureStatistic captures and returns the data for a triggered statistic |
*name: the name of the statistic |
startCapturing starts the capturing for the active session |
none |
stopCapturing stops the capturing for the active session |
none |
setGlobalParam sets the value of a global configuration parameter |
*key: the config parameter key
*value: the config parameter value |
getGlobalParam retrieves the value of a global configuration parameter |
*key: the config parameter key |
setSessionParam sets the value of a configuration parameter for the active session |
*key: the config parameter key
*value: the config parameter |
getSessionParam retrieves the value of a configuration parameter for the active session |
*key: the config parameter key |
clearStatistics clears the stores statistics for a particular session |
*sessionId: the unique session ID |
clearAllStatistics clears all the stored statistics |
none |
retrieveStatistics retrieves the data from the statistics store |
format: txt or zip to indicate the output format
sessionId: the session ID to retrieve data for
agentIp: the agent IP to retrieve data for
agentDifferentiator: the agent differentiator to retrieve data for
names: the statistic names to retrieve data for
elements: the statistic data elements to retrieve data for |
aggregateStatisticsData Aggregates statistics data for use in time-series. |
interval: the interval in milliseconds, this can be used to create a fixed-size sliding window that will only show the capture data from now until a fixed point in the past
format: csv to generate CSV data or xml to generate XML data
*sessionId: see above
*agentIp: see above
*agentDifferentiator: see above
*names: see above
elements: see above |
Data retrieval
After the recording of the statistics, you're able to retrieve the data as a dump of all the recorded entries or in an aggregated fashion.
CSV 'dump' format
When using the 'retrieveStatistics' statistics command through one of the interfaces, you'll receive the data from the statistics store in an unprocessed fashion. The format that is used is CSV and even if you receive the data as a zip file, the only entry in the archive will be that CSV file.
The CSV format is very handy for transferring the recorded data to other tools and applications. You can for instance import this file into Excel and generate graphs yourself if you want.
These are the columns of the CSV file:
- Session ID : the unique identifier of the session
- IP : the IP address of the agent
- Differentiator : the string that can be used to differentiate the agent
- Moment : the moment when the data was captured, expressed as milliseconds since epoch (unix time
)
- Name : the name of the statistic
- Element : the name of the data element of the statistic (can be empty)
- Data Number : the statistic data as an integer number
- Data Text : the statistic data as text
- Data Date : the statistic data as a timestamp, expressed as milliseconds since epoch
- Data Decimal : the statistic data as a decimal number
Note that only one of the four data columns can be filled in, the one that's used is up to internal implementation of the statistic.
For example:
Session ID,IP,Differentiator,Moment,Name,Element,Data Number,Data Text,Data Date,Data Decimal
"mysession","192.168.1.33","L2/9510","1208722084118","startup timestamp",,,,"1208722084118",
"mysession","192.168.1.33","L2/9510","1208722084120","l2 l1 fault",,"0",,,
"mysession","192.168.1.33","L2/9510","1208722084120","memory free",,"244609968",,,
"mysession","192.168.1.33","L2/9510","1208722084120","memory max",,"265486336",,,
"mysession","192.168.1.33","L2/9510","1208722084120","memory used",,"20876368",,,
"mysession","192.168.1.33","L1/0","1208722084122","startup timestamp",,,,"1208722084122",
"mysession","192.168.1.33","L1/0","1208722084123","l1 transaction size",,"0",,,
"mysession","192.168.1.33","L1/0","1208722084123","memory free",,"44666792",,,
"mysession","192.168.1.33","L1/0","1208722084123","memory max",,"66650112",,,
"mysession","192.168.1.33","L1/0","1208722084123","memory used",,"21983320",,,
"mysession","192.168.1.33","L1/1","1208722084127","startup timestamp",,,,"1208722084127",
"mysession","192.168.1.33","L1/1","1208722084128","l1 transaction size",,"0",,,
"mysession","192.168.1.33","L1/1","1208722084128","memory free",,"43802560",,,
"mysession","192.168.1.33","L1/1","1208722084128","memory max",,"66650112",,,
"mysession","192.168.1.33","L1/1","1208722084128","memory used",,"22847552",,,
"mysession","192.168.1.33","L2/9510","1208722089118","l2 l1 fault",,"30",,,
"mysession","192.168.1.33","L2/9510","1208722089118","memory free",,"232671448",,,
"mysession","192.168.1.33","L2/9510","1208722089118","memory max",,"265486336",,,
"mysession","192.168.1.33","L2/9510","1208722089118","memory used",,"32814888",,,
"mysession","192.168.1.33","L2/9510","1208722089119","shutdown timestamp",,,,"1208722089119",
"mysession","192.168.1.33","L1/0","1208722089123","l1 transaction size",,"0",,,
"mysession","192.168.1.33","L1/0","1208722089123","memory free",,"43161296",,,
"mysession","192.168.1.33","L1/0","1208722089123","memory max",,"66650112",,,
"mysession","192.168.1.33","L1/0","1208722089123","memory used",,"23488816",,,
"mysession","192.168.1.33","L1/0","1208722089124","shutdown timestamp",,,,"1208722089124",
"mysession","192.168.1.33","L1/1","1208722089128","l1 transaction size",,"10",,,
"mysession","192.168.1.33","L1/1","1208722089128","memory free",,"40551680",,,
"mysession","192.168.1.33","L1/1","1208722089128","memory max",,"66650112",,,
"mysession","192.168.1.33","L1/1","1208722089128","memory used",,"26098432",,,
"mysession","192.168.1.33","L1/1","1208722089129","shutdown timestamp",,,,"1208722089129",
Aggregated formats
The CSV dump format is handy when you want to work with all the data after the statistics recording has finished. However, often you also want a real-time view into the statistics data while it's being captured. For that you typically only want to know use the values of certain statistics, aggregated according to the moment on which they were captured. This makes it easy to create a timeline on which graphs are created that correspond to the values of the statistics.
You can receive this information through the 'aggregateStatisticsData' command of the RESTful interface.
For example, let's say that you want to get all the data of the memory statistics for the Terracotta server, over the last 10 seconds. This URL will give you exactly that (on one line):
http: &format=xml&agentDifferentiator=L2/9510
&names=memory%20used&names=memory%20free&names=memory%20max
&interval=10000
The result could be:
If you would change the 'format' argument to 'csv', the result could be:
You can see that the first column or the XML element 'm' always contains the unix timestamp. The other columns or XML elements are dynamic and will correspond to the statistics names that you've requested. In this case, the first column is 'memory used', the second one is 'memory free' and the third one is 'memory max'. The names of the XML elements have been compressed deliberately to reduce the content size as much as possible.
With this aggregated output, you can use many tools directly with the cluster statistics recording to create real-time charts of the statistics that interest you (for instance YUI or Flex ).
Troubleshooting
In the unlikely event that something goes wrong with the Cluster Statistics Recorder and it is not able to repair the agent buffers and stores itself, the recording will not be functional but the rest of Terracotta's features will all be operational. This gives you the opportunity to determine the best approach towards resolving the issue without impacting the rest of your application.
Reinitializing all the nodes
Usually, the recorded statistics data in not very valuable and the sessions that you need to keep around have already been exported for use in the Snapshot Visualization Tool or something else.
The easiest way to reset the Cluster Statistics Recorded to a clean state, is to use the 'reinitialize' command through the CLI or through the RESTful interface. This will stop any ongoing recordings, remove all existing sessions and recreate all the buffers and stores in the cluster. After the reinitialize call, the statistics recorder will have a clean slate that is immediately ready to be used, without having to restart your application.
On the CLI you can perform this re-initialization as follows:
bin\tc-stats.bat reinitialize
Through the RESTful interface it would be like this:
Appendix
Contacting Terracotta
Contact Terracotta at the following:
Web site: http://www.terracotta.org
Online forums: http://forums.terracottatech.com/forums/
Information: info@terracottatech.com
Platform Support
See Platform Support for information on which platforms are supported by Terracotta.
See the Integrations space to see the status of integrations with third-party technologies.
Copyright Information
Copyright © 2005-2007
Terracotta, Inc.
All Rights Reserved
This publication (the "Documentation") and the Terracotta software which it describes (the "Software") are protected to the maximum extent permitted under applicable law, including but not limited to, the regulations set forth in Title 17 of the United States Code, and California law. This Documentation, or any parts thereof, may not be reproduced in any form, by any method, for any purpose, without the express written consent of Terracotta. Terracotta makes no warranty, either express or implied, including but not limited to any implied warranties of merchantability or fitness for a particular purpose, with respect to the Software discussed in this Documentation, and the Documentation itself (collectively, "the Materials"). The Materials are made available solely on an "as-is" basis. In no event shall Terracotta be liable to anyone for special, collateral, incidental, indirect, punitive, exemplary, or consequential damages in connection with, or arising from the purchase or use of, the Materials. Under no circumstances and regardless of the cause of action alleged, shall Terracotta's liability exceed the purchase price of the Software described herein. Terracotta reserves the right to revise and improve its Software and Documentation as it deems fit. The Documentation describes the state of the Software at the time of publication.
Trademarks
"Terracotta," the stylized "T" logo, and "Open Terracotta" are trademarks of Terracotta. All other brand names, product names, or trademarks belong to their respective holders. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. All other brand names, product names, or trademarks belong to their respective holders.
Government Use
Use, duplication, or disclosure by the U.S. Government is subject to restrictions as set forth in FAR 12.212 (Commercial Computer Software-Restricted Rights) and DFAR 267.7202 (Rights in Technical Data and Computer Software), as applicable.
|
|
|