This document shows you how to test and tune Terracotta Distributed Ehcache for Hibernate.
|
- Set Eviction Parameters
- Turn Off Query Cache
- Prevent Unnecessary Database Connections (see Reducing Unnecessary Database Connections)
- Configure Database Connection Pool (see Connection Pools)
- Turn off Unnecessary Statistics Gathering
|
The main benefit of a Hibernate second-level cache is raising performance by decreasing the number of times an application accesses the database. To gauge the level of database offloading provided by the Terracotta Distributed Ehcache for Hibernate second-level cache, look for these benefits:
The number of threads that can simultaneously access the distributed second-level cache can be scaled up more easily and efficiently than database connections, which generally are limited by the size of the connection pool.
You should record measurements for all of these factors before enabling the Terracotta Distributed Ehcache for Hibernate second-level cache to create a benchmark against which you can assess the impact of using the cache. You should also record measurements for all of these factors before tuning the cache to gauge the impact of any tuning changes you make.
Another important test in addition to performance testing is verifying that the expected data is being loaded. For example, loading one entity can result in multiple cache entries. One approach to tracking cache operations is to set Hibernate cache logging to "debug" in
log4j.properties
:
log4j.logger.org.hibernate.cache=debug
This level of logging should not be used during performance testing.
When using a testing framework, ensure that the framework does not cause a performance bottleneck and skew results.
Caches that get too large may become inefficient and suffer from performance degradation. A growing rate of flushing and faulting is an indication of a cache that's become too large and should be pruned.
The most important parameters for tuning cache size and cache performance in general are the following:
TTI and TTL are set in seconds.
You can also control Hibernate region sizes using the following parameters:
You can set eviction parameters in two different ways:
ehcache.xml
– Configuration file for Terracotta Distributed Ehcache for Hibernate with properties for controlling eviction on a per-cache basis. See Setting Cache Eviction for more information.After setting eviction parameters, be sure to test the effect on performance (see Testing the Cache) .
The cache miss rate is a measure of requests that the cache could not meet. Each miss can lead to a fault which requires a database query. (However, misses and faults are not one-to-one since a query can return results that satisfy more than one miss.) A high or growing cache miss rate indicates the cache should be optimized.
To lower the miss rate, adjust for regions containing entities with high access rates to evict less frequently. This keeps popular entities in the cache for longer periods of time. You should adjust eviction parameter values incrementally and carefully observe the effect on the cache miss rate. For example, TTI and TTL that are set too high can introduce other drawbacks, such as stale data or overly large caches.
Examinator , the Terracotta reference application that uses Terracotta Distributed Ehcache for Hibernate to implement the second-level cache, supports thousands of concurrent user sessions. This web-based test-taking application caches exams and must have TTI and TTL properly tuned to prevent unnecessarily large data caches and stale exam pages.
The following sections detail how certain cached Examinator data is configured for second-level caching. Included are snippets from the Terracotta distributed Ehcache for Hibernate configuration file (see Cache Configuration File).
The data defining user roles has the following characteristics:
Therefore, user roles are cached and never evicted (TTI=0, TTL=0). In general, read-only data that is used frequently and never grows stale should be cached continuously.
<cache name="org.terracotta.reference.exam.domain.UserRole"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="0"
timeToLiveSeconds="0"
overflowToDisk="false">
<terracotta/>
</cache>
User data, which includes the user entity and its role, is useful only while the user is active. This data has the following characteristics:
Therefore, these entities should have a short idle time of two minutes (TTI=120) to allow data associated with inactive user sessions to be evicted. However, there should never be eviction based on a hard lifetime (TTL=0), thus allowing the associated entities to be cached indefinitely as long as TTI is reset by activity.
<cache name="org.terracotta.reference.exam.domain.User"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="120"
timeToLiveSeconds="00"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.User.roles"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="120"
timeToLiveSeconds="0"
overflowToDisk="false">
<terracotta/>
</cache>
Exam data is includes the actual exams being taken by users. It has the following characteristics:
Since there can be many different exams, and the potential exists for a cached exam to become stale, cached exams should be periodically evicted based on lack of access (TTI=3600) and to ensure they are up-to-date (TTL=86400).
<cache name="org.terracotta.reference.exam.domain.Exam"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Section"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Section.questions"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Section.sections"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Question"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Question.choices"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
<cache name="org.terracotta.reference.exam.domain.Choice"
maxElementsInMemory="1000"
eternal="false"
timeToIdleSeconds="3600"
timeToLiveSeconds="86400"
overflowToDisk="false">
<terracotta/>
</cache>
If your application caches read-only data, the following may improve performance:
The JDBC mode Autocommit automatically writes changes to the database, making it unnecessary for an application to do so explicitly. However, unnecessary database connections can result from Autocommit because of the way JDBC drivers are designed. For example, transactional read-only operations in Hibernate, even those that are resolved in the second-level cache, still generate "empty" database connections. This situation, which can be tracked in database logs, can quickly have a detrimental effect on performance.
Turning off Autocommit should prevent empty database connections, but may not work in all cases. Lazily fetching JDBC connections resolves the issue by preventing JDBC calls until a connection to the database actually needed.
Two options are provided for implementing lazy fetching of database connections:
If your application is based on the Spring framework, turning off Autocommit may not be enough to reduce unnecessary database connections for transactional read operations. You can prevent these empty database connections from occurring by using the Spring
LazyConnectionDataSourceProxy
proxy definition. The proxy holds unnecessary JDBC calls until a connection to the database is actually required, at which time the held calls are applied.
To implement the proxy, create a target DataSource definition (or rename your existing target DataSource) and a
LazyConnectionDataSourceProxy
proxy definition in the Spring application context file:
<!-- Renamed the existing target DataSource to 'dataSourceTarget' which will be used by the proxy. -->
<bean id="dataSourceTarget"
class="org.apache.commons.dbcp.BasicDataSource"
destroy-method="close">
<property name="driverClassName"><value>com.mysql.jdbc.Driver</value></property>
<property name="url"><value>jdbc:mysql://localhost:3306/imagedb</value></property>
<property name="username"><value>admin</value></property>
<property name="password"><value></value></property>
<!-- other datasource configuration properties -->
</bean>
<!-- This is the lazy DataSource proxy that interacts with the target DataSource once a real statement is sent to the database. Users use this DataSource to set up their Hibernate session factory, which in turn forces the Hibernate second-level cache and also everything that interacts with that Hibernate session factory to use it. -->
<bean id="dataSource"
class="org.springframework.jdbc.datasource.LazyConnectionDataSourceProxy">
<property name="targetDataSource"><ref local="dataSourceTarget"/></property>
</bean>
Your application's SessionFactory, transaction manager, and all DAOs should access the proxy. Since the proxy implements the DataSource interface too, it can simply be passed in instead of the target DataSource.
See the Spring documentation for more information.
By implementing a custom Hibernate connection provider, you can use the
LazyConnectionDataSourceProxy
in a non-Spring based application:
public class LazyDBCPConnectionProvider implements ConnectionProvider {
private DataSource ds;
private BasicDataSource basicDs;
public void configure(Properties props) throws HibernateException {
// DBCP properties used to create the BasicDataSource
Properties dbcpProperties = new Properties();
// set some DBCP properties or implement logic to get them from the Hibernate config
try {
// Let the factory create the pool
basicDs = (BasicDataSource)BasicDataSourceFactory.createDataSource(dbcpProperties);
ds = new LazyConnectionDataSourceProxy(basicDs);
// The BasicDataSource has lazy initialization
// borrowing a connection will start the DataSource
// and make sure it is configured correctly.
Connection conn = ds.getConnection();
conn.close();
} catch (Exception e) {
String message = "Could not create a DBCP pool";
if (basicDs != null) {
try {
basicDs.close();
} catch (Exception e2) {
// ignore
}
ds = null;
basicDs = null;
}
throw new HibernateException(message, e);
}
}
public Connection getConnection() throws SQLException {
return ds.getConnection();
}
public void closeConnection(Connection conn) throws SQLException {
conn.close();
}
public void close() throws HibernateException {
try {
if (basicDs != null) {
basicDs.close();
ds = null;
basicDs = null;
}
} catch (Exception e) {
throw new HibernateException("Could not close DBCP pool", e);
}
}
public boolean supportsAggressiveRelease() {
return false;
}
}
To use the custom connection provider, update
hibernate.cfg.xml
with the following property:
<property name="connection.provider_class">LazyDBCPConnectionProvider</property>
If your application must perform a large number of insertions or updates with Hibernate, a potential antipattern can emerge from the fact that all transactional insertions or updates in a session are stored in the first-level cache until flushed. Therefore, waiting to flush until the transaction is committed can result in an OutOfMemoryException (OOME) during large operations of this type.
You can prevent OOMEs in this case by processing the insertions or updates in batches, flushing after each batch. The Hibernate core documentation gives the following example for inserts:
Session session = sessionFactory.openSession();
Transaction tx = session.beginTransaction();
for ( int i=0; i<100000; i++ ) {
Customer customer = new Customer(.....);
session.save(customer);
if ( i % 20 == 0 ) { //20, same as the JDBC batch size
//flush a batch of inserts and release memory:
session.flush();
session.clear();
}
}
tx.commit();
session.close();
Updates can be batched similarly. The JDBC batch size referred to in the comment above is set in the Hibernate configuration property
hibernate.jdbc.batch_size
. For more information, see "Batch processing" in the
Hibernate core documentation
.
The following factors could affect the performance of your second-level cache.
This Hibernate feature creates overhead regardless of how many queries are actually cached. For example, it records timestamps for entities even if not caching the related queries. Query cache is
on
if the following element is set in
hibernate.cfg.xml
:
<property name="hibernate.cache.use_query_cache">true</property>
If query cache is turned on, two specially-named cache regions appear in the Terracotta Developer Console cache-regions list. The two regions are the query cache and the timestamp cache.
Unless you are certain that the query cache benefits your application, it is recommended that you turn it off (set
hibernate.cache.use_query_cache
to "false").
If your installation of Hibernate uses JDBC directly, you use a connection pool to create and manage the JDBC connections to a database. Hibernate provides a default connection pool and supports a number of different connection pools. The low-performance default connection pool is inadequate for more then just initial development and testing. Use one of the supported connection pools, such as C3P0 or DBCP, and be sure to set the number of connections to an optimal amount for your application.
Terracotta Distributed Ehcache for Hibernate can cache a "hotset" of keys on clients to add locality-of-reference, a feature suitable for read-only cases. Note that the set of keys must be small enough for available memory.
See Cache Configuration File for more information on configuring a local key cache.
CacheMode is the Hibernate class that controls how a session interacts with second-level and query caches.
If your application explicitly warms the cache (reloads entities), CacheMode should be set to REFRESH to prevent unnecessary reads and null checks.
If your application can tolerate somewhat inconsistent views of data, and the data does not change frequently, consider changing the cache concurrency strategy from READ_WRITE to NONSTRICT_READ_WRITE to boost performance. See Cache Concurrency Strategies for more information on cache concurrency strategies.
You can optimize the Terracotta servers in your cluster to improve cluster performance with a second-level cache. Some server optimization requires editing the Terracotta configuration file. For more information on Terracotta configuration file, see:
Test the following recommendations to gauge their impact on performance.
By default, Terracotta servers clear a certain amount of heap memory based on the percentage of memory used. You can configure a Terracotta server to be less aggressive in clearing heap memory by raising the threshold that triggers this action. Allowing more data to remain in memory makes larger caches more efficient by reducing the server's swap-to-disk dependence. Be sure to test any changes to the threshold to confirm that the server doesn't suffer an OOME by failing to effectively manage memory at the new threshold level.
The default threshold is 70 (70 percent of heap memory used). Raise the threshold by setting a higher value for the Terracotta property
l2.cachemanager.threshold
in one of the following ways.
To set the threshold at 90, add the following option to $JAVA_OPTS before starting the Terracotta server:
-Dcom.tc.l2.cachemanager.threshold=90
Be sure to export JAVA_OPTS. If you adjust the threshold value after the server is running, you must restart the Terracotta server for the new value to take effect.
Add the following configuration to the top of the Terracotta configuration file (
tc-config.xml
by default) before starting the Terracotta server:
<tc-properties>
<property name="l2.cachemanager.threshold" value="90" />
</tc-properties>
You must start the Terracotta server with the configuration file you've updated:
start-tc-server.sh -f <path_to_configuration_file>
If your data is backed by a database, and no critical data exists only in memory, you can run the Terracotta server in non-persistent mode ( temporary-swap-only mode). By default, Terracotta servers are set to non-persistent mode. For more information on persistence, see the Terracotta Configuration Guide and Reference .
Terracotta allots a certain percentage of memory to Berkeley DB, the database application used to manage the disk store. The default is 25 percent. Under the following circumstances, this percentage can be reduced:
For example, if Berkeley DB has a fixed requirement of 300– 400MB of memory, and the heap size is set to 6GB, Berkeley DB can be allotted eight percent. You can set the percentage using the Terracotta property
l2.berkeleydb.je.maxMemoryPercent
in one of the following ways.
To set the percentage at 8, add the following option to
$JAVA_OPTS
(or
$JAVA_OPTIONS
) before starting the Terracotta server:
-Dcom.tc.l2.berkeleydb.je.maxMemoryPercent=8
Be sure to export
JAVA_OPTS
(or
JAVA_OPTIONS
). If you adjust the percentage value after the server is running, you must restart the Terracotta server for the new value to take effect.
Add the following configuration to the top of the Terracotta configuration file (
tc-config.xml
by default) before starting the Terracotta server:
<tc-properties>
<property name="l2.berkeleydb.je.maxMemoryPercent" value="8" />
</tc-properties>
You must start the Terracotta server with the configuration file you've updated:
start-tc-server.sh -f <path_to_configuration_file>
Use
start-tc-server.bat
in Microsoft Windows.
If you lower the value of
l2.berkeleydb.je.maxMemoryPercent
, be sure to test the new value's effectiveness by noting the amount of flushing to disk that occurs in the Terracotta server. If flushing rises to a level that impacts performance, increase the value of
l2.berkeleydb.je.maxMemoryPercent
incrementally until an optimal level is observed.
While both JDK 1.5 and 1.6 are supported, JDK 1.6 may deliver better performance.
Each time you connect to the Terracotta cluster with the Developer Console and go to the second-level cache node, Hibernate and cache statistics gathering is automatically started. Since this may have a negative impact on performance, consider disabling statistics gathering during performance tests and in production if you continue to use the Developer Console. To disable statistics gathering, navigate to the Overview panel in the Hibernate view, then click Disable Statistics .
There is a negative impact on performance if logging is set. Consider disabling statistics logging during performance tests and in production.
To disable statistics gathering in the Terracotta Developer Console, navigate to the Configuration panel in the Hibernate view, then select the target regions in the list and clear Logging enabled if it is set.
To disable debug logging for Terracotta Distributed Ehcache, set the logging level for the clustered store to be less granular than FINE.
Garbage Collection (GC) should be aggressive. Consider using the
-Server
Java option on all application servers to force a "server" GC strategy.
A well-tuned database reduces latency and improves performance:
Databases should be indexed to load data quickly, based on the types of queries your application performs (type of key used, for example).
The following are issues that could affect the functioning of Terracotta Distributed Ehcache for Hibernate.
When direct field access is used, Hibernate uses reflection to access fields, triggering unwanted synchronization that can degrade performance across a cluster. See this JIRA issue for more information.
Under certain circumstances, using a
cascade="all-delete-orphan"
can throw a Hibernate exception. See
this Hibernate troubleshooting issue
for more information.
Certain data that should be in the second-level cache may not have been configured for caching (or may have not been configured correctly). This oversight may not cause an error, but may impact performance. See Finding Cacheable Entities and Collections for more information.
Top of 2.2 Testing and Tuning Terracotta Distributed Ehcache for Hibernate