This site hosts historical documentation. Visit www.terracotta.org for recent product information.
This implementation uses indexes that are maintained on each Terracotta server. With distributed BigMemory Max, the data is sharded across the number of active nodes in the cluster, and the index for each shard is maintained on the server for that shard. Searches are performed using the Scatter-Gather pattern. The query executes on each node and the results are then aggregated in the BigMemory Max that initiated the search.
Search operations perform in O(log n / number of shards) time. Performance is excellent. To improve the performance still further, consider adding more servers to the TSA. Search results are returned over the network, and the data returned might be very large, so techniques to limit return size are recommended. For more information, see Best Practices.
BigMemory uses a Search index that is maintained at the local node. The index is stored under a directory in the DiskStore and is available whether or not persistence is enabled. Any overflow from the on-heap tier of the cache is searched using indexes.
Search operations perform in O(log(n)) time. For tips that can aid performance, see Best Practices.
For caches that are on-heap only, Attributes are extracted during query execution rather than ahead of time, and indexes are not used. Instead, the cache takes advantage of the fast access to do the equivalent of a table scan for each query. Each element in the cache is only visited once.
On-heap search operations perform in O(n) time. To see performance results, see Maven-based performance test, where an average of representative queries takes 4.6 ms for a 10,000 entry cache, and 427 ms for a 1,000,000 entry cache.
Construct searches by including only the data that is actually required.
includeKeys()
and/or includeAttribute()
if those values are required for your application logic.result.getValue()
is not called in the search results, do not use includeValues()
in the query. includeValues()
and then result.getValue()
, run the query for keys and include cache.get()
for each individual key. Note: includeKeys()
and includeValues()
have lazy deserialization, which means that keys and values are de-serialized only when result.getKey()
or result.getValue()
is called. However, calls to includeKeys()
and includeValues()
do take time, so consider carefully when constructing your queries.
Searchable keys and values are automatically indexed by default. If you are not including them in your query, turn off automatic indexing with the following:
<cache name="cacheName" ...>
<searchable keys="false" values="false"/>
...
</searchable>
</cache>
Limit the size of the result set. Depending on your use case, you might consider maxResults, an Aggregator, or pagination:
query.maxResults(int number_of_results)
Sometimes maxResults is useful where the result set is ordered such that the items you want most are included within the maxResults.count()
. For details, see the net.sf.ehcache.search.aggregator
package in the Ehcache Javadoc.If you want to avoid an OutOfMemoryError
while allowing your Terracotta client to receive an extremely large result set, consider using the Pagination feature. Pagination limits how many of the total results appear on the client at a time, so that you can view the results in page-sized batches. Instead of calling the parameterless version of the execute method query.execute()
, pass in an ExecutionHints
object that specifies the page size you want:
query.execute(new ExecutionHints().setResultBatchSize(pageSize))
If you call for results after issuing a query with ExecutionHints
, all results are returned (same behavior as a regular query), except that only the number of results specified as the ResultBatchSize
will appear on the client. For example, if your query would have 500 results and you use a ResultBatchSize
of 100, you will still get all 500 results, but you can scroll through them in pages of 100.
You can enable search result pagination for the execution phase of a query whether the query was constructed using the Search API or BigMemory SQL.
Limitations of search result pagination:
Query.addGroupBy()
) cannot be paginated regardless of server topology.Query.includeAggregator().maxResults()
- with the exception that count()
is the one aggregator that does work with all topologiesQuery.addOrderBy()
Make your search as specific as possible.
iLike
criteria and fuzzy (wildcard) searches might take longer than more specific queries. "321*"
instead of "*123"
). <searchAttribute>
with the string value reversed in it, so that your query can use the trailing wildcard instead.When possible, use the query criteria "Between" instead of "LessThan" and "GreaterThan", or "LessThanOrEqual" and "GreaterThanOrEqual". For example, instead of using le(startDate)
and ge(endDate)
, try not(between(startDate,endDate))
.
Index dates as integers. This can save time and can also be faster if you have to do a conversion later on.
Searches of eventually consistent BigMemory Max data sets are fast because queries are executed immediately, without waiting for the commit of pending transactions at the local node. Note: This means that if a thread adds an element into an eventually consistent cache and immediately runs a query to fetch the element, it will not be visible in the search results until the update is published to the server.
Unlike cache operations, which have selectable concurrency control or transactions, queries are asynchronous and Search results are eventually consistent with the caches.
Although indexes are updated synchronously, their state lags slightly behind that of the cache. The only exception is when the updating thread performs a search.
For caches with concurrency control, an index does not reflect the new state of the cache until:
commit
has been called.Unexpected results might occur if:
sum()
, disagree with the same calculation done by redoing the calculation yourself by re-accessing the cache for each key and repeating the calculation.Because the state of the cache can change between search executions, the following is recommended:
BigMemory SQL supports using the presence or absence of null as a search criterion:
select * from searchable where birthDate is null
select * from searchable where birthDate is not null
The Search API supports the same criteria:
myQuery.addCriteria(cache.getAttribute("middle_name").isNull());
The opposite case: require that a value for the attribute must be present:
myQuery.addCriteria(cache.getAttribute("middle_name").notNull());
which is equivalent to:
myQuery.addCriteria(cache.getAttribute("middle_name").isNull().not());
Alternatively, you can call constructors to set up equivalent logic:
Criteria isNull = new IsNull("middle_name");
Criteria notNull = new NotNull("middle_name");