c-treeACE V10.0 Update Guide

Partitioned Files Available via c-treeACE SQL

Partitioned files are available directly through c-treeACE SQL. As partitioned files rely on a key value to partition the data, the option is part of the CREATE INDEX statement. By including the STORAGE_ATTRIBUTES clause with the PARTITION option, the file will be rebuilt to enable partitioned support. All other operations on the file will continue as usual.

Note: Partitioned file support requires a custom build of c-treeACE SQL with the partition rule compiled from the ctpart.c module.

c-treeACE SQL CREATE INDEX Syntax

CREATE [ UNIQUE ] INDEX index_name

ON table_name

( {column_name [ASC | DESC]} [, ...])

[ STORAGE_ATTRIBUTES 'attributes' ];

New Argument:

STORAGE_ATTRIBUTES 'attributes'

A quoted string that specifies index attributes that are specific to c-treeACE SQL indexes. c-treeACE SQL interprets the following attribute strings:

'PARTITION' - Creates the index as the partition index to the table enabling multiple table partitions. This option can be specified after a table has been created, and the table will be rebuilt according to the partition rule in effect.

Optimized c-treeACE SQL Partitioned File Queries

A detailed analysis of how partitioned files were opened and queried by various SQL constructs was taken. Many enhancements were identified that could greatly improve performance when multiple physical files are taken into consideration:

Estimation of key values - c-treeACE SQL requires an estimation of key values as part of the query optimization phase. It was discovered that this phase of query execution frequently consumed the largest amount of time when working with large numbers of partitioned data files. It was found that the calling of key estimation routines opened large numbers of files to obtain the key estimate. To better optimize this phase, a sampling technique is now performed on a much smaller subset of partitions to reduce time spent in this critical phase. The partitions sampled are the first and last partitions that ordinarily would have been used, and one or more in the “middle” of the remaining active (or covering) partitions.
c-treeACE SQL defaults to three samplings. The following configuration keywords change this behavior:
- PARTITION_ESTIMATE_LIMIT <limit> increases this limit to a desired value. A negative value resorts to the previous behavior of reading from each active partition (or covering partition).
- PARTITION_ESTIMATE_LIMIT <limit>% increases this limit as a percentage of eligible partitions.
Active number of key values - An enhanced ability to return the active number of key values without having to examine each active partition member. All necessary information is now stored in the host Global Unique Index (GUIx).
Query logic modifications - Query logic was modified to check for empty covered ranges, and also to check for unexpected missing partitions in middle partitions that are sampled.
Range search - When a range search is performed on a unique index and the range criteria specify an equality match on all segments of the key, a direct equal key function is now called rather than a key range function. For a partitioned file global unique index that does not cover the partition key, this greatly improves performance when many active partitions exist as the equal key call can use the global unique partition host index to find the partition that contains the key value directly, avoiding costly searches through multiple partitions.
Improved hashing - An improved hashing mechanism for determining if a given file is already open. For large numbers of open files (such as when partitioned files are in use) this substantially reduces initial open times by reducing search times.

ALTER TABLE Add and Drop Columns Supported for Partitioned Files

The ability to add and drop columns for Partitioned Files via an ALTER TABLE (either via SQL or c-treeDB) has been added. Previously an invalid argument error was returned (CTDBRET_INVARG) when attempting this operation. For very large data sets this could take time, as currently, every record is visited to update based on the new schema. In addition, if indexes require a rebuild, this will require additional time.