FairCom Database Engine Partitioned Files

Advanced Concepts

In This Section

Maximum Partition Number vs File Size

Raw Partition Numbers

Partitioned File Naming

Optimized FairCom DB SQL Partitioned File Queries

ALTER TABLE Add and Drop Columns Supported for Partitioned Files

Improved Rebuilding of Partitioned Files

Updated Partition Admin Modes Reuse and Base

Maximum Partition Number vs File Size

By default, 16 bits of the 64-bit record offset are used to reference the raw partition number, allowing each partitioned file to support up to 65535 member files over its lifetime. This can be adjusted at create time using the callparm parameter of the extended file creation block, where a value of 0 defaults to 16 bits, values less than 4 bits default to 4 bits (maximum 15 member files), and 32 bits is the maximum value (4,294,967,295 member files). The number of bits determines the total number of raw partitions for the entire life of the host file. This is not the number of partitions active at one time. Raw partitions are not reassigned.

Raw Partition Numbers

The raw partition numbers must be 1 or greater. When passing a file position that includes a partition number to a routine, the partition number is encoded in the high-order bits of the high-order word. Ordinarily, the application will only get such information from a call to CurrentFileOffset() followed by a call to ctGETHGH().

Partition numbers are stored in the higher-order bytes of the 64-bit record offset. This allows the ISAM API calls to remain unchanged. Simply change the parameters of your file creation call, and your application is ready to use partitioned files. For this reason, functions requiring a record offset must use the ctSETHGH() and ctGETHGH() functions, even if the partitioned files are not HUGE to ensure these high-order bytes are included.

Partition Ordering and Range Query

Partitions are assigned in increasing order of the partition key values. That is, if KeyValue2 > KeyValue1, then the partition assigned to KeyValue2 will be the same as or after the partition assigned to KeyValue1.

We allow any user-defined expression that evaluates to a numeric value to be used as a partition rule. However, our partition search logic requires that a partition rule assigns partitions in increasing order of the partition key values. That is, the partition function is required to be a monotonically increasing function: for any two partition key values A and B, if A > B then the partition rule must output values p(A) and p(B) such that p(A) > p(B).

We don’t currently check that a user-defined partition rule meets the monotonically increasing property. If a rule is supplied that doesn't have this property, partition queries will return incorrect results such as not finding key values that exist in the table.One example of a function that does not meet this requirement is partitionRule = (partitionKeyValue MOD 12). Note that the values of this function increase then decrease again rather than always increasing as the partition key value increases.

It is up to the developer to be aware of this requirement and to only use partition rules that meet this requirement.

Partition Number Base

Use the PartitionAdmin() function to increase or decrease the lowest permitted partition number, called the “base” partition number. The system enforces an absolute lowest value for the base of one (1), but PartitionAdmin() can be used to change the base as long as it is one or greater. However, when changing this base value, PartitionAdmin() ensures no inconsistencies will arise. For example, one cannot increase the base value if it would eliminate any active or archived partitions (however it can eliminate purged partitions).

Partitioned File Naming

Partition file names are automatically created as the base file name with the 3-digit raw partition number as the file extension. This can be customized with the FairCom DB Server SDK See the function partnam() in ctpart.c defines the naming algorithm and is required to be compiled into the server binary at this time.

Optimized FairCom DB SQL Partitioned File Queries

A detailed analysis of how partitioned files were opened and queried by various SQL constructs was taken. Many enhancements were identified that could greatly improve performance when multiple physical files are taken into consideration:

Estimation of key values - FairCom DB SQL requires an estimation of key values as part of the query optimization phase. It was discovered that this phase of query execution frequently consumed the largest amount of time when working with large numbers of partitioned data files. It was found that the calling of key estimation routines opened large numbers of files to obtain the key estimate. To better optimize this phase, a sampling technique is now performed on a much smaller subset of partitions to reduce time spent in this critical phase. The partitions sampled are the first and last partitions that ordinarily would have been used, and one or more in the “middle” of the remaining active (or covering) partitions.
FairCom DB SQL defaults to three samplings. The following configuration keywords change this behavior:
- PARTITION_ESTIMATE_LIMIT <limit> increases this limit to a desired value. A negative value resorts to the previous behavior of reading from each active partition (or covering partition).
- PARTITION_ESTIMATE_LIMIT <limit>% increases this limit as a percentage of eligible partitions.
Active number of key values - An enhanced ability to return the active number of key values without having to examine each active partition member. All necessary information is now stored in the host Global Unique Index (GUIx).
Query logic modifications - Query logic was modified to check for empty covered ranges, and also to check for unexpected missing partitions in middle partitions that are sampled.
Range search - When a range search is performed on a unique index and the range criteria specify an equality match on all segments of the key, a direct equal key function is now called rather than a key range function. For a partitioned file global unique index that does not cover the partition key, this greatly improves performance when many active partitions exist as the equal key call can use the global unique partition host index to find the partition that contains the key value directly, avoiding costly searches through multiple partitions.
Improved hashing - An improved hashing mechanism for determining if a given file is already open. For large numbers of open files (such as when partitioned files are in use) this substantially reduces initial open times by reducing search times.

ALTER TABLE Add and Drop Columns Supported for Partitioned Files

The ability to add and drop columns for Partitioned Files via an ALTER TABLE (either via SQL or c-treeDB) has been added. Previously an invalid argument error was returned (CTDBRET_INVARG) when attempting this operation. For very large data sets this could take time, as currently, every record is visited to update based on the new schema. In addition, if indexes require a rebuild, this will require additional time.

Improved Rebuilding of Partitioned Files

Enhanced partition file rebuild is supported via three modes:

Calling RBLIFILX8() for the partition host forces the host and all partitions to be rebuilt. This is the ctrbldif utility default.
Calling RBLIFILX8() for the partition host with tfilno set to badpartIFIL (which cannot be combined with updateIFIL), and, if the host is clean, rebuilds only specific partitions that are not clean.
Calling PTADMIN() for a specified partition using ptADMINrebuild mode.This is available in ctpartadmin utility.

For the first two rebuild modes of RBLIFILX8(), the XCREblk argument must be included as we check the x8mode member of XCREblk for partition attributes.

Updated Partition Admin Modes Reuse and Base

After purging a partition member, that partition number is no longer available for use, as the member is marked purged. The partition administration function PartAdmin(), had originally stubbed in a reuse mode, however, it was not implemented, and this mode is now available for use. The ptADMINreuse mode only supports reuse of a previously purged partition member.

As part of this change, partition instance numbers are introduced such that the host file can distinguish between different versions of the same partition. By “same” partition we mean partitions that contain the same range of partition key values. Reasons for having different versions of the same partition include purging a partition and then recreating the partition, or rebuilding a file partition (that could result in modified contents). The instance numbers are used in the host’s global unique index (GUIx), if any.

A GUIx contains key values from all partitions as a means of ensuring global uniqueness for a key value across all the partitions. The key value is stored in the GUIx, however, instead of storing a record location to go with the key, we stored the partition number that holds the record. This way, if we purge a partition, we do not need to find all the entries in the GUIx that correspond to the partition because if we find a duplicate conflict when trying to add a new key value, we can check if the existing key value is for a purged partition, and, if so, we can replace it by the new key value for an active partition.

The new implementation now stores not only the partition number in the GUIx, but also the instance number (that defaults to zero). Now it is not only possible to distinguish between purged and active partitions in the GUIx, we can also distinguish between a purged and recreated partition since we force them to have different instance numbers.

For non-huge files, the instance numbers are in the range of 0 to 255. Huge files use four bytes for the instance number.

The ptADMINbase mode behavior has also been improved such that it is possible to change the base raw partition number (that is, the lowest permitted partition number) to any desired value as long as it does not exceed any active partition. Instance numbers permit purged partitions to fall outside the new base number because we can distinguish between different versions of the partitions.

When a partition is opened, its instance number is checked against the host list of instance numbers by partition. If they don’t match the open fails with error PNST_ERR (927).

Note: The addition of instance numbers has caused the partition resource stored in the host data file to be revised, and assigned a version number 2. Prior code will not be able to open a partition file with a version 2 resource, and will fail with error PVRN_ERR (725).