Product Documentation

c-treeDB API API for C

Previous Topic

Next Topic

Extended Key Segment Structure

Extended key segments are specified by filling the fields of the ctKSEGDEF structure:

#define ctKSEGDLEN 32 /* length of desc string */

typedef struct keysegdef {

LONG kseg_stat; /* status (internal use) */

LONG kseg_vrsn; /* version info */

LONG kseg_ssiz; /* source size */

LONG kseg_type; /* segment type */

LONG kseg_styp; /* source type */

LONG kseg_comp; /* comparison options */

LONG kseg_rsv1; /* future use */

LONG kseg_rsv2; /* future use */

TEXT kseg_desc[ctKSEGDLEN]; /* text specification eg, locale string */

} ctKSEGDEF, ctMEM* pctKSEGDEF;

The FairCom DB module ctport.h contains defines for all of the constants, beginning with ctKSEG, used to create an extended key segment definition. As extended key segments are currently implemented, the kseg_stat and the kseg_vrsn members are filled-in as needed by the extended key segment implementation itself. The kseg_ssiz member specifies the number of bytes of source data to use to derive the actual key segment. In addition to using a specific numeric value for the source size, kseg_ssiz may also be assigned either of two values discussed in the following two sections.

In This Section

ctKSEG_SSIZ_COMPUTED

ctKSEG_SSIZ PROVIDED

Previous Topic

Next Topic

ctKSEG_SSIZ_COMPUTED

The information about the underlying data field will be used to compute how much source data is available. For fields without length specifiers (such as CT_STRING or CT_UNICODE) an appropriate version of strlen() will be used to determine data availability. However, this could be very inefficient if the field may hold very long strings since it is likely that only a small portion of the variable length field will actually contribute to the key segment. An alternative is to specify a fixed source size. If the variable data has less than this size, it will still be handled correctly.

Previous Topic

Next Topic

ctKSEG_SSIZ PROVIDED

The call to create the key segment will provide the particular length of source data available.

For an ICU Unicode definition, the remaining structure members are specified as follows:

kseg_type

Must be set to ctKSEG_TYPE_UNICODE.

kseg_styp

Specify the type of source data as ctKSEG_STYP_PROVIDED.

ctKSEG_STYP_PROVIDED means that the type of source data will be determined at run-time during key value construction. (Key value construction consists of one or both of assembling the key value from its component segments and performing transformations to generate a binary sort key). In this case, if the data type is one of the conventional c-tree string types (e.g., CT_STRING), the source data type is UTF-8; if a Unicode string type is found (e.g., CT_UNICODE), then the source data type is UTF-16. However, if the underlying data type does not fall into either of these categories, the data is treated as UTF-16, and used as is.

kseg_desc

Contains the ICU locale formed as an ordinary, null-terminated ASCII string. The format specified by ICU is "xx", "xx_YY", or "xx_YY_Variant" where "xx" is the language as specified by ISO-639 (e.g., "fr" for French); "YY" is a country as specified by ISO-3166 (e.g., "fr_CA" for French language in Canada); and the "Variant" portion represents system-dependent options. Note: When ICU uses a locale to access collation rules, it attempts to get rules for the closest match to the locale specified in kseg_desc. By default, there is no restriction on how close the match of locales must be to be acceptable. You can restrict the use of alternative locales by including either ctKSEG_COMPU_FALLBACK_NOTOK or ctKSEG_COMPU_SYSDEFAULT_NOTOK as part of the bit map comprising kseg_comp discussed below. After a successful call to PutXtdKeySegmentDef(), the GetXtdKeySegmentDef() function can be used to determine the actual ICU locale used during collation.

kseg_comp

This member of the structure permits the full range of ICU collation options to be specified through a bit map.

Example of extended key segment structure:

ctKSEGDEF ksgdef;

ksgdef.kseg_ssiz = 12; /* 12 bytes for the source */

ksgdef.kseg_type = ctKSEG_TYPE_UNICODE; /* ICU Unicode */

ksgdef.kseg_styp = ctKSEG_STYP_UTF16; /* UTF16 source data */

ksgdef.kseg_comp = ctKSEG_COMPU_A_LOWER; /* lower case sorts first */

strcpy(ksgdef.kseg_desc,"fr_CA"); /* French in Canada */

TOCIndex