Product Documentation

FairCom ISAM for C

Previous Topic

Next Topic

How to Specify an ICU Unicode Key Segment

An ordinary FairCom DB key segment is defined by a triplet: offset, length, and mode. In IFIL parlance, this is an ISEG structure. An extended key segment also uses this “standard” specification, but with two adjustments:

  • The length specified in the triplet is the number of bytes that this segment will occupy in the key value stored in the index rather than the number of bytes of source data that will be used to generate the key segment. The source length will be part of the extended key segment definition discussed below.
  • The key segment mode will contain a modifier that indicates the particular type of extended key segment. The only extended key segment at this time is UNCSEG, an ICU Unicode segment. An example of an ICU Unicode ISEG is:

ISEG isegunc = {8,24,REGSEG | UNCSEG};

This ISEG specifies:

  • The source data begins at an offset of 8 bytes from the start of the record.
  • The key segment will be 24 bytes in length.
  • The segment type will be an ICU Unicode segment.

However, this ISEG definition does not specify the underlying data type (UTF-8 or UFT16 for a Unicode segment), nor does it specify how many bytes of source data to use to construct the segment. The extended key segment definition specifies this additional information. Note that REGSEG implies no standard transformation, but the UNCSEG modifier specifies the particular type of extended segment.

In addition to an explicit definition of segment types such as REGSEG or INTSEG, FairCom DB supports VARSEG and SCHSEG. VARSEG and SCHSEG use the same triplet, but the contents are interpreted somewhat differently.

  • A VARSEG is a segment based on a field that falls in the variable-length region of the file and therefore cannot be located by a simple offset value. The offset is interpreted as the number of fields in the variable-length region to skip over. A zero implies the first variable-length field is used.
    To use an extended segment definition with VARSEG, simply modify the key segment mode as before:

VARSEG | UNCSEG

  • A SCHSEG is a segment whose type is based on the data record field definitions stored in the DODA. When the mode is SCHSEG, the offset value is interpreted as a zero based index into the DODA. A value of zero implies using the first field definition to determine the type of key segment. You must use SCHSEG | UNCSEG segment mode if the offset value maps to an underlying data field that is one of the UTF-16 Unicode types

    (CT_UNICODE, CT_2UNICODE, CT_FUNICODE, CT_F2UNICODE).

    If the underlying data is stored in a regular string field (e.g., CT_STRING), and the data is UTF-8 encoded, UNCSEG must also be combined with SCHSEG as FairCom DB is unable to automatically identify the UTF-8 data encoding:

SCHSEG | UNCSEG

TOCIndex