FairCom DB API for C

FairCom DB Unicode UTF-16 Field Types

Storing Unicode data requires DODA entries for each field. The individual wide-characters used in UTF-16 are not platform independent with respect to byte ordering. They are treated the same as short integers: on LOW_HIGH platforms, the lower order byte comes before the higher order byte. With the DODA entries in place, the Server and clients manage byte-order translation automatically.

FairCom DB API has four Unicode UTF-16 field types:

UTF-16 Field Type	Description
CT_FUNICODE	A fixed-length field containing a UTF-16 encoded, null terminated string. This Unicode field type is similar to CT_FSTRING field type.
CT_F2UNICODE	A fixed-length field that begins with a 2-byte (16 bit) integer specifying the number of bytes in the following UTF-16 encoded string. This Unicode field type is similar to CT_F2STRING field type,
CT_UNICODE	A variable-length field containing a UTF-16 encoded, null terminated string. This Unicode field type is similar to CT_STRING field type.
CT_2UNICODE	A variable-length field that begins with a 2 byte (16 bit) integer specifying the number of bytes in the UTF-16 encoded string. This Unicode field type is similar to CT_2STRING field type.

The length fields at the beginning of CT_F2UNICODE and CT_2UNICODE field types, and the length in the DODA entry for CT_FSTRING and CT_F2STRING field types, are specified in bytes. Specifying a field length in bytes is consistent with all other FairCom DB API field types, but is inconsistent with the system level routines that ordinarily use a number of characters, not a number of bytes, to describe the length of UTF-16 strings.

Storing a UTF-16 string longer than 64Kbytes requires a CT_UNICODE field. To store a UTF-16 string greater than 64Kbytes with a length prefix, convert the string to UTF-8 and store it in a CT_4STRING field, as discussed below. If this UTF-8 converted field is to be part of a key segment, then "Extended Key Segment" information must also be added to this key segment.