FairCom ISAM for C

Storing UTF-16 Data

Storing Unicode data requires DODA entries for each field. The individual wide-characters used in UTF-16 are not platform independent with respect to byte ordering. They are treated the same as short integers: on LOW_HIGH platforms, the lower order byte comes before the higher order byte. Recall in a client/server environment with the DODA entries in place, the Server and clients manage byte-order translation.

FairCom DB has four Unicode (UTF-16) field types:

CT_FUNICODE - A fixed length field containing a UTF-16 encoded, null terminated string.
CT_UNICODE - A variable-length field containing a UTF-16 encoded, null terminated string.
CT_F2UNICODE -A fixed length field that begins with a 2-byte integer specifying the number of bytes in the following UTF-16 encoded string.
CT_2UNICODE - A variable-length field that begins with a 2-byte integer specifying the number of bytes in the following UTF-16 encoded string.

Note: The length fields at the beginning of CT_F2UNICODE and CT_2UNICODE are specified in bytes. Specifying a field length in bytes is consistent with all other FairCom DB field types, but it is inconsistent with system level routines that ordinarily use number of characters, not number of bytes, to describe the length of wide-character strings.

Storing a UTF-16 string longer than 64KB requires a CT_UNICODE field. To store a string greater than 64KB with a length prefix, convert the string to UTF-8 and store it in a CT_4STRING field, as discussed below.