Storing Unicode data requires DODA entries for each field. The individual wide-characters used in UTF-16 are not platform independent with respect to byte ordering. They are treated the same as short integers: on LOW_HIGH platforms, the lower order byte comes before the higher order byte. With the DODA entries in place, the Server and clients manage byte-order translation automatically.
c-treeDB has four Unicode UTF-16 field types:
UTF-16 Field Type |
Description |
---|---|
CT_FUNICODE |
A fixed-length field containing a UTF-16 encoded, null terminated string. This Unicode field type is similar to CT_FSTRING field type. |
CT_F2UNICODE |
A fixed-length field that begins with a 2-byte (16 bit) integer specifying the number of bytes in the following UTF-16 encoded string. This Unicode field type is similar to CT_F2STRING field type, |
CT_UNICODE |
A variable-length field containing a UTF-16 encoded, null terminated string. This Unicode field type is similar to CT_STRING field type. |
CT_2UNICODE |
A variable-length field that begins with a 2 byte (16 bit) integer specifying the number of bytes in the UTF-16 encoded string. This Unicode field type is similar to CT_2STRING field type. |
The length fields at the beginning of CT_F2UNICODE and CT_2UNICODE field types, and the length in the DODA entry for CT_FSTRING and CT_F2STRING field types, are specified in bytes. Specifying a field length in bytes is consistent with all other c-treeDB field types, but is inconsistent with the system level routines that ordinarily use a number of characters, not a number of bytes, to describe the length of UTF-16 strings.
Storing a UTF-16 string longer than 64Kbytes requires a CT_UNICODE field. To store a UTF-16 string greater than 64Kbytes with a length prefix, convert the string to UTF-8 and store it in a CT_4STRING field, as discussed below. If this UTF-8 converted field is to be part of a key segment, then "Extended Key Segment" information must also be added to this key segment.