Using c-treeDB with UTF-8 encoded data works as expected with standard string fields. However, it was not possible to identify those string fields containing UTF-8 data to SQL on import. Proper international character support dictates that each string field should have charset and collation attributes associated with it.
c-tree has different field types to distinguish between UTF-16 strings (CT_*UNICODE) and "generic strings", however, it does not know about the character set and encoding used for generic strings. c-tree at its core is not concerned with this issue as the application itself normally directly handles these strings as bytes. However, high-level languages such as JAVA and SQL require defined string encoding for correct handling.
Functions have been added to c-treeDB for setting and identifying field level string character encoding.
ctdbSetFieldStringEncoding
Set the field string encoding.
CTDBRET ctdbDECL ctdbSetFieldStringEncoding( CTHANDLE Handle, pTEXT encoding )
Parameters:
Returns:
Returns the encoding set on the field on success or a c-treeDB error on failure.
Calling ctdbSetFieldStringEncoding() on a field type that is not a string field fails with error CTDBRET_INVTYPE.
ctdbGetFieldStringEncoding
Get the encoding set on the field as a string.
Declaration
pTEXT ctdbDECL ctdbGetFieldStringEncoding( CTHANDLE Handle )
Description
ctdbGetFieldStringEncoding() returns the encoding set on the field as a string or Null if no encoding was set or an error occurred. Check ctdbGetError().
Calling ctdbGetFieldStringEncoding() on a field without an encoding set results in returning NULL.
Returns
c-treeDB error code
Availability
These methods are available in the following interfaces:
public void SetFieldStringEncoding(String encoding) throws CTException
public String GetFieldStringEncoding() throws CTException