Full-Text Search
Full-Text Search ICU Tokenizer
FairCom Full-Text Search capabilities are available on Unicode text by using a ICU based tokenizer. An application can configure the ICU tokenizer as follows:
Call ctdbSetFTIOption( pFTI, CTDB_FTI_OPTION_TOKENIZER, NULL, CTDB_FTI_TOKENIZER_ICU ) Call ctdbSetFTIOption( pFTI, CTDB_FTI_OPTION_ICULANG, XXXX , 0 ) where XXXX is a string specifying the locale Call ctdbSetFTIOption( pFTI, CTDB_FTI_OPTION_ICUOPTION, NULL, YYYY ) where YYYY is a ctKSEG_COMPU* combination. Calls to ctdbAddFTIField or ctdbAddFTIFieldByName can specify the mode parameter using the following values:
CTDB_FTI_MODE_REG : source string encoding depends on the DODA, CT_*STRING are considered in UTF-8 format, Ct_*UNICODE are considered in UTF-16 format.CTDB_FTI_MODE_UTF-8 : source string in UTF-8 encodingCTDB_FTI_MODE_UTF-16 : source string in UTF-16 encodingThe CTDB_FTI_OPTION_ICUSTRENGTH option has been renamed to CTDB_FTI_OPTION_ICUOPTION .