Product Documentation

Full-Text Search

Previous Topic

Next Topic

Full-Text Search ICU Tokenizer

FairCom Full-Text Search capabilities are available on Unicode text by using a ICU based tokenizer. An application can configure the ICU tokenizer as follows:

  1. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_TOKENIZER, NULL, CTDB_FTI_TOKENIZER_ICU)
  2. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICULANG, XXXX , 0) where XXXX is a string specifying the locale
  3. Call ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_ICUOPTION, NULL, YYYY) where YYYY is a ctKSEG_COMPU* combination.

Calls to ctdbAddFTIField or ctdbAddFTIFieldByName can specify the mode parameter using the following values:

  • CTDB_FTI_MODE_REG: source string encoding depends on the DODA, CT_*STRING are considered in UTF-8 format, Ct_*UNICODE are considered in UTF-16 format.
  • CTDB_FTI_MODE_UTF-8: source string in UTF-8 encoding
  • CTDB_FTI_MODE_UTF-16: source string in UTF-16 encoding

The CTDB_FTI_OPTION_ICUSTRENGTH option has been renamed to CTDB_FTI_OPTION_ICUOPTION.

TOCIndex