In addition to the default tokenizers provided with FairCom DB FTS, support is available to call a DLL containing a custom tokenizer. FairCom Full-Text Search allows programmers to create their own full-text tokenizer and set it as the tokenizer to be used in a full-text index.
To use a custom tokenizer, the programmer must do the following:
The c-tree source code in the sdk\Xtras\ctree.samples\special\tokenizer directory contains an example of a custom tokenizer, easytok.c, and a stub for the tokenizer to be implemented by programmers in tokenizer.c. Both files have no dependency on any c-tree code and can be simply compiled as a DLL (for instance cl /LD easytok.c on Windows) or shared library on Unix and copied to a place where the server can load them.
See the complete FairCom Full-Text Search documentation for a list the functions that must be implemented.
Initialize the tokenizer. This function is called:
Tokenize context handle that will be passed to the other functions.
NULL in case of error.
DLLexport void* Tokenizer_init (unsigned long texttype, char* text, size_t textsize, long maxtokensize, char* param, int* errcode)
Resets the text and its size for an already initialized tokenizer.
This function is used mainly during searches to tokenize the various search items of a search query.
CTDBRET_OK if successful, or the c-tree error code on failure.
DLLexport int Tokenizer_reset (void *handle, char* text, size_t textsize)
Determines and returns the next token in the text.
'\0' terminated string containing the next token, which needs to point to memory that needs to stay valid until the next tokenizer function call.
NULL in case of error (size != 0) or end-of-text (size == 0)
DLLexport char *Tokenizer_next (void* handle, int *size)
Terminates the use of the tokenizer. It is the implementer's responsibility to release any resource it might have allocated.
DLLexport void Tokenizer_end (void* handle)