In addition to the default tokenizers provided with FairCom DB FTS, support is available to call a DLL containing a custom tokenizer. FairCom Full-Text Search allows programmers to create their own full-text tokenizer and set it as the tokenizer to be used in a full-text index.
To use a custom tokenizer, the programmer must do the following:
The c-tree source code in the sdk\Xtras\ctree.samples\special\tokenizer directory contains an example of a custom tokenizer, easytok.c, and a stub for the tokenizer to be implemented by programmers in tokenizer.c. Both files have no dependency on any c-tree code and can be simply compiled as a DLL (for instance cl /LD easytok.c on Windows) or shared library on Unix and copied to a place where the server can load them.
See the complete FairCom Full-Text Search documentation for a list the functions that must be implemented.
Initialize the tokenizer. This function is called:
Parameters:
Returns:
Tokenize context handle that will be passed to the other functions.
NULL in case of error.
Usage:
DLLexport void* Tokenizer_init (unsigned long texttype, char* text, size_t textsize, long maxtokensize, char* param, int* errcode)
Resets the text and its size for an already initialized tokenizer.
This function is used mainly during searches to tokenize the various search items of a search query.
Parameters:
Returns:
CTDBRET_OK if successful, or the c-tree error code on failure.
Usage:
DLLexport int Tokenizer_reset (void *handle, char* text, size_t textsize)
Determines and returns the next token in the text.
Parameters:
Returns:
'\0' terminated string containing the next token, which needs to point to memory that needs to stay valid until the next tokenizer function call.
NULL in case of error (size != 0) or end-of-text (size == 0)
Usage:
DLLexport char *Tokenizer_next (void* handle, int *size)
Terminates the use of the tokenizer. It is the implementer's responsibility to release any resource it might have allocated.
Parameters:
Usage:
DLLexport void Tokenizer_end (void* handle)