Product Documentation

Full-Text Search

Previous Topic

Next Topic

Current Support

The initial release of FairCom Full-Text Search supports the most relevant features typically expected by users of full-text search. Some minor features will be deferred until later phases of product development. This section lists considerations you need to keep in mind when using the current version of FTS.

Multiple FTI Supported per Data File

FTS permits multiple Full-Text Indexes per data file.

Word Dictionary

The Word Dictionary contains a list of tokens (as generated by the tokenizer) for known words appearing at least once in the indexed documents. Each index uses its own Word Dictionary. Because each tokenizer may encode words in its own way, each Word Dictionary uses a single tokenizer (you cannot change tokenizers without rebuilding the Word Dictionary).

Limitations

Several limitations will be addressed in future FTS releases:

Limit of 1 Field per FTI

FTS is currently limited to a single field per FTI. This limitation will be released in the future.

OR Operator - The OR operator is not supported in this release. All search terms are considered to be joined by AND operators. This functionality could be created in the application by breaking the search into multiple searches and presenting the union of their results.

Wildcard Search - Wildcard searches are not supported in this release of FTS.

A limited form of wildcard searches, called "term-prefix searches," allows you to search for words that begin with the specified characters. See Current & Planned Features.

Parentheses - The use of parentheses to indicate the precedence of parts of a complex search is not presently supported. However, in this release of FTS, all search terms are ANDed together, so precedence should not matter.

Previous Topic

Next Topic

Full-Text Index stop word list

The Full-Text Search feature supports adding a list of stop words to a Full-Text Index when the word dictionary is created.

To set a stop word list when creating a full text index dictionary, call the ctdbSetFTIOption() function with option set to CTDB_FTI_OPTION_STOP_LIST and pvalue set to a UTF-8 string containing the stop words separated by spaces. Example:

rc = ctdbSetFTIOption(pFTI, CTDB_FTI_OPTION_STOP_LIST, "a an the", 0);

TOCIndex