Product Documentation

Full-Text Search

Previous Topic

Next Topic

SQL full-text search examples using SIMPLE tokenizer

The SIMPLE tokenizer identifies each word using whitespace and/or punctuation. Then regardless of case, the tokenizer finds all matching words in the search. tokenizer is case insensitive.

A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.

/* =====================================================================

* Full-text search demo using the SIMPLE tokenizer.

* =====================================================================

*/

/* Create the testsearch table. */

CREATE TABLE testsearch

(

name VARCHAR(30)

, description VARCHAR(500)

)

STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'

;

/* Create full-text index on the description field of the testsearch table. Use the PORTER English word stem tokenizer to do case insensitive matching of common word forms */

CREATE FULLTEXT INDEX testsearch_description_simple_fts

ON testsearch( description )

UPDATE_MODE ASYNC

;

/* Insert a record into the testsearch table. */

INSERT INTO testsearch

VALUES('Mike','Americans --love-- their __BASEball__ and <FOOTball>!')

;

/* Look at the record inserted into the testsearch table. */

SELECT description FROM testsearch;

/* =====================================================================

* The following queries find the inserted record

* by searching for words and phrases in the description field:

* 'Americans --love-- their __BASEball__ and <FOOTball>!'.

*

* The SIMPLE tokenizer reduces the text to

* 'americans love their baseball and football'.

*

* The tokenizer identifies each word using whitespace and/or punctuation.

* The tokenizer is case insensitive.

* A search string identifies each word using whitespace and/or punctuation.

* A search string may use double quotes to identify a word or a phrase.

* A phrase contains two or more words within double quotes.

* A search string may include any amount of whitespace between words.

* A search string may include

* Punctuation, such as .!>, is ignored in the search string

* when it is not a full-text search operator.

* Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT

* The following search operators are unsupported: * ( ) OR NOT

* The server returns an error when you use an unsupported search operator.

* =====================================================================

*/

/* WORD SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball';

/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the description column contains case-insensitive forms of the word 'americans', such as 'Americans' and 'AMERIcans' anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'americans';

/* FIRST WORD SEARCH: Find records in the testsearch table where the description column contains the word 'americans' as the first word in the description field. */

SELECT description FROM testsearch WHERE description MATCH '^americans';

/* FIRST PHRASE SEARCH: Find records in the testsearch table where the description column contains the phrase "americans love" at the begining of the text in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' ^"americans love" ';

/* MULTI-WORD SEARCH: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'baseball love';

/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the words 'baseball' and 'love' in any order anywhere in the description field. NOTE: Any amount of whitespace may occur between these words in the search text. Punctuation, such as .!>, is also ignored in the search string when it is not a full-text search operator. */

SELECT description FROM testsearch WHERE description MATCH ' <Baseball> >>> .LOVE. !!! ';

/* PHRASE SEARCH: Find records in the testsearch table where the description column contains the case-insensitive phrase "love their" anywhere in the description field. The words 'love" and 'their" must follow each other with no intervening words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "love their" ';

/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where the description column contains the case-insensitive phrase "LOVE their" anywhere in the description field. The case-insensitive words 'LOVE" and 'theIR" must follow each other with no intervening words, but any amount of whitespace and unreserved punctuation may occur between these words in the description field. */

SELECT description FROM testsearch WHERE description MATCH ' "LOVE! theIR" ';

/* NEAR SEARCH: Find records in the testsearch table where the description column contains the words 'Americans' and 'baseball' with no more than ten words between each other anywhere in the description field. */

SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR football ';

/* NEAR PHRASE SEARCH: Find records in the testsearch table where the description column contains the words 'their' and 'baseballs' with no words between each other. This is the same as using a phrase search. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseball';

/* NEAR N SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' with no more than 1 word between each other. This pattern may occur anywhere in the description field with any amount of whitespace and punctuation between the words. */

SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 football ';

/* NOT WORD SEARCH: Find records in the testsearch table where the description column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The minus - operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include at least one term in the search phrase before the - operator. */

SELECT description FROM testsearch WHERE description MATCH 'love baseball -karate';

/* NOT PHRASE SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' but not the phrase "hate karate" and not the word 'soccer'. NOTE: The minus operator is the NOT operator. It must be adjacent to the word with no intervening whitespace. You must also include a term in the search phrase before the NOT operator. */

SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate karate" -soccer';

/* AND SEARCH: Find records in the testsearch table where the description column contains the word 'baseball' and 'loved'. NOTE: The AND operator is optional. The server automatically adds it between each term in the search expression. The following two queries are functionally identical. */

SELECT description FROM testsearch WHERE description MATCH ' love AND "their baseball" ';

SELECT description FROM testsearch WHERE description MATCH ' love "their baseball" ';

/* SEARCH for reserved words "and", "or", and "near": Find records in the testsearch table where the description column contains the words "baseball" and "and" but not "or" and "near". NOTE: enclose a reserved word in double quotes to treat it as a word instead of an operator. */

SELECT description FROM testsearch WHERE description MATCH ' love "and" -"or" -"near" ';

/* =====================================================================

* The following queries do NOT find the inserted record

* because they do not match the text:

* 'Americans --love-- their __BASEball__ and <FOOTball>!'

* =====================================================================

*/

/* VERB STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English conjugations of verbs, such as 'loving', 'love', 'loved', and 'loves'. The PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'loving';

/* NOUN STEM SEARCH: This query returns no results because the SIMPLE tokenizer does not match common English singular and plural forms of nouns, such as 'american' and 'americans'. The PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'american';

/* FIRST WORD SEARCH: This query returns no results because 'love' is not the first word in the description. */

SELECT description FROM testsearch WHERE description MATCH '^love';

/* STEMMED MULTI-WORD SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'Baseballs american';

/* STEMMED NEAR SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'American NEAR baseballs ';

/* STEMMED PHRASE SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 baseballs ';

/* STEMMED NEAR N SEARCH: This query returns no results because the SIMPLE tokenizer does not match word stems like the PORTER tokenizer does. */

SELECT description FROM testsearch WHERE description MATCH 'loves NEAR/1 baseballs ';

/* Remove the testsearch table. */

DROP TABLE testsearch;

TOCIndex