SIMPLE tokenizer
Example SQL full-text searches using SIMPLE tokenizer
The SIMPLE tokenizer identifies each word using whitespace and/or punctuation. Then regardless of case, the tokenizer finds all matching words in the search. tokenizer is case insensitive.
A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.
/* =====================================================================
* Full-text search demo using the SIMPLE tokenizer.
* =====================================================================
*/
/* Create the testsearch table. */
CREATE TABLE testsearch
(
name VARCHAR(30)
, description VARCHAR(500)
) STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'
;
/* Create full-text index on the description field of the testsearch table. Use
the PORTER English word stem tokenizer to do case insensitive matching of
common word forms */
CREATE FULLTEXT INDEX testsearch_description_simple_fts
ON testsearch( description )
UPDATE_MODE ASYNC
;
/* Insert a record into the testsearch table. */
INSERT INTO testsearch
VALUES('Mike','Americans --love-- their __BASEball__ and <FOOTball>!')
;
/* Look at the record inserted into the testsearch table. */
SELECT description FROM testsearch;
/* =====================================================================
* The following queries find the inserted record
* by searching for words and phrases in the description field:
* 'Americans --love-- their __BASEball__ and <FOOTball>!'.
*
* The SIMPLE tokenizer reduces the text to
* 'americans love their baseball and football'.
*
* The tokenizer identifies each word using whitespace and/or punctuation.
* The tokenizer is case insensitive.
* A search string identifies each word using whitespace and/or punctuation.
* A search string may use double quotes to identify a word or a phrase.
* A phrase contains two or more words within double quotes.
* A search string may include any amount of whitespace between words.
* A search string may include
* Punctuation, such as .!>, is ignored in the search string
* when it is not a full-text search operator.
* Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT
* The following search operators are unsupported: * ( ) OR NOT
* The server returns an error when you use an unsupported search operator.
* =====================================================================
*/
/* WORD SEARCH: Find records in the testsearch table where the description
column contains the word 'baseball' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball';
/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the
description column contains case-insensitive forms of the word 'americans',
such as 'Americans' and 'AMERIcans' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'americans';
/* FIRST WORD SEARCH: Find records in the testsearch table where the
description column contains the word 'americans' as the first word in the
description field. */
SELECT description FROM testsearch WHERE description MATCH '^americans';
/* FIRST PHRASE SEARCH: Find records in the testsearch table where the
description column contains the phrase "americans love" at the begining of the
text in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' ^"americans love"
';
/* MULTI-WORD SEARCH: Find records in the testsearch table where the
description column contains the words 'baseball' and 'love' in any order
anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball love';
/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table
where the description column contains the words 'baseball' and 'love' in any
order anywhere in the description field. NOTE: Any amount of whitespace may
occur between these words in the search text. Punctuation, such as .!>, is also
ignored in the search string when it is not a full-text search operator. */
SELECT description FROM testsearch WHERE description MATCH ' <Baseball> >>>
.LOVE. !!! ';
/* PHRASE SEARCH: Find records in the testsearch table where the description
column contains the case-insensitive phrase "love their" anywhere in the
description field. The words 'love" and 'their" must follow each other with no
intervening words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "love their" ';
/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where
the description column contains the case-insensitive phrase "LOVE their"
anywhere in the description field. The case-insensitive words 'LOVE" and
'theIR" must follow each other with no intervening words, but any amount of
whitespace and unreserved punctuation may occur between these words in the
description field. */
SELECT description FROM testsearch WHERE description MATCH ' "LOVE! theIR" ';
/* NEAR SEARCH: Find records in the testsearch table where the description
column contains the words 'Americans' and 'baseball' with no more than ten
words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR
football ';
/* NEAR PHRASE SEARCH: Find records in the testsearch table where the
description column contains the words 'their' and 'baseballs' with no words
between each other. This is the same as using a phrase search. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0
baseball';
/* NEAR N SEARCH: Find records in the testsearch table where the description
column contains the words 'love' and 'baseball' with no more than 1 word
between each other. This pattern may occur anywhere in the description field
with any amount of whitespace and punctuation between the words. */
SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1
football ';
/* NOT WORD SEARCH: Find records in the testsearch table where the description
column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The
minus - operator is the NOT operator. It must be adjacent to the word with no
intervening whitespace. You must also include at least one term in the search
phrase before the - operator. */
SELECT description FROM testsearch WHERE description MATCH 'love baseball
-karate';
/* NOT PHRASE SEARCH: Find records in the testsearch table where the
description column contains the word 'baseball' but not the phrase "hate
karate" and not the word 'soccer'. NOTE: The minus operator is the NOT
operator. It must be adjacent to the word with no intervening whitespace. You
must also include a term in the search phrase before the NOT operator. */
SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate
karate" -soccer';
/* AND SEARCH: Find records in the testsearch table where the description
column contains the word 'baseball' and 'loved'. NOTE: The AND operator is
optional. The server automatically adds it between each term in the search
expression. The following two queries are functionally identical. */
SELECT description FROM testsearch WHERE description MATCH ' love AND "their
baseball" ';
SELECT description FROM testsearch WHERE description MATCH ' love "their
baseball" ';
/* SEARCH for reserved words "and", "or", and "near": Find records in the
testsearch table where the description column contains the words "baseball" and
"and" but not "or" and "near". NOTE: enclose a reserved word in double quotes
to treat it as a word instead of an operator. */
SELECT description FROM testsearch WHERE description MATCH ' love "and" -"or"
-"near" ';
/* =====================================================================
* The following queries do NOT find the inserted record
* because they do not match the text:
* 'Americans --love-- their __BASEball__ and <FOOTball>!'
* =====================================================================
*/
/* VERB STEM SEARCH: This query returns no results because the SIMPLE tokenizer
does not match common English conjugations of verbs, such as 'loving', 'love',
'loved', and 'loves'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loving';
/* NOUN STEM SEARCH: This query returns no results because the SIMPLE tokenizer
does not match common English singular and plural forms of nouns, such as
'american' and 'americans'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'american';
/* FIRST WORD SEARCH: This query returns no results because 'love' is not the
first word in the description. */
SELECT description FROM testsearch WHERE description MATCH '^love';
/* STEMMED MULTI-WORD SEARCH: This query returns no results because the SIMPLE
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'Baseballs
american';
/* STEMMED NEAR SEARCH: This query returns no results because the SIMPLE
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'American NEAR
baseballs ';
/* STEMMED PHRASE SEARCH: This query returns no results because the SIMPLE
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0
baseballs ';
/* STEMMED NEAR N SEARCH: This query returns no results because the SIMPLE
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loves NEAR/1
baseballs ';
/* Remove the testsearch table. */
DROP TABLE testsearch;