Skip to main content

SIMPLE tokenizer

Example SQL full-text searches using SIMPLE tokenizer

Abstract

example SQL full-text searches using SIMPLE tokenizer

The SIMPLE tokenizer identifies each word using whitespace and/or punctuation. Then regardless of case, the tokenizer finds all matching words in the search. tokenizer is case insensitive.

A search string identifies each word using whitespace and/or punctuation and may use double quotes to identify a word or a phrase. Phrases contain two or more words within double quotes and may include whitespace between words. Punctuation, such as .!>, is ignored in the search string when it is not a full-text search operator. Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT; However, * ( ) OR NOT are not supported and will return an error when used.

/* =====================================================================
 * Full-text search demo using the SIMPLE tokenizer.
 * =====================================================================
 */

/* Create the testsearch table. */
CREATE TABLE testsearch
( 
 name VARCHAR(30)
, description VARCHAR(500) 
) STORAGE_ATTRIBUTES 'hotAlter; huge; rowid_fld; noRecbyt_idx;'
;

/* Create full-text index on the description field of the testsearch table. Use 
the PORTER English word stem tokenizer to do case insensitive matching of 
common word forms */

CREATE FULLTEXT INDEX testsearch_description_simple_fts   
 ON  testsearch( description )   
 UPDATE_MODE  ASYNC
;

/* Insert a record into the testsearch table. */
INSERT INTO testsearch 
VALUES('Mike','Americans   --love--  their __BASEball__ and <FOOTball>!')
;

/* Look at the record inserted into the testsearch table. */
SELECT  description FROM testsearch;

/* =====================================================================
 * The following queries find the inserted record 
 * by searching for words and phrases in the description field:
 * 'Americans   --love--  their __BASEball__ and <FOOTball>!'. 
 * 
 * The SIMPLE tokenizer reduces the text to 
 * 'americans love their baseball and football'. 
 * 
 * The tokenizer identifies each word using whitespace and/or punctuation.
 * The tokenizer is case insensitive.
 * A search string identifies each word using whitespace and/or punctuation.
 * A search string may use double quotes to identify a word or a phrase.
 * A phrase contains two or more words within double quotes.
 * A search string may include any amount of whitespace between words.
 * A search string may include 
 * Punctuation, such as .!>, is ignored in the search string 
 * when it is not a full-text search operator. 
 * Reserved search operators include - " ^ * ( ) NEAR NEAR/n AND OR NOT
 * The following search operators are unsupported: * ( ) OR NOT
 * The server returns an error when you use an unsupported search operator.
 * =====================================================================
 */

/* WORD SEARCH: Find records in the testsearch table where the description 
column contains the word 'baseball' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball';

/* CASE INSENSITIVE SEARCH: Find records in the testsearch table where the 
description column contains case-insensitive forms of the word 'americans', 
such as 'Americans' and 'AMERIcans' anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'americans';

/* FIRST WORD SEARCH: Find records in the testsearch table where the 
description column contains the word 'americans' as the first word in the 
description field. */
SELECT  description FROM testsearch WHERE description MATCH '^americans';

/* FIRST PHRASE SEARCH: Find records in the testsearch table where the 
description column contains the phrase "americans love" at the begining of the 
text in the description field. */
SELECT  description FROM testsearch WHERE description MATCH ' ^"americans love" 
';

/* MULTI-WORD SEARCH: Find records in the testsearch table where the 
description column contains the words 'baseball' and 'love' in any order 
anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'baseball love';

/* MULTI-WORD SEARCH ACROSS WHITESPACE: Find records in the testsearch table 
where the description column contains the words 'baseball' and 'love' in any 
order anywhere in the description field. NOTE: Any amount of whitespace may 
occur between these words in the search text. Punctuation, such as .!>, is also 
ignored in the search string when it is not a full-text search operator. */
SELECT description FROM testsearch WHERE description MATCH ' <Baseball>   >>>   
.LOVE. !!! ';

/* PHRASE SEARCH: Find records in the testsearch table where the description 
column contains the case-insensitive phrase "love their" anywhere in the 
description field. The words 'love" and 'their" must follow each other with no 
intervening words in the description field. */
SELECT description FROM testsearch WHERE description MATCH ' "love their" ';

/* PHRASE SEARCH ACROSS WHITESPACE: Find records in the testsearch table where 
the description column contains the case-insensitive phrase "LOVE their"  
anywhere in the description field. The case-insensitive words 'LOVE" and 
'theIR" must follow each other with no intervening words, but any amount of 
whitespace and unreserved punctuation may occur between these words in the 
description field. */
SELECT description FROM testsearch WHERE description MATCH ' "LOVE!   theIR" ';

/* NEAR SEARCH: Find records in the testsearch table where the description 
column contains the words 'Americans' and 'baseball' with no more than ten 
words between each other anywhere in the description field. */
SELECT description FROM testsearch WHERE description MATCH 'Americans NEAR 
football ';

/* NEAR PHRASE SEARCH: Find records in the testsearch table where the 
description column contains the words 'their' and 'baseballs' with no words 
between each other. This is the same as using a phrase search. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 
baseball';

/* NEAR N SEARCH: Find records in the testsearch table where the description 
column contains the words 'love' and 'baseball' with no more than 1 word 
between each other. This pattern may occur anywhere in the description field 
with any amount of whitespace and punctuation between the words. */
SELECT description FROM testsearch WHERE description MATCH ' baseball NEAR/1 
football ';

/* NOT WORD SEARCH: Find records in the testsearch table where the description 
column contains the words 'love' and 'baseball' but not 'karate'. NOTE: The 
minus - operator is the NOT operator. It must be adjacent to the word with no 
intervening whitespace. You must also include at least one term in the search 
phrase before the - operator. */
SELECT description FROM testsearch WHERE description MATCH 'love baseball 
-karate';

/* NOT PHRASE SEARCH: Find records in the testsearch table where the 
description column contains the word 'baseball' but not the phrase "hate 
karate" and not the word 'soccer'. NOTE: The minus operator is the NOT 
operator. It must be adjacent to the word with no intervening whitespace. You 
must also include a term in the search phrase before the NOT operator. */
SELECT description FROM testsearch WHERE description MATCH 'baseball -"hate 
karate" -soccer';

/* AND SEARCH: Find records in the testsearch table where the description 
column contains the word 'baseball' and 'loved'. NOTE: The AND operator is 
optional. The server automatically adds it between each term in the search 
expression. The following two queries are functionally identical. */
SELECT description FROM testsearch WHERE description MATCH ' love AND "their 
baseball" ';
SELECT description FROM testsearch WHERE description MATCH ' love "their 
baseball" ';

/* SEARCH for reserved words "and", "or", and "near": Find records in the 
testsearch table where the description column contains the words "baseball" and 
"and" but not "or" and "near". NOTE: enclose a reserved word in double quotes 
to treat it as a word instead of an operator. */
SELECT description FROM testsearch WHERE description MATCH ' love "and" -"or" 
-"near" ';

/* =====================================================================
 * The following queries do NOT find the inserted record 
 * because they do not match the text: 
 * 'Americans   --love--  their __BASEball__ and <FOOTball>!' 
 * =====================================================================
 */

/* VERB STEM SEARCH: This query returns no results because the SIMPLE tokenizer 
does not match common English conjugations of verbs, such as 'loving', 'love', 
'loved', and 'loves'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loving';

/* NOUN STEM SEARCH: This query returns no results because the SIMPLE tokenizer 
does not match common English singular and plural forms of nouns, such as  
'american' and 'americans'. The PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'american';

/* FIRST WORD SEARCH: This query returns no results because 'love' is not the 
first word in the description. */
SELECT  description FROM testsearch WHERE description MATCH '^love';

/* STEMMED MULTI-WORD SEARCH: This query returns no results because the SIMPLE 
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'Baseballs 
american';

/* STEMMED NEAR SEARCH: This query returns no results because the SIMPLE 
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'American NEAR 
baseballs ';

/* STEMMED PHRASE SEARCH: This query returns no results because the SIMPLE 
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'their NEAR/0 
baseballs ';

/* STEMMED NEAR N SEARCH: This query returns no results because the SIMPLE 
tokenizer does not match word stems like the PORTER tokenizer does. */
SELECT description FROM testsearch WHERE description MATCH 'loves NEAR/1 
baseballs ';


/* Remove the testsearch table. */
DROP TABLE testsearch;