Product Documentation

FairCom ISAM for C

Previous Topic

Next Topic

ICU Collation Option Overview

The collation options can be grouped as follows: locale default control, collation strength, normalization, and special attributes. Locale default control affects the degree to which a default locale must be related to the requested locale. Collation strength determines how case, accents and other character modifiers affect the ordering of sort keys. Normalization affects how alternative variations of the “same” character (including its accents and other modifiers) are compared. The special attributes affect particular properties of the collation, which further modify the strength and normalization options. For example, a special attribute can be used to force lower case characters first or last in the collation.

If no locale default control option is made part of kseg_comp, there is no restriction on how close to the requested locale the effective locale must be. For example, if you request collation for the German language (“de”), you are likely to get a locale based on the system default (e.g., “en_US” in the United States). This is not a problem since it has been determined that the default rules work for the German language.

If ctKSEG_COMPU_SYSDEFAULT_NOTOK is used, then a request to use locale “xx_YY_Variant” will succeed as long as collation rules for “xx” are available. If ctKSEG_COMPU_FALLBACK_NOTOK is used, then rules for the particular locale with its optional country and variant modifiers must be available. Falling back from “xx_YY” to “xx” is not satisfactory. In the case of the “de” locale noted above, the segment definition would cause an error in the call to PutXtdKeySegmentDef() if either of the “NOTOK” default restrictions are part of the definition.

At most one of the following collation strength options can be included in kseg_comp:

ctKSEG_COMPU_S_PRIMARY

ctKSEG_COMPU_S_SECONDARY

ctKSEG_COMPU_S_TERTIARY

ctKSEG_COMPU_S_QUATERNARY

ctKSEG_COMPU_S_IDENTICAL

ctKSEG_COMPU_S_DEFAULT

At most one of the following normalization options can be included in kseg_comp:

ctKSEG_COMPU_N_NONE

ctKSEG_COMPU_N_CAN_DECMP

ctKSEG_COMPU_N_CMP_DECMP

ctKSEG_COMPU_N_CAN_DECMP_CMP

ctKSEG_COMPU_N_CMP_DECMP_CAN

ctKSEG_COMPU_N_DEFAULT

One or more of the following special attributes can be included in kseg_comp. After each one of our symbolic constants is the equivalent ICU-attribute, attribute-value pair.

ctKSEG_COMPU_A_FRENCH_ON

(UCOL_FRENCH_COLLATION,UCOL_ON)

ctKSEG_COMPU_A_FRENCH_OFF

(UCOL_FRENCH_COLLATION,UCOL_OFF)

ctKSEG_COMPU_A_CASE_ON

(UCOL_CASE_LEVEL,UCOL_ON)

ctKSEG_COMPU_A_CASE_OFF

(UCOL_CASE_LEVEL,UCOL_OFF)

ctKSEG_COMPU_A_DECOMP_ON

(UCOL_DECOMPOSITION_MODE, UCOL_ON)

ctKSEG_COMPU_A_DECOMP_OFF

(UCOL_DECOMPOSITION_MODE, UCOL_OFF)

ctKSEG_COMPU_A_SHIFTED

(UCOL_ALTERNATE_HANDLING, UCOL_SHIFTED)

ctKSEG_COMPU_A_NONIGNR

(UCOL_ALTERNATE_HANDLING, UCOL_NON_IGNORABLE)

ctKSEG_COMPU_A_LOWER

(UCOL_CASE_FIRST, UCOL_LOWER_FIRST)

ctKSEG_COMPU_A_UPPER

(UCOL_CASE_FIRST, UCOL_UPPER_FIRST)

ctKSEG_COMPU_A_HANGUL

(UCOL_NORMALIZATION_MODE, UCOL_ON_WITHOUT_HANGUL)

It is permissible to set kseg_comp to zero. A zero kseg_comp implies no restrictions on locale defaults, default collation strength, default normalization, and no special attributes.

For a complete treatment of all of these options, please refer to the ICU web site and the Unicode Consortium’s web site and publications.

TOCIndex