Product Documentation

FairCom DB API for C

Previous Topic

Next Topic

ICU Collation Option Overview

The collation options can be grouped as follows: locale default control, collation strength, normalization, and special attributes. Locale default control effects the degree to which a default locale must be related to the requested locale. Collation strength determines how case, accents and other character modifiers affect the ordering of sort keys. Normalization effects how alter- native variations of the "same" character (including its accents and other modifiers) are compared. The special attributes effect particular properties of the collation, which further modify the strength and normalization options. For example, a special attribute can be used to force lower case characters to be first or last in the collation.

If no locale default control option is made part of kseg_comp, there is no restriction on how close to the requested locale the effective locale must be. For example, if you request collation for the German language ("de"), you are likely to get a locale based on the system default (e.g., "en_US" in the United States). This is not a problem since it has been determined that the default rules work for the German language.

If ctKSEG_COMPU_SYSDEFAULT_NOTOK is used, then a request to use locale "xx_YY_Variant" will succeed as long as collation rules for "xx" are available. If ctKSEG_COMPU_FALLBACK_NOTOK is used, then rules for the particular locale with its optional country and variant modifiers must be available. Falling back from "xx_YY" to "xx" is not satisfactory. In the case of the "de" locale noted above, the segment definition would cause an error in the call to PutXtdKeySegmentDef() if either of the "NOTOK" default restrictions are part of the definition.

At most one of the following collation strength options can be included in kseg_comp:

  • ctKSEG_COMPU_S_PRIMARY
  • ctKSEG_COMPU_S_SECONDARY
  • ctKSEG_COMPU_S_TERTIARY
  • ctKSEG_COMPU_S_QUATERNARY
  • ctKSEG_COMPU_S_IDENTICAL
  • ctKSEG_COMPU_S_DEFAULT

At most, one of the following normalization options can be included in kseg_comp:

  • ctKSEG_COMPU_N_NONE
  • ctKSEG_COMPU_N_CAN_DECMP
  • ctKSEG_COMPU_N_CMP_DECMP
  • ctKSEG_COMPU_N_CAN_DECMP_CMP
  • ctKSEG_COMPU_N_CMP_DECMP_CAN
  • ctKSEG_COMPU_N_DEFAULT

One or more of the following special attributes can be included in kseg_comp After each one of the c-tree symbolic constants is the equivalent ICU-attribute value pair.

c-tree Symbolic Constant

ICU Attribute value pair

ctKSEG_COMPU_A_FRENCH_ON

(UCOL_FRENCH_COLLATION,UCOL_ON)

ctKSEG_COMPU_A_FRENCH_OFF

(UCOL_FRENCH_COLLATION,UCOL_OFF)

ctKSEG_COMPU_A_CASE_ON

(UCOL_CASE_LEVEL,UCOL_ON)

ctKSEG_COMPU_A_CASE_OFF

(UCOL_CASE_LEVEL,UCOL_OFF)

ctKSEG_COMPU_A_DECOMP_ON

(UCOL_DECOMPOSITION_MODE,UCOL_ON)

ctKSEG_COMPU_A_DECOMP_OFF

(UCOL_DECOMPOSITION_MODE,UCOL_OFF)

ctKSEG_COMPU_A_SHIFTED

(UCOL_ALTERNATE_HANDLING, UCOL_SHIFTED)

ctKSEG_COMPU_A_NONIGNR

(UCOL_ALTERNATE_HANDLING, UCOL_NON_IGNORABLE)

ctKSEG_COMPU_A_LOWER

(UCOL_CASE_FIRST,UCOL_LOWER_FIRST)

ctKSEG_COMPU_A_UPPER

(UCOL_CASE_FIRST,UCOL_UPPER_FIRST)

ctKSEG_COMPU_A_HANGUL

(UCOL_NORMALIZATION_MODE, UCOL_ON_WITHOUT_HANGUL)

It is permissible to set kseg_comp to zero. A zero kseg_comp implies no restrictions on locale defaults, default collation strength, default normalization, and no special attributes.

For a complete treatment of all of these options, please refer to the ICU website and the Unicode Consortium’s website and publications.

TOCIndex