Skip to main content

Analyzers

Analyzers enable you to break search inputs into sets of sub-values that search views can use for improved searching and sorting. When you use an analyzer, the search view gathers the attributes of all documents in liked collections, and creates appropriate sub-values and metadata.

You can use the TOKENS() function to tokenize phrases and turn them into strings for C8QL search queries.

Built-in Analyzers

Macrometa provides a set of built-in analyzers.

The identity analyzer uses the frequency and norm features. All text analyzers tokenize strings with stemming enabled, no stop-words configured, case conversion set to lower, and accent mark removal enabled. The text analyzers use the frequency, norm, and position features.

NameTypeLanguage
identityidentitynone
text_detextGerman
text_entextEnglish
text_estextSpanish
text_fitextFinnish
text_frtextFrench
text_ittextItalian
text_nltextDutch
text_notextNorwegian
text_pttextPortuguese
text_rutextRussian
text_svtextSwedish

Supported Languages

Analyzers rely on ICU for language dependent tokenization and normalization. GDN ships with a data file, icudtl.dat, which contains information for supported languages.

C8DB only supports UTF-8 encoding.

Search views do not support alphabetical ordering in different languages. For example, a range query performed against a search view will not follow language rules defined in the analyzer locale.

Snowball provides stemming capabilities and supports the following languages:

CodeLanguage
deGerman
enEnglish
esSpanish
fiFinnish
frFrench
itItalian
nlDutch
noNorwegian
ptPortuguese
ruRussian
svSwedish

Value Handling

Analyzers are primarily focused on text processing, but you can use search views to index any type of data in the form of a text string which makes it compatible with an analyzer. We index primitive data values as-is (null, true, false, and numbers).

Sub-nested elements of arrays are also unpacked and indexed individually, and all objects are indexed as sub-attributes. However, we do not add arrays or objects to the index, so they cannot be searched.

Refer to the following links for more information about value handling:

  • Search Query: How to query indexed values such as numbers and nested values.
  • Search Views: How we index compound data types (arrays, objects).