Version: User Guides (BYOC)

Filter Reference
Contact Sales to Enable BYOC

This section provides a detailed reference for filters in analyzers.

Lowercase [READ MORE]

The `lowercase` filter converts terms generated by a tokenizer to lowercase, making searches case-insensitive. For example, it can convert `["High", "Performance", "Vector", "Database"]` to `["high", "performance", "vector", "database"]`.

ASCII folding [READ MORE]

The `asciifolding` filter converts characters outside the Basic Latin Unicode block into their ASCII equivalents. For instance, it transforms characters like `í` to `i`, making text processing simpler and more consistent, especially for multilingual content.

Alphanumonly [READ MORE]

The `alphanumonly` filter removes tokens that contain non-ASCII characters, keeping only alphanumeric terms. This filter is useful for processing text where only basic letters and numbers are relevant, excluding any special characters or symbols.

Cnalphanumonly [READ MORE]

The `cnalphanumonly` filter removes tokens that contain any characters other than Chinese characters, English letters, or digits.

Cncharonly [READ MORE]

The `cncharonly` filter removes tokens that contain any non-Chinese characters. This filter is useful when you want to focus solely on Chinese text, filtering out any tokens that contain other scripts, numbers, or symbols.

Length [READ MORE]

The `length` filter removes tokens that do not meet specified length requirements, allowing you to control the length of tokens retained during text processing.

Stop [READ MORE]

The `stop` filter removes specified stop words from tokenized text, helping to eliminate common, less meaningful words. You can configure the list of stop words using the `stopwords` parameter.

Decompounder [READ MORE]

The `decompounder` filter splits compound words into individual components based on a specified dictionary, making it easier to search for parts of compound terms. This filter is particularly useful for languages that frequently use compound words, such as German.

Stemmer [READ MORE]

The `stemmer` filter reduces words to their base or root form (known as stemming), making it easier to match words with similar meanings across different inflections. The `stemmer` filter supports multiple languages, allowing for effective search and indexing in various linguistic contexts.