Filter ReferencePublic Preview
This section provides a detailed reference for filters in analyzers.
Lowercase [READ MORE]
The `lowercase` filter converts terms generated by a tokenizer to lowercase, making searches case-insensitive. For example, it can convert `["High", "Performance", "Vector", "Database"]` to `["high", "performance", "vector", "database"]`.
ASCII folding [READ MORE]
The `asciifolding` filter converts characters outside the Basic Latin Unicode block into their ASCII equivalents. For instance, it transforms characters like `í` to `i`, making text processing simpler and more consistent, especially for multilingual content.
Alphanumonly [READ MORE]
The `alphanumonly` filter removes tokens that contain non-ASCII characters, keeping only alphanumeric terms. This filter is useful for processing text where only basic letters and numbers are relevant, excluding any special characters or symbols.
Cnalphanumonly [READ MORE]
The `cnalphanumonly` filter removes tokens that contain any characters other than Chinese characters, English letters, or digits.
Cncharonly [READ MORE]
The `cncharonly` filter removes tokens that contain any non-Chinese characters. This filter is useful when you want to focus solely on Chinese text, filtering out any tokens that contain other scripts, numbers, or symbols.
Length [READ MORE]
The `length` filter removes tokens that do not meet specified length requirements, allowing you to control the length of tokens retained during text processing.
Stop [READ MORE]
The `stop` filter removes specified stop words from tokenized text, helping to eliminate common, less meaningful words. You can configure the list of stop words using the `stopwords` parameter.
Decompounder [READ MORE]
The `decompounder` filter splits compound words into individual components based on a specified dictionary, making it easier to search for parts of compound terms. This filter is particularly useful for languages that frequently use compound words, such as German.
Stemmer [READ MORE]
The `stemmer` filter reduces words to their base or root form (known as stemming), making it easier to match words with similar meanings across different inflections. The `stemmer` filter supports multiple languages, allowing for effective search and indexing in various linguistic contexts.