Skip to main content
Version: User Guides (Cloud)

Regex
Public Preview

The regex filter is a regular expression filter: any token produced by the tokenizer is kept only if it matches the expression you provide; everything else is discarded.

Configuration

The regex filter is a custom filter in Zilliz Cloud. To use it, specify "type": "regex" in the filter configuration, along with an expr parameter to specify the desired regular expressions.

analyzer_params = {
"tokenizer": "standard",
"filter": [{
"type": "regex",
"expr": "^(?!test)" # keep tokens that do NOT start with "test"
}]
}

The regex filter accepts the following configurable parameters.

Parameter

Description

expr

A regular‑expression pattern applied to each token. Tokens that match are retained; non‑matches are dropped.

For details on regex syntax, refer to Syntax.

The regex filter operates on the terms generated by the tokenizer, so it must be used in combination with a tokenizer.

After defining analyzer_params, you can apply them to a VARCHAR field when defining a collection schema. This allows Zilliz Cloud to process the text in that field using the specified analyzer for efficient tokenization and filtering. For details, refer to Example use.

Examples

Before applying the analyzer configuration to your collection schema, verify its behavior using the run_analyzer method.

Analyzer configuration

analyzer_params = {
"tokenizer": "standard",
"filter": [{
"type": "regex",
"expr": "^(?!test)"
}]
}

Verification using run_analyzer

from pymilvus import (
MilvusClient,
)

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

# Sample text to analyze
sample_text = "testItem apple testCase banana"

# Run the standard analyzer with the defined configuration
result = client.run_analyzer(sample_text, analyzer_params)
print("Standard analyzer output:", result)

Expected output

['apple', 'banana']