Skip to main content
Version: User Guides (BYOC)

Jieba

The jieba tokenizer processes Chinese text by breaking it down into its component words.

Configuration

# Simple configuration: only specifying the tokenizer name
analyzer_params = {
"tokenizer": "jieba", # Use the default settings: dict=["_default_"], mode="search", hmm=true
}

Examples

Analyzer configuration

analyzer_params = {
"tokenizer": {
"type": "jieba",
"dict": ["结巴分词器"],
"mode": "exact",
"hmm": False
}
}

Expected output

['milvus', '结巴分词器', '中', '文', '测', '试']