Tokenizer ReferencePublic Preview
This section provides a detailed reference for tokenizers.
Standard Tokenizer [READ MORE]
The `standard` tokenizer in Zilliz Cloud splits text based on spaces and punctuation marks, making it suitable for most languages.
Whitespace [READ MORE]
The `whitespace` tokenizer divides text into terms whenever there is a space between words.
Jieba [READ MORE]
The `jieba` tokenizer processes Chinese text by breaking it down into its component words.