バージョン: User Guides (BYOC)

[説明] このページは機械翻訳された日本語版です。内容に誤りがございましたら、報告していただけると助かります。

JSON Indexing
Contact Sales to Enable BYOC

JSON fields provide a flexible way to store structured metadata in Zilliz Cloud. Without indexing, queries on JSON fields require full collection scans, which become slow as your dataset grows. JSON indexing enables fast lookups by creating indexes on within your JSON data.

JSON indexing is ideal for:

Structured schemas with consistent, known keys
Equality and range queries on specific JSON paths
Scenarios where you need precise control over which keys are indexed
Storage-efficient acceleration of targeted queries

📘Notes

For complex JSON documents with diverse query patterns, consider JSON Shredding as an alternative.

JSON indexing syntax

When you create a JSON index, you specify:

JSON path: The exact location of the data you want to index
Data cast type: How to interpret and store the indexed values
Optional type conversion: Transform data during indexing if needed

Here's the syntax to index a JSON field:

# Prepare index params
index_params = MilvusClient.prepare_index_params()

index_params.add_index(
    field_name="<json_field_name>",  # Name of the JSON field
    index_type="AUTOINDEX",  # Must be AUTOINDEX
    index_name="<unique_index_name>",  # Index name
    params={
        "json_path": "<path_to_json_key>",  # Specific key to be indexed within JSON data
        "json_cast_type": "<data_type>",  # Data type to use when interpreting and indexing the value
        # "json_cast_function": "<cast_function>"  # Optional: convert key values into a target type at index time
    }
)

Parameter	Description	Value / Example
`field_name`	The name of your JSON field in the collection schema.	`"metadata"`
`index_type`	Must be `"AUTOINDEX"` for JSON indexing.	`"AUTOINDEX"`
`index_name`	Unique identifier for this index.	`"category_index"`
`json_path`	The path to the key you want to index within your JSON object.	Top-level key: `'metadata["category"]'` Nested key: `'metadata["supplier"]["contact"]["email"]'` Entire JSON object: `"metadata"` Sub-object: `'metadata["supplier"]'`
`json_cast_type`	The data type to use when interpreting and indexing the value. Must match the actual data type of the key. For a list of available cast types, see Supported cast types below.	`"VARCHAR"`
`json_cast_function`	(Optional) Converts original key values to a target type at index time. This config is required only when key values are stored in a wrong format and you want to convert the data type during indexing. For a list of available cast functions, see Supported cast functions below.	`"STRING_TO_DOUBLE"`

Supported cast types

Zilliz Cloud supports the following data types for casting at index time. These types ensure that your data is interpreted correctly for efficient filtering.

Cast Type	Description	Example JSON Value
`BOOL` / `bool`	Used to index boolean values, enabling queries that filter on true/false conditions.	`true`, `false`
`DOUBLE` / `double`	Used for numeric values, including both integers and floating-point numbers. It enables filtering based on ranges or equality (e.g., `>`, `<`, `==`).	`42`, `99.99`
`VARCHAR` / `varchar`	Used to index string values, which is common for text-based data like names, categories, or IDs.	`"electronics"`, `"BrandA"`
`ARRAY_BOOL` / `array_bool`	Used to index an array of boolean values.	`[true, false, true]`
`ARRAY_DOUBLE` / `array_double`	Used to index an array of numeric values.	`[1.2, 3.14, 42]`
`ARRAY_VARCHAR` / `array_varchar`	Used to index an array of strings, which is ideal for a list of tags or keywords.	`["tag1", "tag2", "tag3"]`
`JSON` / `json`	Entire JSON objects or sub-objects with automatic type inference and flattening. Indexing entire JSON objects increases index size. For many-key scenarios, consider JSON Shredding.	Any JSON object

📘Notes

Arrays should contain elements of the same type for optimal indexing. For more information, refer to Array Field.

Supported cast functions

If your JSON field key contains values in an incorrect format (e.g., numbers stored as strings), you can pass a cast function to the json_cast_function argument to convert these values at index time.

Cast functions are case-insensitive. The following functions are supported:

Cast Function	Converts From → To	Use Case
`STRING_TO_DOUBLE` / `string_to_double`	String → Numeric (double)	Convert `"99.99"` to `99.99`

📘Notes

If conversion fails (e.g., non-numeric string), the value is skipped and not indexed.

Create JSON indexes

This section demonstrates how to create indexes on different types of JSON data using practical examples. All examples use the sample JSON structure shown below and assume you've already established a connection to MilvusClient with a properly defined collection schema.

Sample JSON structure

{
  "metadata": { 
    "category": "electronics",
    "brand": "BrandA",
    "in_stock": true,
    "price": 99.99,
    "string_price": "99.99",
    "tags": ["clearance", "summer_sale"],
    "supplier": {
      "name": "SupplierX",
      "country": "USA",
      "contact": {
        "email": "support@supplierx.com",
        "phone": "+1-800-555-0199"
      }
    }
  }
}

Basic setup

Before creating any JSON indexes, prepare your index parameters:

# Prepare index params
index_params = MilvusClient.prepare_index_params()

Example 1: Index a simple JSON key

Create an index on the category field to enable fast filtering by product category:

index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX for JSON path indexing
    index_name="category_index",  # Unique index name
    params={
        "json_path": 'metadata["category"]', # Path to the JSON key
        "json_cast_type": "varchar" # Data cast type
    }
)

Example 2: Index a nested key

Create an index on the deeply nested email field for supplier contact searches:

# Index the nested key
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX for JSON path indexing
    index_name="email_index", # Unique index name
    params={
        "json_path": 'metadata["supplier"]["contact"]["email"]', # Path to the nested JSON key
        "json_cast_type": "varchar" # Data cast type
    }
)

Example 3: Convert data type at index time

Sometimes numeric data is mistakenly stored as strings. Use the STRING_TO_DOUBLE cast function to convert and index it properly:

# Convert string numbers to double for indexing
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", # Must be set to AUTOINDEX for JSON path indexing
    index_name="string_to_double_index", # Unique index name
    params={
        "json_path": 'metadata["string_price"]', # Path to the JSON key to be indexed
        "json_cast_type": "double", # Data cast type
        "json_cast_function": "STRING_TO_DOUBLE" # Cast function; case insensitive
    }
)

Important: If conversion fails for any document (e.g., a non-numeric string like "invalid"), that document's value will be excluded from the index and won't appear in filtered results.

Example 4: Index entire objects

Index the complete JSON object to enable queries on any field within it. When you use json_cast_type="JSON", the system automatically:

Flattens the JSON structure: Nested objects are converted into flat paths for efficient indexing
Infers data types: Each value is automatically categorized as numeric, string, boolean, or date based on its content
Creates comprehensive coverage: All keys and nested paths within the object become searchable

For the sample JSON structure above, index the entire metadata object:

# Index the entire JSON object
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX",
    index_name="metadata_full_index",
    params={
        "json_path": "metadata",
        "json_cast_type": "JSON"
    }
)

You can also index only a portion of the JSON structure, such as all supplier information:

# Index a sub-object
index_params.add_index(
    field_name="metadata",
    index_type="AUTOINDEX", 
    index_name="supplier_index",
    params={
        "json_path": 'metadata["supplier"]',
        "json_cast_type": "JSON"
    }
)

Apply index configuration

After defining all your index parameters, apply them to your collection:

# Apply all index configurations to the collection
MilvusClient.create_index(
    collection_name="your_collection_name",
    index_params=index_params
)

Once indexing completes, your JSON field queries will automatically use these indexes for faster performance.

FAQ

What happens if a query's filter expression uses a different type than the indexed cast type?

If your filter expression uses a different type than the index's json_cast_type, Zilliz Cloud will not use the index and may fall back to a slower brute-force scan if the data allows. For best performance, always align your filter expression with the cast type of the index. For example, if a numeric index is created with json_cast_type="double", only numeric filter conditions will leverage the index.

When creating a JSON index, what if a JSON key has inconsistent data types across different entities?

Inconsistent types can lead to partial indexing. For example, if a metadata["price"] field is stored as both a number (99.99) and a string ("99.99") and you create an index with json_cast_type="double", only the numeric values will be indexed. The string-form entries will be skipped and will not appear in filter results.

Can I create multiple indexes on the same JSON key?

No, each JSON key supports only one index. You must choose a single json_cast_type that matches your data. However, you can create an index on the entire JSON object and an index on a nested key within that object.

JSON indexing syntax​

Supported cast types​

Supported cast functions​

Create JSON indexes​

Sample JSON structure​

Basic setup​

Example 1: Index a simple JSON key​

Example 2: Index a nested key​

Example 3: Convert data type at index time​

Example 4: Index entire objects​

Apply index configuration​

FAQ​

What happens if a query's filter expression uses a different type than the indexed cast type?​

When creating a JSON index, what if a JSON key has inconsistent data types across different entities?​

Can I create multiple indexes on the same JSON key?​

JSON indexing syntax

Supported cast types

Supported cast functions

Create JSON indexes

Sample JSON structure

Basic setup

Example 1: Index a simple JSON key

Example 2: Index a nested key

Example 3: Convert data type at index time

Example 4: Index entire objects

Apply index configuration

FAQ

What happens if a query's filter expression uses a different type than the indexed cast type?

When creating a JSON index, what if a JSON key has inconsistent data types across different entities?

Can I create multiple indexes on the same JSON key?