Skip to main content
Version: User Guides (BYOC)

JSON Field

A JSON field is a scalar field that stores additional information along with vector embeddings, in key-value pairs. Here's an example of how data is stored in JSON format:

{
"metadata": {
"product_info": {
"category": "electronics",
"brand": "BrandA"
},
"price": 99.99,
"in_stock": true,
"tags": ["summer_sale", "clearance"]
}
}

Limits

  • Field Size: JSON fields are limited to 65,536 bytes in size.

  • Nested Dictionaries: Any nested dictionaries within JSON field values are treated as plain strings for storage.

  • Default Values: JSON fields do not support default values. However, you can set the nullable attribute to True to allow null values. For details, refer to Nullable & Default.

  • Type Matching: If a JSON field’s key value is an integer or float, it can only be compared (via expression filters) with another numeric key of the same type.

  • Naming: When naming JSON keys, it is recommended to use only letters, numbers, and underscores. Using other characters may cause issues when filtering or searching.

  • String Handling: Milvus stores string values in JSON fields as entered, without semantic conversion. For example:

    • 'a"b', "a'b", 'a\'b', and "a\"b" are stored exactly as they are.

    • 'a'b' and "a"b" are considered invalid.

  • JSON Indexing: When indexing a JSON field, you can specify one or more paths in the JSON field to accelerate filtering. Each additional path increases indexing overhead, so plan your indexing strategy carefully. For more considerations on indexing a JSON field, refer to Considerations on JSON indexing.

Add JSON field

To add this JSON field metadata to your collection schema, use DataType.JSON. The example below defines a JSON field metadata that allows null values:

# Import necessary libraries
from pymilvus import MilvusClient, DataType

# Define server address
SERVER_ADDR = "YOUR_CLUSTER_ENDPOINT"

# Create a MilvusClient instance
client = MilvusClient(uri=SERVER_ADDR)

# Define the collection schema
schema = client.create_schema(
auto_id=False,
enable_dynamic_fields=True,
)

# Add a JSON field that supports null values
schema.add_field(field_name="metadata", datatype=DataType.JSON, nullable=True)
schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)
📘Notes
  • Set enable_dynamic_fields=True if you need to insert additional, undefined fields in the future.

  • Use nullable=True to allow missing or null JSON objects.

Set index params

Indexing helps Milvus quickly filter or search across large volumes of data. In Milvus, indexing is:

  • Mandatory for vector fields (to efficiently run similarity searches).

Index a vector field

The following example creates an index on the vector field embedding, using the AUTOINDEX index type. With this type, Milvus automatically selects the most suitable index based on the data type.

# Set index params

index_params = client.prepare_index_params()

# Index `embedding` with AUTOINDEX and specify similarity metric type
index_params.add_index(
field_name="embedding",
index_name="vector_index",
index_type="AUTOINDEX", # Use automatic indexing to simplify complex index settings
metric_type="COSINE" # Specify similarity metric type, options include L2, COSINE, or IP
)

Create collection

Once the schema and index are defined, create a collection that includes string fields.

client.create_collection(
collection_name="my_json_collection",
schema=schema,
index_params=index_params
)

Insert data

After creating the collection, insert entities that match the schema.

# Sample data
data = [
{
"metadata": {
"product_info": {"category": "electronics", "brand": "BrandA"},
"price": 99.99,
"in_stock": True,
"tags": ["summer_sale"]
},
"pk": 1,
"embedding": [0.12, 0.34, 0.56]
},
{
"metadata": None, # Entire JSON object is null
"pk": 2,
"embedding": [0.56, 0.78, 0.90]
},
{
# JSON field is completely missing
"pk": 3,
"embedding": [0.91, 0.18, 0.23]
},
{
# Some sub-keys are null
"metadata": {
"product_info": {"category": None, "brand": "BrandB"},
"price": 59.99,
"in_stock": None
},
"pk": 4,
"embedding": [0.56, 0.38, 0.21]
}
]

client.insert(
collection_name="my_json_collection",
data=data
)

Query with filter expressions

After inserting entities, use the query method to retrieve entities that match the specified filter expressions.

📘Notes

For JSON fields that allow null values, the field will be treated as null if the entire JSON object is missing or set to None. For more information, refer to JSON Fields with Null Values.

To retrieve entities where metadata is not null:

# Query to filter out records with null metadata

filter = 'metadata is not null'

res = client.query(
collection_name="my_json_collection",
filter=filter,
output_fields=["metadata", "pk"]
)

# Expected result:
# Rows with pk=1 and pk=4 have valid, non-null metadata.
# Rows with pk=2 (metadata=None) and pk=3 (no metadata key) are excluded.

print(res)

# Output:
# data: [
# "{'metadata': {'product_info': {'category': 'electronics', 'brand': 'BrandA'}, 'price': 99.99, 'in_stock': True, 'tags': ['summer_sale']}, 'pk': 1}",
# "{'metadata': {'product_info': {'category': None, 'brand': 'BrandB'}, 'price': 59.99, 'in_stock': None}, 'pk': 4}"
# ]

To retrieve entities where metadata["product_info"]["category"] is "electronics":

filter = 'metadata["product_info"]["category"] == "electronics"'

res = client.query(
collection_name="my_json_collection",
filter=filter,
output_fields=["metadata", "pk"]
)

# Expected result:
# - Only pk=1 has "category": "electronics".
# - pk=4 has "category": None, so it doesn't match.
# - pk=2 and pk=3 have no valid metadata.

print(res)

# Output:
# data: [
# "{'pk': 1, 'metadata': {'product_info': {'category': 'electronics', 'brand': 'BrandA'}, 'price': 99.99, 'in_stock': True, 'tags': ['summer_sale']}}"
# ]

Vector search with filter expressions

In addition to basic scalar field filtering, you can combine vector similarity searches with scalar field filters. For example, the following code shows how to add a scalar field filter to a vector search:

filter = 'metadata["product_info"]["brand"] == "BrandA"'

res = client.search(
collection_name="my_json_collection",
data=[[0.3, -0.6, 0.1]],
limit=5,
search_params={"params": {"nprobe": 10}},
output_fields=["metadata"],
filter=filter
)

# Expected result:
# - Only pk=1 has "brand": "BrandA" in metadata["product_info"].
# - pk=4 has "brand": "BrandB".
# - pk=2 and pk=3 have no valid metadata.
# Hence, only pk=1 matches the filter.

print(res)

# Output:
# data: [
# "[{'id': 1, 'distance': -0.2479381263256073, 'entity': {'metadata': {'product_info': {'category': 'electronics', 'brand': 'BrandA'}, 'price': 99.99, 'in_stock': True, 'tags': ['summer_sale']}}}]"
# ]

Additionally, Zilliz Cloud supports advanced JSON filtering operators such as JSON_CONTAINS, JSON_CONTAINS_ALL, and JSON_CONTAINS_ANY, which can further enhance query capabilities. For more details, refer to JSON Operators.