Skip to main content
Version: User Guides (Cloud)

JSON Field

JSON (JavaScript Object Notation) is a lightweight data exchange format that provides a flexible way to store and query complex data structures. In Zilliz Cloud clusters you can store additional structured information alongside vector data using JSON fields, enabling advanced searches and queries that combine vector similarity with structured filtering.

JSON fields are ideal for applications that require metadata to optimize retrieval results. For example, in e-commerce, product vectors can be enhanced with attributes like category, price, and brand. In recommendation systems, user vectors can be combined with preferences and demographic information. Below is an example of a typical JSON field:

{
"category": "electronics",
"price": 99.99,
"brand": "BrandA"
}

Add JSON field

To use JSON fields in Zilliz Cloud clusters, define the relevant field type in the collection schema, setting the datatype to the supported JSON type, i.e., JSON.

Here’s how to define a collection schema that includes a JSON field:

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

schema = client.create_schema(
auto_id=False,
enable_dynamic_fields=True,
)

schema.add_field(field_name="metadata", datatype=DataType.JSON)
schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)

In this example, we add a JSON field called metadata to store additional metadata related to vector data, such as product category, price, and brand information.

📘Notes

The primary field and vector field are mandatory when you create a collection. The primary field uniquely identifies each entity, while the vector field is crucial for similarity search. For more details, refer to Primary Field & AutoId, Dense Vector, Binary Vector, or Sparse Vector.

Create collection

When creating a collection, you must create an index for the vector field to ensure retrieval performance. In this example, we use AUTOINDEX to simplify index setup. For more details, refer to AUTOINDEX Explained.


index_params = client.prepare_index_params()

index_params.add_index(
field_name="embedding",
index_type="AUTOINDEX",
metric_type="COSINE"
)

Use the defined schema and index parameters to create a collection:

client.create_collection(
collection_name="my_json_collection",
schema=schema,
index_params=index_params
)

Insert data

After creating the collection, you can insert data that includes JSON fields.

# Data to be inserted
data = [
{
"metadata": {"category": "electronics", "price": 99.99, "brand": "BrandA"},
"pk": 1,
"embedding": [0.12, 0.34, 0.56]
},
{
"metadata": {"category": "home_appliances", "price": 249.99, "brand": "BrandB"},
"pk": 2,
"embedding": [0.56, 0.78, 0.90]
},
{
"metadata": {"category": "furniture", "price": 399.99, "brand": "BrandC"},
"pk": 3,
"embedding": [0.91, 0.18, 0.23]
}
]

# Insert data into the collection
client.insert(
collection_name="your_collection_name",
data=data
)

In this example:

  • Each data entry includes a primary field (pk), metadata as a JSON field to store information such as product category, price, and brand.

  • embedding is a 3-dimensional vector field used for vector similarity search.

Search and query

JSON fields allow scalar filtering during searches, enhancing Zilliz Cloud's vector search capabilities. You can query based on JSON properties alongside vector similarity.

Filter queries

You can filter data based on JSON properties, such as matching specific values or checking if a number falls within a certain range.

filter = 'metadata["category"] == "electronics" and metadata["price"] < 150'

res = client.query(
collection_name="my_json_collection",
filter=filter,
output_fields=["metadata"]
)

print(res)

# Output
# data: ["{'metadata': {'category': 'electronics', 'price': 99.99, 'brand': 'BrandA'}, 'pk': 1}"]

In the above query, Zilliz Cloud filters out entities where the metadata field has a category of "electronics" and a price below 150, returning entities that match these criteria.

Vector search with JSON filtering

By combining vector similarity with JSON filtering, you can ensure that the retrieved data not only matches semantically but also meets specific business conditions, making the search results more precise and aligned with user needs.

filter = 'metadata["brand"] == "BrandA"'

res = client.search(
collection_name="my_json_collection",
data=[[0.3, -0.6, 0.1]],
limit=5,
search_params={"params": {"nprobe": 10}},
output_fields=["metadata"],
filter=filter
)

print(res)

# Output
# data: ["[{'id': 1, 'distance': -0.2479381263256073, 'entity': {'metadata': {'category': 'electronics', 'price': 99.99, 'brand': 'BrandA'}}}]"]

In this example, Zilliz Cloud returns the top 5 entities most similar to the query vector, with the metadata field containing a brand of "BrandA".

Additionally, Zilliz Cloud supports advanced JSON filtering operators such as JSON_CONTAINS, JSON_CONTAINS_ALL, and JSON_CONTAINS_ANY, which can further enhance query capabilities. For more details, refer to JSON Operators.

Limits

  • Indexing Limitations: Due to the complexity of data structures, indexing JSON fields is not supported.

  • Data Type Matching: If a JSON field's key value is an integer or floating point, it can only be compared with another integer or float key or INT32/64 or FLOAT32/64 fields. If the key value is a string (VARCHAR), it can only be compared with another string key.

  • Naming Restrictions: When naming JSON keys, it is recommended to use only letters, numeric characters, and underscores, as other characters may cause issues during filtering or searching.

  • Handling String Values: For string values (VARCHAR), Zilliz Cloud stores JSON field strings as-is without semantic conversion. For example: 'a"b', "a'b", 'a\'b', and "a\"b" are stored as entered; however, 'a'b' and "a"b" are considered invalid.

  • Handling Nested Dictionaries: Any nested dictionaries within JSON field values are treated as strings.