Skip to main content
Version: User Guides (Cloud)

Use Dynamic Field

This page explains how to use the dynamic field in a collection for flexible data insertion and retrieval.

Overview

Schema design is crucial for Zilliz Cloud cluster data processing. Before inserting entities into a collection, clarify the schema design and ensure that all data entities inserted afterward match the schema. However, this puts limits on collections, making them similar to tables in relational databases.

Dynamic schema enables users to insert entities with new fields into a collection without modifying the existing schema. This means that users can insert data without knowing the full schema of a collection and can include fields that are not yet defined.

Dynamic schema also provides flexibility in data processing, enabling users to store and retrieve complex data structures in their collections. This includes nested data, arrays, and other complex data types.

Create collection with dynamic field

To create a collection using a dynamic schema, set enable_dynamic_field to True when defining the data model. Afterward, all undefined fields and their values in the data entities inserted afterward will be treated as pre-defined fields. We prefer to use the term "dynamic fields" to refer to these key-value pairs.

With these dynamic fields, you can ask Zilliz Cloud to output dynamic fields in search/query results and include them in search and query filter expressions just as they are already defined in the collection schema.

import json, os, time
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

CLUSTER_ENDPOINT="YOUR_CLUSTER_ENDPOINT" # Set your cluster endpoint
TOKEN="YOUR_CLUSTER_TOKEN" # Set your token
COLLECTION_NAME="medium_articles_2020" # Set your collection name
DATASET_PATH="{}/../medium_articles_2020_dpr.json".format(os.path.dirname(__file__)) # Set your dataset path

# 1. Connect to cluster
connections.connect(
alias='default',
# Public endpoint obtained from Zilliz Cloud
uri=CLUSTER_ENDPOINT,
# API key or a colon-separated cluster username and password
token=TOKEN,
)

# 2. Define fields
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, max_length=100),
FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=512),
FieldSchema(name="title_vector", dtype=DataType.FLOAT_VECTOR, dim=768)
]

# 3. Create schema with dynamic field enabled
schema = CollectionSchema(
fields,
"The schema for a medium news collection",
enable_dynamic_field=True
)

# 4. Create collection
collection = Collection(COLLECTION_NAME, schema)

# 5. Index collection
index_params = {
"index_type": "AUTOINDEX",
"metric_type": "L2",
"params": {}
}

collection.create_index(
field_name="title_vector",
index_params=index_params
)

collection.load()

# Get loading progress
progress = utility.loading_progress(COLLECTION_NAME)

print(progress)

# Output
#
# {
# "loading_progress": "100%"
# }

Insert dynamic data

Once the collection is created, you can start inserting data, including the dynamic data into the collection.

Prepare data

Now we need to prepare a piece of applicable data out of the Example Dataset.

# 6. Prepare data
with open(DATASET_PATH) as f:
data = json.load(f)
list_of_rows = data['rows']

data_rows = []
for row in list_of_rows:
# Remove the id field because the primary key has auto_id enabled.
del row['id']
# Other keys except the title and title_vector fields in the row
# will be treated as dynamic fields.
data_rows.append(row)

Insert data

Then you can safely insert the data into the collection.

# 7. Insert data
result = collection.insert(data_rows)
collection.flush()

print(f"Data inserted successfully! Inserted counts: {result.insert_count}")

# Output
#
# Data inserted successfully! Inserted counts: 5979

Search with dynamic fields

If you have created medium_articles_with_dynamic with dynamic field enabled, and inserted data with dynamic fields into, index, and load the collection, you can use dynamic fields in the filter expression of a search or a query as follows:

# 8. Search data
result = collection.search(
data=[data_rows[0]['title_vector']],
anns_field="title_vector",
param={"metric_type": "L2", "params": {"nprobe": 10}},
limit=3,
# Access dynamic fields in the boolean expression
expr='claps > 30 and reading_time < 10',
# Include dynamic fields in the output to return
output_fields=["title", "reading_time", "claps"],
)

result = [ list(map(lambda y: y.entity.to_dict(), x)) for x in result ]

print(result)

# Output
#
# [
# [
# {
# "id": 443943328732915404,
# "distance": 0.36103835701942444,
# "entity": {
# "title": "The Hidden Side Effect of the Coronavirus",
# "reading_time": 8,
# "claps": 83
# }
# },
# {
# "id": 443943328732915438,
# "distance": 0.37674015760421753,
# "entity": {
# "title": "Why The Coronavirus Mortality Rate is Misleading",
# "reading_time": 9,
# "claps": 2900
# }
# },
# {
# "id": 443943328732913238,
# "distance": 0.4162980318069458,
# "entity": {
# "title": "Coronavirus shows what ethical Amazon could look like",
# "reading_time": 4,
# "claps": 51
# }
# }
# ]
# ]

# get collection info
print("Entity counts: ", collection.num_entities)

# Output
#
# Entity counts: 5979

It is worth noting that claps and reading_time are not present when you define the schema, which does not prevent you from using them in the filter expression and including them in the output fields if the data entities inserted have these fields, just like you normally do in the past.

If the key of a dynamic field contains characters other than digits, letters, and underscores (e.g. plus signs, asterisks, or dollar signs), you need to include the key within $meta[] as shown in the following code snippet when using it in a boolean expression or including it in the output fields.

... 
expr='$meta["#key"] in ["a", "b", "c"]',
output_fields='$meta["#key"]'
...