Skip to main content
Version: User Guides (Cloud)

Primary Field & AutoID

Every collection in Zilliz Cloud must have a primary field to uniquely identify each entity. This field ensures that every entity can be inserted, updated, queried, or deleted without ambiguity.

Depending on your use case, you can either let Zilliz Cloud automatically generate IDs (AutoID) or assign your own IDs manually.

What is a primary field?

A primary field acts as the unique key for each entity in a collection, similar to a primary key in a traditional database. Zilliz Cloud uses the primary field to manage entities during insert, upsert, delete, and query operations.

Key requirements:

  • Each collection must have exactly one primary field.

  • Primary field values cannot be null.

  • The data type must be specified at creation and cannot be changed later.

Supported data types

The primary field must use a supported scalar data type that can uniquely identify entities.

Data Type

Description

INT64

64-bit integer type, commonly used with AutoID. This is the recommended option for most use cases.

VARCHAR

Variable-length string type. Use this when entity identifiers come from external systems (for example, product codes or user IDs). Requires the max_length property to define the maximum number of bytes allowed per value.

Choose between AutoID and Manual IDs

Zilliz Cloud supports two modes for assigning primary key values.

Mode

Description

Recommended For

AutoID

Zilliz Cloud automatically generates unique identifiers for inserted or imported entities.

Most scenarios where you don’t need to manage IDs manually.

Manual ID

You provide unique IDs yourself when inserting or importing data.

When IDs must align with external systems or pre-existing datasets.

📘Notes

If you are unsure which mode to choose, start with AutoID for simpler ingestion and guaranteed uniqueness.

Quickstart: Use AutoID

You can let Zilliz Cloud handle ID generation automatically.

Step 1: Create a collection with AutoID

Enable auto_id=True in your primary field definition. Zilliz Cloud will handle ID generation automatically.

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

schema = client.create_schema()

# Define primary field with AutoID enabled
schema.add_field(
field_name="id", # Primary field name
is_primary=True,
auto_id=True, # Milvus generates IDs automatically; Defaults to False
datatype=DataType.INT64
)

# Define the other fields
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=4) # Vector field
schema.add_field(field_name="category", datatype=DataType.VARCHAR, max_length=1000) # Scalar field of the VARCHAR type

# Create the collection
if client.has_collection("demo_autoid"):
client.drop_collection("demo_autoid")
client.create_collection(collection_name="demo_autoid", schema=schema)

Step 2: Insert Data

Important: Do not include the primary field column in your data. Zilliz Cloud generates IDs automatically.

data = [
{"embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
]

res = client.insert(collection_name="demo_autoid", data=data)
print("Generated IDs:", res.get("ids"))

# Output example:
# Generated IDs: [461526052788333649, 461526052788333650]
📘Notes

Use upsert() instead of insert() when working with existing entities to avoid duplicate ID errors.

Use manual IDs

If you need to control IDs manually, disable AutoID and provide your own values.

Step 1: Create a collection without AutoID

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

schema = client.create_schema()

# Define the primary field without AutoID
schema.add_field(
field_name="product_id",
is_primary=True,
auto_id=False, # You'll provide IDs manually at data ingestion
datatype=DataType.VARCHAR,
max_length=100 # Required when datatype is VARCHAR
)

# Define the other fields
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=4) # Vector field
schema.add_field(field_name="category", datatype=DataType.VARCHAR, max_length=1000) # Scalar field of the VARCHAR type

# Create the collection
if client.has_collection("demo_manual_ids"):
client.drop_collection("demo_manual_ids")
client.create_collection(collection_name="demo_manual_ids", schema=schema)

Step 2: Insert data with your IDs

You must include the primary field column in every insert operation.

# Each entity must contain the primary field `product_id`
data = [
{"product_id": "PROD-001", "embedding": [0.1, 0.2, 0.3, 0.4], "category": "book"},
{"product_id": "PROD-002", "embedding": [0.2, 0.3, 0.4, 0.5], "category": "toy"},
]

res = client.insert(collection_name="demo_manual_ids", data=data)
print("Generated IDs:", res.get("ids"))

# Output example:
# Generated IDs: ['PROD-001', 'PROD-002']

Your responsibilities:

  • Ensure all IDs are unique across all entities

  • Include the primary field in every insert/import operation

  • Handle ID conflicts and duplicate detection yourself