Skip to main content
Version: User Guides (BYOC)

Upsert Entities

The upsert operation provides a convenient way to insert or update entities in a collection.

Overview

You can use upsert to either insert a new entity or update an existing one, depending on whether the primary key provided in the upsert request exists in the collection. If the primary key is not found, an insert operation occurs. Otherwise, an update operation will be performed.

An upsert request combines an insert and a delete. When an upsert request for an existing entity is received, Zilliz Cloud inserts the data carried in the request payload and deletes the existing entity with the original primary key specified in the data at the same time.

Q3LawAQIKht1FKbsM3EcoQAHnvc

If the target collection has autoid enabled on its primary field, Zilliz Cloud will generate a new primary key for the data carried in the request payload before inserting it.

For fields with nullable enabled, you can omit them in the upsert request if they do not require any updates.

Upsert in merge mode

You can also use the partial_update flag to make an upsert request work in merge mode. This allows you to include only the fields that need updating in the request payload.

NZNKwxm9ahmi87b487TcuCrNn4c

To perform a merge, set partial_update to True in the upsert request along with the primary key and the fields to update with their new values.

Upon receiving such a request, Zilliz Cloud performs a query with strong consistency to retrieve the entity, updates the field values based on the data in the request, inserts the modified data, and then deletes the existing entity with the original primary key carried in the request.

For ARRAY fields, merge mode supports two operators: ARRAY_APPEND and ARRAY_REMOVE. These operators let you append elements to or remove matching elements from an existing ARRAY field, without first querying the entity to retrieve its current value. For details, see Upsert ARRAY fields with partial-update operators.

Update field values

To update the field values of an existing entity, use upsert in merge mode. In this mode, only the fields included in the request are updated — all other fields retain their existing values.

Upsert behaviors: special notes

There are several special notes you should consider before using the merge feature. The following cases assume that you have a collection with two scalar fields named title and issue, along with a primary key id and a vector field called vector.

  • Upsert fields with nullable enabled.

    Suppose that the issue field can be null. When you upsert these fields, note that:

    • If you omit the issue field in the upsert request and disable partial_update, the issue field will be updated to null instead of retaining its original value.

    • To preserve the original value of the issue field, you need either to enable partial_update and omit the issue field or include the issue field with its original value in the upsert request.

  • Upsert keys in the dynamic field.

    Suppose that you have enabled the dynamic key in the example collection, and the key-value pairs in the dynamic field of an entity are similar to {"author": "John", "year": 2020, "tags": ["fiction"]}.

    When you upsert the entity with keys, such as author, year, or tags, or add other keys, note that:

    • If you upsert with partial_update disabled, the default behavior is to override. It means that the value of the dynamic field will be overridden by all non-schema-defined fields included in the request and their values.

      For example, if the data included in the request is {"author": "Jane", "genre": "fantasy"}, the key-value pairs in the dynamic field of the target entity will be updated to that.

    • If you upsert with partial_update enabled, the default behavior is to merge. It means that the value of the dynamic field will merge with all non-schema-defined fields included in the request and their values.

      For example, if the data included in the request is {"author": "John", "year": 2020, "tags": ["fiction"]}, the key-value pairs in the dynamic field of the target entity will become {"author": "John", "year": 2020, "tags": ["fiction"], "genre": "fantasy"} after the upsert.

  • Upsert a JSON field.

    Suppose that the example collection has a schema-defined JSON field named extras, and the key-value pairs in this JSON field of an entity are similar to {"author": "John", "year": 2020, "tags": ["fiction"]}.

    When you upsert the extras field of an entity with modified JSON data, note that the JSON field is treated as a whole, and you cannot update individual keys selectively. In other words, the JSON field DOES NOT support upsert in merge mode.

  • Upsert an ARRAY field.

    By default, an ARRAY field in merge mode follows REPLACE semantics: the value carried in the request overwrites the existing array. For finer-grained updates, Zilliz Cloud also supports two operators:

    • ARRAY_APPEND appends the elements in the request payload to the existing array.

    • ARRAY_REMOVE removes every element from the existing array that matches a value in the request payload.

    For operator syntax, supported element types, and other constraints, see Upsert array fields with partial-update operators.

Limits & Restrictions

Based on the above content, there are several limits and restrictions to follow:

  • The upsert request must always include the primary keys of the target entities.

  • The target collection must be loaded and available for queries.

  • All fields specified in the request must exist in the schema of the target collection.

  • The values of all fields specified in the request must match the data types defined in the schema.

  • For any field derived from another using functions, Zilliz Cloud will remove the derived field during the upsert to allow recalculation.

Upsert entities in a collection

In this section, we will upsert entities into a collection named my_collection. This collection has only two fields, named id, vector, title, and issue. The id field is the primary field, while the title and issue fields are scalar fields.

The three entities, if exists in the collection, will be overridden by those included the upsert request.

from pymilvus import MilvusClient

client = MilvusClient(
uri="YOUR_CLUSTER_ENDPOINT",
token="YOUR_CLUSTER_TOKEN"
)

data=[
{
"id": 0,
"vector": [-0.619954382375778, 0.4479436794798608, -0.17493894838751745, -0.4248030059917294, -0.8648452746018911],
"title": "Artificial Intelligence in Real Life",
"issue": "vol.12"
}, {
"id": 1,
"vector": [0.4762662251462588, -0.6942502138717026, -0.4490002642657902, -0.628696575798281, 0.9660395877041965],
"title": "Hollow Man",
"issue": "vol.19"
}, {
"id": 2,
"vector": [-0.8864122635045097, 0.9260170474445351, 0.801326976181461, 0.6383943392381306, 0.7563037341572827],
"title": "Treasure Hunt in Missouri",
"issue": "vol.12"
}
]

res = client.upsert(
collection_name='my_collection',
data=data
)

print(res)

# Output
# {'upsert_count': 3}

Upsert entities in a partition

You can also upsert entities into a specified partition. The following code snippets assume that you have a partition named PartitionA in your collection.

The three entities, if exists in the partition, will be overridden by those included in the request.

data=[
{
"id": 10,
"vector": [0.06998888224297328, 0.8582816610326578, -0.9657938677934292, 0.6527905683627726, -0.8668460657158576],
"title": "Layour Design Reference",
"issue": "vol.34"
},
{
"id": 11,
"vector": [0.6060703043917468, -0.3765080534566074, -0.7710758854987239, 0.36993888322346136, 0.5507513364206531],
"title": "Doraemon and His Friends",
"issue": "vol.2"
},
{
"id": 12,
"vector": [-0.9041813104515337, -0.9610546012461163, 0.20033003106083358, 0.11842506351635174, 0.8327356724591011],
"title": "Pikkachu and Pokemon",
"issue": "vol.12"
},
]

res = client.upsert(
collection_name="my_collection",
data=data,
partition_name="partitionA"
)

print(res)

# Output
# {'upsert_count': 3}

Upsert entities in merge mode

The following code example demonstrates how to upsert entities with partial updates. Provide only the fields needing updates and their new values, along with the explicit partial update flag.

In the following example, the issue field of the entities specified in the upsert request will be updated to the values included in the request.

📘Notes

When performing an upsert in merge mode, ensure that the entities involved in the request have the same set of fields. Suppose there are two or more entities to be upserted, as shown in the following code snippet, it is important that they include identical fields to prevent errors and maintain data integrity.

data=[
{
"id": 1,
"issue": "vol.14"
},
{
"id": 2,
"issue": "vol.7"
}
]

res = client.upsert(
collection_name="my_collection",
data=data,
partial_update=True
)

print(res)

# Output
# {'upsert_count': 2}

Upsert ARRAY fields with partial-update operators

Before introducing partial-update operators (ARRAY_APPEND and ARRAY_REMOVE), updating part of an ARRAY field required a client-side read-modify-write flow: query the existing array, change it in application code, and upsert the full replacement value. Partial-update operators let you send only the elements to append or remove, which reduces client-side logic and avoids the extra read before the upsert.

Suppose the entity with primary key 1 already has tags = ["new", "trial"]. Before partial-update operators, adding element "premium" to an array required upserting the full replacement array:

client.upsert(
collection_name="users",
data=[{"pk": 1, "tags": ["new", "trial", "premium"]}],
partial_update=True,
)

With ARRAY_APPEND, send only the element to add:

client.upsert(
collection_name="users",
data=[{"pk": 1, "tags": ["premium"]}],
field_ops={"tags": FieldOp.array_append()},
)
📘Notes

Attaching either operator to a field via field_ops implicitly enables partial-update semantics. Therefore, you do not need to pass partial_update=True alongside field_ops.

Limits

  • The payload values must match the element_type of the target ARRAY field. For example, if the target field is ARRAY<VARCHAR>, the payload must contain string values.

  • For this release, ARRAY_APPEND and ARRAY_REMOVE support ARRAY fields whose element_type is BOOL, INT8, INT16, INT32, INT64, FLOAT, DOUBLE, or VARCHAR.

  • After an ARRAY_APPEND operation, the resulting array length must not exceed the field's max_capacity.

  • Concurrent upserts to the same entity are not atomic across requests. If two requests update the same ARRAY field at the same time, the later write can overwrite the earlier one. Use application-level coordination if you need to preserve all concurrent changes.

Example

The following example uses a small users collection with a primary key pk, a tags field of type ARRAY<VARCHAR>, and an embedding vector field. It first inserts two entities with initial tags values, then uses ARRAY_APPEND and ARRAY_REMOVE to show how each operator changes the stored array.

from pymilvus import DataType, FieldOp, MilvusClient

client = MilvusClient(
uri="YOUR_CLUSTER_ENDPOINT",
token="YOUR_CLUSTER_TOKEN"
)

# 1. Create a collection with an ARRAY<VARCHAR> field
schema = client.create_schema(enable_dynamic_field=False)
schema.add_field("pk", DataType.INT64, is_primary=True)
schema.add_field("embedding", DataType.FLOAT_VECTOR, dim=5)
schema.add_field(
"tags",
DataType.ARRAY,
element_type=DataType.VARCHAR,
max_capacity=8,
max_length=32,
)

index_params = client.prepare_index_params()
index_params.add_index(
field_name="embedding",
index_type="AUTOINDEX",
metric_type="L2",
)

client.create_collection(
collection_name="users",
schema=schema,
index_params=index_params
)

# 2. Seed two entities
client.insert(
collection_name="users",
data=[
{"pk": 1, "embedding": [0.1, 0.2, 0.3, 0.4, 0.5], "tags": ["new"]},
{"pk": 2, "embedding": [0.6, 0.7, 0.8, 0.9, 1.0], "tags": ["new", "trial"]},
],
)

# 3. Append tags without reading the existing ARRAY values
client.upsert(
collection_name="users",
data=[
{"pk": 1, "tags": ["premium", "vip"]},
{"pk": 2, "tags": ["premium"]},
],
field_ops={"tags": FieldOp.array_append()},
)

res = client.query(
collection_name="users",
filter="pk in [1, 2]",
output_fields=["pk", "tags"],
)
print(res)

# Example output:
# data: [
# "{'pk': 1, 'tags': ['new', 'premium', 'vip']}",
# "{'pk': 2, 'tags': ['new', 'trial', 'premium']}"
# ]

# 4. Remove matching tags without replacing the full ARRAY field
client.upsert(
collection_name="users",
data=[
{"pk": 1, "tags": ["new"]},
{"pk": 2, "tags": ["trial"]},
],
field_ops={"tags": FieldOp.array_remove()},
)

res = client.query(
collection_name="users",
filter="pk in [1, 2]",
output_fields=["pk", "tags"],
)
print(res)

# Example output:
# data: [
# "{'pk': 1, 'tags': ['premium', 'vip']}",
# "{'pk': 2, 'tags': ['new', 'premium']}"
# ]