Skip to main content
Version: User Guides (BYOC)

Use Partition Key

The Partition Key is a search optimization solution based on partitions. By designating a specific scalar field as the Partition Key and specifying filtering conditions based on the Partition Key during the search, the search scope can be narrowed down to several partitions, thereby improving search efficiency. This article will introduce how to use the Partition Key and related considerations.

Overview

In Zilliz Cloud, you can use partitions to implement data segregation and improve search performance by restricting the search scope to specific partitions. If you choose to manage partitions manually, you can create a maximum of 1,024 partitions in a collection, and insert entities into these partitions based on a specific rule so that you can narrow the search scope by restricting searches within a specific number of partitions.

Zilliz Cloud introduces the Partition Key for you to reuse partitions in data segregation to overcome the limit on the number of partitions you can create in a collection. When creating a collection, you can use a scalar field as the Partition Key. Once the collection is ready, Zilliz Cloud creates the specified number of partitions inside the collection with each partition corresponding to a range of the values in the Partition Key. Upon receiving inserted entities, Zilliz Cloud stores them into different partitions based on their Partition Key values.

IXXIwZdOYhRFXmbTMdwcaN6fnPe

The following figure illustrates how Zilliz Cloud processes the search requests in a collection with or without the Partition Key feature enabled.

  • If the Partition Key is disabled, Zilliz Cloud searches for entities that are the most similar to the query vector within the collection. You can narrow the search scope if you know which partition contains the most relevant results.

  • If the Partition Key is enabled, Zilliz Cloud determines the search scope based on the Partition Key value specified in a search filter and scans only the entities within the partitions that match.

RTaqwdaWXhRWPTb4uJTc9Uknn5c

Use Partition Key

To use the Partition Key, you need to

  • Set the Partition Key,

  • Set the number of partitions to create (Optional), and

  • Create a filtering condition based on the Partition Key.

Set Partition Key

To designate a scalar field as the Partition Key, you need to set its is_partition_key attribute to true when you add the scalar field.

from pymilvus import (
MilvusClient, DataType
)

client = MilvusClient(
uri="YOUR_CLUSTER_ENDPOINT",
token="YOUR_CLUSTER_TOKEN"
)

schema = client.create_schema()

# Add the partition key
schema.add_field(
field_name="my_varchar",
datatype=DataType.VARCHAR,
max_length=512,
is_partition_key=True,
)

Set Partition Numbers

When you designate a scalar field in a collection as the Partition Key, Zilliz Cloud automatically creates 16 partitions in the collection. Upon receiving an entity, Zilliz Cloud chooses a partition based on the Partition Key value of this entity and stores the entity in the partition, resulting in some or all partitions holding entities with different Partition Key values.

You can also determine the number of partitions to create along with the collection. This is valid only if you have a scalar field designated as the Partition Key.

client.create_collection(
collection_name="my_collection",
schema=schema,
num_partitions=1024
)

Create Filtering Condition

When conducting ANN searches in a collection with the Partition Key feature enabled, you need to include a filtering expression involving the Partition Key in the search request. In the filtering expression, you can restrict the Partition Key value within a specific range so that Zilliz Cloud restricts the search scope within the corresponding partitions.

The following examples demonstrate Partition-Key-based filtering based on a specific Partition Key value and a set of Partition Key values.

# Filter based on a single partition key value, or
filter='partition_key == "x" && <other conditions>'

# Filter based on multiple partition key values
filter='partition_key in ["x", "y", "z"] && <other conditions>'

Use Partition Key Isolation

In the multi-tenancy scenario, you can designate the scalar field related to tenant identities as the partition key and create a filter based on a specific value in this scalar field. To further improve search performance in similar scenarios, Zilliz Cloud introduces the Partition Key Isolation feature.

BVotwv5BvhBWXXbvotUccowZnng

As shown in the above figure, Zilliz Cloud groups entities based on the Partition Key value and creates a separate index for each of these groups. Upon receiving a search request, Zilliz Cloud locates the index based on the Partition Key value specified in the filtering condition and restricts the search scope within the entities included in the index, thus avoiding scanning irrelevant entities during the search and greatly enhancing the search performance.

Once you have enabled Partition Key Isolation, you can include only a specific value in the Partition-key-based filter so that Zilliz Cloud can restrict the search scope within the entities included in the index that match.

📘Notes

Currently, the Partition-Key Isolation feature applies only to Performance-optimized clusters.

Enable Partition Key Isolation

The following code examples demonstrate how to enable Partition Key Isolation.

client.create_collection(
collection_name="my_collection",
schema=schema,
properties={"partitionkey.isolation": True}
)

Once you have enabled Partition Key Isolation, you can still set the Partition Key and number of partitions as described in Use Partition Key. Note that the Partition-Key-based filter should include only a specific Partition Key value.