Skip to main content
Version: User Guides (Cloud)

Use Partition Key

This guide walks you through using the partition key to accelerate data retrieval from your collection.

Overview

You can set a particular field in a collection as the partition key so that Zilliz Cloud distributes incoming entities into different partitions according to their respective partition values in this field. This allows entities with the same key value to be grouped in a partition, accelerating search performance by avoiding the need to scan irrelevant partitions when filtering by the key field. When compared to traditional filtering methods, the partition key can greatly enhance query performance.

You can use the partition key to implement multi-tenancy. For details on multi-tenancy, read Multi-tenancy for more.

Enable partition key

The following snippet demonstrates how to set a field as the partition key.

In the example code below, num_partitions determines the number of partitions that will be created. By default, it is set to 64. We recommend you retain the default value.

import random, time
from pymilvus import connections, MilvusClient, DataType

CLUSTER_ENDPOINT = "YOUR_CLUSTER_ENDPOINT"
TOKEN = "YOUR_CLUSTER_TOKEN"

# 1. Set up a Milvus client
client = MilvusClient(
uri=CLUSTER_ENDPOINT,
token=TOKEN
)

# 2. Create a collection
schema = MilvusClient.create_schema(
auto_id=False,
enable_dynamic_field=True,
partition_key_field="color",
num_partitions=64 # Number of partitions. Defaults to 64.
)

schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=5)
schema.add_field(field_name="color", datatype=DataType.VARCHAR, max_length=512)

After you have defined the fields, set up the index parameters.

index_params = MilvusClient.prepare_index_params()

index_params.add_index(
field_name="id",
index_type="STL_SORT"
)

index_params.add_index(
field_name="color",
index_type="Trie"
)

index_params.add_index(
field_name="vector",
index_type="IVF_FLAT",
metric_type="L2",
params={"nlist": 1024}
)

Finally, you can create a collection.

client.create_collection(
collection_name="test_collection",
schema=schema,
index_params=index_params
)

List partitions

Once a field of a collection is used as the partition key, Zilliz Cloud creates the specified number of partitions and manages them on your behalf. Therefore, you cannot manipulate the partitions in this collection anymore.

The following snippet demonstrates that 64 partitions in a collection once one of its fields is used as the partition key.

# 2.1. List all partitions in the collection
partition_names = client.list_partitions(
collection_name="test_collection"
)

print(partition_names)

# Output
#
# [
# "_default_0",
# "_default_1",
# "_default_2",
# "_default_3",
# "_default_4",
# "_default_5",
# "_default_6",
# "_default_7",
# "_default_8",
# "_default_9",
# "(54 more items hidden)"
# ]

Insert data

Once the collection is ready, start inserting data as follows:

Prepare data

# 3. Insert randomly generated vectors 
colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"]
data = []

for i in range(1000):
current_color = random.choice(colors)
current_tag = random.randint(1000, 9999)
data.append({
"id": i,
"vector": [ random.uniform(-1, 1) for _ in range(5) ],
"color": current_color,
"tag": current_tag,
"color_tag": f"\{current_color}_{str(current_tag)}"
})

print(data[0])

You can view the structure of the generated data by checking its first entry.

{
id: 0,
vector: [
0.1275656405044483,
0.47417858592773277,
0.13858264437643286,
0.2390904907020377,
0.8447862593689635
],
color: 'blue',
tag: 2064,
color_tag: 'blue_2064'
}

Insert data

res = client.insert(
collection_name="test_collection",
data=data
)

print(res)

# Output
#
# {
# "insert_count": 1000,
# "ids": [
# 0,
# 1,
# 2,
# 3,
# 4,
# 5,
# 6,
# 7,
# 8,
# 9,
# "(990 more items hidden)"
# ]
# }

Use partition key

Once you have indexed and loaded the collection as well as inserted data, you can conduct a similarity search using the partition key.

📘Notes

To conduct a similarity search using the partition key, you should include either of the following in the boolean expression of the search request:

  • filter='<partition_key>=="xxxx"'

  • filter='<partition_key> in ["xxx", "xxx"]'

Do replace <partition_key> with the name of the field that is designated as the partition key.

# 4. Search with partition key
query_vectors = [[0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]]

res = client.search(
collection_name="test_collection",
data=query_vectors,
filter="color == 'green'",
search_params={"metric_type": "L2", "params": {"nprobe": 10}},
output_fields=["id", "color_tag"],
limit=3
)

print(res)

# Output
#
# [
# [
# {
# "id": 970,
# "distance": 0.5770174264907837,
# "entity": {
# "id": 970,
# "color_tag": "green_9828"
# }
# },
# {
# "id": 115,
# "distance": 0.6898155808448792,
# "entity": {
# "id": 115,
# "color_tag": "green_4073"
# }
# },
# {
# "id": 899,
# "distance": 0.7028976678848267,
# "entity": {
# "id": 899,
# "color_tag": "green_9897"
# }
# }
# ]
# ]

Typical use cases

You can utilize the partition key feature to achieve better search performance and enable multi-tenancy. This can be done by assigning a tenant-specific value as the partition key field for each entity. When searching or querying the collection, you can filter entities by the tenant-specific value by including the partition key field in the boolean expression. This approach ensures data isolation by tenants and avoids scanning unnecessary partitions.