Skip to main content
Version: User Guides (Cloud)

Use Partition Key

This guide walks you through using the partition key to accelerate data retrieval from your collection.

Overview

The partition key in Zilliz Cloud allows for the distribution of incoming entities into different partitions based on their respective partition key values. This allows entities with the same key value to be grouped together in a partition, which in turn accelerates search performance by avoiding the need to scan irrelevant partitions when filtering by the key field. Compared to traditional filtering methods, the partition key can greatly enhance query performance.

You can use the partition key to implement multi-tenancy. For details on multi-tenancy, read Multi-tenancy for more.

Before you start

Before creating a collection, ensure that

  • You have a blueprint of your data model (i.e. schema). For details, see Schema Explained.

  • You have created a dedicated cluster. For details, see Create Cluster.

  • You have downloaded the example dataset. For details, see Example Dataset.

Enable partition key

To demonstrate the use of partition keys, we will continue to use the example dataset that contains over 5,000 articles, and the publication field will serve as the partition key.

import json, time
from pymilvus import connections, FieldSchema, CollectionSchema, DataType, Collection, utility

CLUSTER_ENDPOINT="YOUR_CLUSTER_ENDPOINT" # Set your cluster endpoint
TOKEN="YOUR_CLUSTER_TOKEN" # Set your token
COLLECTION_NAME="medium_articles_2020" # Set your collection name
DATASET_PATH="{}/../medium_articles_2020_dpr.json".format(os.path.dirname(__file__)) # Set your dataset path

# 1. Connect to cluster
client = MilvusClient(
uri=CLUSTER_ENDPOINT,
token=TOKEN
)

# 2. Define collection schema
schema = MilvusClient.create_schema(
auto_id=True,
partition_key_field="publication"
)

schema.add_field(field_name="id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="title", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="title_vector", datatype=DataType.FLOAT_VECTOR, dim=768)
schema.add_field(field_name="link", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="reading_time", datatype=DataType.INT64)
schema.add_field(field_name="publication", datatype=DataType.VARCHAR, max_length=512)
schema.add_field(field_name="claps", datatype=DataType.INT64)
schema.add_field(field_name="responses", datatype=DataType.INT64)

After you have defined the fields, set other necessary parameters.

# 3. Define index parameters
index_params = MilvusClient.prepare_index_params()

index_params.add_index(
field_name="title_vector",
index_type="AUTOINDEX",
metric_type="L2"
)

Finally, you can create a collection.

# 4. Create a collection
client.create_collection(
collection_name=COLLECTION_NAME,
schema=schema,
index_params=index_params
)

Insert data

Once the collection is ready, start inserting data as follows:

Prepare data

with open(DATASET_PATH) as f:
data = json.load(f)
list_of_rows = data['rows']

data_rows = []
for row in list_of_rows:
# Remove the id field because the primary key has auto_id enabled.
del row['id']
# Other keys except the title and title_vector fields in the row
# will be treated as dynamic fields.
data_rows.append(row)

Insert data

# 7. Insert data
res = client.insert(
collection_name=COLLECTION_NAME,
data=data_rows,
)

# Output
#
# {
# "insert_count": 5979
# }

time.sleep(5000)

Use partition key

Once you have indexed and loaded the collection as well as inserted data, you can conduct a similarity search using the partition key.

📘Notes

To conduct a similarity search using the partition key, you should include either of the following in the boolean expression of the search request:

  • expr='<partition_key>=="xxxx"'

  • expr='<partition_key> in ["xxx", "xxx"]'

Do replace <partition_key> with the name of the field that is designated as the partition key.

res = client.search(
collection_name=COLLECTION_NAME,
data=[data_rows[0]['title_vector']],
filter='claps > 30 and reading_time < 10',
limit=3,
output_fields=["title", "reading_time", "claps"],
search_params={"metric_type": "L2", "params": {}}
)

print(result)

Use cases

To achieve better search performance and enable multi-tenancy, you can utilize the partition key feature. This can be done by assigning a tenant-specific value as the partition key field for each entity. When searching or querying the collection, you can filter entities by the tenant-specific value by including the partition key field in the boolean expression. This approach ensures data isolation by tenants and avoids scanning unnecessary partitions.