Skip to main content
Version: User Guides (BYOC)

Manage Collections (SDKs)

This guide walks you through creating and managing collections using the SDK of your choice, which is more flexible and customizable when compared with the operations on the intuitive Web UI of the Zilliz Cloud console.

Before you start

  • You have created a cluster. To create a cluster, refer to Create Cluster (Dev).

  • You have installed the SDK of your choice. To install an SDK, refer to Install SDKs.

Overview

On Zilliz Cloud, you store your vector embeddings in collections. All vector embeddings within a collection share the same dimensionality and distance metric for measuring similarity.

Zilliz Cloud collections support dynamic fields (i.e., fields not pre-defined in the schema) and automatic incrementation of primary keys.

To accommodate different preferences, Zilliz Cloud offers two methods for creating a collection. One provides a quick setup, while the other allows for detailed customization of the collection schema and index parameters.

Additionally, you can view, load, release, and drop a collection when necessary.

Create Collection

You can create a collection in one of the following ways:

  • Quick setup

    In this manner, you can create a collection by simply giving it a name and specifying the number of dimensions of the vector embeddings to be stored in this collection. For details, refer to Quick setup.

  • Customized setup

    Instead of letting Zilliz Cloud decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own. For details, refer to Customized setup.

  • With multiple vector fields

    Zilliz Cloud enables multi-vector support, allowing you to add a maximum of 4 vector fields per collection.

Quick setup

Against the backdrop of the great leap in the AI industry, most developers just need a simple yet dynamic collection to start with. Zilliz Cloud allows a quick setup of such a collection with just three arguments:

  • Name of the collection to create,

  • Dimension of the vector embeddings to insert, and

  • Metric type used to measure similarities between vector embeddings.

from pymilvus import MilvusClient, DataType

CLUSTER_ENDPOINT = "YOUR_CLUSTER_ENDPOINT"
TOKEN = "YOUR_CLUSTER_TOKEN"

# 1. Set up a Milvus client
client = MilvusClient(
uri=CLUSTER_ENDPOINT,
token=TOKEN
)

# 2. Create a collection in quick setup mode
client.create_collection(
collection_name="quick_setup",
dimension=5 # The dimension value should be an integer greater than 1.
)

res = client.get_load_state(
collection_name="quick_setup"
)

print(res)

# Output
#
# {
# "state": "<LoadState: Loaded>"
# }

The collection generated in the above code contains only two fields: id (as the primary key) and vector (as the vector field), with auto_id and enable_dynamic_field settings enabled by default.

  • auto_id

    Enabling this setting ensures that the primary key increments automatically. There's no need for manual provision of primary keys during data insertion.

  • enable_dynamic_field

    When enabled, all fields, excluding id and vector in the data to be inserted, are treated as dynamic fields. These additional fields are saved as key-value pairs within a special field named $meta. This feature allows the inclusion of extra fields during data insertion.

The automatically indexed and loaded collection from the provided code is ready for immediate data insertions.

Customized setup

Instead of letting Zilliz Cloud decide almost everything for your collection, you can determine the schema and index parameters of the collection on your own.

Step 1: Set up schema

A schema defines the structure of a collection. Within the schema, you have the option to enable or disable enable_dynamic_field, add pre-defined fields, and set attributes for each field. For a detailed explanation of the concept and available data types, refer to Schema Explained.

# 3. Create a collection in customized setup mode

# 3.1. Create schema
schema = MilvusClient.create_schema(
auto_id=False,
enable_dynamic_field=True,
)

# 3.2. Add fields to schema
schema.add_field(field_name="my_id", datatype=DataType.INT64, is_primary=True)
# The dim value should be an integer greater than 1.
schema.add_field(field_name="my_vector", datatype=DataType.FLOAT_VECTOR, dim=5)

In the provided code snippet for Python, the enable_dynamic_field is set to True, and auto_id is enabled for the primary key. Additionally, a vector field is introduced, configured with a dimensionality of 768, along with the inclusion of four scalar fields, each with its respective attributes. The dimensionality of a valid vector field should be greater than 1.

Step 2: Set up index parameters

Index parameters dictate how Zilliz Cloud organizes your data within a collection. You can tailor the indexing process for specific fields by adjusting their metric_type and index_type. On Zilliz Cloud, the recommended index type is always AUTOINDEX. For the vector field, you have the flexibility to select COSINE, L2, or IP as the metric_type. For additional insights into index types, refer to AUTOINDEX Explained.

# 3.3. Prepare index parameters
index_params = client.prepare_index_params()

# 3.4. Add indexes
index_params.add_index(
field_name="my_id",
index_type="STL_SORT"
)

index_params.add_index(
field_name="my_vector",
index_type="AUTOINDEX",
metric_type="IP"
)

The code snippet above demonstrates how to set up index parameters for the vector field and a scalar field, respectively. For the vector field, set both the metric type and the index type. For a scalar field, set only the index type. It is recommended to create an index for the vector field and any scalar fields that are frequently used for filtering.

Step 3: Create the collection

You have the option to create a collection and an index file separately or to create a collection with the index loaded simultaneously upon creation.

  • Create a collection with the index loaded simultaneously upon creation.

    # 3.5. Create a collection with the index loaded simultaneously
    client.create_collection(
    collection_name="customized_setup_1",
    schema=schema,
    index_params=index_params
    )

    time.sleep(5)

    res = client.get_load_state(
    collection_name="customized_setup_1"
    )

    print(res)

    # Output
    #
    # {
    # "state": "<LoadState: Loaded>"
    # }

    The collection created above is loaded automatically. To learn more about loading and releasing a collection, refer to Load & Release Collection.

  • Create a collection and an index file separately.

    # 3.6. Create a collection and index it separately
    client.create_collection(
    collection_name="customized_setup_2",
    schema=schema,
    )

    res = client.get_load_state(
    collection_name="customized_setup_2"
    )

    print(res)

    # Output
    #
    # {
    # "state": "<LoadState: NotLoad>"
    # }

    The collection created above is not loaded automatically. You can create an index for the collection as follows. Creating an index for the collection in a separate manner does not automatically load the collection. For details, refer to Load & Release Collection.

    # 3.6 Create index
    client.create_index(
    collection_name="customized_setup_2",
    index_params=index_params
    )

    res = client.get_load_state(
    collection_name="customized_setup_2"
    )

    print(res)

    # Output
    #
    # {
    # "state": "<LoadState: NotLoad>"
    # }

With multiple vector fields

The process for creating a collection with multiple vector fields keeps consistent with that for customized setup. To create a collection with multiple vector fields (up to 4), you need to define the configuration of all the vector fields you want to store in the collection. Each vector field in the collection has its own name and distant metric type used to measure how similar the entities are. For more information on vector data types and metrics, refer to Similarity Metrics Explained and Schema Explained.

The example below defines two vector fields, text_vector and image_vector, in the collection schema.

# Create a collection with multiple vector fields

schema = client.create_schema(
auto_id=False,
enable_dynamic_field=True,
)

# Add primary key field to schema
schema.add_field(field_name="my_id", datatype=DataType.INT64, is_primary=True)

# Add vector field 1 to schema
# The dim value should be an integer greater than 1.
# Binary vector dimensions must be a multiple of 8
schema.add_field(field_name="text_vector", datatype=DataType.BINARY_VECTOR, dim=8)

# Add vector field 2 to schema
# The dim value should be an integer greater than 1.
schema.add_field(field_name="image_vector", datatype=DataType.FLOAT_VECTOR, dim=128)

# Output:
# {'auto_id': False, 'description': '', 'fields': [{'name': 'my_id', 'description': '', 'type': <DataType.INT64: 5>, 'is_primary': True, 'auto_id': False}, {'name': 'text_vector', 'description': '', 'type': <DataType.BINARY_VECTOR: 100>, 'params': {'dim': 8}}, {'name': 'image_vector', 'description': '', 'type': <DataType.FLOAT_VECTOR: 101>, 'params': {'dim': 128}}], 'enable_dynamic_field': True}

In the example code above,

  • The create_schema method is used to create a new schema for the collection, with auto_id set to False and dynamic fields enabled.

  • A primary key field my_id of type INT64 is added to the schema.

  • Two vector fields are added: text_vector (a binary vector with a dimension of 8) and image_vector (a float vector with a dimension of 128).

📘Notes

For vector fields of the BINARY_VECTOR type,

  • The dimension value (dim) must be a multiple of 8.

  • The available metric types are HAMMING and JACCARD.

After defining the schema, you can create an index for each vector field separately. The example code below demonstrates how to prepare and add indexes for both vector fields text_vector and image_vector.

# Prepare index parameters

index_params = client.prepare_index_params()

index_params.add_index(
field_name="text_vector",
# In Zilliz Cloud, the index type should always be `AUTOINDEX`.
index_type="AUTOINDEX",
# For vector of the `BINARY_VECTOR` type, use `HAMMING` or `JACCARD` as the metric type.
metric_type="HAMMING"
)

index_params.add_index(
field_name="image_vector",
index_type="AUTOINDEX",
metric_type="IP"
)

client.create_collection(
collection_name="demo_multiple_vector_fields",
schema=schema,
index_params=index_params
)

In the example code above,

  • The prepare_index_params method prepares the parameters for indexing.

  • Indexes are added to both vector fields: text_vector uses HAMMING as the metric type, and image_vector uses IP (Inner Product).

  • The create_collection method creates the collection with the defined schema and index parameters.

📘Notes

For vector fields of the BINARY_VECTOR type,

  • The dimension value (dim) must be a multiple of 8.

  • The available metric types are HAMMING and JACCARD.

View Collections

You can check the details of an existing collection as follows:

# 5. View Collections
res = client.describe_collection(
collection_name="customized_setup_2"
)

print(res)

# Output
#
# {
# "collection_name": "customized_setup_2",
# "auto_id": false,
# "num_shards": 1,
# "description": "",
# "fields": [
# {
# "field_id": 100,
# "name": "my_id",
# "description": "",
# "type": 5,
# "params": {},
# "element_type": 0,
# "is_primary": true
# },
# {
# "field_id": 101,
# "name": "my_vector",
# "description": "",
# "type": 101,
# "params": {
# "dim": 5
# },
# "element_type": 0
# }
# ],
# "aliases": [],
# "collection_id": 448143479230158446,
# "consistency_level": 2,
# "properties": {},
# "num_partitions": 1,
# "enable_dynamic_field": true
# }

To list all existing collections, you can do as follows:

# 6. List all collection names
res = client.list_collections()

print(res)

# Output
#
# [
# "customized_setup_2",
# "quick_setup",
# "customized_setup_1"
# ]

Load & Release Collection

During the loading process of a collection, Zilliz Cloud loads the collection's index file into memory. Conversely, when releasing a collection, Zilliz Cloud unloads the index file from memory. Before conducting searches in a collection, ensure that the collection is loaded.

Load a collection

# 7. Load the collection
client.load_collection(
collection_name="customized_setup_2"
)

res = client.get_load_state(
collection_name="customized_setup_2"
)

print(res)

# Output
#
# {
# "state": "<LoadState: Loaded>"
# }

Release a collection

# 8. Release the collection
client.release_collection(
collection_name="customized_setup_2"
)

res = client.get_load_state(
collection_name="customized_setup_2"
)

print(res)

# Output
#
# {
# "state": "<LoadState: NotLoad>"
# }

Set up aliases

You can assign aliases for collections to make them more meaningful in a specific context. You can assign multiple aliases for a collection, but multiple collections cannot share an alias.

Create aliases

# 9. Manage aliases
# 9.1. Create aliases
client.create_alias(
collection_name="customized_setup_2",
alias="bob"
)

client.create_alias(
collection_name="customized_setup_2",
alias="alice"
)

List aliases

# 9.2. List aliases
res = client.list_aliases(
collection_name="customized_setup_2"
)

print(res)

# Output
#
# {
# "aliases": [
# "bob",
# "alice"
# ],
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }

Describe aliases

# 9.3. Describe aliases
res = client.describe_alias(
alias="bob"
)

print(res)

# Output
#
# {
# "alias": "bob",
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }

Reassign aliases

# 9.4 Reassign aliases to other collections
client.alter_alias(
collection_name="customized_setup_1",
alias="alice"
)

res = client.list_aliases(
collection_name="customized_setup_1"
)

print(res)

# Output
#
# {
# "aliases": [
# "alice"
# ],
# "collection_name": "customized_setup_1",
# "db_name": "default"
# }

res = client.list_aliases(
collection_name="customized_setup_2"
)

print(res)

# Output
#
# {
# "aliases": [
# "bob"
# ],
# "collection_name": "customized_setup_2",
# "db_name": "default"
# }

Drop aliases

# 9.5 Drop aliases
client.drop_alias(
alias="bob"
)

client.drop_alias(
alias="alice"
)

Drop a Collection

If a collection is no longer needed, you can drop the collection.

# 10. Drop the collections
client.drop_collection(
collection_name="quick_setup"
)

client.drop_collection(
collection_name="customized_setup_1"
)

client.drop_collection(
collection_name="customized_setup_2"
)

Collection Limits

Cluster Type

Max Number

Remarks

Dedicated cluster

64 per CU, and <= 4096

You can create up to 64 collections per CU used in a dedicated cluster and no more than 4,096 collections in the cluster.

In addition to the limits on the number of collections per cluster, Zilliz Cloud also applies limits on consumed capacity, which indicates the physical resources consumed by your clusters. The following table lists the limits on the general capacity of a cluster.

Number of CUs

General Capacity

1-8 CUs

<= 4,096

12+ CUs

Min(512 x Number of CUs, 65536)

For details on the calculation of general and consumed capacity, refer to Zilliz Cloud Limits.