Version: User Guides (Cloud)

Manage Collections (Console)

A collection is a two-dimensional table used to store vector embeddings and metadata. All entities in a collection share the same schema. You can create multiple collections for data management or multi-tenancy purposes.

This guide walks you through the collection creation and management operations on the web console. It is intended for users who prefer a visual interface. If you are familiar with SDKs, you can also create and manage collections through them. For details, see Create Collection.

📘Notes

If you need strong data isolation and manage only a small number of tenants, you can create a separate collection for each tenant.

However, you can only create a maximum of 16,384 collections depending on your cluster plan. Therefore, for large-scale multi-tenancy, consider using alternative strategies such as partition-based or partition-key-based multi-tenancy, depending on your use case. For details, see Implement Multi-tenancy.

Create collection

The Zilliz Cloud console provides 3 ways to create a collection, each designed for different scenarios:

Create your own collection: Customize the schema and index parameters to fit your dataset and use case. Ideal for users who need fine-grained control over the schema.
Create sample collection: Quickly set up a collection with a predefined schema and sample dataset. Recommended for new users exploring Zilliz Cloud.
Clone existing collection: Duplicate an existing collection within the same database. Useful in environment duplication scenarios where you need to copy both schema and data from a testing collection to a production collection. Alternatively, you can also use cloning to modify the shard settings of a created collection.

The following demo shows you where to find the features on the web UI.

Below are some of the concepts you will encounter when creating a collection.

Collection basic information

The metadata of a collection contains:

Collection name
(Optional) Collection description
The database to which the collection belongs. A database is a layer between clusters and collections and serves as a logical container to manage and organize collections. You can group relevant collections under the same database.

Collection schema

A schema defines the data structure of your collection and must include:

1 primary key (PK) field
At least 1 vector field. You can include up to four vector fields by default. To include up to 10, contact us.
(Optional) Scalar fields for metadata
(Optional) Dynamic field. Enabling dynamic field provides flexibility to the collection schema because it allows you to add fields during data insertion without modifying the existing schema. It is recommended to enable dynamic field when your data structure is not fixed. For fields that are frequently used in filters or queries, define them in advance in the schema instead of using dynamic fields, as this can help maintain optimal query performance.

📘Notes

Most of the schema configurations cannot be modified once the collection is created. Design your schema carefully to ensure it meets current and future business needs. For best practices, see Schema Explained.

Index

An index is a data structure that organizes data to accelerate searches and queries. Zilliz Cloud supports two types of indexes:

Vector index: Automatically created using AUTOINDEX to accelerate vector searches. If you have multiple vector fields in the schema, you can create a separate index for each vector field. In addition, you can also edit the metric type used to calculate the distance between vectors.
Scalar index: By default, Zilliz Cloud does not automatically create indexes for scalar fields. However, you can manually create indexes on scalar fields that are commonly used for filtering to accelerate searches and queries.

You can skip creating indexes during collection creation and add indexes later. For details, see Manage Indexes.

Function & analyzer

An analyzer is used in full-text search to tokenize and normalize raw text. It breaks input text into individual, searchable terms and removes irrelevant elements like stop words and punctuation to improve search precision. For details, see Analyzer Overview.

A function is used in full-text search to convert tokenized terms generated by an analyzer into sparse vectors with relevance scores. It applies scoring algorithms like BM25 to generate weighted representations for indexing and document ranking.

To use functions, you need to add both SPARSE_FLOAT_VECTOR and VARCHAR fields in the schema. For details, see Full Text Search.

Partition & partition key

Partition: A partition is a physical subset of a collection. A partition shares the same data schema with its parent collection but contains only a portion of the data in the collection. Each collection comes with one default partition. You can manually add more partitions for multi-tenancy and data management purposes. If no extra partition is created, all data inserted into a collection will fall into the default partition. For details, see Manage Partitions

Partition key: A partition key is a search optimization solution based on partitions. When you specify a non-primary key INT64 or VARCHAR field as the partition key, 16 partitions will be automatically created by Zilliz Cloud and all inserted entities will fall into these 16 auto-generated partitions based on their partition key values. Once partition key is enabled for a collection, you will not be able to manually create partitions in this collection. For details, see Use Partition Key.

📘Notes

To decide whether you need to create partitions or use partition key, you can consider the following factors:

Multi-tenancy strategies: If you need to support millions of tenants, please use partition key. If you need strong physical data isolation between tenants, please use partitions. For details, refer to Implement Multi-tenancy.
Resource management: If you prefer creating and managing partitions on you own, you can choose to use partitions. If you need automatic creation and management of partitions, please use partitions keys.
Hot and cold data management: If you need efficient handling of hot and cold data, please use partition key. To use partition key for hot and cold data management in Dedicated clusters, please contact us.

mmap

Memory mapping (mmap) is a memory usage optimization that enables direct access to large files on disk without loading them to memory. After enabling mmap, you can store more data under the same CU size specifications. As indicated below, mmap is configured with recommended defaults based on your CU type and plan.

Free, Serverless, and Dedicated clusters with the extended-capacity CU type have mmap enabled by default. This setting is fixed and cannot be modified, so you may not see mmap configuration options during collection creation.
Dedicated clusters with the performance-optimized CU type have mmap disabled by default.
Dedicated clusters with the capacity-optimized CU type have mmap enabled by default.

For details about the cluster-level default mmap settings, see Use mmap.

During collection creation, you can optionally configure mmap settings at the collection or field level, depending on your use case. Settings at lower levels take precedence over higher levels: Field > Collection > Cluster.

Collection-level mmap: Enable mmap for raw data across the entire collection. This setting can be modified later, but requires releasing the collection first.
Field-level mmap: Enable mmap for raw data and scalar indexes of selected fields via custom settings. Generally, it is recommended to enable mmap for fields whose data size is large and are not frequently filtered or queried. The setting applies only to the selected fields and can later be modified. To modify field-level mmap settings, you need to release the collection first.

📘Notes

Please be cautious with mmap settings. Changing the default mmap settings may cause performance degradation or load failures due to out-of-memory (OOM) issues. For best practices, see Use mmap.

The demo below shows the entrance of this feature on the Zilliz Cloud web console.

Shard

A shard is a horizontal slice of a collection that corresponds to a data input channel. Every collection comes with one shard by default. You can add more shards to increase write throughput.

As a general guideline, consider adding 1 shard for every 100 million rows of data. The maximum number of shards allowed depends on the cluster plan and cluster CU size. For details, see Zilliz Cloud Limits.

The number of shards can be later edited via the clone collection feature once the collection is created.

Full text search

The Zilliz Cloud console supports configuring the functions and analyzer to use in a full text search. For details about full text search, see Full Text Search.

The demo below shows the entrance of this feature on the Zilliz Cloud web console.

Text Match

The Zilliz Cloud console also supports configuring the field and analyzer for text match. For details about text match, see Text Match.

The demo below shows the entrance of this feature on the Zilliz Cloud web console.