Collection Explained
On Zilliz Cloud, you can create multiple collections to manage your data, and insert your data as entities into the collections. Collection and entity are similar to tables and records in relational databases. This page helps you to learn about the collection and related concepts.
Collection
A collection is a two-dimensional table with fixed columns and variant rows. Each column represents a field, and each row represents an entity.
The following chart shows a collection with eight columns and six entities.
Schema and Fields
When describing an object, we usually mention its attributes, such as size, weight, and position. You can use these attributes as fields in a collection. Each field has various constraining properties, such as the data type and the dimensionality of a vector field. You can form a collection schema by creating the fields and defining their order. For possible applicable data types, refer to Schema Explained.
You should include all schema-defined fields in the entities to insert. To make some of them optional, consider enabling dynamic field. For details, refer to Dynamic Field.
-
Making them nullable or setting default values
For details on how to make a field nullable or set the default value, refer to Nullable & Default.
-
Enabling dynamic field
For details on how to enable and use the dynamic field, refer to Dynamic Field.
Primary key and AutoId
Similar to the primary field in a relational database, a collection has a primary field to distinguish an entity from others. Each value in the primary field is globally unique and corresponds to one specific entity.
As shown in the above chart, the field named id serves as the primary field, and the first ID 0 corresponds to an entity titled The Mortality Rate of Coronavirus is Not Important. There will not be any other entity that has the primary field of 0.
A primary field accepts only integers or strings. When inserting entities, you should include the primary field values by default. However, if you have enabled AutoId upon collection creation, Zilliz Cloud will generate those values upon data insertion. In such a case, exclude the primary field values from the entities to be inserted.
For more information, please refer to Primary Field & AutoId.
Index
Creating indexes on specific fields improves search efficiency. You are advised to create indexes for all the fields your service relies on, among which indexes on vector fields are mandatory.
Unlike in Milvus, AUTOINDEX is the only applicable index type to the vector fields in collections on Zilliz Cloud. For more details, refer to AUTOINDEX Explained.
Entity
Entities are data records that share the same set of fields in a collection. The values in all fields of the same row comprise an entity.
You can insert as many entities as you need into a collection. However, as the number of entities mounts, the memory size it takes also increases, affecting search performance.
For more information, refer to Schema Explained.
Load and Release
Loading a collection is the prerequisite to conducting similarity searches and queries in collections. When you load a collection, Zilliz Cloud loads all index files and the raw data in each field into memory for fast response to searches and queries.
Searches and queries are memory-intensive operations. To save the cost, you are advised to release the collections that are currently not in use.
For more details, refer to Load & Release.
Search and Query
Once you create indexes and load the collection, you can start a similarity search by feeding one or several query vectors. For example, when receiving the vector representation of your query carried in a search request, Zilliz Cloud uses the specified metric type to measure the similarity between the query vector and those in the target collection before returning those that are semantically similar to the query.
You can also include metadata filtering within searches and queries to improve the relevancy of the results. Note that, metadata filtering conditions are mandatory in queries but optional in searches.
For details on applicable metric types, refer to Metric Types.
For more information about searches and queries, refer to the articles in the Search & Rerank chapter, among which, basic features are:
In addition, Zilliz Cloud also provides enhancements to improve search performance and efficiency. They are disabled by default, and you can enable and use them according to your service requirements. They are
Partition
Partitions are subsets of a collection, which share the same field set with its parent collection, each containing a subset of entities.
By allocating entities into different partitions, you can create entity groups. You can conduct searches and queries in specific partitions to have Zilliz Cloud ignore entities in other partitions, and improve search efficiency.
For details, refer to Manage Partitions.
Shard
Shards are horizontal slices of a collection. Each shard corresponds to a data input channel. Every collection has a shard by default. You can set the appropriate number of shards when creating a collection based on the expected throughput and the volume of the data to insert into the collection.
For details on how to set the shard number, refer to Create Collection.
Alias
You can create aliases for your collections. A collection can have several aliases, but collections cannot share an alias. Upon receiving a request against a collection, Zilliz Cloud locates the collection based on the provided name. If the collection by the provided name does not exist, Zilliz Cloud continues locating the provided name as an alias. You can use collection aliases to adapt your code to different scenarios.
For more details, refer to Manage Aliases.
Function
You can set functions for Zilliz Cloud to derive fields upon collection creation. For example, the full-text search function uses the user-defined function to derive a sparse vector field from a specific varchar field. For more information on full-text search, refer to Full Text Search.
Consistency Level
Distributed database systems usually use the consistency level to define the data sameness across data nodes and replicas. You can set separate consistency levels when you create a collection or conduct similarity searches within the collection. The applicable consistency levels are Strong, Bounded Staleness, Session, and Eventually.
For details on these consistency levels, refer to Consistency Level.
Limits
The following table lists the limits on the maximum number of collections you can create when using different cluster types.
Cluster Type | Max Number | Remarks |
---|---|---|
Free cluster | 5 | You can create up to 5 collections. |
Serverless cluster | 100 | You can create up to 100 collections. |
Dedicated cluster | 64 per CU, and <= 4096 | You can create up to 64 collections per CU used in a dedicated cluster and no more than 4,096 collections in the cluster. |
In addition to the limits on the number of collections per cluster, Zilliz Cloud also applies limits on consumed capacity. The following table lists the limits on the general capacity of a cluster.
Number of CUs | General Capacity |
---|---|
1-8 CUs | <= 4,096 |
12+ CUs | Min(512 x Number of CUs, 65536) |
For details on the calculation of general and consumed capacity, refer to Zilliz Cloud Limits.