ClusterPublic Preview
A cluster is a set of compute resources that runs your vector database workloads. Zilliz Cloud offers two types: serving clusters, which run continuously for production workloads requiring always-on, low-latency access, and on-demand clusters, which spin up only when requests arrive and scale to zero when idle.
This topic describes how to create an on-demand cluster.
This feature is only available to Enterprise projects.
Currently, you can only create an on-demand cluster in AWS us-west-2. For other regions, contact us.
Limitations
-
To manage an on-demand cluster, you need to be a Project Admin.
-
You can only create up to 20 on-demand clusters in each project.
-
An on-demand cluster can query up to 3 TB of raw data for every 8 CUs. Queries that exceed this limit will return an error.
Create an on-demand cluster
-
Via RESTful API
export BASE_URL="https://api.cloud.zilliz.com"export TOKEN="YOUR_API_KEY"curl --request POST \--url "${BASE_URL}/v2/clusters/createOnDemandCluster" \--header "Authorization: Bearer ${TOKEN}" \--header "Accept: application/json" \--header "Content-Type: application/json" \--data-raw '{"projectId": "proj-09ee1f4b1151d5dd1edbc5","regionId": "aws-us-west-2","clusterName": "my-on-demand","cuSize": 8,"autoSuspend": 120}'# {# "code": 0,# "data": {# "clusterId": "inxx-xxxxxxxxxxxxxxx",# "regionId": "aws-us-west-2",# "projectId": "proj-09ee1f4b1151d5dd1edbc5"# }# }The following table describes the parameters.
Parameter
Description
projectIdID of the project where the on-demand cluster will be created.
regionIdRegion where the cluster is deployed. Must match the project’s region.
cuSizeThe number of query CUs to allocate. The cluster automatically scales between zero and this value based on workload — it spins up to the specified CU size when requests arrive and scales back to zero when idle.
The minimum is 8 CU, the maximum is 256 CU, and sizes increase in increments of 8 (for example, 8, 16, and 24). Clusters with more than 8 CU require a payment method.
Setting this to 8 enables searches across data up to 3 TB. To increase the data volume, increase the CU size.
This value is fixed after creation and cannot be changed.
clusterNameName of the cluster to create.
autoSuspendIdle timeout before the cluster auto-suspends. When no requests are received within this period, the cluster suspends to stop incurring compute costs.
Value type: Integer
Unit: Seconds
Minimum: 60
Default: 60
-
Via web console
The following demo shows how to create an on-demand cluster on the web console.
1Click on On-Demand Compute > Clusters.
2Click on + Cluster.
3Configure cluster settings.
The following table explains the parameters.
Parameter
Description
Cluster Name
The name of the cluster to create.
Query CU
The number of query CUs to allocate. The cluster automatically scales between zero and this value based on workload — it spins up to the specified CU size when requests arrive and scales back to zero when idle.
The minimum is 8 CU, the maximum is 256 CU, and sizes increase in increments of 8 (for example, 8, 16, and 24). Clusters with more than 8 CU require a payment method.
This value is fixed after creation and cannot be changed.
Auto suspend
The idle time (in seconds) before the cluster auto-suspends. Default is 1 minute. When no requests are received within this period, the cluster suspends to stop incurring compute costs.
4Click on Create.
View all on-demand clusters
-
Via RESTful API
You can list all on-demand clusters as follows:
export BASE_URL="https://api.cloud.zilliz.com"export TOKEN="YOUR_API_KEY"curl --request GET \--url "{BASE_URL}/v2/clusters/onDemandClusters?projectId={PROJECT_ID}®ionId=aws-us-west-2" \--header "Authorization: Bearer ${TOKEN}" \--header "Accept: application/json"The following is an example output.
{"code": 0,"data": {"count": 2,"onDemandClusters": [{"clusterId": "inxx-xxxxxxxxxxxxxxx","clusterName": "xxx","regionId": "aws-us-west-2","cuSize": 8,"status": "SUSPENDED","endpoint": "https://proj-09ee1f4b1151d5dd1edbc5.aws-us-west-2.vectordb-uat3.zillizcloud.com","privateLink": "","createdBy": "john.doe@zilliz.com","createTime": 1745396115000}]}} -
Via web console

Check the details of an on-demand cluster
-
Via RESTful API
You can describe an on-demand cluster as follows:
export BASE_URL="https://api.cloud.zilliz.com"export TOKEN="YOUR_API_KEY"curl --request GET \--url "${BASE_URL}/v2/clusters/onDemandClusters/inxx-xxxxxxxxxxxxxxx" \--header "Authorization: Bearer ${TOKEN}" \--header "Accept: application/json"The following is an example output.
{"code": 0,"data": {"clusterId": "inxx-xxxxxxxxxxxxxxx","clusterName": "xxx","regionId": "aws-us-west-2","cuSize": 8,"status": "RUNNING","endpoint": "https://proj-09ee1f4b1151d5dd1edbc5.aws-us-west-2.vectordb-uat3.zillizcloud.com","privateLink": "","createdBy": "john.doe@zilliz.com","createTime": 1745396115000}} -
Via web console

Drop an on-demand cluster
Once you drop a cluster, it is removed immediately and cannot be recovered. This action cannot be undone.
-
Via RESTful API
You can drop an on-demand cluster as follows:
export BASE_URL="https://api.cloud.zilliz.com"export TOKEN="YOUR_API_KEY"curl --request DELETE \--url "${BASE_URL}/v2/clusters/onDemandClusters/inxx-xxxxxxxxxxxxxxx" \--header "Authorization: Bearer ${TOKEN}" \--header "Accept: application/json"The following is an example output.
{"code": 0,"data": {"clusterId": "inxx-xxxxxxxxxxxxxxx","status": "DELETING"}} -
Via web console
