Scale Cluster
As your workload grows and more data is written, the cluster may reach its capacity limit. In such cases, read operations will continue to function, but new write operations may fail.
To proactively manage this, you can monitor CU Capacity on the metrics page to determine when scaling is needed. Based on your business needs and patterns, you can increase the number of query CUs to expand cluster capacity or reduce it when demand decreases to save on costs.
This guide explains how to resize a cluster to suit your changing workload.
For clusters with 1 - 8 CUs, you can directly scale query CU. For clusters with more than 8 CUs, please increase replicas.
Considerations
-
Plan Availability: Only supported for Dedicated clusters.
-
Resource Limitations:
-
Scale up
-
Dedicated (Standard) clusters: Up to 32 CUs
Dedicated (Enterprise) clusters: Up to 256 CUs
-
The product of Number of Query CU × Replica count must not exceed 256
For larger query CU, contact sales.
-
-
Scale down
-
Clusters with replicas cannot scale down to less than 8 CUs
-
A scale-down request only succeeds if:
-
Current data volume < 80% of the CU capacity of the new CU size.
-
Current number of collections and partitions < the maximum number of collections and partitions allowed in the new CU size.
-
-
-
-
During Scaling: The cluster status changes to “Modifying,” during which no operations can be performed. If multiple scaling tasks are triggered, they will be processed sequentially based on trigger timestamp. Completion time depends on data volume.
-
Performance Impact: Scaling may cause slight service jitter.
-
Backup Limitations: Dynamic and scheduled scaling settings are not included in backups. After restoring a cluster, reconfigure these settings manually.
Manual scaling
You can manually scale your cluster up or down via the Zilliz Cloud console or RESTful API. Note that scheduled scaling is only available on the web console.
Via web console
The following demo shows how to manually scale up and down a cluster on the Zilliz Cloud web console.
Via RESTful API
The following example scales an existing cluster to 2 CU. For details, see Modify Cluster.
curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"cuSize": 2
}'
The following is an example output.
{
"code": 0,
"data": {
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"prompt": "successfully submitted. Cluster is being upgraded, which is expected to take several minutes. You can access data about the creation progress and status of your cluster by DescribeCluster API. Once the cluster status is RUNNING, you may access your vector database using the SDK."
}
}