Version: User Guides (Cloud)

Scale Cluster

As your workload grows and more data is written, the cluster may reach its capacity limit. In such cases, read operations will continue to function, but new write operations may fail.

To proactively manage this, you can monitor CU Capacity on the metrics page to determine when scaling is needed. Based on your business needs and patterns, you can increase the CU size to expand cluster capacity or reduce it when demand decreases to save on costs.

This guide explains how to resize a cluster to suit your changing workload.

📘Notes

To improve query performance (QPS) or availability, increase replicas—not CU size, which only affects storage and ingestion capacity.

Scaling Options in Zilliz Cloud

Zilliz Cloud offers several ways to scale your cluster:

Manual Scaling: Manually adjust CU size anytime for full control. Ideal if you have a clear understanding of your workload patterns.
Dynamic Auto-Scaling: Automatically adjust CU size based on real-time metrics. Best for unpredictable workloads that may spike or dip throughout the day.
Scheduled Auto-Scaling: Automatically adjust CU size based on a predefined time schedule. Perfect for recurring workload patterns, such as peaks during business days and lower demand on weekends.

Considerations

Scaling is only available for Dedicated clusters.
Downward auto-scaling is currently not supported in dynamic auto-scaling.
Scaling may cause slight service jitter. Completion time varies by data volume.

Manual scaling

You can manually scale your cluster up or down via the Zilliz Cloud console or RESTful API.

The followings are the limits and considerations for manual scaling.

Scale up
- Dedicated (Standard) clusters: Up to 32 CUs
  
  Dedicated (Enterprise) clusters: Up to 256 CUs
  
  For larger CU sizes, contact sales.
- The product of CU size × Replica count must not exceed 256
Scale down
- Clusters with replicas cannot scale down to less than 8 CUs
- A scale-down request only succeeds if:
  - Current data volume < 80% of the CU capacity of the new CU size.
  - Current number of collections < the maximum number of collections allowed in the new CU size.

Via web console

The following demo shows how to manually scale up and down a cluster on the Zilliz Cloud web console.

Via RESTful API

The following example scales an existing cluster to 2 CU. For details, see Modify Cluster.

curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
    "cuSize": 2
}'

The following is an example output.

{
    "code": 0,
    "data": {
        "clusterId": "inxx-xxxxxxxxxxxxxxx",
        "prompt": "successfully submitted. Cluster is being upgraded, which is expected to take several minutes. You can access data about the creation progress and status of your cluster by DescribeCluster API. Once the cluster status is RUNNING, you may access your vector database using the SDK."
    }
}

Auto-scaling

To reduce operational overhead and avoid service interruptions, you can enable auto-scaling in the Zilliz Cloud web console. Two modes are available—dynamic and scheduled—and you can enable either or both.

The followings are the limits and considerations for auto-scaling.

Downward auto-scaling is not currently supported in dynamic auto-scaling.
There is 10-minute cooldown between two automatic scaling events.

Dynamic auto-scaling

The following demo shows how to configure dynamic auto-scaling on the Zilliz Cloud web console.

CU Capacity Threshold: The usage percentage (default 70%) that triggers auto-scaling if consistently exceeded at all sample points over the past 2 minutes. Avoid setting it too high (e.g., >90%), as high write rates can push CU capacity to 100% before the cluster can finish scaling, resulting in write prohibitions.

Scheduled auto-scaling

The following demo shows how to configure scheduled auto-scaling on the Zilliz Cloud web console.

Scaling Options in Zilliz Cloud​

Considerations​

Manual scaling​

Via web console​

Via RESTful API​

Auto-scaling​

Dynamic auto-scaling​

Scheduled auto-scaling​