Skip to main content
Version: User Guides (Cloud)

Scale Cluster

As your workload grows and more data is written, the cluster may reach its capacity limit. In such cases, read operations will continue to function, but new write operations may fail.

To proactively manage this, you can monitor CU Capacity on the metrics page to determine when scaling is needed. Based on your business needs and patterns, you can increase the number of query CUs to expand cluster capacity or reduce it when demand decreases to save on costs.

This guide explains how to resize a cluster to suit your changing workload.

📘Notes

For clusters with 1 - 8 CUs, you can directly scale query CU. For clusters with more than 8 CUs, please increase replicas.

Scaling Methods in Zilliz Cloud

Zilliz Cloud offers several ways to scale your cluster:

  • Manual Scaling: Instantly adjust the number of query CU. Ideal if you have a clear understanding of your workload patterns.

    • When you choose manual scaling, you can further enable scheduled scaling to adjust query CU resources based on a predefined time schedule. Scheduled scaling is perfect for recurring workload patterns, such as peaks during business days and lower demand on weekends, or scenarios where your future workload is stable and predictable.
  • Dynamic Scaling: Automatically adjusts the cluster query CU resources within a user-defined min–max range based on real-time metrics. Best for unpredictable workloads that may spike or dip throughout the day.

Considerations

  • Plan Availability: Only supported for Dedicated clusters.

  • Resource Limitations:

    • Scale up

      • Dedicated (Standard) clusters: Up to 32 CUs

        Dedicated (Enterprise) clusters: Up to 256 CUs

      • The product of Number of Query CU × Replica count must not exceed 256

      For larger query CU, contact sales.

    • Scale down

      • Clusters with replicas cannot scale down to less than 8 CUs

      • A scale-down request only succeeds if:

  • During Scaling: The cluster status changes to “Modifying,” during which no operations can be performed. If multiple scaling tasks are triggered, they will be processed sequentially based on trigger timestamp. Completion time depends on data volume.

  • Performance Impact: Scaling may cause slight service jitter.

  • Backup Limitations: Dynamic and scheduled scaling settings are not included in backups. After restoring a cluster, reconfigure these settings manually.

Manual scaling

You can manually scale your cluster up or down via the Zilliz Cloud console or RESTful API. Note that scheduled scaling is only available on the web console.

Via web console

The following demo shows how to manually scale up and down a cluster on the Zilliz Cloud web console.

Via RESTful API

The following example scales an existing cluster to 2 CU. For details, see Modify Cluster.

curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"cuSize": 2
}'

The following is an example output.

{
"code": 0,
"data": {
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"prompt": "successfully submitted. Cluster is being upgraded, which is expected to take several minutes. You can access data about the creation progress and status of your cluster by DescribeCluster API. Once the cluster status is RUNNING, you may access your vector database using the SDK."
}
}

Dynamic scaling

Zilliz Cloud supports dynamic scaling to help you maintain performance while eliminating manual intervention. When enabled, the system automatically adjusts the query CU resources based on the real-time CU capacity metric, ensuring your workload is served efficiently without service disruption.

When setting up dynamic scaling, you can configure the following bounds:

  • Minimum Query CU: Defaults to the current size.

  • Maximum Query CU: Defaults to 4× the current CU size.

📘Notes
  • Selecting a minimum or maximum query CU below the current value triggers an immediate scale-down.

  • Selecting a minimum query CU above the current value triggers an immediate scale-up.

Trigger conditions

  • Scale Up: Triggered when CU capacity exceeds 80% for 10 minutes. Or when CU capacity reaches 100%, a scale up will be triggered immediately.

  • Scale Down: Triggered when CU capacity stays below 50% for 30 minutes.

  • A cooldown period of 10 minutes applies between scale-up events, and 30 minutes between scale-down events. Scaling down will execute on a size-by-size basis until the target metric value has been achieved.

Scaling size calculation

The following formula explains how Zilliz Cloud calculates the target number of query CU for a dynamic scaling event.

Target Query CU Number = Current Query CU Number × (Current Metric Value / Target Metric Value) 

Variable Name

Description

Target Query CU Number

The new size the system aims to scale the cluster to.

Current Query CU Number

The current query CU number of the cluster.

Current Metric Value

The current measured value of the CU capacity metric.

Target Metric Value

Expected CU capacity value after scaling, which is 70.

For example, if dynamic scaling is enabled and the following conditions are met:

  • Current Query CU Number: 60 CU

  • Cluster CU Capacity: Above 80% for 10 minutes

A dynamic scaling event will be triggered. The target query CU number is calculated as:

60 × (80 / 70) ≈ 68.57 CU

This value is then rounded up to the next available CU number, resulting in a new size of 72 CU.

Procedures

The following demo shows how to configure dynamic auto-scaling on the Zilliz Cloud web console.

FAQ

  1. Which scaling option should I choose?

    The following is a quick tip to help you choose the right scaling method for your needs:

YfDow6t7Bh9HONbg60RcQryvnfe

  • If you have a very clear understanding of your workload patterns—such as consistent daily peaks or planned batch import jobs—manual scaling and scheduled scaling is right option for you. If you need to adjust the query CU immediately, choose manual scaling. If you want the adjustment to occur recurringly at a specific future time, choose scheduled scaling.

  • If your workload is unpredictable and varies throughout the day or week, dynamic scaling is recommended. It adjusts the cluster size automatically within a range you define, helping to maintain performance while optimizing cost.

  1. When should I scale replicas and when should I scale query CU?

    You are recommended to:

    • Increase replica count when:

      • You need to handle high QPS (queries per second) and high availability.

      • Your workload consists of many concurrent search or query requests. You need to increase throughput.

      Tips: Each replica is an independent copy of the query CU resources and handles a subset of queries.

    • Increase query CU when:

      • You are working with large datasets or require more collections.

      • You are seeing high CPU or memory usage.

      Tips: Increasing CU size gives each query node more computing resources and capacity.

    • Suggestion: For clusters with 1 - 8 CUs, you can directly scale query CU. For clusters with more than 8 CUs, please increase replicas.

  2. What are the limitations when scaling down a cluster?

    Clusters with replicas cannot scale down to fewer than 8 CUs.

    A scale-down request will only succeed if both of the following conditions are met:

    • The current data volume is less than 80% of the new CU size's capacity.

    • The number of collections and partitions is within the limit allowed by the new CU size.