Skip to main content
Version: User Guides (Cloud)

Scale Query CU

As your workload grows and more data is written, the cluster may reach its capacity limit. In such cases, read operations will continue to function, but new write operations may fail.

To proactively manage this, you can monitor Query CU Capacity on the metrics page to determine when query CU scaling is needed. Based on your business needs and patterns, you can increase the number of query CUs to expand cluster capacity or reduce it when demand decreases to save on costs.

Please note that for clusters with 1 - 8 CUs, you can directly scale query CU. For clusters with more than 8 CUs, please increase replicas.

This guide explains how to resize a cluster to suit your changing workload.

📘Notes

This feature is available only to Dedicated clusters.

Considerations

  • Resource Limitations:

    • Scale up

      • Dedicated (Standard) clusters: Up to 32 CUs

        Dedicated (Enterprise) clusters: Up to 256 CUs

      • The product of Number of Query CU × Replica count must not exceed 256

      For larger query CU, contact sales.

    • Scale down

      • Clusters with replicas cannot scale down to less than 8 CUs

      • A scale-down request only succeeds if:

  • During Scaling: The cluster status changes to “Modifying,” during which no operations can be performed. If multiple scaling tasks are triggered, they will be processed sequentially based on trigger timestamp. Completion time depends on data volume.

  • Performance Impact: Scaling may cause slight service jitter.

  • Backup Limitations: Dynamic and scheduled scaling settings are not included in backups. After restoring a cluster, reconfigure these settings manually.

Manual scaling

You can manually scale your cluster up or down via the Zilliz Cloud console or RESTful API.

The following demo shows how to manually scale up and down a cluster on the Zilliz Cloud web console.

In addition, you can use the RESTful API to manually scale query CU.

The following example scales an existing cluster to 2 CU. For details, see Modify Cluster.

export TOKEN="YOUR_API_KEY"
export CLUSTER_ID="inxx-xxxxxxxxxxxxxxx"

curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"cuSize": 2
}'

Scheduled scaling

📘说明

此功能仅限企业版项目中的 Dedicated 集群使用。

The interval between schedules should be greater than 30 minutes.

For details about how to use the advanced mode to write cron expressions, see Cron Expression.

In addition, you can also enable scheduled scaling as follows. For details, see Modify Cluster.

export TOKEN="YOUR_API_KEY"
export CLUSTER_ID="inxx-xxxxxxxxxxxxxxx"

curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"autoscaling": {
"cu": {
"schedules": [
{
"cron": "10 0 0 0 0 ?",
"target": 2
}
]
}
}
}'

Dynamic scaling

📘说明

此功能仅限企业版项目中的 Dedicated 集群使用。

Zilliz Cloud supports dynamic scaling to help you maintain performance while eliminating manual intervention. When enabled, the system automatically adjusts the query CU resources based on the real-time CU capacity metric, ensuring your workload is served efficiently without service disruption.

When setting up dynamic scaling, you can configure the following bounds:

  • Minimum Query CU: Defaults to the current size.

  • Maximum Query CU: Defaults to 4× the current CU size.

📘Notes
  • Selecting a maximum query CU below the current value triggers an immediate scale-down.

  • Selecting a minimum query CU above the current value triggers an immediate scale-up.

Trigger conditions

  • Scale Up: Triggered when CU capacity exceeds 80% for 10 minutes. Or when CU capacity reaches 100%, a scale up will be triggered immediately.

  • Scale Down: Triggered when CU capacity stays below 60% for 30 minutes.

  • A cooldown period of 10 minutes applies between scale-up events, and 30 minutes between scale-down events. Scaling down will execute on a size-by-size basis until the target metric value has been achieved.

Scaling size calculation

The following formula explains how Zilliz Cloud calculates the target number of query CU for a dynamic scaling event. The dynamic scaling formula aims to maintain your CU capacity at a target value of 70%.

Target Query CU Number = Current Query CU Number × (Current Metric Value / Target Metric Value) 

Variable Name

Description

Target Query CU Number

The new size the system aims to scale the cluster to.

Current Query CU Number

The current query CU number of the cluster.

Current Metric Value

The current measured value of the CU capacity metric.

Target Metric Value

Expected CU capacity value after scaling, which is 70.

For example, if query CU dynamic scaling is enabled and the following conditions are met:

  • Current Query CU Number: 60 CU

  • Cluster CU Capacity: Above 80% for 10 minutes

A dynamic scaling event will be triggered. The target query CU number is calculated as:

60 × (80 / 70) ≈ 68.57 CU

This value is then rounded up to the next available CU number, resulting in a new size of 72 CU.

Procedures

The following demo shows how to configure dynamic auto-scaling on the Zilliz Cloud web console.

In addition, you can configure dynamic scaling using RESTful API. For details, see Modify Cluster.

export TOKEN="YOUR_API_KEY"
export CLUSTER_ID="inxx-xxxxxxxxxxxxxxx"

curl --request POST \
--url "${BASE_URL}/v2/clusters/${CLUSTER_ID}/modify" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"autoscaling": {
"cu": {
"min": 1,
"max": 2
}
}
}'

View scaling progress

Once a manual scaling request is sent or a scheduled or dynamic scaling event is triggered, a job record will be generated. You can check the progress on the Jobs page.

When a scaling job is in progress, you cluster status will change to "Modifying". Once the scaling job is successful, the cluster status will change to "Running".

FAQ

  1. What are the limitations when scaling down a cluster?

    Clusters with replicas cannot scale down to fewer than 8 CUs.

    A scale-down request will only succeed if both of the following conditions are met:

    • The current data volume is less than 80% of the new CU size's capacity.

    • The number of collections and partitions is within the limit allowed by the new CU size.