Scale Global Cluster
Scaling a global cluster works differently from scaling a regular Dedicated cluster. Some resource settings are controlled centrally from the primary cluster, while others are configured independently per cluster.
This page explains the scaling behavior for global clusters and how to scale each resource type.
This feature is available only to Dedicated clusters in a Business Critical project.
Before you start
- Ensure you are a Project Admin.
Scaling behavior overview
The following table provides an overview of the supported scaling behavior on a global cluster.
Resource | Primary Cluster | Secondary cluster |
|---|---|---|
Query CU | Supported. All scaling methods (manual, dynamic, scheduled) available. | Auto-follows primary. Cannot be scaled independently. |
Replica | Supported. All scaling methods (manual, dynamic, scheduled) available. | Supported. All scaling methods (manual, dynamic, scheduled) available. Configured independently per cluster. |
Scale query CUs
Query CU scaling is controlled at the primary cluster level. When you change the number of query CU on the primary, Zilliz Cloud automatically applies the new query CU count to all secondary clusters. You cannot scale a secondary cluster's query CU independently — it always matches the primary.
Scaling the query CU of a primary cluster follows the same procedure as a regular Dedicated cluster. For details, see Scale Query CU.
Considerations
-
The same resource limits apply as for regular Dedicated clusters (e.g., query CU × Replica ≤ 10,240).
-
During query CU scaling, the cluster status changes to Modifying. Switchover is blocked while scaling is in progress.
-
Failover can still be triggered during query CU scaling as an emergency operation, but the scaling task will fail and be retried after failover completes.
Scale replicas
Replica scaling is controlled independently per cluster. Each cluster in the global cluster — primary and secondary — can have a different replica count tailored to its regional workload. This allows you to allocate more resources in high-traffic regions without over-provisioning in others.
The following is an example of configuration replicas for each cluster.
Cluster | Region | Replica | Reason |
|---|---|---|---|
Primary | us-west-2 | 2 | Moderate read + all write traffic |
Secondary_01 | eu-west-1 | 4 | High European read traffic |
Secondary_02 | ap-southeast-1 | 1 | Low traffic, disaster recovery standby only |
Scaling the replica of a primary or secondary cluster follows the same procedure as a regular Dedicated cluster. For details, see Scale Replica.
Considerations
-
The same replica limits apply as for regular Dedicated clusters:
-
Minimum 8 CUs required to enable multi-replicas
-
Maximum 10 replicas
-
CU × Replica ≤ 1,024
-
-
During replica scaling, switchover is blocked on the global cluster.
-
Failover can still be triggered during replica scaling, but the scaling task will fail and be retried after failover completes.
FAQs
-
Can I set different number of query CUs on the primary and secondary clusters?
No. CU scaling is always initiated on the primary, and all secondaries follow automatically. This ensures consistent capacity across the global cluster.
-
Can I set different replica counts on different clusters?
Yes. Replica scaling is fully independent per cluster. This is useful when regions have different traffic patterns — for example, more replicas in a high-traffic region and fewer in a standby-only region.
-
What happens to scaling settings after a switchover?
After a switchover, query CU scaling targets the new primary cluster. Replica configurations on each cluster remain unchanged.