Version: User Guides (Cloud)

Select the Right CU

Selecting the right Compute Unit (CU) is a crucial step when creating a cluster in Zilliz Cloud. A CU is the basic unit of compute resources used for parallel processing of data, and different CU types comprise varying combinations of CPU, memory, and storage.

Understand CU types

Zilliz Cloud offers these CU types: Performance-optimized, Capacity-optimized, and Extended-capacity.

The following table offers a quick comparison of the three CU types in different aspects. For a detailed comparison in terms of the capacity and performance among the CU types, please proceed to Select an optimal CU type.

CU Type	Search QPS	Search Latency	Per CU Capacity	Cost per Million Vectors
Performance-optimized	500~1500	sub-10 ms	1.5 million 768-dim vectors	from $65/mo.
Capacity-optimized	100~300	tens-ms	5 million 768-dim vectors	from $20/mo.
Extended-capacity	5~20	hundreds-ms	20 million 768-dim vectors	from $10/mo.

Performance-optimized CU

Tailored for scenarios emphasizing low latency and high throughput.
Ideal for real-time applications like generative AI, recommendation systems, chatbots, and more.

Capacity-optimized CU

Crafted for handling vast datasets, boasting five times the data capacity of its Performance-optimized counterpart, albeit with subdued search performance.
Ideal for large-scale unstructured data search, copyright detection, and identity verification.

Extended-capacity CU

Best for scenarios with extensive datasets where cost-efficiency is prioritized over latency.
Ideal for applications that need to store massive volumes of data at a low cost. The capacity of an extended-capacity CU is 4 times that of a capacity-optimized CU.

📘Notes

To select an extended-capacity CU, your cluster must have at least 4 CUs.

Select an optimal CU type

Factor in data volume, performance expectations, and budgets while selecting the CU type. Your vector data's magnitude, both in terms of vector count and dimensions, plays a pivotal role in determining cluster resource allocation.

Assess capacity

The number of entities a cluster can accommodate depends on the CU capacity of a cluster.

The reference table below illustrates the capacity of a cluster with 1 performance-optimized CU and 1 capacity-optimized CU, taking into account the vector dimensions and the total vector count. For an estimation of the CU sizes needed for your data volume, please use our calculator.

Vector Dimensions	Performance-optimized (Max. Vectors per CU)	Capacity-optimized (Max. Vectors per CU)	Extended-capacity (Max. Vectors per CU)
128	7.5 million	25 million	100 million
256	4.5 million	15 million	60 million
512	2.25 million	7.5 million	30 million
768	1.5 million	5 million	20 million
1024	1.125 million	3.75 million	15 million

📘Notes

The above metrics are based on tests considering only primary keys and vectors. If your dataset has extra scalar fields (e.g., id, label, keywords), the actual capacity may deviate. It's prudent to conduct personal tests for a precise evaluation.

Evaluate performance

Performance metrics, notably latency and queries per second (QPS), are vital. The Performance-optimized CU distinctly outperforms Capacity-optimized CU in latency and throughput, particularly for standard top-k values ranging from 10 to 250.

The following table shows the test result of how each CU type performs in terms of QPS.

top_k	QPS for Performance-optimized CU (768-dim 1M vectors)	QPS for Capacity-optimized CU (768-dim 5M vectors)
10	520	100
100	440	80
250	270	60
1000	150	40

The following table shows the test result of how each CU type performs in terms of latency.

top_k	Latency of Performance-optimized CU (768-dim 1M vectors)	Latency of Capacity-optimized CU (768-dim 5M vectors)
10	< 10 ms	< 50 ms
100	< 10 ms	< 50 ms
250	< 10 ms	< 50 ms
1000	10 - 20 ms	50 - 100 ms

Scenario breakdown

Suppose you are building an image recommendation application with a library of 8 million images. Each image in your library is represented by a 768-dimensional embedding vector. Your goal is to swiftly handle a QPS of 1,000 recommendation requests and deliver the top 100 image recommendations in under 30 milliseconds.

To select the right CU for this requirement, follow these steps:

Evaluate Latency: The Performance-optimized CU is the only type that meets the 30-millisecond latency requirement.
Assess Capacity: A single Performance-optimized CU accommodates 1.5 million 768-dimensional vectors. To store all 8 million vectors, you would need at least 6 CUs.
Check Throughput: With a top-k setting of 100, the Performance-optimized CU can achieve a QPS of 440. To sustain a consistent 1,000 QPS, you would need to triple the number of replicas.

In conclusion, for this scenario, the Performance-optimized CU is your best bet. A configuration of 3 replicas, with each replica consisting of 6 CUs, should serve you perfectly.

Understand CU types​

Performance-optimized CU​

Capacity-optimized CU​

Extended-capacity CU​

Select an optimal CU type​

Assess capacity​

Evaluate performance​

Scenario breakdown​