Skip to main content
Version: User Guides (Cloud)

Metrics Reference

Zilliz Cloud categorizes metrics in two levels - Organization and Project:

  • Organization-level metrics: Reflect account-wide status (e.g., license credits, usage) across all projects.

  • Project-level metrics: Reflect cluster resources, capacity, performance, and data within a single project.

📘Notes

Most metrics support alerts. An alert evaluates a metric against a condition (operator + threshold) over a time window and notifies you when it’s met. For configuration, refer to Manage Organization Alerts and Manage Project Alerts.

Organization-level metrics

Organization-level metrics help you track billing-related issues across all projects in an organization.

Metric

Unit

Description

Recommended action

Usage Amount

$

Cumulative usage charges over a period.

Monitor vs. budget; optimize usage or adjust budget as needed.

Credit Validity

day

Days left before free credits expire.

Use or extend credits before expiry.

Remaining Credits

$

Balance of free credits.

Top up when low to maintain account functionality.

Credit Card Validity

day

Days until the saved card expires.

Update or replace card before expiry to avoid payment failures.

Advance Pay Balance

$

Remaining pre-paid funds.

Add funds when low to prevent service interruption.

Project-level metrics (cluster metrics)

These metrics describe resource usage and performance within a project’s clusters.

📘Notes

In this section, Availability refers to the cluster’s plan tier. All means the metric is available across all current cluster plan tiers. For detailed plan tiers, refer to Detailed Plan Comparison.

Resources

Metric

Unit

Description

Recommended action

Availability

Read vCUs

count

A measure of vCU consumption of search and query operations.

Note: Alerts are not supported for this metric.

Monitor trends to understand read cost/throughput.

Free / Serverless

Write vCUs

count

A measure of vCU consumption of insert, delete, and upsert operations.

Note: Alerts are not supported for this metric.

Monitor trends to understand write cost/throughput.

Free / Serverless

Query CU Computation

%

A measure of the utilized computational power relative to the total computational capacity of the CU.

  • 70–80%: check service status & prep scale-up.

  • >90%: scale up to avoid interruption.

Dedicated / BYOC

Query CU Capacity

%

A measure of the used capacity relative to the total capacity of the CU.

Dedicated / BYOC

Total Query CU

count

The total query CU in the current cluster. It is calculated as the product of the numbers of cluster query CU and replica. (Eg. If your cluster has 2 Query CUs and 2 Replicas, the Total Query CU displayed here is 4.)

Track to identify query-CU scaling events.

Dedicated / BYOC

Replica

count

The number of cluster replicas.

Track to identify replica scaling events.

Dedicated / BYOC

Storage

GB

The total amount of persistent storage consumed by data and indexes.

Configure alerts for monitoring storage usage.

All

Performance

Metric

Unit

Description

Recommended action

Availability

QPS (Read)

-

The number of read requests (search and query) per second.

Refer to benchmark for system performance monitoring.

All

QPS (Write)

-

The number of write requests (insert, bulk insert, upsert, and delete) per second.

Refer to benchmark for system performance monitoring.

All

Search NQ per Second

-

The number of query vectors that each search request carries per second.

Refer to benchmark for system performance monitoring.

All

Write Throughput (Entities/sec)

-

Measures the number of entities written per second across all write operations (insert, upsert, bulk insert, and delete).

Refer to benchmark for system performance monitoring.

All

Latency (Read)

ms

The time elapsed between a client sending a read request (search and query request) to a server and the client receiving a response. It includes an average latency and a P99 latency.

-

All

Latency (Write)

ms

The time elapsed between a client sending a write request (insert and upsert request) to a server and the client receiving a response. It includes an average latency and a P99 latency.

-

All

Request Failure Rate (Read)

%

The percentage of all failed read requests in all requests per second.

Configure alerts for monitoring read request failure rate.

All

Request Failure Rate (Write)

%

The percentage of all failed write requests in all requests per second.

Configure alerts for monitoring write request failure rate.

All

Slow Query Count

counts/min

The number of queries that take an unusually long time to execute.

Identify problematic queries and tune performance by adjusting cluster configuration as necessary.

Dedicated (Enterprise) / BYOC

Cluster Write Performance Capacity

%

Cluster write performance capacity = Current rate of write operations/write rate limit. When it exceeds 80%, it is recommended to reduce the rate of your write operations (insert and upsert).

If the current rate is too high (suggested to be over 80%), it is recommended that you lower the write rate.

Dedicated (Enterprise) / BYOC

Number of Flush Operations

counts/min

The number of flush operations on a cluster.

Performing flush operations too frequently can negatively impact the overall performance of the cluster. For more information, refer to Zilliz Cloud Limits.

Dedicated (Enterprise) / BYOC

Data

Metric

Unit

Description

Recommended action

Availability

Collection Count

count

The number of collections created in a cluster.

Monitor growth; enforce per-project limits if needed.

All

Entity Count

count

The total number of entities inserted into the cluster, including both single inserts and bulk inserts.

Investigate unexpected growth; plan storage and indexing.

All

Loaded Entities (Approx.)

count

The approximate number of entities loaded (actively served).

For a more accurate and real-time value, please refer to the 'Loaded Entities' value on the collection overview page or use count(*).

Dedicated / BYOC

Number of Unloaded Collections

count

The number of unloaded collections in a cluster.

Load critical collections; review memory headroom.

Dedicated (Enterprise) / BYOC