Version: User Guides (BYOC)

Metrics & Alerts Reference
Contact Sales to Enable BYOC

In this reference, you can find descriptions of monitoring metrics for Zilliz Cloud clusters, as well as alert targets that you can set up at organization and project levels.

Cluster metrics

The Metrics tab in the Zilliz Cloud console presents various graphical representations.

The table provides a description of each metric and the actions that you are advised to perform when the usage of your cluster resource exceeds a threshold.

📘Notes

To unlock a range of advanced metrics, upgrade your plan tier.

Metric Name	Unit	Description	Recommended Action
Pod Resources
CPU Usage	Core	The number of CPU cores used by pods.	Regularly monitor and log resource usage to identify trends and potential bottlenecks.
CPU Usage Rate for Limit	%	The percentage of the pod CPU usage in the value of limit.	Monitor the workload and consider optimizing resource usage or increasing the CPU limit if the usage trend continues to rise.
Memory Usage	MB	The memory usage of containers in the pod (with cache excluded).	Regularly monitor and log resource usage to identify trends and potential bottlenecks.
Memory Usage Rate for Limit	%	The percentage of the pod memory usage in the value of limit.	Monitor the memory usage and identify any potential memory leaks or inefficient memory usage in the application.
Network Inbound Flow	Mbps	The network inbound flow of pod.	Track and analyze the amount of data being received from external sources, helping you monitor network performance and identify potential network congestion or bandwidth issues.
Network Outbound Flow	Mbps	The network outbound flow of pod.	Track and analyze the amount of data being sent to external sources, helping you monitor network performance and identify potential network congestion or bandwidth issues.
Resources
CU Computation	%	A measure of the utilized computational power relative to the total computational capacity of the CU. This metric is available only for Dedicated or BYOC clusters.	70%-80%: Check service status and prepare for scaling up. > 90%: Scale up immediately to avoid service interruption.
CU Capacity	%	A measure of the used capacity relative to the total capacity of the CU. This metric is available for Free, Dedicated or BYOC clusters.	70%-80%: Check service status and prepare for scaling up. > 90%: Scale up immediately to avoid service interruption. 100%: When CU capacity reaches 100%, you will be unable to write data into the cluster. Please scale up immediately to avoid service interruption.
Storage	GB	The total amount of persistent storage consumed by data and indexes.	Configure alerts for monitoring storage usage.
Performance
QPS/VPS (Read)	QPS/VPS	QPS: The number of read requests (search and query) per second. VPS: The number of read requests (search) on vectors per second. VPS is not available for query requests as query operations do not involve vectors.	Refer to benchmark for system performance monitoring.
QPS/VPS (Write)	QPS/VPS	QPS: The number of write requests (insert, bulk insert, upsert, and delete) per second. VPS: The number of write requests (insert, bulk insert, upsert, and delete) on vectors per second.	Refer to benchmark for system performance monitoring.
Latency (Read)	ms	The time elapsed between a client sending a read request (search and query) to a server and the client receiving a response. Selecting Average or P99 from the expanded dropdown menu on the right displays an average or P99 latency.	-
Latency (Write)	ms	The time elapsed between a client sending a write request (insert, upsert, and delete) to a server and the client receiving a response. Selecting Average or P99 from the expanded dropdown menu on the right displays an average or P99 latency.	-
Request Failure Rate (Read)	%	The percentage of failed read requests (search and query) in all read requests per second.	Configure alerts to monitor read request failure rate.
Request Failure Rate (Write)	%	The percentage of failed write requests (insert, bulk insert, upsert, and delete) in all write requests per second.	Configure alerts to monitor write request failure rate.
Slow Query Count	count/min	The number of slow query operations, including all search and query requests. By default, all requests whose latency is 5 seconds are considered slow queries. This metric type is available only for Dedicated clusters of the Enterprise edition or BYOC clusters.	Identify problematic queries and tune performance by adjusting cluster configuration as necessary.
Cluster Write Performance Capacity	%	The current rate of write operations/write rate limit. This metric type is available only for Dedicated clusters of the Enterprise edition or BYOC clusters.	If the current rate is too high (suggested to be over 80%), it is recommended that you lower the write rate.
Number of Flush Operations	count/min	The number of flush operations on a cluster. This metric type is available only for Dedicated clusters of the Enterprise edition or BYOC clusters.	Performing flush operations too frequently can negatively impact the overall performance of the cluster. For more information, refer to Zilliz Cloud Limits.
Data
Collection Count	count	The number of collections created in a cluster.	-
Entity Count	count	The number of entities inserted into a cluster. Selecting a specific collection from the expanded dropdown menu on the right displays the number of entities at the collection level.	-
Loaded Entities	count	The number of entities loaded (actively served) by a cluster. Selecting a specific collection from the expanded dropdown menu on the right displays the number of loaded entities at the collection level. This metric is available only for Dedicated or BYOC clusters.	-
Number of Unloaded Collections	count	The number of unloaded collections in a cluster. This metric type is available only for Dedicated clusters of the Enterprise edition or BYOC clusters.

Organization alerts

Organization alerts keep you informed about license-related issues such as the license cores and validity period.

Alert Target	Unit	Description	Recommended Action	Default Trigger Condition
License (Core Usage)	%	Monitor the percentage of used CPU cores against the total licensed cores.	> 70%: Assess future needs and prepare to renew or upgrade the license. > 100%: Renew or upgrade the license immediately to avoid operational disruptions.	WARNING: Trigger alerts when the number of used CPU cores reaches or exceeds 70% of the total. CRITICAL: Trigger alerts when the number of used CPU cores reaches or exceeds 100% of the total.
License (Validity Period)	Day	Track the remaining days of license validity.	< 60 days: Start preparing to renew or upgrade the license. < 0 day (expired): Renew or upgrade the license immediately to avoid restrictions like the inability to create new clusters or scale up.	WARNING: Trigger alerts when the license validity is 60 days or less. CRITICAL: Trigger alerts when the license expires.

Alert Target

Unit

Description

Recommended Action

Default Trigger Condition

License (Core Usage)

Monitor the percentage of used CPU cores against the total licensed cores.

> 70%: Assess future needs and prepare to renew or upgrade the license.

> 100%: Renew or upgrade the license immediately to avoid operational disruptions.

WARNING: Trigger alerts when the number of used CPU cores reaches or exceeds 70% of the total.

CRITICAL: Trigger alerts when the number of used CPU cores reaches or exceeds 100% of the total.

License (Validity Period)

Day

Track the remaining days of license validity.

< 60 days: Start preparing to renew or upgrade the license.

< 0 day (expired): Renew or upgrade the license immediately to avoid restrictions like the inability to create new clusters or scale up.

WARNING: Trigger alerts when the license validity is 60 days or less.

CRITICAL: Trigger alerts when the license expires.

Project alerts

Project alerts focus on the operational aspects of your clusters, including notifications on the CU usage, QPS thresholds, latency issues, and request anomalies, ensuring you maintain optimal cluster performance.

For each project alert target, the trigger condition includes a threshold value and a duration value that must be met for the alert to be triggered. The condition can be set to one of the following operators: >, >=, <, <=, =. The threshold value can be a numeric value, such as a number for metrics like query latency, query QPS, search QPS, CU Capacity, and CU Computation. The duration specifies how long the threshold must be exceeded, which is set to a minimum of 1 minute and a maximum of 30 minutes.

Default alert targets

Zilliz Cloud predefines common alert targets to ensure that critical issues are quickly identified and addressed with the appropriate actions.

For more information about recommended actions, refer to Cluster metrics.

Alert Target	Unit	Default Trigger Condition
CU Computation	%	WARNING: Trigger alerts at >70% utilized computational power for 10+ minutes. CRITICAL: Trigger alerts at >90% utilized computational power for 10+ minutes.
CU Capacity	%	WARNING: Trigger alerts at >70% utilized CU capacity for 10+ minutes. CRITICAL: Trigger alerts at >90% utilized CU capacity for 10+ minutes.
Search (QPS)	QPS	Trigger WARNING alerts at >50 search operations per second for 10+ minutes.
Query (QPS)	QPS	Trigger WARNING alerts at >50 query operations per second for 10+ minutes.
Search Latency (P99)	ms	Trigger WARNING alerts at P99 latency >1,000ms for 10+ minutes.
Query Latency (P99)	ms	Trigger WARNING alerts at P99 latency >1,000ms for 10+ minutes.

Custom alert targets

In addition to the predefined default project alerts , you can also configure custom alert targets as needed.

Alert Target	Description
Resource
Storage	Monitor storage usage and send notifications if the usage exceeds a threshold for a certain duration.
Performance (read/write)
Bulk Insert (QPS)	Monitor the rate of bulk insert operations and send notifications if the rate exceeds a threshold for a certain duration.
Delete (QPS)	Monitor the rate of delete operations and send notifications if the rate exceeds a threshold for a certain duration.
Insert (QPS)	Monitor the rate of insert operations and send notifications if the rate exceeds a threshold for a certain duration.
Insert (VPS)	Monitor the rate of vector insert operations and send notifications if the rate exceeds a threshold for a certain duration.
Search (VPS)	Monitor the rate of vector search operations and send notifications if the rate exceeds a threshold for a certain duration.
Upsert (QPS)	Monitor the rate of upsert operations and send notifications if the rate exceeds a threshold for a certain duration.
Upsert (VPS)	Monitor the rate of vector upsert operations and send notifications if the rate exceeds a threshold for a certain duration.
Writes to Cluster Are Disabled	Monitor the write operations to the cluster to ensure they are not prohibited. Please scale out immediately if write prohibition has been triggered.
Performance (latency)
Delete Latency (Average)	Monitor the average latency for delete requests and send notifications if the latency exceeds a threshold for a certain duration.
Delete Latency (P99)	Monitor the P99 latency for delete requests and send notifications if the latency exceeds a threshold for a certain duration.
Insert Latency (Average)	Monitor the average latency for insert requests and send notifications if the latency exceeds a threshold for a certain duration.
Insert Latency (P99)	Monitor the P99 latency for insert requests and send notifications if the latency exceeds a threshold for a certain duration.
Query Latency (Average)	Monitor the average latency for query requests and send notifications if the latency exceeds a threshold for a certain duration.
Search Request Latency (Average)	Monitor the average latency for search requests and send notifications if the latency exceeds a threshold for a certain duration.
Upsert Latency (Average)	Monitor the average latency for upsert requests and send notifications if the latency exceeds a threshold for a certain duration.
Upsert Latency (P99)	Monitor the P99 latency for upsert requests and send notifications if the latency exceeds a threshold for a certain duration.
Performance (request failure rate)
Bulk Insert Failure Rate	Monitor the failure rate of bulk insert requests and send notifications if the rate exceeds a threshold for a certain duration.
Delete Failure Rate	Monitor the failure rate of delete requests and send notifications if the rate exceeds a threshold for a certain duration.
Insert Failure Rate	Monitor the failure rate of insert requests and send notifications if the rate exceeds a threshold for a certain duration.
Query Failure Rate	Monitor the failure rate of query requests and send notifications if the rate exceeds a threshold for a certain duration.
Search Failure Rate	Monitor the failure rate of search requests and send notifications if the rate exceeds a threshold for a certain duration.
Slow Query Count	Monitor the number of slow queries and send notifications if the value exceeds a threshold for a certain duration. By default, all requests whose latency is 5 seconds are considered slow queries.
Upsert Failure Rate	Monitor the failure rate of upsert requests and send notifications if the rate exceeds a threshold for a certain duration.
Data
Loaded Entities	Monitor the number of loaded entities and send notifications if the count exceeds a threshold for a certain duration.
Total Collections	Monitor the number of total collections and send notifications if the count exceeds a threshold for a certain duration.
Total Entities	Monitor the number of total entities and send notifications if the count exceeds a threshold for a certain duration.
Others
Cluster Is Abnormal	Monitor the status of a cluster to ensure it is functioning properly. This includes checking the cluster load and usage.

Cluster metrics​

Organization alerts​

Project alerts​

Default alert targets​

Custom alert targets​

Related topics​

Cluster metrics

Organization alerts

Project alerts

Default alert targets

Custom alert targets

Related topics