Version: User Guides (Cloud)

Integrate with Prometheus

Prometheus is a monitoring system that collects metrics from configured targets at specified intervals, evaluates rule expressions, displays the results, and can trigger alerts based on specific conditions.

By integrating Zilliz Cloud with Prometheus, you can collect and monitor metrics related to your Zilliz Cloud deployment.

📘Notes

Prometheus integration is supported only for Zilliz Cloud clusters running the Dedicated-Enterprise or BYOC plan.

Configure Prometheus to scrape Zilliz Cloud metrics

To monitor Zilliz Cloud clusters with Prometheus, follow these steps:

Access the Prometheus.yml configuration file on your Prometheus server. For more information, refer to Configuration.

Add the following snippet to the scrape_configs section of the Prometheus.yml file. Replace the placeholders with the appropriate values:

{{apiKey}}: Your Zilliz Cloud API key for accessing cluster metrics.
{{clusterId}}: The ID of the Zilliz Cloud cluster you wish to monitor.

scrape_configs:
  - job_name: in01-06b8404b623xxxx
    scheme: https
    metrics_path: /v2/clusters/{{clusterId}}/metrics/export
    authorization:
      type: Bearer
      credentials: {{apiKey}}
    
    static_configs:
        - targets: ["api.cloud.zilliz.com"]

Parameter	Description
`job_name`	Human-readable label assigned to scraped metrics.
`scheme`	The protocol scheme used to scrape metrics from the Zilliz Cloud endpoints, which is set to `https`.
`metrics_path`	The path on the target service that provides the metric data.
`authorization.type`	The authentication type used to access the Zilliz Cloud metrics. Set the value to `Bearer`.
`authorization.credentials`	The API key used for authorization to access the Zilliz Cloud metrics endpoints.
`static_configs.targets`	The static target that Prometheus will scrape, which should be `api.cloud.zilliz.com`, the host address of the Zilliz Cloud RESTful API.

Save the changes to the Prometheus.yml file.

For more details, refer to Prometheus official documentation.

Example scraped metrics

The following are example Prometheus metrics scraped from the Zilliz Cloud /metrics/export endpoint:

# HELP zilliz_cluster_capacity Cluster capacity ratio
# TYPE zilliz_cluster_capacity gauge
zilliz_cluster_capacity 0.88
# HELP zilliz_cluster_computation Cluster computation ratio
# TYPE zilliz_cluster_computation gauge
zilliz_cluster_computation 0.1
# HELP zilliz_cluster_storage_bytes Cluster storage usage
# TYPE zilliz_cluster_storage_bytes gauge
zilliz_cluster_storage_bytes 8.9342782E7
# HELP zilliz_request_vectors_total Total number of vectors in requests
# TYPE zilliz_request_vectors_total counter
zilliz_request_vectors_total{request_type="bulk_insert"} 1.0
zilliz_request_vectors_total{request_type="delete"} 1.0
zilliz_request_vectors_total{request_type="insert"} 1.0
zilliz_request_vectors_total{request_type="search"} 1.0
zilliz_request_vectors_total{request_type="upsert"} 1.0

Zilliz Cloud metric labels

The metrics exposed by Zilliz Cloud are labeled with the following identifiers.

Label Name	Description	Values
`cluster_id`	The ID of the Zilliz Cloud cluster that the metrics are from.	-
`org_id`	The ID of the organization that owns the Zilliz Cloud cluster.	-
`project_id`	The ID of the project within the organization that the cluster belongs to.	-
`collection_name`	The name of the collection being monitored.	-
`request_type`	The type of operation performed on the data.	`insert`, `upsert`, `delete`, `bulk_insert`, `flush`, `search`, `query`
`status`	The outcome of the data operation.	`success`, `fail`

Available metrics

The following table lists the metrics available for Zilliz Cloud, along with their types, descriptions, and associated labels.

Metric Name	Type	Description	Labels
`zilliz_cluster_computation`	Gauge	The current computation capacity utilization.	`cluster_id`, `org_id`, `project_id`
`zilliz_cluster_capacity`	Gauge	The current storage capacity utilization.	`cluster_id`, `org_id`, `project_id`
`zilliz_storage_bytes`	Gauge	The total storage space used.	`cluster_id`, `org_id`, `project_id`
`zilliz_cluster_write_capacity`	Gauge	The current write throughput.	`cluster_id`, `org_id`, `project_id`
`zilliz_requests_total`	Counter	The total number of requests processed.	`cluster_id`, `org_id`, `project_id`, `request_type`, `status`
`zilliz_request_vectors_total`	Counter	The total number of vectors manipulated across all requests.	`cluster_id`, `org_id`, `project_id`, `request_type`
`zilliz_request_duration_seconds_bucket`	Histogram	The latency distribution of requests processed.	`cluster_id`, `org_id`, `project_id`, `request_type`
`zilliz_slow_queries_total`	Counter	The number of queries exceeding the latency threshold.	`cluster_id`, `org_id`, `project_id`
`zilliz_entities`	Gauge	The total number of entities stored.	`cluster_id`, `org_id`, `project_id`, `collection_name`
`zilliz_loaded_entities`	Gauge	The number of entities currently loaded in memory.	`cluster_id`, `org_id`, `project_id`, `collection_name`
`zilliz_indexed_entities`	Gauge	The number of entities that have been indexed.	`cluster_id`, `org_id`, `project_id`, `collection_name`
`zilliz_collections`	Gauge	The total number of collections.	`cluster_id`, `org_id`, `project_id`
`zilliz_unloaded_collections`	Gauge	The number of unloaded collections.	`cluster_id`, `org_id`, `project_id`

Example Prometheus queries

Here are some example queries you can use to analyze Zilliz Cloud metrics with Prometheus:

Calculate insert QPS

rate(zilliz_requests_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])

Calculate insert VPS

rate(zilliz_request_vectors_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])

Calculate 70th percentile insert latency

histogram_quantile(
    0.70, 
    sum(
        rate(zilliz_request_duration_seconds_bucket{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
    ) by (le) 
)

Calculate insert request fail rate

rate(zilliz_requests_total{cluster_id=?,status!='success'}[$__rate_interval])
/
rate(zilliz_requests_total{cluster_id=?}[$__rate_interval])

Calculate the number of slow queries per 1 minute

sum(increase(zilliz_slow_queries_total{cluster_id=?}[1m]))

Calculate the number of slow queries per 5 minutes

sum(increase(zilliz_slow_queries_total{cluster_id=?}[5m]))

Configure Prometheus to scrape Zilliz Cloud metrics​

Example scraped metrics​

Zilliz Cloud metric labels​

Available metrics​

Example Prometheus queries​

Configure Prometheus to scrape Zilliz Cloud metrics

Example scraped metrics

Zilliz Cloud metric labels

Available metrics

Example Prometheus queries