Integrate with Prometheus
Prometheus is a monitoring system that collects metrics from configured targets at specified intervals, evaluates rule expressions, displays the results, and can trigger alerts based on specific conditions.
By integrating Zilliz Cloud with Prometheus, you can collect and monitor metrics related to your Zilliz Cloud deployment.
Prometheus integration is supported only for Zilliz Cloud clusters running the Dedicated-Enterprise or BYOC plan.
Configure Prometheus to scrape Zilliz Cloud metrics​
To monitor Zilliz Cloud clusters with Prometheus, follow these steps:
-
Access the
Prometheus.yml
configuration file on your Prometheus server. For more information, refer to Configuration. -
Add the following snippet to the
scrape_configs
section of thePrometheus.yml
file. Replace the placeholders with the appropriate values:-
{{apiKey}}
: Your Zilliz Cloud API key for accessing cluster metrics. -
{{clusterId}}
: The ID of the Zilliz Cloud cluster you wish to monitor.
scrape_configs:
- job_name: in01-06b8404b623xxxx
scheme: https
metrics_path: /v2/clusters/{{clusterId}}/metrics/export
authorization:
type: Bearer
credentials: {{apiKey}}
static_configs:
- targets: ["api.cloud.zilliz.com"]Parameter
Description
job_name
Human-readable label assigned to scraped metrics.
scheme
The protocol scheme used to scrape metrics from the Zilliz Cloud endpoints, which is set to
https
.metrics_path
The path on the target service that provides the metric data.
authorization.type
The authentication type used to access the Zilliz Cloud metrics. Set the value to
Bearer
.authorization.credentials
The API key used for authorization to access the Zilliz Cloud metrics endpoints.
static_configs.targets
The static target that Prometheus will scrape, which should be
api.cloud.zilliz.com
, the host address of the Zilliz Cloud RESTful API. -
-
Save the changes to the
Prometheus.yml
file.
For more details, refer to Prometheus official documentation.
Example scraped metrics​
The following are example Prometheus metrics scraped from the Zilliz Cloud /metrics/export
endpoint:
# HELP zilliz_cluster_capacity Cluster capacity ratio
# TYPE zilliz_cluster_capacity gauge
zilliz_cluster_capacity 0.88
# HELP zilliz_cluster_computation Cluster computation ratio
# TYPE zilliz_cluster_computation gauge
zilliz_cluster_computation 0.1
# HELP zilliz_cluster_storage_bytes Cluster storage usage
# TYPE zilliz_cluster_storage_bytes gauge
zilliz_cluster_storage_bytes 8.9342782E7
# HELP zilliz_request_vectors_total Total number of vectors in requests
# TYPE zilliz_request_vectors_total counter
zilliz_request_vectors_total{request_type="bulk_insert"} 1.0
zilliz_request_vectors_total{request_type="delete"} 1.0
zilliz_request_vectors_total{request_type="insert"} 1.0
zilliz_request_vectors_total{request_type="search"} 1.0
zilliz_request_vectors_total{request_type="upsert"} 1.0
Zilliz Cloud metric labels​
The metrics exposed by Zilliz Cloud are labeled with the following identifiers.
Label Name | Description | Values |
---|---|---|
| The ID of the Zilliz Cloud cluster that the metrics are from. | - |
| The ID of the organization that owns the Zilliz Cloud cluster. | - |
| The ID of the project within the organization that the cluster belongs to. | - |
| The name of the collection being monitored. | - |
| The type of operation performed on the data. |
|
| The outcome of the data operation. |
|
Available metrics​
The following table lists the metrics available for Zilliz Cloud, along with their types, descriptions, and associated labels.
Metric Name | Type | Description | Labels |
---|---|---|---|
| Gauge | The current computation capacity utilization. |
|
| Gauge | The current storage capacity utilization. |
|
| Gauge | The total storage space used. |
|
| Gauge | The current write throughput. |
|
| Counter | The total number of requests processed. |
|
| Counter | The total number of vectors manipulated across all requests. |
|
| Histogram | The latency distribution of requests processed. |
|
| Counter | The number of queries exceeding the latency threshold. |
|
| Gauge | The total number of entities stored. |
|
| Gauge | The number of entities currently loaded in memory. |
|
| Gauge | The number of entities that have been indexed. |
|
| Gauge | The total number of collections. |
|
| Gauge | The number of unloaded collections. |
|
Example Prometheus queries​
Here are some example queries you can use to analyze Zilliz Cloud metrics with Prometheus:
-
Calculate insert QPS
rate(zilliz_requests_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
-
Calculate insert VPS
rate(zilliz_request_vectors_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
-
Calculate 70th percentile insert latency
histogram_quantile(
0.70,
sum(
rate(zilliz_request_duration_seconds_bucket{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
) by (le)
) -
Calculate insert request fail rate
rate(zilliz_requests_total{cluster_id=?,status!='success'}[$__rate_interval])
/
rate(zilliz_requests_total{cluster_id=?}[$__rate_interval]) -
Calculate the number of slow queries per 1 minute
sum(increase(zilliz_slow_queries_total{cluster_id=?}[1m]))
-
Calculate the number of slow queries per 5 minutes
sum(increase(zilliz_slow_queries_total{cluster_id=?}[5m]))