Skip to main content
Version: User Guides (BYOC)

Integrate with Prometheus

Prometheus is a monitoring system that collects metrics from configured targets at specified intervals, evaluates rule expressions, displays the results, and can trigger alerts based on specific conditions.

By integrating Zilliz Cloud with Prometheus, you can collect and monitor metrics related to your Zilliz Cloud deployment.

📘Notes

Prometheus integration is supported only for Zilliz Cloud clusters running the Dedicated-Enterprise or BYOC plan.

Configure Prometheus to scrape Zilliz Cloud metrics​

To monitor Zilliz Cloud clusters with Prometheus, follow these steps:

  1. Access the Prometheus.yml configuration file on your Prometheus server. For more information, refer to Configuration.

  2. Add the following snippet to the scrape_configs section of the Prometheus.yml file. Replace the placeholders with the appropriate values:

    • {{apiKey}}: Your Zilliz Cloud API key for accessing cluster metrics.

    • {{clusterId}}: The ID of the Zilliz Cloud cluster you wish to monitor.

    scrape_configs:
    - job_name: in01-06b8404b623xxxx
    scheme: https
    metrics_path: /v2/clusters/{{clusterId}}/metrics/export
    authorization:
    type: Bearer
    credentials: {{apiKey}}

    static_configs:
    - targets: ["api.cloud.zilliz.com"]

    Parameter

    Description

    job_name

    Human-readable label assigned to scraped metrics.

    scheme

    The protocol scheme used to scrape metrics from the Zilliz Cloud endpoints, which is set to https.

    metrics_path

    The path on the target service that provides the metric data.

    authorization.type

    The authentication type used to access the Zilliz Cloud metrics. Set the value to Bearer.

    authorization.credentials

    The API key used for authorization to access the Zilliz Cloud metrics endpoints.

    static_configs.targets

    The static target that Prometheus will scrape, which should be api.cloud.zilliz.com, the host address of the Zilliz Cloud RESTful API.

  3. Save the changes to the Prometheus.yml file.

For more details, refer to Prometheus official documentation.

Example scraped metrics​

The following are example Prometheus metrics scraped from the Zilliz Cloud /metrics/export endpoint:

# HELP zilliz_cluster_capacity Cluster capacity ratio
# TYPE zilliz_cluster_capacity gauge
zilliz_cluster_capacity 0.88
# HELP zilliz_cluster_computation Cluster computation ratio
# TYPE zilliz_cluster_computation gauge
zilliz_cluster_computation 0.1
# HELP zilliz_cluster_storage_bytes Cluster storage usage
# TYPE zilliz_cluster_storage_bytes gauge
zilliz_cluster_storage_bytes 8.9342782E7
# HELP zilliz_request_vectors_total Total number of vectors in requests
# TYPE zilliz_request_vectors_total counter
zilliz_request_vectors_total{request_type="bulk_insert"} 1.0
zilliz_request_vectors_total{request_type="delete"} 1.0
zilliz_request_vectors_total{request_type="insert"} 1.0
zilliz_request_vectors_total{request_type="search"} 1.0
zilliz_request_vectors_total{request_type="upsert"} 1.0

Zilliz Cloud metric labels​

The metrics exposed by Zilliz Cloud are labeled with the following identifiers.

Label Name

Description

Values

cluster_id

The ID of the Zilliz Cloud cluster that the metrics are from.

-

org_id

The ID of the organization that owns the Zilliz Cloud cluster.

-

project_id

The ID of the project within the organization that the cluster belongs to.

-

collection_name

The name of the collection being monitored.

-

request_type

The type of operation performed on the data.

insert, upsert, delete, bulk_insert, flush, search, query

status

The outcome of the data operation.

success, fail

Available metrics​

The following table lists the metrics available for Zilliz Cloud, along with their types, descriptions, and associated labels.

Metric Name

Type

Description

Labels

zilliz_cluster_computation

Gauge

The current computation capacity utilization.

cluster_id, org_id, project_id

zilliz_cluster_capacity

Gauge

The current storage capacity utilization.

cluster_id, org_id, project_id

zilliz_storage_bytes

Gauge

The total storage space used.

cluster_id, org_id, project_id

zilliz_cluster_write_capacity

Gauge

The current write throughput.

cluster_id, org_id, project_id

zilliz_requests_total

Counter

The total number of requests processed.

cluster_id, org_id, project_id, request_type, status

zilliz_request_vectors_total

Counter

The total number of vectors manipulated across all requests.

cluster_id, org_id, project_id, request_type

zilliz_request_duration_seconds_bucket

Histogram

The latency distribution of requests processed.

cluster_id, org_id, project_id, request_type

zilliz_slow_queries_total

Counter

The number of queries exceeding the latency threshold.

cluster_id, org_id, project_id

zilliz_entities

Gauge

The total number of entities stored.

cluster_id, org_id, project_id, collection_name

zilliz_loaded_entities

Gauge

The number of entities currently loaded in memory.

cluster_id, org_id, project_id, collection_name

zilliz_indexed_entities

Gauge

The number of entities that have been indexed.

cluster_id, org_id, project_id, collection_name

zilliz_collections

Gauge

The total number of collections.

cluster_id, org_id, project_id

zilliz_unloaded_collections

Gauge

The number of unloaded collections.

cluster_id, org_id, project_id

Example Prometheus queries​

Here are some example queries you can use to analyze Zilliz Cloud metrics with Prometheus:

  • Calculate insert QPS

    rate(zilliz_requests_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
  • Calculate insert VPS

    rate(zilliz_request_vectors_total{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
  • Calculate 70th percentile insert latency

    histogram_quantile(
    0.70,
    sum(
    rate(zilliz_request_duration_seconds_bucket{cluster_id='in01-xxxxx',request_type='insert'}[$__rate_interval])
    ) by (le)
    )
  • Calculate insert request fail rate

    rate(zilliz_requests_total{cluster_id=?,status!='success'}[$__rate_interval])
    /
    rate(zilliz_requests_total{cluster_id=?}[$__rate_interval])
  • Calculate the number of slow queries per 1 minute

    sum(increase(zilliz_slow_queries_total{cluster_id=?}[1m]))
  • Calculate the number of slow queries per 5 minutes

    sum(increase(zilliz_slow_queries_total{cluster_id=?}[5m]))