Skip to main content
Version: User Guides (Cloud)

Access Logs Overview
Public Preview

In high-volume workloads, understanding which data is accessed most frequently is critical for optimization decisions such as index tuning or partition strategy. Without visibility into query patterns, these decisions rely on guesswork.

Access Logs give you that visibility. When enabled on a Zilliz Cloud cluster, the access log pipeline captures query activities and delivers it as structured log files to your own object storage. You can then load these logs into a data warehouse and aggregate by entity ID to identify hot data, slow queries, and usage trends.

📘Notes

This release logs search- or query-class actions only: Search, HybridSearch, and Query. Support for the full action list is planned for a future release.

How the pipeline works

The access log pipeline has two phases: collection on the Zilliz Cloud side and analysis on yours.

TWlbbeheTo3aOnxE5t5cEYgcnbb

Zilliz Cloud collects and delivers logs

When you enable Access Logs on a cluster, Zilliz Cloud begins capturing query activities at the proxy layer. You configure two settings at the cluster level:

  • Sample rate: Control what percentage of requests are logged. The value ranges from 0 to 100 and represents the percentage of requests that are randomly sampled and written to access logs. For example, if you set the sample rate to 1, approximately 1% of requests will produce access log entries. For high‑volume workloads, a lower sample rate can reduce log storage costs while still providing enough data to analyze access patterns.

  • Output fields: Control which additional response fields are included in each access log entry. Common options are:

    • params.ids: Records the list of primary key IDs returned in the query result. This lets you aggregate by entity later to identify hot data and access frequency.

    • params.scores: Records the similarity score for each ID in params.ids, helping you understand which results were high‑confidence matches and which were borderline matches.

Logs are written in JSON Lines format (one JSON object per line) and delivered automatically to the object storage bucket you configured during setup. Each file follows a predictable path convention:

/<Cluster ID>/<Log type>/<Date>/<HH:MM:SS>-<UUID>.log

For example: /in03-c7be749d5f403ad/access/2024-12-20/09:16:53-jz5l7D8Q.log

For more information on parameters, refer to Access Log Reference.

You analyze the logs

Because logs arrive as standard JSON Lines files in your own bucket, you can process them with any tool that reads JSON. Each log entry contains structured fields including action, cluster_id, timestamp, and params.ids (the list of primary keys in the query result).

The general analysis approach is:

  1. Load the JSON Lines files into a data warehouse or analytics tool.

  2. Parse the action and params.ids fields from each entry.

  3. Aggregate by primary key across a time window to surface access frequency.

The result is a heat map of your data, which entities are queried most often, through which actions, and at what times.

Reliability and billing

The access log pipeline is designed around a core principle: logging never degrades query performance.

Non-blocking guarantee

Access log collection never delays or blocks user requests. If the system must choose between completing a query and writing a log entry, the query always wins.

Graceful degradation

Under extreme load, the system may drop access log entries to preserve query throughput. This means access logs provide a best-effort record of query activity rather than a guaranteed complete record.

Billing

Access log billing is time-based, not volume-based. The cost is 12.5% of the Query CU unit price, billed by the duration that Access Logs remain enabled on a cluster. This makes costs predictable regardless of query volume, where you pay the same whether your cluster handles 100 or 100,000 queries per hour.

What's next