Use Large TopK
A Zilliz Cloud collection allows you to retrieve up to 16,384 entities in a search or query result. To retrieve more entities beyond the topK limit, you can set the query mode to allow Zilliz Cloud to include millions of entities in a single search or query result, instead of using complex and time-consuming iterators.
This feature is available for Zilliz Cloud clusters that are compatible with Milvus v2.6.x. If you would like to try this feature, please get in touch with us.
Overview
By default, a Zilliz Cloud collection supports a maximum topK of 16,384 in search or query operations. When you need to retrieve more entities in a single request, such as batch similarity search or data mining scenarios, you can enable the Large TopK mode by setting the query_mode property to large_topk on your collection. This raises the topK limit to 1,000,000 (one million) entities.
Enabling Large TopK changes the underlying index strategy from the default Auto Index to IVF (Inverted File Index) with RaBitQ deep compression, which is optimized for high-recall, large-range retrieval at the cost of small-K query performance.
When to use Large TopK
Large TopK is designed for scenarios where you need to retrieve a very large number of similar entities in a single search, such as:
-
Batch similarity search: Find the top 100,000 or 1,000,000 most similar items for a given query vector.
-
Data mining and analysis: Extract large candidate sets for downstream processing, filtering, or model training.
-
Regression testing preparation: Retrieve large result sets to build test corpora for simulation teams.
For interactive, latency-sensitive online queries with small topK (e.g., top 10 or top 100), the default query mode is recommended.
Prerequisites and trade-offs
Before enabling Large TopK, be aware of the following trade-offs:
-
Small-K performance degradation: After switching to
large_topk, small-K queries (K < 16,384) will experience increased latency and reduced recall compared to the default mode. -
Query latency: Large TopK queries have significantly higher latency than standard queries. A topK of 100,000 may take several seconds, and a topK of 1,000,000 may take minutes.
-
Resource usage: A single large TopK query can consume several gigabytes of memory for result sorting. On Perf clusters, this may affect other queries running on the same cluster.
-
Offline preference: For batch workloads, consider using an On-demand Compute database. Databases use on-demand CUs and do not affect online services.
-
Index rebuild required: If your collection already has a vector index, you must release and drop the existing index before enabling Large TopK. Search will be unavailable during the rebuild.
Enable Large TopK
During collection creation (recommended)
If you know your collection will require Large TopK, specify it at creation time to avoid the cost of switching later:
from pymilvus import MilvusClient
client = MilvusClient(uri="your_uri", token="your_token")
client.create_collection(
collection_name="scenarios_corpus",
schema=schema,
index_params=index_params,
properties={"query_mode": "large_topk"}
)
On an existing collection
For an existing collection without a vector index, you can enable Large TopK directly:
client.alter_collection_properties(
collection_name="scenarios_corpus",
properties={"query_mode": "large_topk"}
)
For an existing collection with a vector index, you must first drop the index, then enable the mode, and finally recreate the index:
# 1. Release and drop the existing index
client.release_collection(collection_name="scenarios_corpus")
client.drop_index(collection_name="scenarios_corpus", index_name="vector_idx")
# 2. Enable Large TopK
client.alter_collection_properties(
collection_name="scenarios_corpus",
properties={"query_mode": "large_topk"}
)
# 3. Recreate the index (will use IVF + RaBitQ automatically)
client.create_index(
collection_name="scenarios_corpus",
index_params=index_params
)
client.load_collection(collection_name="scenarios_corpus")
Check current query mode
info = client.describe_collection(collection_name="scenarios_corpus")
query_mode = info["properties"].get("query_mode") # None means default mode
Disable Large TopK
To return to the default query mode, drop the query_mode property. Note that this also requires releasing and dropping the existing index first:
client.drop_collection_properties(
collection_name="scenarios_corpus",
property_keys=["query_mode"]
)
Perform a Large TopK search
Once Large TopK is enabled, use the standard search method with a large limit value:
Online search (Serving Cluster)
results = client.search(
collection_name="scenarios_serving",
data=[query_vector],
limit=500000
)
Offline search (On-demand Compute)
results = client.search(
collection_name="scenarios_corpus",
data=[query_vector],
limit=500000
)
Export search results
There is no dedicated export API for Large TopK results. You can compose existing capabilities to write results to a Managed Volume:
import pyarrow as pa
import pyarrow.parquet as pq
writer = None
try:
for i, qvec in enumerate(query_vectors):
results = client.search(
collection_name="corpus",
data=[qvec],
limit=100000,
output_fields=["scenario_id", "title"]
)
table = pa.Table.from_pylist([
{"query_id": i, "rank": j, **r}
for j, r in enumerate(results)
])
if writer is None:
writer = pq.ParquetWriter("/tmp/results.parquet", table.schema)
writer.write_table(table)
finally:
if writer is not None:
writer.close()
volume_file_manager.upload_file_to_volume(
source_file_path="/tmp/results.parquet",
target_volume_path="results/batch.parquet"
)
Performance expectations
The following table summarizes the performance characteristics of Large TopK queries:
Metric | Default mode | Large TopK mode |
|---|---|---|
TopK limit | 16,384 | 1,000,000 |
Small-K latency | Milliseconds | Higher (degraded) |
Large-K latency | Not supported | Seconds to minutes |
Memory per query | Low | Up to several GB |
Concurrency | High | Limited (queued) |
Best for | Online interaction | Batch, data mining |
Zilliz Cloud applies concurrency control to Large TopK queries to prevent resource exhaustion. Requests that exceed the concurrency limit are queued and processed when resources become available.
Limitations
-
Switching query modes requires rebuilding the vector index. During the rebuild, search is unavailable for the collection.
-
Large TopK is a collection-level setting. All indexes on the collection are affected.
-
Three cluster types (Performance-optimized, Capacity-optimized, and Tiered Storage) all support Large TopK.
FAQ
Q: Can I switch back and forth frequently?
Technically, yes, but it is not recommended. Each switch requires releasing, dropping, and recreating the index, during which search is unavailable. In an on-demand cluster, each rebuild also incurs Index Build CU charges.