Skip to main content
Version: User Guides (BYOC)

With Iterators

Zilliz Cloud provides search and query iterators for iterating results with a large volume of entities.

Overview

Iterators are powerful tools that help you navigate through large datasets by using primary key values and Boolean expressions. This can significantly improve the way you retrieve data from Zilliz Cloud. Unlike the traditional use of offset and limit parameters, which may become less efficient over time, iterators offer a more scalable solution.

Using iterators offers the following benefits:

  • Simplicity: Eliminates the complex offset and limit settings.

  • Efficiency: Provides scalable data retrieval by fetching only the data in need.

  • Consistency: Ensures a consistent dataset size with boolean filters.

📘Notes

This feature is available for the Zilliz Cloud clusters that are compatible with Milvus 2.3.x and above.

Preparations

The following steps repurpose the code to connect to a Zilliz Cloud cluster, quickly set up a collection, and insert over 10,000 randomly generated entities into the collection.

Step 1: Create a collection

from pymilvus import MilvusClient

CLUSTER_ENDPOINT = "YOUR_CLUSTER_ENDPOINT"
TOKEN = "YOUR_CLUSTER_TOKEN"

# 1. Set up a Milvus client
client = MilvusClient(
uri=CLUSTER_ENDPOINT,
token=TOKEN
)

# 2. Create a collection
client.create_collection(
collection_name="quick_setup",
dimension=5, # The dimension value should be greater than 1
)

Step 2: Insert randomly generated entities

# 3. Insert randomly generated vectors 
colors = ["green", "blue", "yellow", "red", "black", "white", "purple", "pink", "orange", "brown", "grey"]
data = []

for i in range(10000):
current_color = random.choice(colors)
current_tag = random.randint(1000, 9999)
data.append({
"id": i,
"vector": [ random.uniform(-1, 1) for _ in range(5) ],
"color": current_color,
"tag": current_tag,
"color_tag": f"{current_color}_{str(current_tag)}"
})

print(data[0])

# Output
#
# {
# "id": 0,
# "vector": [
# -0.5705990742218152,
# 0.39844925120642083,
# -0.8791287928610869,
# 0.024163154953680932,
# 0.6837669917169638
# ],
# "color": "purple",
# "tag": 7774,
# "color_tag": "purple_7774"
# }

res = client.insert(
collection_name="quick_setup",
data=data,
)

print(res)

# Output
#
# {
# "insert_count": 10000,
# "ids": [
# 0,
# 1,
# 2,
# 3,
# 4,
# 5,
# 6,
# 7,
# 8,
# 9,
# "(9990 more items hidden)"
# ]
# }

Search with iterator

Iterators make similarity searches more scalable. To search with an iterator, do as follows:

  1. Initialize the search iterator to define the search parameters and output fields.

  2. Use the next() method within a loop to paginate through the search results.

    • If the method returns an empty array, the loop ends, and no more pages are available.

    • All results carry the specified output fields.

  3. Manually call the close() method to close the iterator once all data has been retrieved.

from pymilvus import Collection

# 4. Search with iterator
connections.connect(uri=CLUSTER_ENDPOINT, token=TOKEN)
collection = Collection("quick_setup")

query_vectors = [[0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592]]
search_params = {
"metric_type": "IP",
"params": {"nprobe": 10}
}

iterator = collection.search_iterator(
data=query_vectors,
anns_field="vector",
batch_size=10,
param=search_params,
output_fields=["color_tag"],
limit=3
)

results = []

while True:
result = iterator.next()
if not result:
iterator.close()
break

for hit in result:
results.append(hit.to_dict())

print(results)

# Output
#
# [
# {
# "id": 1756,
# "distance": 2.0642056465148926,
# "entity": {
# "color_tag": "black_9109"
# }
# },
# {
# "id": 6488,
# "distance": 1.9437453746795654,
# "entity": {
# "color_tag": "purple_8164"
# }
# },
# {
# "id": 3338,
# "distance": 1.9107104539871216,
# "entity": {
# "color_tag": "brown_8121"
# }
# }
# ]

Query with an iterator

# 6. Query with iterator
iterator = collection.query_iterator(
batch_size=10,
expr="color_tag like \"brown_8%\"",
output_fields=["color_tag"]
)

results = []

while True:
result = iterator.next()
if not result:
iterator.close()
break

results += result

# 7. Check the results
print(f"There are {len(results)} entities found. The first 3 are as follows:")

# Output
#
# There are 99 entities found. The first 3 are as follows:

print(results[:3])

# Output
#
# [
# {
# "color_tag": "brown_8785",
# "id": 94
# },
# {
# "color_tag": "brown_8568",
# "id": 176
# },
# {
# "color_tag": "brown_8721",
# "id": 289
# }
# ]

Typical use cases

In typical cases, you can use the above iterators to export certain or all entities from a collection. For details, refer to Export Data with Iterators.