Skip to main content
Version: User Guides (Cloud)

Search, Query, and Get

This guide walks you through performing nearest-neighbor searches and queries. A search involves looking for the closest vector to a given vector in a collection, while a query filters out entities that match a certain condition.

Overview

Zilliz Cloud employs Approximate Nearest Neighbor (ANN) search algorithms to process vector search requests. A search returns top-K entities that are most similar to the input query vector. To optimize throughput, bulk search is available, where multiple query vectors are searched in parallel. Filters, defined using boolean expressions, can be optionally applied to narrow down the scope of ANN searches.

On the other hand, a query filters out entities in a collection based on a certain condition defined using boolean expressions. The result of the query is a set of entities that match the specified condition. Unlike a search, which finds the closest vector to a given vector in a collection, queries are used to filter out entities based on specific criteria.

Before you start

Before performing ANN searches and queries, ensure that

A single-vector search request involves using only one vector and asking for the top-K entities that are most similar to the input query vector.

Here is an example of a single-vector search.

# Search using a MilvusClient object
from pymilvus import MilvusClient

# (Continued)

# Read the dataset
with open(DATASET_PATH) as f:
data = json.load(f)

# single vector search
res = client.search(
collection_name=COLLECTION_NAME,
data=[data["rows"][0]["title_vector"]],
output_fields=["title", "link"],
limit=5
)

print(res)

# Output
#
# [
# [
# {
# "id": 0,
# "distance": 1.0,
# "entity": {
# "title": "The Reported Mortality Rate of Coronavirus Is Not Important",
# "link": "https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912"
# }
# },
# {
# "id": 70,
# "distance": 0.7525784969329834,
# "entity": {
# "title": "How bad will the Coronavirus Outbreak get? \u2014 Predicting the outbreak figures",
# "link": "https://towardsdatascience.com/how-bad-will-the-coronavirus-outbreak-get-predicting-the-outbreak-figures-f0b8e8b61991"
# }
# },
# {
# "id": 160,
# "distance": 0.7132074236869812,
# "entity": {
# "title": "The Funeral Industry is a Killer",
# "link": "https://medium.com/swlh/the-funeral-industry-is-a-killer-1775118a7778"
# }
# },
# {
# "id": 111,
# "distance": 0.688888430595398,
# "entity": {
# "title": "The role of AI in web-based ADA and WCAG compliance",
# "link": "https://towardsdatascience.com/the-role-of-ai-in-web-based-ada-and-wcag-compliance-4fc09e69f416"
# }
# }
# ]
# ]

Before searching a collection, you must define the search parameters. Ensure that the metric type matches the one defined in the index parameters. Then, reference the search parameters in the search request and set the query vector, vector field name, limits, and any other applicable parameters.

The script above searches for articles with title vectors that are most similar to the given vector. The results display the top 5 most similar entities, along with their primary keys and distances.

The search result of a query vector is an iterable hits object. You can iterate over the hits to get a hit object that matches one of the nearest neighbors of the query vector in the search. If you have defined some output fields in the search request, use the get method of a hit object to get the value of a defined output field.

You can conduct a bulk search by providing multiple query vectors in a single request. In most cases, bulk search is more efficient than conducting single-vector searches because the total latency is much lower than searching against these query vectors in individual requests.

Note that RESTful API does not support Bulk search. You can use an iteration to iterate over the rows in the dataset and send a search request per row.

# Search using a MilvusClient object
from pymilvus import MilvusClient

# (Continued)

# Read the dataset
with open(DATASET_PATH) as f:
data = json.load(f)

# bulk vector search
res = client.search(
collection_name=COLLECTION_NAME,
data=[data["rows"][0]['title_vector'], data["rows"][1]['title_vector']],
output_fields=["title", "link"],
limit=5
)

print(res)

# Output
#
# [
# [
# {
# "id": 0,
# "distance": 1.0,
# "entity": {
# "title": "The Reported Mortality Rate of Coronavirus Is Not Important",
# "link": "https://medium.com/swlh/the-reported-mortality-rate-of-coronavirus-is-not-important-369989c8d912"
# }
# },
# {
# "id": 70,
# "distance": 0.7525784969329834,
# "entity": {
# "title": "How bad will the Coronavirus Outbreak get? \u2014 Predicting the outbreak figures",
# "link": "https://towardsdatascience.com/how-bad-will-the-coronavirus-outbreak-get-predicting-the-outbreak-figures-f0b8e8b61991"
# }
# },
# {
# "id": 160,
# "distance": 0.7132074236869812,
# "entity": {
# "title": "The Funeral Industry is a Killer",
# "link": "https://medium.com/swlh/the-funeral-industry-is-a-killer-1775118a7778"
# }
# },
# {
# "id": 111,
# "distance": 0.688888430595398,
# "entity": {
# "title": "The role of AI in web-based ADA and WCAG compliance",
# "link": "https://towardsdatascience.com/the-role-of-ai-in-web-based-ada-and-wcag-compliance-4fc09e69f416"
# }
# }
# ],
# [
# {
# "id": 1,
# "distance": 0.9999999403953552,
# "entity": {
# "title": "Dashboards in Python: 3 Advanced Examples for Dash Beginners and Everyone Else",
# "link": "https://medium.com/swlh/dashboards-in-python-3-advanced-examples-for-dash-beginners-and-everyone-else-b1daf4e2ec0a"
# }
# },
# {
# "id": 4,
# "distance": 0.7625511884689331,
# "entity": {
# "title": "Python NLP Tutorial: Information Extraction and Knowledge Graphs",
# "link": "https://medium.com/swlh/python-nlp-tutorial-information-extraction-and-knowledge-graphs-43a2a4c4556c"
# }
# },
# {
# "id": 155,
# "distance": 0.7575345039367676,
# "entity": {
# "title": "How To Use Web Sockets (Socket IO) With Digital Ocean Load Balancers And Kubernetes (DOK8S) With Ingress Nginx",
# "link": "https://medium.com/swlh/how-to-use-web-sockets-socket-io-with-digital-ocean-load-balancers-and-kubernetes-dok8s-with-e4dd5531c67e"
# }
# },
# {
# "id": 17,
# "distance": 0.7366296052932739,
# "entity": {
# "title": "Blockchain, IoT and AI \u2014 A Perfect Fit",
# "link": "https://medium.com/swlh/blockchain-iot-and-ai-a-perfect-fit-1-e04c6ad73fbc"
# }
# },
# {
# "id": 113,
# "distance": 0.7317826747894287,
# "entity": {
# "title": "AutoAI: The Magic of Converting Data to Models",
# "link": "https://towardsdatascience.com/autoai-the-magic-of-converting-data-to-models-185f26d22234"
# }
# }
# ]
# ]

The number of hits objects in a bulk search result equals the number of query vectors in the search request. You can access the hits object of a query vector using its index in the query vector list.

Search with filters

A filter is a boolean expression used to specify the conditions for an ANN search. You can use arithmetic, logical, and comparison operators to construct filters.

OperatorDescription
add (&&)True if both operands are true
or (||)True if either operand is true
+, -, *, /Addition, subtraction, multiplication, and division
**Exponent
%Modulus
<, >Less than, greater than
==, !=Equal to, not equal to
<=, >=Less than or equal to, greater than or equal to
notReverses the result of a given condition.
likeCompares a value to similar values using wildcard operators.
For example, like "prefix%" matches strings that begin with "prefix".
inTests if an expression matches any value in a list of values.

The following are some example ANN searches with filters:

  • Filter articles that readers can finish within 10 to 15 minutes.

    # Search using a MilvusClient object
    from pymilvus import MilvusClient

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter="10 < reading_time < 15",
    output_fields=["title", "reading_time"],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 0,
    # "distance": 1.0,
    # "entity": {
    # "title": "The Reported Mortality Rate of Coronavirus Is Not Important",
    # "reading_time": 13
    # }
    # },
    # {
    # "id": 7,
    # "distance": 0.6361639499664307,
    # "entity": {
    # "title": "Building Comprehensible Customer Churn Prediction Models",
    # "reading_time": 13
    # }
    # },
    # {
    # "id": 103,
    # "distance": 0.6340133547782898,
    # "entity": {
    # "title": "A Primer on Domain Adaptation",
    # "reading_time": 12
    # }
    # },
    # {
    # "id": 90,
    # "distance": 0.6230067610740662,
    # "entity": {
    # "title": "SVM: An optimization problem",
    # "reading_time": 11
    # }
    # }
    # ]
    # ]

  • Filter articles that have more than 1500 claps and 15 responses.

    # Search using a MilvusClient object

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter='claps > 1500 and responses > 15',
    output_fields=['title', 'claps', 'responses'],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 130,
    # "distance": 0.5737711787223816,
    # "entity": {
    # "title": "The Only \u201cCompetition\u201d Slide You\u2019ll Ever Need in a Pitch Deck",
    # "claps": 1940,
    # "responses": 25
    # }
    # },
    # {
    # "id": 66,
    # "distance": 0.5508044362068176,
    # "entity": {
    # "title": "How to Be Memorable in Social Settings",
    # "claps": 8600,
    # "responses": 34
    # }
    # },
    # {
    # "id": 69,
    # "distance": 0.4541875422000885,
    # "entity": {
    # "title": "Top 10 In-Demand programming languages to learn in 2020",
    # "claps": 3000,
    # "responses": 18
    # }
    # }
    # ]
    # ]

  • Filter articles published by Towards Data Science.

    # Search using a MilvusClient object

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter='publication == "Towards Data Science"',
    output_fields=["title", "publication"],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 70,
    # "distance": 0.7525784969329834,
    # "entity": {
    # "title": "How bad will the Coronavirus Outbreak get? \u2014 Predicting the outbreak figures",
    # "publication": "Towards Data Science"
    # }
    # },
    # {
    # "id": 111,
    # "distance": 0.688888430595398,
    # "entity": {
    # "title": "The role of AI in web-based ADA and WCAG compliance",
    # "publication": "Towards Data Science"
    # }
    # },
    # {
    # "id": 103,
    # "distance": 0.6340133547782898,
    # "entity": {
    # "title": "A Primer on Domain Adaptation",
    # "publication": "Towards Data Science"
    # }
    # },
    # {
    # "id": 94,
    # "distance": 0.6249956488609314,
    # "entity": {
    # "title": "Why Machine Learning Validation Sets Grow Stale",
    # "publication": "Towards Data Science"
    # }
    # },
    # {
    # "id": 90,
    # "distance": 0.6230067610740662,
    # "entity": {
    # "title": "SVM: An optimization problem",
    # "publication": "Towards Data Science"
    # }
    # }
    # ]
    # ]

  • Filter articles published by authors rather than Towards Data Science or Personal Growth.

    # Search using a MilvusClient object

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter='publication not in ["Towards Data Science", "Personal Growth"]',
    output_fields=["title", "publication"],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 0,
    # "distance": 1.0,
    # "entity": {
    # "title": "The Reported Mortality Rate of Coronavirus Is Not Important",
    # "publication": "The Startup"
    # }
    # },
    # {
    # "id": 160,
    # "distance": 0.7132074236869812,
    # "entity": {
    # "title": "The Funeral Industry is a Killer",
    # "publication": "The Startup"
    # }
    # },
    # {
    # "id": 196,
    # "distance": 0.6882869601249695,
    # "entity": {
    # "title": "The Question We Should Be Asking About the Cost of Youth Sports",
    # "publication": "The Startup"
    # }
    # },
    # {
    # "id": 51,
    # "distance": 0.6719912886619568,
    # "entity": {
    # "title": "What if Facebook had to pay you for the profit they are making?",
    # "publication": "The Startup"
    # }
    # }
    # ]
    # ]

  • Filter articles whose titles start with Top.

    # Search using a MilvusClient object

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter='title like "Top%"',
    output_fields=["title", "link"],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 75,
    # "distance": 0.5751269459724426,
    # "entity": {
    # "title": "Top Trends of Graph Machine Learning in 2020",
    # "link": "https://towardsdatascience.com/top-trends-of-graph-machine-learning-in-2020-1194175351a3"
    # }
    # },
    # {
    # "id": 76,
    # "distance": 0.5366824865341187,
    # "entity": {
    # "title": "Top 20 Data Science Discord servers to join in 2020",
    # "link": "https://towardsdatascience.com/top-20-data-science-discord-servers-to-join-in-2020-567b45738e9d"
    # }
    # },
    # {
    # "id": 74,
    # "distance": 0.5235060453414917,
    # "entity": {
    # "title": "Top 10 Artificial Intelligence Trends for 2020",
    # "link": "https://towardsdatascience.com/top-10-ai-trends-for-2020-d6294cfee2bd"
    # }
    # },
    # {
    # "id": 97,
    # "distance": 0.5228530168533325,
    # "entity": {
    # "title": "Top 5 AI Conferences To Visit in Europe in 2020",
    # "link": "https://towardsdatascience.com/top-5-ai-conferences-to-visit-in-europe-in-2020-7a6f068aff34"
    # }
    # },
    # {
    # "id": 69,
    # "distance": 0.4541875422000885,
    # "entity": {
    # "title": "Top 10 In-Demand programming languages to learn in 2020",
    # "link": "https://towardsdatascience.com/top-10-in-demand-programming-languages-to-learn-in-2020-4462eb7d8d3e"
    # }
    # }
    # ]
    # ]

  • Filter articles from Towards Data Science that readers can finish within 10 to 15 minutes or have more than 1500 responses and 15 claps.

    # Search using a MilvusClient object

    res = client.search(
    collection_name=COLLECTION_NAME,
    data=[data["rows"][0]["title_vector"]],
    filter='(publication == "Towards Data Science") and ((claps > 1500 and responses > 15) or (10 < reading_time < 15))',
    output_fields=["title", "publication", "claps", "responses", "reading_time"],
    limit=5
    )

    print(res)

    # Output
    #
    # [
    # [
    # {
    # "id": 103,
    # "distance": 0.6340133547782898,
    # "entity": {
    # "title": "A Primer on Domain Adaptation",
    # "reading_time": 12,
    # "publication": "Towards Data Science",
    # "claps": 74,
    # "responses": 0
    # }
    # },
    # {
    # "id": 90,
    # "distance": 0.6230067610740662,
    # "entity": {
    # "title": "SVM: An optimization problem",
    # "reading_time": 11,
    # "publication": "Towards Data Science",
    # "claps": 44,
    # "responses": 0
    # }
    # },
    # {
    # "id": 75,
    # "distance": 0.5751269459724426,
    # "entity": {
    # "title": "Top Trends of Graph Machine Learning in 2020",
    # "reading_time": 11,
    # "publication": "Towards Data Science",
    # "claps": 1100,
    # "responses": 0
    # }
    # },
    # {
    # "id": 99,
    # "distance": 0.5726118087768555,
    # "entity": {
    # "title": "Finding optimal NBA physiques using data visualization with Python",
    # "reading_time": 13,
    # "publication": "Towards Data Science",
    # "claps": 89,
    # "responses": 0
    # }
    # },
    # {
    # "id": 80,
    # "distance": 0.564883828163147,
    # "entity": {
    # "title": "Understanding Natural Language Processing: how AI understands our languages",
    # "reading_time": 13,
    # "publication": "Towards Data Science",
    # "claps": 109,
    # "responses": 0
    # }
    # }
    # ]
    # ]