Skip to main content

search()

Addedv2.3.xModifiedv3.0.x

This operation conducts a vector similarity search with an optional scalar filtering expression.

📘Notes

This method applies only to dedicated serving clusters and on-demand compute.

  • For this operation in a collection of a serving cluster, please create MilvusClient with the cluster endpoint.

  • Free & Serverless

https://{cluster-id}.serverless.{region}.vectordb.zillizcloud.com

  • Dedicated

https://{cluster-id}.{region}.vectordb.zillizcloud.com:19530

  • For this operation in a collection for on-demand compute, create MilvusClient with the project endpoints, and then create a session to attach to an on-demand cluster for searches.

https://{project-id}.{region}.api.zillizcloud.com

Request syntax

search(
self,
collection_name: str,
data: Union[List[list], list],
ids: Union[List[str], List[int]],
filter: str = "",
limit: int = 10,
output_fields: Optional[List[str]] = None,
search_params: Optional[dict] = None,
timeout: Optional[float] = None,
partition_names: Optional[List[str]] = None,
anns_field: Optional[str] = None,
ranker: Optional[Union[Function, FunctionScore]] = None,
highlighter: Optional[Highlighter] = None,
group_by: Optional[GroupBy] = None,
order_by_fields: Optional[List[dict]] = None,
search_aggregation: Optional[SearchAggregation] = None,
**kwargs,
) -> List[List[dict]]

PARAMETERS:

  • collection_name (str) -

    [REQUIRED]

    The name of an existing collection.

  • data (List[list], list]) -

    [REQUIRED]

    A list of vector embeddings.

    Zilliz Cloud searches for the most similar vector embeddings to the specified ones.

    This parameter is mutually exclusive with ids.

  • ids (Union[List[str], List[int]]) -

    A list of primary keys.

    Zilliz Cloud searches for the most similar vector embeddings to those in the specified entities.

    This parameter is mutually exclusive with data.

  • anns_field (str) -

    The name of the target vector field of the current search.

  • filter (str) -

    A scalar filtering condition to filter matching entities.

    The value defaults to an empty string, indicating that no condition applies.

    You can set this parameter to an empty string to skip scalar filtering. To build a scalar filtering condition, refer to Filtering Overview.

  • filter_params (dict) -

    If you choose to use placeholders in filter as stated in Filtering Templating, then you can specify the actual values for these placeholders as key-value pairs as the value of this parameter.

  • limit (int) -

    The total number of entities to return.

    You can use this parameter in combination with offset in param to enable pagination.

    The sum of this value and offset in param should be less than 16,384.

    In a grouping search, however, limit specifies the maximum number of groups to return, rather than individual entities. Each group is formed based on the specified group_by_field.

    📘Notes

    When group_by is specified for search aggregation, do not explicitly set limit. Use the root GroupBy.size value to control the number of top-level buckets to return.

  • output_fields (list[str]) -

    A list of field names to include in each entity in return.

    The value defaults to None. If left unspecified, only the primary field is included.

  • search_params (dict) -

    The parameter settings specific to this operation.

    • radius (float) -

      Determines the threshold of least similarity. When the collection's metric type is set to L2, ensure this value is greater than range_filter. Otherwise, this value should be lower than that of range_filter.

    • range_filter (float) -

      Refines the search to vectors within a specific similarity range. When the collection's metric type is set to IP or COSINE, ensure that this value is greater than that of radius. Otherwise, this value should be lower than that of radius.

    • level (int)

      Zilliz Cloud uses a unified parameter to simplify search parameter tuning instead of leaving you to work with a bunch of search parameters specific to various index algorithms.

      The value defaults to 1, and ranges from 1 to 5. Increasing the value results in a higher recall rate with degraded search performance.

    • page_retain_order (bool) -

      Whether to retain the order of the search result when offset is provided.

      This parameter applies only when you also set radius.

    • params (dict) -

      Additional parameters.

      📘Notes

      All additional parameters are moved to the upper search_params, and the params argument will be deprecated soon.

      • radius (float) -

        Determines the threshold of least similarity. When the collection's metric type is set to L2, ensure that this value is greater than that of range_filter. Otherwise, this value should be lower than that of range_filter.

      • range_filter (float) -

        Refines the search to vectors within a specific similarity range. When the collection's metric type is set to IP or COSINE, ensure that this value is greater than that of radius. Otherwise, this value should be lower than that of radius.

      • level (int)

        Zilliz Cloud uses a unified parameter to simplify search parameter tuning instead of leaving you to work with a bunch of search parameters specific to various index algorithms.

        The value defaults to 1, and ranges from 1 to 5. Increasing the value results in a higher recall rate with degraded search performance.

      • page_retain_order (bool) -

        Whether to retain the order of the search result when offset is provided.

        This parameter applies only when you also set radius.

    • ignore_growing (str) -

      This option, when set, instructs the search to exclude data from growing segments. Utilizing this setting can potentially enhance search performance by focusing only on indexed and fully processed data.

    For details on other applicable search parameters, refer to In-memory Index and On-disk Index.

    For details on other applicable search parameters, read AUTOINDEX Explained to get more.

  • group_by_field (str)

    Groups search results by a specified field to ensure diversity and avoid returning multiple results from the same group.

    This parameter is used by Grouping Search. It is mutually exclusive with group_by.

  • group_size (int)

    The target number of entities to return within each group in a grouping search. For example, setting group_size=2 instructs the system to return up to 2 of the most similar entities (e.g., document passages or vector representations) within each group. Without setting group_size, the system defaults to returning only 1 entity per group.

  • strict_group_size (bool)

    This Boolean parameter dictates whether group_size should be strictly enforced. When strict_group_size=True, the system will attempt to fill each group with exactly group_size results, as long as sufficient data exists within each group. If there is an insufficient number of entities in a group, it will return only the available entities, ensuring that groups with adequate data meet the specified group_size.

  • group_by (GroupBy | None) -

    A GroupBy object that defines a search aggregation. When this parameter is specified, Zilliz Cloud groups ANN search results into buckets based on the fields in the root GroupBy object. Each bucket can include per-bucket metrics, representative hits, and nested sub-groups. group_by is mutually exclusive with group_by_field. Use group_by_field for existing single-field Grouping Search workflows. Use group_by when you need per-bucket metrics, multi-field grouping, bucket ordering, hit sorting, or nested grouping.

    📘Notes

    Search aggregation metrics are computed over ANN-retrieved entities, not over the full collection. Bucket counts, metrics, and metric-based ordering are approximate.

  • order_by_fields (list[dict] | None) -

    A list of order-by specifications for sorting search results by supported scalar fields.

    Each dictionary in the list has the following keys:

    • field (str) -

      The name of the scalar field to sort by.

    • order (str) -

      The sort direction. Possible values are "asc" and "desc". If you omit this key, Milvus sorts the field in ascending order.

    Zilliz Cloud applies multiple order-by fields in the order that you specify. For entities with the same values in all specified order-by fields, Zilliz Cloud keeps the original similarity-score order.

    In a grouping search, Zilliz Cloud orders groups by the specified scalar field value of each group's top entity. The limit parameter still controls the number of groups, and group_size controls the number of entities per group.

  • timeout (float | None) -

    The timeout duration for this operation. Setting this to None indicates that this operation timeouts when any response arrives or any error occurs.

  • partition_names (list) -

    A list of partition names.

    The value defaults to None. If specified, only the specified partitions are involved in queries.

  • ranker (Function | FunctionScore) -

    The ranker to use for the search.

    For details, refer to Decay Ranker Overview and .

  • highlighter (Highlighter) -

    The highlighter to highlight matched terms in search operations. For details, refer to Lexical Highlighter and Semantic Highlighter.

  • search_aggregation (Optional[SearchAggregation]) -

    Hierarchical bucket aggregation spec. Mutually exclusive with group_by_field. When set, limit is ignored and the root SearchAggregation.size controls the top-level bucket count.

  • kwargs -

    • offset (int) -

      The number of records to skip in the search result.

      You can use this parameter in combination with limit to enable pagination.

      The sum of this value and limit should be less than 16,384.

    • round_decimal (int) -

      The number of decimal places that Zilliz Cloud rounds the calculated distances to.

      The value defaults to -1, indicating that Zilliz Cloud skips rounding the calculated distances and returns the raw value.

    • timezone (str)

      Temporarily override the collection or database default time zone for a single query by setting an IANA identifier (for example, Asia/Shanghai, America/Chicago, or UTC). This controls how TIMESTAMPTZ values are interpreted, displayed, and compared during that operation only; it does not modify stored data or collection settings.

      For more information, refer to TIMESTAMPZ Field.

    • time_fields (str)

      Extract specific time components from a TIMESTAMPTZ field during query or search operations. Use a comma-separated list to specify which elements to extract. Supported elements include: year, month, day, hour, minute, second, and microsecond.

      For more information, refer to TIMESTAMPZ Field.

RETURN TYPE:

list[dict]

RETURNS: A list of dictionaries that contains the searched entities with specified output fields.

EXCEPTIONS:

  • MilvusException

    This exception will be raised when any error occurs during this operation.

Examples

from pymilvus import MilvusClient

# 1. Set up a milvus client
client = MilvusClient(
uri="https://inxx-xxxxxxxxxxxx.api.gcp-us-west1.zillizcloud.com:19530",
token="user:password"
)

# 2. Create a collection
client.create_collection(
collection_name="test_collection",
dimension=5
)

# 3. Insert data
client.insert(
collection_name="test_collection",
data=[
{"id": 0, "vector": [0.3580376395471989, -0.6023495712049978, 0.18414012509913835, -0.26286205330961354, 0.9029438446296592], "color": "pink_8682"},
{"id": 1, "vector": [0.19886812562848388, 0.06023560599112088, 0.6976963061752597, 0.2614474506242501, 0.838729485096104], "color": "red_7025"},
{"id": 2, "vector": [0.43742130801983836, -0.5597502546264526, 0.6457887650909682, 0.7894058910881185, 0.20785793220625592], "color": "orange_6781"},
{"id": 3, "vector": [0.3172005263489739, 0.9719044792798428, -0.36981146090600725, -0.4860894583077995, 0.95791889146345], "color": "pink_9298"},
{"id": 4, "vector": [0.4452349528804562, -0.8757026943054742, 0.8220779437047674, 0.46406290649483184, 0.30337481143159106], "color": "red_4794"},
{"id": 5, "vector": [0.985825131989184, -0.8144651566660419, 0.6299267002202009, 0.1206906911183383, -0.1446277761879955], "color": "yellow_4222"},
{"id": 6, "vector": [0.8371977790571115, -0.015764369584852833, -0.31062937026679327, -0.562666951622192, -0.8984947637863987], "color": "red_9392"},
{"id": 7, "vector": [-0.33445148015177995, -0.2567135004164067, 0.8987539745369246, 0.9402995886420709, 0.5378064918413052], "color": "grey_8510"},
{"id": 8, "vector": [0.39524717779832685, 0.4000257286739164, -0.5890507376891594, -0.8650502298996872, -0.6140360785406336], "color": "white_9381"},
{"id": 9, "vector": [0.5718280481994695, 0.24070317428066512, -0.3737913482606834, -0.06726932177492717, -0.6980531615588608], "color": "purple_4976"}
],
)

# {'insert_count': 10}

# 4. Conduct a search
search_params = {
"params": {}
}

# Search with limit
res = client.search(
collection_name="test_collection",
data=[[0.05, 0.23, 0.07, 0.45, 0.13]],
limit=3,
search_params=search_params
)

# [[{'id': 7, 'distance': 0.4801957309246063, 'entity': {}},
# {'id': 2, 'distance': 0.3205878734588623, 'entity': {}},
# {'id': 1, 'distance': 0.2993225157260895, 'entity': {}}]]

# Search with filter
res = client.search(
collection_name="test_collection",
data=[[0.05, 0.23, 0.07, 0.45, 0.13]],
limit=3,
filter='color like "red%"',
search_params=search_params
)

# [[{'id': 1, 'distance': 0.2993225157260895, 'entity': {}},
# {'id': 4, 'distance': 0.12666261196136475, 'entity': {}},
# {'id': 6, 'distance': -0.3535143733024597, 'entity': {}}]]

# Search with an offset
res = client.search(
collection_name="test_collection",
data=[[0.05, 0.23, 0.07, 0.45, 0.13]],
limit=3,
offset=3,
search_params=search_params
)

# [[{'id': 4, 'distance': 0.12666261196136475, 'entity': {}},
# {'id': 3, 'distance': 0.11930042505264282, 'entity': {}},
# {'id': 5, 'distance': -0.05843167006969452, 'entity': {}}]]

# Search with output fields
res = client.search(
collection_name="test_collection",
data=[[0.05, 0.23, 0.07, 0.45, 0.13]],
limit=3,
output_fields=["vector", "color"],
search_params=search_params
)

# [[{'id': 7,
# 'distance': 0.4801957309246063,
# 'entity': {'color': 'grey_8510',
# 'vector': [-0.33445146679878235,
# -0.25671350955963135,
# 0.8987540006637573,
# 0.9402995705604553,
# 0.537806510925293]}},
# {'id': 2,
# 'distance': 0.3205878734588623,
# 'entity': {'color': 'orange_6781',
# 'vector': [0.4374213218688965,
# -0.5597502589225769,
# 0.6457887887954712,
# 0.789405882358551,
# 0.20785793662071228]}},
# {'id': 1,
# 'distance': 0.2993225157260895,
# 'entity': {'color': 'red_7025',
# 'vector': [0.19886812567710876,
# 0.060235604643821716,
# 0.697696328163147,
# 0.2614474594593048,
# 0.8387295007705688]}}]]

# Conduct a range search
search_params = {
"metric_type": "IP",
"params": {
"radius": 0.1,
"range_filter": 0.8
}
}

res = client.search(
collection_name="test_collection",
data=[[0.05, 0.23, 0.07, 0.45, 0.13]],
limit=3,
search_params=search_params
)

# [[{'id': 7, 'distance': 0.4801957309246063, 'entity': {}},
# {'id': 2, 'distance': 0.3205878734588623, 'entity': {}},
# {'id': 1, 'distance': 0.2993225157260895, 'entity': {}}]]