Skip to main content
Version: User Guides (Cloud)

Elasticsearch Queries to Milvus

Elasticsearch, built on Apache Lucene, is a leading open-source search engine. However, it faces challenges in modern AI applications, including high update costs, poor real-time performance, inefficient shard management, a non-cloud-native design, and excessive resource demands. As a cloud-native vector database, Milvus overcomes these issues with decoupled storage and computing, efficient indexing for high-dimensional data, and seamless integration with modern infrastructures. It offers superior performance and scalability for AI workloads.

This article aims to facilitate the migration of your code base from Elasticsearch to Milvus, providing various examples of converting queries in between.

Overview

In Elasticsearch, operations in the query context generate relevance scores, while those in the filter context do not. Similarly, Milvus searches produce similarity scores, whereas its filter-like queries do not. When migrating your code base from Elasticsearch to Milvus, the key principle is converting fields used in Elasticsearch's query context into vector fields to enable similarity score generation.

The table below outlines some Elasticsearch query patterns and their corresponding equivalents in Milvus.

Elasticsearch Queries

Milvus Equivalents

Remarks

Full-text queries

Match query

Full-text search

Both provide similar sets of capabilities.

Non-scoring match query

Text match

All match queries in Elasticsearch generate relevance scores.

Milvus provides a non-scoring filter named TEXT_MATCH to provide similar capabilities of an Elasticsearch match query.

Term-level queries

IDs

in operator

Both provide the same or similar set of capabilities when these Elasticsearch queries are used in the filter context.

Prefix query

like operator

Range query

Comparison operators like >, <, >=, and <=

Term query

Comparison operators like ==

Terms query

in operator

Wildcard query

like operator

Boolean query

Logical operators like AND

Both provide similar sets of capabilities when used in the filter context.

Vector queries

kNN query

Search

Milvus provides more advanced vector search capabilities.

Reciprocal rank fusion

Hybrid Search

Milvus supports multiple reranking strategies.

Full-text queries

In Elasticsearch, the full text queries enable you to search analyzed text fields such as the body of an email. The query string is processed using the same analyzer that was applied to the field during indexing.

Match query

In Elasticsearch, a match query returns documents that match a provided text, number, date, or boolean value. The provided text is analyzed before matching.

The following is an example Elasticsearch search request with a match query.

resp = client.search(
query={
"match": {
"message": {
"query": "this is a test"
}
}
},
)

Milvus provides the same capability through the full-text search feature. You can convert the above Elasticsearch query into Milvus as follows:

res = client.search(
collection_name="my_collection",
data=['How is the weather in Jamaica?'],
anns_field="message_sparse",
output_fields=["id", "message"]
)

In the example above, message_sparse is a sparse vector field derived from a VarChar field named message. Milvus uses the BM25 embedding model to convert the values in the message field into sparse vector embeddings and stores them in the message_sparse field. Upon receiving the search request, Milvus embeds the plain text query payload using the same BM25 model and performs a sparse vector search and returns the id and message fields specified in the output_fields parameter along with the corresponding similarity scores.

To use this functionality, you must enable the analyzer on the message field and define a function to derive the message_sparse field from it. For detailed instructions on enabling the analyzer and creating the derivative function in Milvus, refer to Full Text Search.

Non-scoring match query equivalent in Milvus

Elasticsearch does not support non-scoring match queries. Every match query always involves relevance score generation. However in Milvus, you can enhance vector searches by incorporating exact keyword match filtering using the TEXT_MATCH operator. This approach ensures that search results contain specific terms, thereby improving recall.

# Filter entities whose message value contains the exact terms `Jamaica`
filter = "TEXT_MATCH(message, 'Jamaica')"

# Assuming 'message_vector' is the vector field and 'message' is the VARCHAR field
result = MilvusClient.search(
collection_name="my_collection",
anns_field="message_vector",
data=[[1, -2.5, 3]], # vector embeddings of the phrase `How is the weather in Jamaica?`
filter=filter,
search_params={"params": {"nprobe": 10}},
limit=10,
output_fields=["id", "message"]
)

Instead of using Milvus to generate sparse vector embeddings into a derived field as shown in the Match query, the above example converts the values in the message field into dense vectors and uses a TEXT_MATCH filter to ensure the search results are closely related to Jamaica. Specifically, Milvus performs a text match to analyze and filter entities that contain the word Jamaica in the message field, followed by a vector search within this filtered subset to improve search recall.

Term-level queries

In Elasticsearch, term-level queries are used to find documents based on exact values in structured data, such as date ranges, IP addresses, prices, or product IDs. This section outlines the possible equivalents of some Elasticsearch term-level queries in Milvus. All examples in this section are adapted to operate within the filter context to align with Milvus's capabilities.

IDs

In Elasticsearch, you can find documents based on their IDs in the filter context as follows:

resp = client.search(
query={
"bool": {
"filter": {
"ids": {
"values": [
"1",
"4",
"100"
]
}
}
}
},
)

In Milvus, you can also find entities based on their IDs as follows:

# Use the filter parameter
res = client.query(
collection_name="my_collection",
filter="id in [1, 4, 100]",
output_fields=["id", "title"]
)

# Use the ids parameter
res = client.query(
collection_name="my_collection",
ids=[1, 4, 100],
output_fields=["id", "title"]
)

You can find the Elasticsearch example on this page. For details on query and get requests as well as the filter expressions in Milvus, refer to Query and Filtering.

Prefix query

In Elasticsearch, you can find documents that contain a specific prefix in a provided field in the filter context as follows:

resp = client.search(
query={
"bool": {
"filter": {
"prefix": {
"user": {
"value": "ki"
}
}
}
}
},
)

In Milvus, you can find the entities whose values start with the specified prefix as follows:

res = client.query(
collection_name="my_collection",
filter='user like "ki%"',
output_fields=["id", "user"]
)

You can find the Elasticsearch example on this page. For details on the like operator in Milvus, refer to Using LIKE for Pattern Matching.

Range query

In Elasticsearch, you can find documents that contain terms within a provided range as follows:

resp = client.search(
query={
"bool": {
"filter": {
"range": {
"age": {
"gte": 10,
"lte": 20
}
}
}
}
},
)

In Milvus, you can find the entities whose values in a specific field are within a provided range as follows:

res = client.query(
collection_name="my_collection",
filter='10 <= age <= 20',
output_fields=["id", "user", "age"]
)

You can find the Elasticsearch example on this page. For details on comparison operators in Milvus, see Comparison operators.

Term query

In Elasticsearch, you can find documents that contain an exact term in a provided field as follows:

resp = client.search(
query={
"bool": {
"filter": {
"term": {
"status": {
"value": "retired"
}
}
}
}
},
)

In Milvus, you can find the entities whose values in the specified field are exactly the specified term as follows:

res = client.query(
collection_name="my_collection",
filter='status == "retired"',
output_fields=["id", "user", "status"]
)

You can find the Elasticsearch example on this page. For details on comparison operators in Milvus, see Comparison operators.

Terms query

In Elasticsearch, you can find documents that contain one or more exact terms in a provided field as follows:

resp = client.search(
query={
"bool": {
"filter": {
"terms": {
"degree": [
"graduate",
"post-graduate"
]
}
}
}
}
)

Milvus does not have a complete equivalence of this one. However, you can find the entities whose values in the specified field are one of the specified terms as follows:

res = client.query(
collection_name="my_collection",
filter='degree in ["graduate", "post-graduate"]',
output_fields=["id", "user", "degree"]
)

You can find the Elasticsearch example on this page. For details on range operators in Milvus, refer to Range operators.

Wildcard query

In Elasticsearch, you can find documents that contain terms matching a wildcard pattern as follows:

resp = client.search(
query={
"bool": {
"filter": {
"wildcard": {
"user": {
"value": "ki*y"
}
}
}
}
},
)

Milvus does not support wildcard in its filtering conditions. However, you can use the like operator to achieve the similar effect as follows:

res = client.query(
collection_name="my_collection",
filter='user like "ki%" AND user like "%y"',
output_fields=["id", "user"]
)

You can find the Elasticsearch example on this page. For details on the range operators in Milvus, refer to Range operators.

Boolean query

In Elasticsearch, a boolean query is a query that matches documents matching boolean combinations of other queries.

The following example is adapted from an example in Elasticsearch documentation on this page. The query will return users with kimchy in their names with a production tag.

resp = client.search(
query={
"bool": {
"filter": {
"term": {
"user": "kimchy"
}
},
"filter": {
"term": {
"tags": "production"
}
}
}
},
)

In Milvus, you can do the similar thing as follows:

filter = 

res = client.query(
collection_name="my_collection",
filter='user like "%kimchy%" AND ARRAY_CONTAINS(tags, "production")',
output_fields=["id", "user", "age", "tags"]
)

The above example assumes that you have a user field of the VarChar type and a tags field of the Array type, in the target collection. The query will return users with kimchy in their names with a production tag.

Vector queries

In Elasticsearch, vector queries are specialized queries that work on vector fields to efficiently perform semantic search.

Knn query

Elasticsearch supports both approximate kNN queries and exact, brute-force kNN queries. You can find the k nearest vectors to a query vector in either way, as measured by a similarity metric, as follows:

resp = client.search(
index="my-image-index",
size=3,
query={
"knn": {
"field": "image-vector",
"query_vector": [
-5,
9,
-12
],
"k": 10
}
},
)

Milvus, as a specialized vector database, uses index types to optimize vector searches. Typically, it prioritizes approximate nearest neighbor (ANN) search for high-dimensional vector data. While brute-force kNN search with the FLAT index type delivers precise results, it is both time-consuming and resource-intensive. In contrast, ANN search using AUTOINDEX or other index types balances speed and accuracy, offering significantly faster and more resource-efficient performance than kNN. For details on index types and AUTOINDEX, you can read Manage Indexes and AUTOINDEX Explained.

A similar equivalence to the above vector query in Mlivus goes like this:

res = client.search(
collection_name="my_collection",
anns_field="image-vector"
data=[[-5, 9, -12]],
limit=10
)

You can find the Elasticsearch example on this page. For details on ANN searches in Milvus, read Basic ANN Search.

Reciprocal Rank Fusion

Elasticsearch provides Reciprocal Rank Fusion (RRF) to combine multiple result sets with different relevance indicators into a single ranked result set.

The following example demonstrates combining a traditional term-based search with a k-nearest neighbors (kNN) vector search to improve search relevance:

client.search(
index="my_index",
size=10,
query={
"retriever": {
"rrf": {
"retrievers": [
{
"standard": {
"query": {
"term": {
"text": "shoes"
}
}
}
},
{
"knn": {
"field": "vector",
"query_vector": [1.25, 2, 3.5], # Example vector; replace with your actual query vector
"k": 50,
"num_candidates": 100
}
}
],
"rank_window_size": 50,
"rank_constant": 20
}
}
}
)

In this example, RRF combines results from two retrievers:

  • A standard term-based search for documents containing the term "shoes" in the text field.

  • A kNN search on the vector field using the provided query vector.

Each retriever contributes up to 50 top matches, which are reranked by RRF, and the final top 10 results are returned.

In Milvus, you can achieve a similar hybrid search by combining searches across multiple vector fields, applying a reranking strategy, and retrieving the top-K results from the combined list. Milvus supports both RRF and weighted reranker strategies. For more details, refer to Reranking.

The following is a non-strict equivalence of the above Elasticsearch example in Milvus.

search_params_dense = {
"data": [[1.25, 2, 3.5]],
"anns_field": "vector",
"param": {
"metric_type": "IP",
"params": {"nprobe": 10},
},
"limit": 100
}

req_dense = ANNSearchRequest(**search_params_dense)

search_params_sparse = {
"data": ["shoes"],
"anns_field": "text_sparse",
"param": {
"metric_type": "BM25",
"params": {"drop_ratio_search": 0.2}
}
}

req_sparse = ANNSearchRequest(**search_params_sparse)

res = client.hybrid_search(
collection_name="my_collection",
reqs=[req_dense, req_sparse],
reranker=RRFRanker(),
limit=10
)

This example demonstrates a hybrid search in Milvus that combines:

  1. Dense vector search: Using the inner product (IP) metric with nprobe set to 10 for approximate nearest neighbor (ANN) search on the vector field.

  2. Sparse vector search: Using the BM25 similarity metric with a drop_ratio_search parameter of 0.2 on the text_sparse field.

The results from these searches are executed separately, combined, and reranked using the Reciprocal Rank Fusion (RRF) ranker. The hybrid search returns the top 10 entities from the reranked list.

Unlike Elasticsearch's RRF ranking, which merges results from standard text-based queries and kNN searches, Milvus combines results from sparse and dense vector searches, providing a unique hybrid search capability optimized for multimodal data.

Recap

In this article, we covered the conversions of typical Elasticsearch queries to their Milvus equivalents, including term-level queries, boolean queries, full-text queries, and vector queries. If you have further questions about converting other Elasticsearch queries, feel free to reach out to us.