Skip to main content
Version: User Guides (Cloud)


In information retrieval, a reranker rearranges results from intial retrieval. Compared to using only vector Approximate Nearest Neighbor (ANN) search for retrieval, adding reranker can improve search quality as it can better judge the semantic relevancy between docs and the query. Using a reranker can also enhance accuracy of generated answer in RAG applications, as fewer but higher quality docs are put in the context. Note that rerankers can be computationally heavy, leading to higher costs and longer query latency.

What is reranker?


Reranker is typically a machine learning model trained to score the semantic relevance between query text and a specific document. Production-level information retrieval systems such as web search and recommender system usually consists of two stages: the initial retrieval and reranking. The initial retrieval phase, often utilizing vector Approximate Nearest Neighbor (ANN) search, efficiently sifts through vast datasets to identify relevant information, such as retrieving the top 20 items from millions of candidates. Once this initial set is obtained, the reranker steps in to further refine the ranking based on relevance to the query text. A smaller set, say top 5 from the reranked list, is used to provide better quality than if selecting the first 5 based on the original order.

Using reranker in RAG application

The RAG application consists of two phases: the retrieval phase, which is responsible for retrieving relevant information from knowledge base, and the generation phase, which uses LLM to summarize and reason about the retrieved knowledge. The relevancy of the context provided to the LLM in the retrieval phase is crucial to final answer quality. Reranker is an effective tool to improve retrieval thus enhancing the answer quality of the RAG application.

The diagram below illustrates the position of a reranker in an RAG pipeline:


When to use reranker

The initial retrieval with Approximate Nearest Neighbor (ANN) vector search alone is very efficient and often yields satisfactory results. In many scenarios, the secondary stage of reranking may not be deemed essential. For applications with high quality standards, employing a reranker can enhance accuracy, albeit with increased computational demands and longer search times. Typically, integrating a reranker model can add 100ms to a few seconds of latency to the search query, depending on factors like topK selection and reranker model size. When the initial retrieval yields incorrect or irrelevant documents, using a reranker effectively filters them out, thus improving the quality of the final generated response.

If a reranker is deemed necessary for your use case, you can choose to enable it in your Search pipeline: