メインコンテンツまでスキップ
バージョン: User Guides (Cloud)

Function & Model Inference Overview

Zilliz Cloud provides a unified search architecture for building modern retrieval systems, including semantic search, lexical search, hybrid search, and intelligent reranking. Rather than exposing these capabilities as isolated features, Zilliz Cloud organizes them around a single core abstraction: the Function.

What is a Function?

In Zilliz Cloud, a Function is a configurable execution unit that applies a specific operation at a defined stage of the search workflow.

A Function answers three practical questions:

  • When does this operation run? Before search or after search.

  • What input does it operate on? Raw text, vector representations or retrieved candidate results.

  • What output does it produce? Vector embeddings used for retrieval, or reordered results returned to the user.

From a workflow perspective, Functions participate in search in two distinct stages:

  • Pre-search: Functions run before search to convert text into vector representations. These vectors determine which candidates are retrieved.

  • Post-search: Functions run after candidate retrieval to refine the ordering of results without changing the candidate set.

The following diagram provides an abstraction of how Functions work in the search workflow.

HF6JwTJVfhXMmdb3qx3cm2YdnMe

Every search request follows the same high-level flow:

  1. The Pre-search Function generates vector representations from input text

  2. The search engine retrieves candidate results based on those vectors

  3. (Optional) The Post-search Function reranks the retrieved candidates

Function categories

Functions in Zilliz Cloud are categorized based on when they run in the search workflow and what role they play. At a high level, Functions fall into two groups:

  • Pre-search Functions, which convert text into vector embeddings and determine candidate retrieval

  • Post-search Functions, which refine the ordering of retrieved candidates

Pre-search Functions: Convert text to vector embeddings

Pre-search Functions run before candidate retrieval. Their role is to convert raw text—both stored documents and incoming queries—into vector representations that the search engine uses to identify relevant candidates.

Different Pre-search Functions generate different types of embeddings, which directly affects how retrieval is performed.

The table below summarizes the available Pre-search Functions:

Function Type

Vector Type

Description

Typical Scenarios

BM25 Function

Sparse embeddings

Computes lexical relevance based on term matching, term frequency, and document length normalization.

Executes entirely within the database engine as a local mechanism; no model inference required.

Keyword-driven full text search, documentation and code search, and workloads where term matching, low latency, and deterministic behavior are critical.

Model-based Functions

Dense embeddings

Encodes the semantic meaning of text using machine learning models, enabling similarity-based retrieval beyond exact keywords.

Requires model inference via hosted models or third-party model services.

Semantic search, natural-language queries, Q&A and RAG pipelines, and use cases where conceptual similarity matters more than literal term overlap.

All Pre-search Functions are applied consistently to both document data and query text, ensuring retrieval is performed within the same representation space.

Post-search Functions: Rerank candidate results

Post-search Functions are applied after candidate retrieval. Their purpose is to refine the ranking of retrieved candidates without adding or removing items from the candidate set.

These functions operate exclusively on the results returned by the search stage and apply additional ranking logic or relevance signals to improve result quality. They do not affect indexing, retrieval, or filtering behavior—only the final ordering of results.

The table below summarizes the available Post-search Functions:

Function Type

Operates On

Description

Typical Scenarios

Hybrid Search Rankers

Multiple result sets retrieved from hybrid search

Combine and rebalance results retrieved from different retrieval strategies using methods such as weighted ranking or reciprocal rank fusion (RRF).

Hybrid search scenarios that combine semantic and lexical retrieval and require balanced result fusion.

Rule-based Rankers

Candidate results from single-vector or hybrid search

Adjust ranking based on predefined rules or numeric signals, such as boosting or decay-based scoring.

Business-driven ranking logic, recency or popularity boosts, and scenarios requiring predictable, non-ML reranking.

Model-based Rankers

Candidate results from single-vector or hybrid search

Use machine learning models to evaluate relevance and reorder results based on learned or semantic signals.

Intelligent reranking, relevance refinement using semantic understanding, and LLM-based relevance evaluation.

Because Post-search Functions operate only on retrieved candidates, they are refinement steps that affect result order but not retrieval scope.

Understand model inference

In the Function-based architecture of Zilliz Cloud, model inference is not a standalone concept or execution stage. Instead, it is an implementation detail used by specific Function types when machine learning-based signals are required.

Where model inference fits in

Model inference refers to the runtime execution of machine learning models to generate semantic signals, such as:

  • Dense vector embeddings derived from text

  • Relevance scores used to rerank search results

Within Zilliz Cloud, model inference is used only by model-based Functions, including:

Other Functions, such as the BM25 Function and rule-based rankers, run entirely within the database engine and do not require model inference.

Sources of model inference

Zilliz Cloud supports two sources of model inference. Both provide model-based capabilities, but differ in how models are provisioned and managed:

Aspect

Hosted Models

Third-Party Model Services

Where models run

Inside Zilliz Cloud

External model provider (OpenAI, Voyage AI, etc.)

Who manages models

Zilliz Cloud

External model provider

How access is set up

See Hosted Models

Through model provider integration on your own

Credentials

Provided during onboarding with Zilliz Cloud support

Provided by you (for example, API keys)

Typical use cases

Tightly integrated or customized deployments

Using standard models from established providers

Setup complexity

Higher (requires onboarding)

Lower (connect your existing API keys)

Choose Hosted Models if you need:

  • Tight integration with Zilliz Cloud (single vendor, unified support)

  • Custom model fine-tuning or specialized models

  • Predictable performance and latency

  • Simplified credential management

Choose Third-Party Model Services if you:

  • Already have an existing relationship with a model provider

  • Want to leverage the latest models from providers like OpenAI

  • Prefer flexibility to switch providers

Supported model providers

Zilliz Cloud integrates with leading model providers that offer different capabilities. The table below shows which providers support text embedding and reranking:

📘Notes

Provider availability and supported capabilities may vary by region and release. Refer to provider-specific documentation for the most up-to-date information.

Model Provider

Text Embedding

Reranking

OpenAI

Yes

No

Voyage AI

Yes

Yes

Cohere

Yes

Yes