Skip to main content
Version: User Guides (BYOC)

Index Scalar Fields

Zilliz Cloud supports indexing on scalar fields (non-vector fields) to significantly accelerate filtering and search performance, especially on large datasets.

Overview

Indexing a scalar field is optional, but recommended if you frequently access a specific scalar field in filter conditions.

Zilliz Cloud supports AUTOINDEX for the following field types:

Field Type

AUTOINDEX Resolves to

Description

VARCHAR

BITMAP (C* < 100) / INVERTED ( C ≥ 100)

String data type. For details, refer to String Field.

INT8, INT16, INT32, INT64

BITMAP (C < 100) / STL_SORT (C ≥ 100)

Integer. For details, refer to Boolean & Number.

FLOAT, DOUBLE

BITMAP (C* < 100) / INVERTED ( C ≥ 100)

Floating point. For details, refer to Boolean & Number.

BOOL

BITMAP

Boolean. For details, refer to Boolean & Number.

ARRAY

BITMAP (C* < 100) / INVERTED ( C ≥ 100)

Homogeneous array of scalar values. For details, refer to Array Field.

GEOMETRY

RTREE

Geometric data that stores spatial information. For details, refer to Geometry Field.

TIMESTAMPTZ

STL_SORT

time zone-aware ISO 8601 inputs, stored as UTC for consistent filtering and ordering across time zones. For details, refer to TIMESTAMPTZ Field.

📘Notes

Cardinality (C in the above table) shows the number of unique values in a field across a whole collection. For example, the cardinality of a float field is the number of distinct float values in that field.

For an array field, the cardinality is the number of distinct element values across all arrays in the segment. For example:

[1, 2, 3]
[2, 3, 4]
[1, 4, 5]

The distinct element values are 5 → cardinality = 5. It flattens all elements from all arrays, then counts unique values — not the number of distinct arrays, nor the array lengths.

Preparations

Before creating indexes, define a collection that includes both vector and scalar fields. Zilliz Cloud requires a vector field in every collection.

In this example, we define a schema for a product catalog, including a required vector field (vector) and a scalar field of the DOUBLE type (price):

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT") # Replace with your cluster endpoint

# Define schema with dynamic field support
schema = client.create_schema(
auto_id=False,
enable_dynamic_field=True # Enable dynamic field
)

# Define fields
schema.add_field(field_name="product_id", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="vector", datatype=DataType.FLOAT_VECTOR, dim=5) # Vector field
schema.add_field(field_name="price", datatype=DataType.DOUBLE) # Scalar field

# Create the collection
client.create_collection(
collection_name="product_catalog",
schema=schema
)

Index a scalar field

You can create an index on a scalar field using AUTOINDEX. No additional index parameters are needed. The example below creates an index on the price field:

index_params = client.prepare_index_params() # Prepare an empty IndexParams object, without having to specify any index parameters

index_params.add_index(
field_name="price", # Name of the scalar field to be indexed
index_type="AUTOINDEX", # Type of index to be created
index_name="price_index" # Name of the index to be created
)

After defining the index parameters, you can apply them to the collection using create_index():

client.create_index(
collection_name="product_catalog",
index_params=index_params
)

Check index details

Once you have created an index, you can check its details.

# Describe index
res = client.list_indexes(
collection_name="product_catalog"
)

print(res)

res = client.describe_index(
collection_name="product_catalog",
index_name="price_index"
)

print(res)

Drop an index

Use the drop_index() method to remove an existing index from a collection.

📘Notes

In your cluster compatible with Milvus v2.6.x, you can drop a scalar index directly once it’s no longer needed—no need to release the collection first.

# Drop index
client.drop_index(
collection_name="product_catalog",
index_name="price_index"
)

Advanced features

There are also several advanced features around scalar indexes that you may be interested in.

NGRAM [READ MORE]

The `NGRAM` index in Zilliz Cloud is built to accelerate `LIKE` queries on `VARCHAR` fields or specific JSON paths within `JSON` fields. Before building the index, Zilliz Cloud splits text into short, overlapping substrings of a fixed length n, known as n-grams. For example, with n = 3, the word "Milvus" is split into 3-grams "Mil", "ilv", "lvu", and "vus". These n-grams are then stored in an inverted index that maps each gram to the document IDs in which it appears. At query time, this index allows Zilliz Cloud to quickly narrow the search to a small set of candidates, resulting in much faster query execution.