Skip to main content
Version: User Guides (BYOC)

Number Field

Number fields are used to store non-vector numerical data in Zilliz Cloud clusters. These fields are typically employed to describe additional information related to vector data, such as age, price, etc. By using this data, you can better describe vectors and improve the efficiency of data filtering and conditional queries.

Number fields are particularly useful in many scenarios. For example, in e-commerce recommendations, a price field can be used for filtering; in user profile analysis, age ranges can help refine the results. Combined with vector data, number fields can help the system provide similarity searches while meeting personalized user needs more precisely.

Supported number field types

Zilliz Cloud supports various number field types to meet different data storage and query needs:

Field Type

Description

BOOL

Boolean type for storing true or false, suitable for describing binary states.

INT8

8-bit integer, suitable for storing small-range integer data.

INT16

16-bit integer, for medium-range integer data.

INT32

32-bit integer, ideal for general integer data storage like product quantities or user IDs.

INT64

64-bit integer, suitable for storing large-range data like timestamps or identifiers.

FLOAT

32-bit floating-point number, for data requiring general precision, such as ratings or temperature.

DOUBLE

64-bit double-precision floating-point number, for high-precision data like financial information or scientific calculations.

Add number field

To use number fields in Zilliz Cloud clusters, define the relevant fields in the collection schema, setting the datatype to a supported type such as BOOL or INT8. For a complete list of supported number field types, refer to Supported number field types.

The following example shows how to define a schema that includes number fields age and price:

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

schema = client.create_schema(
auto_id=False,
enable_dynamic_fields=True,
)

schema.add_field(field_name="age", datatype=DataType.INT64)
schema.add_field(field_name="price", datatype=DataType.FLOAT)
schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)
📘Notes

The primary field and vector field are mandatory when you create a collection. The primary field uniquely identifies each entity, while the vector field is crucial for similarity search. For more details, refer to Primary Field & AutoId, Dense Vector, Binary Vector, or Sparse Vector.

Set index params

Setting index parameters for number fields is optional but can significantly improve retrieval efficiency.

In the following example, we create an AUTOINDEX for the age number field, allowing Zilliz Cloud to automatically create an appropriate index based on the data type. For more information, refer to AUTOINDEX Explained.

index_params = client.prepare_index_params()

index_params.add_index(
field_name="age",
index_type="AUTOINDEX",
index_name="inverted_index"
)

Moreover, before creating the collection, you must create an index for the vector field. In this example, we use AUTOINDEX to simplify vector index settings.

# Add vector index
index_params.add_index(
field_name="embedding",
index_type="AUTOINDEX", # Use automatic indexing to simplify complex index settings
metric_type="COSINE" # Specify similarity metric type, options include L2, COSINE, or IP
)

Create collection

Once the schema and indexes are defined, you can create a collection that includes number fields.

# Create Collection
client.create_collection(
collection_name="your_collection_name",
schema=schema,
index_params=index_params
)

Insert data

After creating the collection, you can insert data that includes number fields.

data = [
{"age": 25, "price": 99.99, "pk": 1, "embedding": [0.1, 0.2, 0.3]},
{"age": 30, "price": 149.50, "pk": 2, "embedding": [0.4, 0.5, 0.6]},
{"age": 35, "price": 199.99, "pk": 3, "embedding": [0.7, 0.8, 0.9]},
]

client.insert(
collection_name="my_scalar_collection",
data=data
)

In this example, we insert data that includes age, price, pk (primary field), and vector representations (embedding). To ensure that the inserted data matches the fields defined in the schema, it's recommended to check data types in advance to avoid errors.

If you set enable_dynamic_fields=True when defining the schema, Zilliz Cloud allows you to insert number fields that were not defined in advance. However, keep in mind that this may increase the complexity of queries and management, potentially impacting performance. For more information, refer to Dynamic Field.

Search and query

After adding number fields, you can use them for filtering in search and query operations to achieve more precise search results.

Filter queries

After adding number fields, you can use them for filtering in queries. For example, you can query all entities where age is between 30 and 40:

filter = "30 <= age <= 40"

res = client.query(
collection_name="my_scalar_collection",
filter=filter,
output_fields=["age","price"]
)

print(res)

# Output
# data: ["{'age': 30, 'price': np.float32(149.5), 'pk': 2}", "{'age': 35, 'price': np.float32(199.99), 'pk': 3}"]

This query expression returns all matching entities and outputs their age and price fields. For more information on filter queries, refer to Filtering.

Vector search with number filtering

In addition to basic number field filtering, you can combine vector similarity searches with number field filters. For example, the following code shows how to add a number field filter to a vector search:

filter = "25 <= age <= 35"

res = client.search(
collection_name="my_scalar_collection",
data=[[0.3, -0.6, 0.1]],
limit=5,
search_params={"params": {"nprobe": 10}},
output_fields=["age","price"],
filter=filter
)

print(res)

# Output
# data: ["[{'id': 1, 'distance': -0.06000000238418579, 'entity': {'age': 25, 'price': 99.98999786376953}}, {'id': 2, 'distance': -0.12000000476837158, 'entity': {'age': 30, 'price': 149.5}}, {'id': 3, 'distance': -0.18000000715255737, 'entity': {'age': 35, 'price': 199.99000549316406}}]"]

In this example, we first define a query vector and add a filter condition 25 <= age <= 35 during the search. This ensures that the search results are not only similar to the query vector but also meet the specified age range. For more information, refer to Filtering.