Skip to main content
Version: User Guides (BYOC)

String Field

In Zilliz Cloud clusters, VARCHAR is the data type used for storing string-type data, suitable for storing variable-length strings. It can store strings with both single- and multi-byte characters, with a maximum length of up to 60,535 characters. When defining a VARCHAR field, you must also specify the maximum length parameter max_length. The VARCHAR string type offers an efficient and flexible way to store and manage text data, making it ideal for applications that handle strings of varying lengths.

Add VARCHAR field

To use string data in Zilliz Cloud clusters, define a VARCHAR field when creating a collection. This process includes:

  1. Setting datatype to the supported string data type, i.e., VARCHAR.

  2. Specifying the maximum length of the string type using the max_length parameter, which cannot exceed 60,535 characters.

from pymilvus import MilvusClient, DataType

client = MilvusClient(uri="YOUR_CLUSTER_ENDPOINT")

# define schema
schema = client.create_schema(
auto_id=False,
enable_dynamic_fields=True,
)

schema.add_field(field_name="varchar_field1", datatype=DataType.VARCHAR, max_length=100)
schema.add_field(field_name="varchar_field2", datatype=DataType.VARCHAR, max_length=200)
schema.add_field(field_name="pk", datatype=DataType.INT64, is_primary=True)
schema.add_field(field_name="embedding", datatype=DataType.FLOAT_VECTOR, dim=3)

In this example, we add two VARCHAR fields: varchar_field1 and varchar_field2, with maximum lengths set to 100 and 200 characters, respectively. It is recommended to set max_length based on your data characteristics to ensure it accommodates the longest data while avoiding excessive space allocation. Additionally, we have added a primary field pk and a vector field embedding.

📘Notes

The primary field and vector field are mandatory when you create a collection. The primary field uniquely identifies each entity, while the vector field is crucial for similarity search. For more details, refer to Primary Field & AutoId, Dense Vector, Binary Vector, or Sparse Vector.

Set index params

Setting index parameters for VARCHAR fields is optional but can significantly improve retrieval efficiency.

In the following example, we create an AUTOINDEX for varchar_field1, meaning Zilliz Cloud will automatically create an appropriate index based on the data type. For more information, refer to AUTOINDEX Explained.

index_params = client.prepare_index_params()

index_params.add_index(
field_name="varchar_field1",
index_type="AUTOINDEX",
index_name="varchar_index"
)

Moreover, before creating the collection, you must create an index for the vector field. In this example, we use AUTOINDEX to simplify vector index settings.

# Add vector index
index_params.add_index(
field_name="embedding",
index_type="AUTOINDEX", # Use automatic indexing to simplify complex index settings
metric_type="COSINE" # Specify similarity metric type, options include L2, COSINE, or IP
)

Create collection

Once the schema and index are defined, you can create a collection that includes string fields.

# Create Collection
client.create_collection(
collection_name="your_collection_name",
schema=schema,
index_params=index_params
)

Insert data

After creating the collection, you can insert data that includes string fields.

data = [
{"varchar_field1": "Product A", "varchar_field2": "High quality product", "pk": 1, "embedding": [0.1, 0.2, 0.3]},
{"varchar_field1": "Product B", "varchar_field2": "Affordable price", "pk": 2, "embedding": [0.4, 0.5, 0.6]},
{"varchar_field1": "Product C", "varchar_field2": "Best seller", "pk": 3, "embedding": [0.7, 0.8, 0.9]},
]

client.insert(
collection_name="my_varchar_collection",
data=data
)

In this example, we insert data that includes VARCHAR fields (varchar_field1 and varchar_field2), a primary field (pk), and vector representations (embedding). To ensure that the inserted data matches the fields defined in the schema, it is recommended to check data types in advance to avoid insertion errors.

If you set enable_dynamic_fields=True when defining the schema, Zilliz Cloud allows you to insert string fields that were not defined in advance. However, keep in mind that this may increase the complexity of queries and management, potentially impacting performance. For more information, refer to Dynamic Field.

Search and query

After adding string fields, you can use them for filtering in search and query operations, achieving more precise search results.

Filter queries

After adding string fields, you can filter results using these fields in queries. For example, you can query all entities where varchar_field1 equals "Product A":

filter = 'varchar_field1 == "Product A"'

res = client.query(
collection_name="my_varchar_collection",
filter=filter,
output_fields=["varchar_field1", "varchar_field2"]
)

print(res)

# Output
# data: ["{'varchar_field1': 'Product A', 'varchar_field2': 'High quality product', 'pk': 1}"]

This query expression returns all matching entities and outputs their varchar_field1 and varchar_field2 fields. For more information on filter queries, refer to Filtering.

Vector search with string filtering

In addition to basic scalar field filtering, you can combine vector similarity searches with scalar field filters. For example, the following code shows how to add a scalar field filter to a vector search:

filter = 'varchar_field1 == "Product A"'

res = client.search(
collection_name="my_varchar_collection",
data=[[0.3, -0.6, 0.1]],
limit=5,
search_params={"params": {"nprobe": 10}},
output_fields=["varchar_field1", "varchar_field2"],
filter=filter
)

print(res)

# Output
# data: ["[{'id': 1, 'distance': -0.06000000238418579, 'entity': {'varchar_field1': 'Product A', 'varchar_field2': 'High quality product'}}]"]

In this example, we first define a query vector and add a filter condition varchar_field1 == "Product A" during the search. This ensures that the search results are not only similar to the query vector but also match the specified string filter condition. For more information, refer to Filtering.