Version: User Guides (Cloud)

Schema & Data Fields

A schema defines the data structure of a collection and determines the names, order, data types, and related attributes of the collection fields. This chapter mainly discusses the schema and related concepts.

Overview [READ MORE]

A schema defines the data structure of a collection. Before creating a collection, you need to work out a design of its schema. This page helps you understand the collection schema and design an example schema on your own.

Primary Field [READ MORE]

Every collection in Zilliz Cloud must have a primary field to uniquely identify each entity. This field ensures that every entity can be inserted, updated, queried, or deleted without ambiguity.

Dense Vector [READ MORE]

Dense vectors are numerical data representations widely used in machine learning and data analysis. They consist of arrays with real numbers, where most or all elements are non-zero. Compared to sparse vectors, dense vectors contain more information at the same dimensional level, as each dimension holds meaningful values. This representation can effectively capture complex patterns and relationships, making data easier to analyze and process in high-dimensional spaces. Dense vectors typically have a fixed number of dimensions, ranging from a few dozen to several hundred or even thousands, depending on the specific application and requirements.

Binary Vector [READ MORE]

Binary vectors are a special form of data representation that convert traditional high-dimensional floating-point vectors into binary vectors containing only 0s and 1s. This transformation not only compresses the size of the vector but also reduces storage and computational costs while retaining semantic information. When precision for non-critical features is not essential, binary vectors can effectively maintain most of the integrity and utility of the original floating-point vectors.

Sparse Vector [READ MORE]

Sparse vectors are an important method of capturing surface-level term matching in information retrieval and natural language processing. While dense vectors excel in semantic understanding, sparse vectors often provide more predictable matching results, especially when searching for special terms or textual identifiers.

String [READ MORE]

In Zilliz Cloud clusters, `VARCHAR` is the data type used for storing string data.

Boolean & Number [READ MORE]

A boolean or number field is a scalar field that stores boolean or numeric values. These values can be one of two possible values or whole numbers (integers) and decimal numbers (floating-point numbers). They are typically used to represent quantities, measurements, or any data that needs to be logically or mathematically processed.

JSON [READ MORE]

This chapter introduces the JSON field type, and provides guides on how to index a JSON field.

Array [READ MORE]

An ARRAY field stores an ordered set of elements of the same data type.

Structs [READ MORE]

An Array of Structs field, or a StructArray field, in an entity stores an ordered set of Struct elements. Each Struct in the Array shares the same pre-defined schema, comprising multiple vectors and scalar fields.

Geometry [READ MORE]

When building applications like Geographic Information Systems (GIS), mapping tools, or location-based services, you often need to store and query geometric data. The `GEOMETRY` data type in Milvus solves this challenge by providing a native way to store and query flexible geometric data.

TIMSTAMPTZ [READ MORE]

Applications that track time across regions, such as e-commerce systems, collaboration tools, or distributed logging, need precise handling of timestamps with time zones. The `TIMESTAMPTZ` data type in Zilliz Cloud provides this capability by storing timestamps with their associated time zone.

Dynamic Field [READ MORE]

Zilliz Cloud allows you to insert entities with flexible, evolving structures through a special feature called the dynamic field. This field is implemented as a hidden JSON field named `$meta`, which automatically stores any fields in your data that are not explicitly defined in the collection schema.

Nullable Fields [READ MORE]

Zilliz Cloud supports nullable fields, which allow a field value to be missing or explicitly set to NULL. Nullability is defined at the schema level and applies consistently across data ingestion, indexing, search, and query operations.

Default Values [READ MORE]

Zilliz Cloud allows you to set default values for scalar fields (excluding the primary field). When a field has a default value configured, Zilliz Cloud automatically applies this value if no data is provided during insertion.

Analyzer [READ MORE]

In text processing, an analyzer is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements tokenizer and filter. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval. This chapter provides thorough information about using analyzers in Zilliz Cloud.

Alter Field [READ MORE]

You can alter the properties of a collection field to change column constraints or enforce stricter data integrity rules.

Alter Collection Schema [READ MORE]

As a collection moves from development to production, the fields around each entity often change. You might add scalar fields such as `sourceuri` or `reviewstatus` for filtering and application logic, add a new vector field for embeddings generated by your application. Alter Collection Schema lets you make supported field changes in place instead of recreating the collection.

Best Practices [READ MORE]

This chapter covers best practices for schema design related to your dataset.