Schema & Data Fields
A schema defines the data structure of a collection and determines the names, order, data types, and related attributes of the collection fields. This chapter mainly discusses the schema and related concepts.
Schema Explained [READ MORE]
A schema defines the data structure of a collection. Before creating a collection, you need to work out a design of its schema. This page helps you understand the collection schema and design an example schema on your own.
Primary Field & AutoID [READ MORE]
The primary field uniquely identifies an entity. This page introduces how to add the primary field of two different data types and how to enable Zilliz Cloud to automatically allocate primary field values.
Dense Vector [READ MORE]
Dense vectors are numerical data representations widely used in machine learning and data analysis. They consist of arrays with real numbers, where most or all elements are non-zero. Compared to sparse vectors, dense vectors contain more information at the same dimensional level, as each dimension holds meaningful values. This representation can effectively capture complex patterns and relationships, making data easier to analyze and process in high-dimensional spaces. Dense vectors typically have a fixed number of dimensions, ranging from a few dozen to several hundred or even thousands, depending on the specific application and requirements.
Binary Vector [READ MORE]
Binary vectors are a special form of data representation that convert traditional high-dimensional floating-point vectors into binary vectors containing only 0s and 1s. This transformation not only compresses the size of the vector but also reduces storage and computational costs while retaining semantic information. When precision for non-critical features is not essential, binary vectors can effectively maintain most of the integrity and utility of the original floating-point vectors.
Sparse Vector [READ MORE]
Sparse vectors are an important method of data representation in information retrieval and natural language processing. While dense vectors are popular for their excellent semantic understanding capabilities, sparse vectors often provide more accurate results when it comes to applications that require precise matching of keywords or phrases.
String Field [READ MORE]
In Zilliz Cloud clusters, `VARCHAR` is the data type used for storing string data. When you define a `VARCHAR` field, two parameters are mandatory
Number Field [READ MORE]
A number field is a scalar field that stores numeric values. These values can be whole numbers (integers) or decimal numbers (floating-point numbers). They are typically used to represent quantities, measurements, or any data that needs to be mathematically processed.
JSON Field [READ MORE]
A JSON field is a scalar field that stores additional information along with vector embeddings, in key-value pairs. Here's an example of how data is stored in JSON format
Array Field [READ MORE]
An ARRAY field stores an ordered set of elements of the same data type. Here's an example of how ARRAY fields store data
Dynamic Field [READ MORE]
All fields defined in the schema of a collection must be included in the entities to be inserted. If you want some fields to be optional, consider enabling the dynamic field. This topic describes how to enable and use the dynamic field.
Nullable & Default [READ MORE]
Zilliz Cloud allows you to set the `nullable` attribute and default values for scalar fields, except the primary field. For fields marked as `nullable=True`, you can skip the field when inserting data, or set it directly to a null value, and the system will treat it as null without causing an error. When a field has a default value, the system will automatically apply this value if no data is specified for the field during insertion.
Analyzer [READ MORE]
In text processing, an analyzer is a crucial component that converts raw text into a structured, searchable format. Each analyzer typically consists of two core elements tokenizer and filter. Together, they transform input text into tokens, refine these tokens, and prepare them for efficient indexing and retrieval. This chapter provides thorough information about using analyzers in Zilliz Cloud.
Alter Collection Field [READ MORE]
You can alter the properties of a collection field to change column constraints or enforce stricter data integrity rules.
Hands-On [READ MORE]
Information Retrieval (IR) systems, also known as search engines, are essential for various AI applications such as Retrieval-augmented generation (RAG), image search, and product recommendation. The first step in developing an IR system is designing the data model, which involves analyzing business requirements, determining how to organize information, and indexing data to make it semantically searchable.