Version: User Guides (Cloud)

Connect to Your Data
About to Deprecate

The connector is an in-built free tool that makes it easy to connect various data sources to a vector database. This guide will explain the concept of a connector and provide instructions on how to create and manage connectors in Zilliz Cloud Pipelines.

📘Notes

Zilliz Cloud Pipelines will be discontinued by the end of Q2 2025 and replaced by a new feature, “Data In, Data Out,” to streamline embedding generation in both Milvus and Zilliz Cloud. As of December 24, 2024, new user registrations are no longer accepted. Current users can continue using the service within the $20 monthly free allowance until the sunset date; however, no SLA is provided. Please consider using embedding APIs from model providers or open-source models to generate vector embeddings.

Understanding Connectors

A connector is a tool for ingesting data to Zilliz Cloud from various data sources, including Object Storage, Kafka (coming soon) and more. Taking object storage connector as an example, a connector can monitor a directory in object storage bucket and sync files such as PDFs and HTMLs to Zilliz Cloud Pipelines, so that they can be converted to vector representation and stored in vector database for search. With ingestion and deletion pipelines, the files and their vector representation in Zilliz Cloud are kept in sync. Any addition or removal of files in the object storage will be mapped to the vector database collection.

connector-overview

Why use a connector?

Real-time Data Ingestion

Effortlessly ingest and index data in real-time, guaranteeing that the freshest content is instantly accessible for all search inquiries.
Scalable and Adaptive

Easily scale up your data ingestion pipeline with zero DevOps hassle. The adaptive connectors seamlessly handle fluctuating traffic loads, ensuring smooth scalability.
Search Index Kept in Sync With Heterogeneous Sources

Automatically sync the addition and deletion of documents to the search index. Moreover, fuse all common types of data source (coming soon).
Observability

Gain insight into your dataflow with detailed logging, ensuring transparency and detecting any anomalies that may arise.

Create Connector

Zilliz Cloud Pipelines provides flexible options when you create a connector. Once a connector is created, it will periodically scan your data sources and ingest data into your vector database at regular intervals.

Prerequisites

Ensure you have created a collection.
Ensure the created collection has a doc ingestion pipeline and deletion pipeline(s).

📘Notes

Currently, Zilliz Cloud Connector only supports processing doc data.

Procedures

Navigate to your project. Click on Pipelines from the navigation panel. Then switch to the Connectors tab. Click + Connectors.

Link to your data source.

Set up the basic information of the connector.

Parameter
Description
Connector Name
The name of the connector to create.
Description (optional)
The description of the connector.

Parameter	Description
Connector Name	The name of the connector to create.
Description (optional)	The description of the connector.

Configure the data source information.

Parameter	Description
Object Storage Service	Select the object storage service of your data source. Available options include: AWS S3 Google Cloud Storage.
Bucket URL	Provide the bucket URL used for accessing your source data. Please make sure you enter the URL of a file directory instead of a specific file. In addition, root directory is not supported. To learn more about how to obtain the URL, please refer to: Accessing and listing an Amazon S3 bucket Discover object storage with the Google Cloud console
Access Keys for authorization (optional)	Provide the following information for authorization if necessary: For AWS S3, please provide the access key and secret key. For Google Cloud Storage, please provide the access key ID and secret access key.

Parameter

Description

Object Storage Service

Select the object storage service of your data source. Available options include:

AWS S3
Google Cloud Storage.

Bucket URL

Provide the bucket URL used for accessing your source data. Please make sure you enter the URL of a file directory instead of a specific file. In addition, root directory is not supported.

To learn more about how to obtain the URL, please refer to:

Access Keys for authorization (optional)

Provide the following information for authorization if necessary:

For AWS S3, please provide the access key and secret key.
For Google Cloud Storage, please provide the access key ID and secret access key.

Click Link and Continue to proceed to the next step.

📘Notes

Zilliz Cloud will verify the connection to your data source before moving to the next step.

link-data-source

Add target Pipelines.

First, choose a target cluster, then a collection with one ingestion pipeline and deletion pipeline(s). The target ingestion pipeline should only have an INDEX_DOC function. If multiple deletion pipelines are available, select the appropriate one manually.

📘Notes
This step can be skipped and completed later before initiating a scan.

Choose whether to enable auto scan.

When it is disabled, you will need to manually trigger a scan if there are any updates to the source data.

When it is enabled, Zilliz Cloud will periodically scan the data source and sync the file addition/deletion to vector database collection through the designated ingestion/deletion pipelines. You will need to set up the auto scan schedule.

Parameter	Description
Frequency	Set how often the system performs scans. Daily: Choose any number from 1 to 7. Hourly: Options are 1, 6, 12, or 18 hours.
Next Run at	Specify the time for the next scan. The time zone is consistent with the system time zone in organization settings.

Parameter

Description

Frequency

Set how often the system performs scans.

Daily: Choose any number from 1 to 7.
Hourly: Options are 1, 6, 12, or 18 hours.

Next Run at

Specify the time for the next scan. The time zone is consistent with the system time zone in organization settings.

enable-auto-scan

Click Create.

Manage Connector

Managing connectors efficiently is integral to maintaining a smooth data integration process. This guide provides detailed instructions on how to manage connectors.

Enable or disable a connector

Locate the connector you want to manage.
Click ... under Actions.
Choose Enable or Disable.

📘Notes

To activate a connector, ensure the target pipelines are configured.

enable-connector

Trigger a manual scan

Perform a manual scan if the auto scan feature is off.

Click "..." under Actions next to the target connector, then click Scan.

📘Notes

Ensure the connector is enabled before initiating a manual scan.

Configure a connector

You can modify the following settings of a connector:

Storage bucket access credentials:
- (For AWS S3) access key and secret key
- (For Google Cloud Storage) access key ID and secret access key
Auto scan schedule. For more information, refer to step 4 in the procedure for creating connectors.

configure-connector

Drop a connector

You can drop a connector if it is no longer necessary.

📘Notes

The connector must be disabled before being dropped.

drop-connector

View connector logs

Monitor connector activities and troubleshoot issues:

Access the connector's activity page to view logs.
An abnormal status indicates an error. Click the "?" icon next to the status for detailed error messages.

To view all the linked connectors in a pipeline, please check the pipeline details.

Understanding Connectors​

Why use a connector?​

Create Connector​

Prerequisites​

Procedures​

Manage Connector​

Enable or disable a connector​

Trigger a manual scan​

Configure a connector​

Drop a connector​

View connector logs​

View related connectors in a pipeline​

Understanding Connectors

Why use a connector?

Create Connector

Prerequisites

Procedures

Manage Connector

Enable or disable a connector

Trigger a manual scan

Configure a connector

Drop a connector

View connector logs

View related connectors in a pipeline