Skip to main content
Version: User Guides (Cloud)

Connect to Your Data

The connector is an in-built free tool that makes it easy to connect various data sources to a vector database. This guide will explain the concept of a connector and provide instructions on how to create and manage connectors in Zilliz Cloud Pipelines.

Understanding Connectors

A connector is a tool for ingesting data to Zilliz Cloud from various data sources, including Object Storage, Kafka (coming soon) and more. Taking object storage connector as an example, a connector can monitor a directory in object storage bucket and sync files such as PDFs and HTMLs to Zilliz Cloud Pipelines, so that they can be converted to vector representation and stored in vector database for search. With ingestion and deletion pipelines, the files and their vector representation in Zilliz Cloud are kept in sync. Any addition or removal of files in the object storage will be mapped to the vector database collection.

connector-overview

Why use a connector?

  1. Real-time Data Ingestion

    Effortlessly ingest and index data in real-time, guaranteeing that the freshest content is instantly accessible for all search inquiries.

  2. Scalable and Adaptive

    Easily scale up your data ingestion pipeline with zero DevOps hassle. The adaptive connectors seamlessly handle fluctuating traffic loads, ensuring smooth scalability.

  3. Search Index Kept in Sync With Heterogeneous Sources

    Automatically sync the addition and deletion of documents to the search index. Moreover, fuse all common types of data source (coming soon).

  4. Observability

    Gain insight into your dataflow with detailed logging, ensuring transparency and detecting any anomalies that may arise.

Create Connector

Zilliz Cloud Pipelines provides flexible options when you create a connector. Once a connector is created, it will periodically scan your data sources and ingest data into your vector database at regular intervals.

Prerequisites

  • Ensure you have created a collection.

  • Ensure the created collection has a doc ingestion pipeline and deletion pipeline(s).

📘Notes

Currently, Zilliz Cloud Connector only supports processing doc data.

Procedures

  1. Navigate to your project. Click on Pipelines from the navigation panel. Then switch to the Connectors tab. Click + Connectors.

    create-connector

  2. Link to your data source.

    1. Set up the basic information of the connector.

      Parameter

      Description

      Connector Name

      The name of the connector to create.

      Description (optional)

      The description of the connector.

    2. Configure the data source information.

      Parameter

      Description

      Object Storage Service

      Select the object storage service of your data source. Available options include:

      • AWS S3

      • Google Cloud Storage.

      Bucket URL

      Provide the bucket URL used for accessing your source data. Please make sure you enter the URL of a file directory instead of a specific file. In addition, root directory is not supported.

      To learn more about how to obtain the URL, please refer to:

      Access Keys for authorization (optional)

      Provide the following information for authorization if necessary:

      Click Link and Continue to proceed to the next step.

      📘Notes

      Zilliz Cloud will verify the connection to your data source before moving to the next step.

      link-data-source

  3. Add target Pipelines.

    First, choose a target cluster, then a collection with one ingestion pipeline and deletion pipeline(s). The target ingestion pipeline should only have an INDEX_DOC function. If multiple deletion pipelines are available, select the appropriate one manually.

    📘Notes

    This step can be skipped and completed later before initiating a scan.

    add-target-pipelines

  4. Choose whether to enable auto scan.

    • When it is disabled, you will need to manually trigger a scan if there are any updates to the source data.

    • When it is enabled, Zilliz Cloud will periodically scan the data source and sync the file addition/deletion to vector database collection through the designated ingestion/deletion pipelines. You will need to set up the auto scan schedule.

      Parameter

      Description

      Frequency

      Set how often the system performs scans.

      • Daily: Choose any number from 1 to 7.

      • Hourly: Options are 1, 6, 12, or 18 hours.

      Next Run at

      Specify the time for the next scan. The time zone is consistent with the system time zone in organization settings.

      enable-auto-scan

  5. Click Create.

Manage Connector

Managing connectors efficiently is integral to maintaining a smooth data integration process. This guide provides detailed instructions on how to manage connectors.

Enable or disable a connector

  1. Locate the connector you want to manage.

  2. Click ... under Actions.

  3. Choose Enable or Disable.

📘Notes

To activate a connector, ensure the target pipelines are configured.

enable-connector

Trigger a manual scan

Perform a manual scan if the auto scan feature is off.

Click "..." under Actions next to the target connector, then click Scan.

📘Notes

Ensure the connector is enabled before initiating a manual scan.

Configure a connector

You can modify the following settings of a connector:

  • Storage bucket access credentials:

    • (For AWS S3) access key and secret key

    • (For Google Cloud Storage) access key ID and secret access key

  • Auto scan schedule. For more information, refer to step 4 in the procedure for creating connectors.

configure-connector

Drop a connector

You can drop a connector if it is no longer necessary.

📘Notes

The connector must be disabled before being dropped.

drop-connector

View connector logs

Monitor connector activities and troubleshoot issues:

  1. Access the connector's activity page to view logs.

    view-connector-logs

  2. An abnormal status indicates an error. Click the "?" icon next to the status for detailed error messages.

To view all the linked connectors in a pipeline, please check the pipeline details.