Connect to Your Data
The connector is an in-built free tool that makes it easy to connect various data sources to a vector database. This guide will explain the concept of a connector and provide instructions on how to create and manage connectors in Zilliz Cloud Pipelines.
Understanding Connectors
A connector is a tool for ingesting data to Zilliz Cloud from various data sources, including Object Storage, Kafka (coming soon) and more. Taking object storage connector as an example, a connector can monitor a directory in object storage bucket and sync files such as PDFs and HTMLs to Zilliz Cloud Pipelines, so that they can be converted to vector representation and stored in vector database for search. With ingestion and deletion pipelines, the files and their vector representation in Zilliz Cloud are kept in sync. Any addition or removal of files in the object storage will be mapped to the vector database collection.
Why use a connector?
-
Real-time Data Ingestion
Effortlessly ingest and index data in real-time, guaranteeing that the freshest content is instantly accessible for all search inquiries.
-
Scalable and Adaptive
Easily scale up your data ingestion pipeline with zero DevOps hassle. The adaptive connectors seamlessly handle fluctuating traffic loads, ensuring smooth scalability.
-
Search Index Kept in Sync With Heterogeneous Sources
Automatically sync the addition and deletion of documents to the search index. Moreover, fuse all common types of data source (coming soon).
-
Observability
Gain insight into your dataflow with detailed logging, ensuring transparency and detecting any anomalies that may arise.
Create Connector
Zilliz Cloud Pipelines provides flexible options when you create a connector. Once a connector is created, it will periodically scan your data sources and ingest data into your vector database at regular intervals.
Prerequisites
-
Ensure you have created a collection.
-
Ensure the created collection has a doc ingestion pipeline and deletion pipeline(s).
Currently, Zilliz Cloud Connector only supports processing doc data.
Procedures
-
Navigate to your project. Click on Pipelines from the navigation panel. Then switch to the Connectors tab. Click + Connectors.
-
Link to your data source.
-
Set up the basic information of the connector.
Parameter
Description
Connector Name
The name of the connector to create.
Description (optional)
The description of the connector.
-
Configure the data source information.
Parameter
Description
Object Storage Service
Select the object storage service of your data source. Available options include:
AWS S3
Google Cloud Storage.
Bucket URL
Provide the bucket URL used for accessing your source data. Please make sure you enter the URL of a file directory instead of a specific file. In addition, root directory is not supported.
To learn more about how to obtain the URL, please refer to:
Access Keys for authorization (optional)
Provide the following information for authorization if necessary:
For AWS S3, please provide the access key and secret key.
For Google Cloud Storage, please provide the access key ID and secret access key.
Click Link and Continue to proceed to the next step.
📘NotesZilliz Cloud will verify the connection to your data source before moving to the next step.
-
-
Add target Pipelines.
First, choose a target cluster, then a collection with one ingestion pipeline and deletion pipeline(s). The target ingestion pipeline should only have an INDEX_DOC function. If multiple deletion pipelines are available, select the appropriate one manually.
📘NotesThis step can be skipped and completed later before initiating a scan.
-
Choose whether to enable auto scan.
-
When it is disabled, you will need to manually trigger a scan if there are any updates to the source data.
-
When it is enabled, Zilliz Cloud will periodically scan the data source and sync the file addition/deletion to vector database collection through the designated ingestion/deletion pipelines. You will need to set up the auto scan schedule.
Parameter
Description
Frequency
Set how often the system performs scans.
Daily: Choose any number from 1 to 7.
Hourly: Options are 1, 6, 12, or 18 hours.
Next Run at
Specify the time for the next scan. The time zone is consistent with the system time zone in organization settings.
-
-
Click Create.
Manage Connector
Managing connectors efficiently is integral to maintaining a smooth data integration process. This guide provides detailed instructions on how to manage connectors.
Enable or disable a connector
-
Locate the connector you want to manage.
-
Click ... under Actions.
-
Choose Enable or Disable.
To activate a connector, ensure the target pipelines are configured.
Trigger a manual scan
Perform a manual scan if the auto scan feature is off.
Click "..." under Actions next to the target connector, then click Scan.
Ensure the connector is enabled before initiating a manual scan.
Configure a connector
You can modify the following settings of a connector:
-
Storage bucket access credentials:
-
(For AWS S3) access key and secret key
-
(For Google Cloud Storage) access key ID and secret access key
-
-
Auto scan schedule. For more information, refer to step 4 in the procedure for creating connectors.
Drop a connector
You can drop a connector if it is no longer necessary.
The connector must be disabled before being dropped.
View connector logs
Monitor connector activities and troubleshoot issues:
-
Access the connector's activity page to view logs.
-
An
abnormal
status indicates an error. Click the "?" icon next to the status for detailed error messages.
View related connectors in a pipeline
To view all the linked connectors in a pipeline, please check the pipeline details.