Import Data (RESTful API)
This page introduces how to import the prepared data via the Zilliz Cloud RESTful API.
Before you start
Make sure the following conditions are met:
-
You have obtained an API key for your cluster. For details, see API Keys.
-
You have prepared your data in either of the supported formats.
For details on how to prepare your data, refer to Prepare Source Data. You can also refer to the end-to-end notebook Data Import from Zero to Hero to get more.
-
You have created a collection with a schema matching the example dataset and already have the collection indexed and loaded. For details, see Example Dataset and Manage Collections.
Import data using the RESTful API
To import data from files using the RESTful API, you must first upload the files to an object storage bucket. Once uploaded, obtain the path to the files in the remote bucket and bucket credentials for Zilliz Cloud to pull data from your bucket. For details on supported object paths, refer to From remote buckets.
Based on your data security requirements, you can use either long-term credentials or session tokens during data import.
For more information about obtaining credentials, refer to:
-
Amazon S3: Authenticate using long-term credentials
-
Google Cloud Storage: Manage HMAC keys for service accounts
-
Azure Blob Storage: View account access keys
For more information about using session tokens, refer to the FAQ.
For successful data import, ensure the target collection has less than 10 running or pending import jobs.
Once the object path and bucket credentials are obtained, call the API as follows:
# replace url and token with your own
curl --request POST \
--url "https://api.cloud.zilliz.com/v2/vectordb/jobs/import/create" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"collectionName": "medium_articles",
"partitionName": "",
"objectUrl": "https://s3.us-west-2.amazonaws.com/publicdataset.zillizcloud.com/medium_articles_2020_dpr/medium_articles_2020_dpr.json",
"accessKey": "",
"secretKey": ""
}'
To import data into a specific partition, you need to include partitionName
in the request.
After Zilliz Cloud processes the above request, you will receive a job ID. Use this job ID to monitor the import progress with the following command:
curl --request GET \
--url "https://api.cloud.zilliz.com/v2/vectordb/jobs/import/getProgress" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Accept: application/json" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "inxx-xxxxxxxxxxxxxxx",
"jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
}'
For details, see Import and Get Import Progress.
Verify the result
If the command output is similar as follows, the import job is successfully submitted:
{
"code": 0,
"data": {
"jobID": "job-xxxxxxxxxxxxxxxxxxxxx"
}
}
You can also call RESTful APIs to get the progress of the current import job and list all import jobs to get more. As an alternative, you can also go to the Zilliz Cloud console to view the result and job details: