Merge DataPrivate Preview
Performs a left join between an existing collection and data records from a new field stored at a specified location, using a common merge key and generates a new collection containing the merged data. You must be a project owner to perform this operation.
The base URL for this API is in the following format:
https://api.cloud.zilliz.com
The endpoints on the control plane currently supports up to 20 requests per second per user per endpoint.
This endpoint is currently in Public Preview. If you have encountered any issue related to this endpoint, please contact Zilliz Cloud support.
export BASE_URL="https://api.cloud.zilliz.com"
The authentication token should be an API key with appropriate privileges.
Name of the cluster that holds the target collection of this operation.
Name of the database that holds the target collection of this operation.
Name of the target collection of this operation.
Name of the database that holds the collection to create.
Name of the collection to create. This collection will hold the merged data.
The data to merge with the existing collection. The data should be a PARQUET file stored in a storage spot that Zilliz Cloud can access. You need to provide the URL of the PARQUET file and the credentials to access the bucket that holds the file.
The type of the data source.
The name of a Zilliz Cloud stage that holds the PARQUET file. This parameter applies only when you set type
to stage
. For details on how to create a stage, see the documentation for the Create Stage
operation.
The URL of the PARQUET file to merge with the existing collection.
The credentials to access the bucket that holds the PARQUET file.
The access key of the bucket that holds the PARQUET file.
The secret key of the bucket that holds the PARQUET file.
The name of the field to use as the merge field. The merge field must be present in both the existing collection and the new data records. In common cases, you can use the primary key as the merge field.
The schema of the fields to create in the new collection. The schema should be an array of field schemas.
The schema of a field to add.
Name of the current field to add.
Data type of the current field to add.
Extra settings for the current field to add.
The maximum length of a VARCHAR field value. This parameter is applicable only when dataType
is set to VARCHAR
.
export TOKEN="YOUR_API_KEY"
curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "in00-xxxxxxxxxxxxxxx",
"dbName": "my_database",
"collectionName": "my_collection",
"targetDbName": "my_database",
"targetCollectionName": "my_merged_collection",
"mergeData": {
"dataPath": "s3://my-bucket/my_data.parquet",
"regionId": "us-west-2",
"credential": {
"accessKey": "my-access-key",
"secretKey": "my-secret-key"
}
},
"mergeKey": "id",
"mergeFieldSchema": [
{
"name": "my_field1",
"dataType": "VARCHAR",
"params": {
"maxLength": 512
}
}
]
}'
Response code.
Response payload which carries the IDs of the created data-merge jobs.
A created data-merge job.
The ID of the current data-merge job.
Returns an error message.
Response code.
Error message.
{
"code": 0,
"data": {
"jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
}
}