Skip to main content

Merge Data
Private Preview

Performs a left join between an existing collection and data records from a new field stored at a specified location, using a common merge key and generates a new collection containing the merged data. You must be a project owner to perform this operation.

POST
/v2/etl/merge
Base URL

The base URL for this API is in the following format:

https://api.cloud.zilliz.com

📘Notes

The endpoints on the control plane currently supports up to 20 requests per second per user per endpoint.

This endpoint is currently in Public Preview. If you have encountered any issue related to this endpoint, please contact Zilliz Cloud support.

export BASE_URL="https://api.cloud.zilliz.com"
Parameters
Authorizationstringheaderrequired

The authentication token should be an API key with appropriate privileges.

Example Value: Bearer {{TOKEN}}
Request Bodyapplication/json
clusterIdstring

Name of the cluster that holds the target collection of this operation.

Example Value: in00-xxxxxxxxxxxxxxxxxx
dbNamestring

Name of the database that holds the target collection of this operation.

collectionNamestring

Name of the target collection of this operation.

destDbNamestring

Name of the database that holds the collection to create.

destCollectionNamestring

Name of the collection to create. This collection will hold the merged data.

dataSourceobject

The data to merge with the existing collection. The data should be a PARQUET file stored in a storage spot that Zilliz Cloud can access. You need to provide the URL of the PARQUET file and the credentials to access the bucket that holds the file.

typestring

The type of the data source.

stageNamestring

The name of a Zilliz Cloud stage that holds the PARQUET file. This parameter applies only when you set type to stage. For details on how to create a stage, see the documentation for the Create Stage operation.

dataPathstring

The URL of the PARQUET file to merge with the existing collection.

Example Value: s3://my-bucket/my_data.parquet
credentialobject

The credentials to access the bucket that holds the PARQUET file.

accessKeystring

The access key of the bucket that holds the PARQUET file.

secretKeystring

The secret key of the bucket that holds the PARQUET file.

mergeFieldstring

The name of the field to use as the merge field. The merge field must be present in both the existing collection and the new data records. In common cases, you can use the primary key as the merge field.

newFieldsarray

The schema of the fields to create in the new collection. The schema should be an array of field schemas.

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

paramsobject

Extra settings for the current field to add.

maxLengthinteger

The maximum length of a VARCHAR field value. This parameter is applicable only when dataType is set to VARCHAR.

export TOKEN="YOUR_API_KEY"

curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "in00-xxxxxxxxxxxxxxx",
"dbName": "my_database",
"collectionName": "my_collection",
"targetDbName": "my_database",
"targetCollectionName": "my_merged_collection",
"mergeData": {
"dataPath": "s3://my-bucket/my_data.parquet",
"regionId": "us-west-2",
"credential": {
"accessKey": "my-access-key",
"secretKey": "my-secret-key"
}
},
"mergeKey": "id",
"mergeFieldSchema": [
{
"name": "my_field1",
"dataType": "VARCHAR",
"params": {
"maxLength": 512
}
}
]
}'
Responses200 - application/json
codeinteger

Response code.

Example Value: 0
dataarray

Response payload which carries the IDs of the created data-merge jobs.

[]dataobject

A created data-merge job.

jobIdstring

The ID of the current data-merge job.

Returns an error message.

codeinteger

Response code.

messagestring

Error message.

{
"code": 0,
"data": {
"jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
}
}