Skip to main content

Merge Data
Private Preview

You can use this interface to add fields with or without data to an existing collection.

  • If you choose to add fields with data, you need to upload the data file to an AWS S3 bucket or a Zilliz Cloud Stage and ensure that the data share the same merge key as the source collection. The merged data will be stored in a new collection that you specify.
  • If you choose to add without data, the new fields will be created with null values.
You need to be a project admin or above to perform this operation. For details, refer to Merge Data.

POST
/v2/etl/merge
Base URL

The base URL for this API is in the following format:

https://api.cloud.zilliz.com

📘Notes

The endpoints on the control plane currently supports up to 20 requests per second per user per endpoint.

If you have encountered any issue related to this endpoint, please contact Zilliz Cloud support.

export BASE_URL="https://api.cloud.zilliz.com"
Parameters
Authorizationstringheaderrequired

The authentication token should be an API key with appropriate privileges.

Example Value: Bearer {{TOKEN}}
Request Bodyapplication/json
clusterIdstringrequired

Name of the cluster that holds the target collection of this operation.

Example Value: in00-xxxxxxxxxxxxxxxxxx
dbNamestringrequired

Name of the database that holds the target collection of this operation.

collectionNamestringrequired

Name of the target collection of this operation.

destDbNamestringrequired

Name of the database that holds the collection to create.

destCollectionNamestringrequired

Name of the collection to create. This collection will hold the merged data.

dataSourceobject

The data to be merged with the above specified collection. You need to upload the data file in PARQUET format to an AWS S3 bucket or a Zilliz Cloud Stage as the data source and then provide the URL of the data file optionally with the access credentials.

typestring

The type of the data source. When using a stage, set this parameter to stage.

Example Value: stage
stageNamestring

The name of a Zilliz Cloud stage that holds the PARQUET file. This parameter applies only when you set type to stage. For details on how to create a stage, see the documentation for the Create Stage operation.

dataPathstring

The URL of the PARQUET file to merge with the existing collection.

Example Value: path/to/your/data.parquet
mergeFieldstring

The data merging operation is similar to a LEFT JOIN operation in relational database systems, with the merge field serving as the shared key between the source collection and the Parquet file containing column-wise data. You need to provide the name of the shared key as the merge field. The merge field must be present in both the existing collection and the Parquet file. In common cases, you can use the primary key as the merge field.

newFieldsarray

The schema of the fields to create in the new collection. The schema should be an array of field schemas.

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

paramsobject

Extra settings for the current field to add.

maxLengthinteger

The maximum length of a VARCHAR field value. This parameter is applicable only when dataType is set to VARCHAR.

clusterIdstringrequired

Name of the cluster that holds the target collection of this operation.

Example Value: in00-xxxxxxxxxxxxxxxxxx
dbNamestringrequired

Name of the database that holds the target collection of this operation.

collectionNamestringrequired

Name of the target collection of this operation.

destDbNamestringrequired

Name of the database that holds the collection to create.

destCollectionNamestringrequired

Name of the collection to create. This collection will hold the merged data.

dataSourceobject

The data to be merged with the above specified collection. You need to upload the data file in PARQUET format to an AWS S3 bucket or a Zilliz Cloud Stage as the data source and then provide the URL of the data file optionally with the access credentials.

typestring

The type of the data source. Set this to s3 when you use an AWS S3 bucket

dataPathstring

The URL of the PARQUET file to merge with the existing collection.

credentialobject

The credentials to access the bucket that holds the PARQUET file. This parameter applies only when you set type to s3.

accessKeystring

The access key of the bucket that holds the PARQUET file.

secretKeystring

The secret key of the bucket that holds the PARQUET file.

mergeFieldstring

The data merging operation is similar to a LEFT JOIN operation in relational database systems, with the merge field serving as the shared key between the source collection and the Parquet file containing column-wise data. You need to provide the name of the shared key as the merge field. The merge field must be present in both the existing collection and the Parquet file. In common cases, you can use the primary key as the merge field.

newFieldsarray

The schema of the fields to add to the existing collection. The schema should be an array of field schemas.

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

paramsobject

Extra settings for the current field to add.

maxLengthinteger

The maximum length of a VARCHAR field value. This parameter is applicable only when dataType is set to VARCHAR.

export TOKEN="YOUR_API_KEY"

curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "in00-xxxxxxxxxxxxxxx",
"dbName": "my_database",
"collectionName": "my_collection",
"destDbName": "my_database",
"destCollectionName": "my_merged_collection",
"dataSource": {
"type": "stage",
"stageName": "my_stage",
"dataPath": "/path/to/your/data.parquet"
},
"mergeField": "id",
"newFields": [
{
"fieldName": "my_field1",
"dataType": "VARCHAR",
"params": {
"maxLength": 512
}
}
]
}'
Responses200 - application/json
codeinteger

Response code.

Example Value: 0
dataarray

Response payload which carries the IDs of the created data-merge jobs.

[]dataobject

A created data-merge job.

jobIdstring

The ID of the current data-merge job.

Returns an error message.

codeinteger

Response code.

messagestring

Error message.

{
"code": 0,
"data": {
"jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
}
}