Merge DataPrivate Preview
You can use this interface to add fields with or without data to an existing collection.
- If you choose to add fields with data, you need to upload the data file to an AWS S3 bucket or a Zilliz Cloud Stage and ensure that the data share the same merge key as the source collection. The merged data will be stored in a new collection that you specify.
- If you choose to add without data, the new fields will be created with null values.
The base URL for this API is in the following format:
https://api.cloud.zilliz.com
The endpoints on the control plane currently supports up to 20 requests per second per user per endpoint.
If you have encountered any issue related to this endpoint, please contact Zilliz Cloud support.
export BASE_URL="https://api.cloud.zilliz.com"
The authentication token should be an API key with appropriate privileges.
Name of the cluster that holds the target collection of this operation.
Name of the database that holds the target collection of this operation.
Name of the target collection of this operation.
Name of the database that holds the collection to create.
Name of the collection to create. This collection will hold the merged data.
The data to be merged with the above specified collection. You need to upload the data file in PARQUET format to an AWS S3 bucket or a Zilliz Cloud Stage as the data source and then provide the URL of the data file optionally with the access credentials.
The type of the data source. When using a stage, set this parameter to stage
.
The name of a Zilliz Cloud stage that holds the PARQUET file. This parameter applies only when you set type
to stage
. For details on how to create a stage, see the documentation for the Create Stage operation.
The URL of the PARQUET file to merge with the existing collection.
The data merging operation is similar to a LEFT JOIN operation in relational database systems, with the merge field serving as the shared key between the source collection and the Parquet file containing column-wise data. You need to provide the name of the shared key as the merge field. The merge field must be present in both the existing collection and the Parquet file. In common cases, you can use the primary key as the merge field.
The schema of the fields to create in the new collection. The schema should be an array of field schemas.
The schema of a field to add.
Name of the current field to add.
Data type of the current field to add.
Extra settings for the current field to add.
The maximum length of a VARCHAR field value. This parameter is applicable only when dataType
is set to VARCHAR
.
Name of the cluster that holds the target collection of this operation.
Name of the database that holds the target collection of this operation.
Name of the target collection of this operation.
Name of the database that holds the collection to create.
Name of the collection to create. This collection will hold the merged data.
The data to be merged with the above specified collection. You need to upload the data file in PARQUET format to an AWS S3 bucket or a Zilliz Cloud Stage as the data source and then provide the URL of the data file optionally with the access credentials.
The type of the data source. Set this to s3
when you use an AWS S3 bucket
The URL of the PARQUET file to merge with the existing collection.
The credentials to access the bucket that holds the PARQUET file. This parameter applies only when you set type
to s3
.
The access key of the bucket that holds the PARQUET file.
The secret key of the bucket that holds the PARQUET file.
The data merging operation is similar to a LEFT JOIN operation in relational database systems, with the merge field serving as the shared key between the source collection and the Parquet file containing column-wise data. You need to provide the name of the shared key as the merge field. The merge field must be present in both the existing collection and the Parquet file. In common cases, you can use the primary key as the merge field.
The schema of the fields to add to the existing collection. The schema should be an array of field schemas.
The schema of a field to add.
Name of the current field to add.
Data type of the current field to add.
Extra settings for the current field to add.
The maximum length of a VARCHAR field value. This parameter is applicable only when dataType
is set to VARCHAR
.
export TOKEN="YOUR_API_KEY"
curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
"clusterId": "in00-xxxxxxxxxxxxxxx",
"dbName": "my_database",
"collectionName": "my_collection",
"destDbName": "my_database",
"destCollectionName": "my_merged_collection",
"dataSource": {
"type": "stage",
"stageName": "my_stage",
"dataPath": "/path/to/your/data.parquet"
},
"mergeField": "id",
"newFields": [
{
"fieldName": "my_field1",
"dataType": "VARCHAR",
"params": {
"maxLength": 512
}
}
]
}'
Response code.
Response payload which carries the IDs of the created data-merge jobs.
A created data-merge job.
The ID of the current data-merge job.
Returns an error message.
Response code.
Error message.
{
"code": 0,
"data": {
"jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
}
}