Merge Data
Private Preview

Performs a left join between an existing collection and data records from a new field stored at a specified location, using a common merge key and generates a new collection containing the merged data. You must be a project owner to perform this operation.

POST

/v2/etl/merge

Base URL

The base URL for this API is in the following format:

https://api.cloud.zilliz.com

📘Notes

The endpoints on the control plane currently supports up to 20 requests per second per user per endpoint.

This endpoint is currently in Public Preview. If you have encountered any issue related to this endpoint, please contact Zilliz Cloud support.

export BASE_URL="https://api.cloud.zilliz.com"

Parameters

Authorizationstringheaderrequired

The authentication token should be an API key with appropriate privileges.

Example Value: Bearer {{TOKEN}}

Request Bodyapplication/json

USE EXTERNAL STORAGE

clusterIdstring

Name of the cluster that holds the target collection of this operation.

Example Value: in00-xxxxxxxxxxxxxxxxxx

dbNamestring

Name of the database that holds the target collection of this operation.

collectionNamestring

Name of the target collection of this operation.

destDbNamestring

Name of the database that holds the collection to create.

destCollectionNamestring

Name of the collection to create. This collection will hold the merged data.

dataSourceobject

The data to merge with the existing collection. The data should be a PARQUET file stored in a storage spot that Zilliz Cloud can access. You need to provide the URL of the PARQUET file and the credentials to access the bucket that holds the file.

typestring

The type of the data source. When using S3, you need to set this parameter to s3.

dataPathstring

The URL of the PARQUET file to merge with the existing collection.

Example Value: s3://my-bucket/my_data.parquet

credentialobject

The credentials to access the bucket that holds the PARQUET file. This parameter applies only when you set type to s3.

accessKeystring

The access key of the bucket that holds the PARQUET file.

secretKeystring

The secret key of the bucket that holds the PARQUET file.

mergeFieldstring

The data merging operation is similar to a LEFT JOIN operation in relational database systems, with the merge field serving as the shared key between the source collection and the Parquet file containing column-wise data. You need to provide the name of the shared key as the merge field. The merge field must be present in both the existing collection and the Parquet file. In common cases, you can use the primary key as the merge field.

newFieldsarray

The schema of the fields to create in the new collection. The schema should be an array of field schemas.

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

Possible Values:

paramsobject

Extra settings for the current field to add.

maxLengthinteger

The maximum length of a VARCHAR field value. This parameter is applicable only when dataType is set to VARCHAR.

USE STAGE

clusterIdstring

Name of the cluster that holds the target collection of this operation.

Example Value: in00-xxxxxxxxxxxxxxxxxx

dbNamestring

Name of the database that holds the target collection of this operation.

collectionNamestring

Name of the target collection of this operation.

destDbNamestring

Name of the database that holds the collection to create.

destCollectionNamestring

Name of the collection to create. This collection will hold the merged data.

dataSourceobject

typestring

The type of the data source. When using a stage, set this parameter to stage.

Example Value: stage

stageNamestring

The name of a Zilliz Cloud stage that holds the PARQUET file. This parameter applies only when you set type to stage. For details on how to create a stage, see the documentation for the Create Stage operation.

dataPathstring

The URL of the PARQUET file to merge with the existing collection.

Example Value: path/to/your/data.parquet

mergeFieldstring

newFieldsarray

The schema of the fields to create in the new collection. The schema should be an array of field schemas.

[]newFieldsobject

The schema of a field to add.

fieldNamestring

Name of the current field to add.

dataTypestring

Data type of the current field to add.

Possible Values:

paramsobject

Extra settings for the current field to add.

maxLengthinteger

The maximum length of a VARCHAR field value. This parameter is applicable only when dataType is set to VARCHAR.

export TOKEN="YOUR_API_KEY"

curl --request POST \
--url "${BASE_URL}/v2/etl/merge" \
--header "Authorization: Bearer ${TOKEN}" \
--header "Content-Type: application/json" \
-d '{
    "clusterId": "in00-xxxxxxxxxxxxxxx",
    "dbName": "my_database",
    "collectionName": "my_collection",
    "destDbName": "my_database",
    "destCollectionName": "my_merged_collection",
    "dataSource": {
        "type": "s3",
        "dataPath": "s3://my-bucket/my_data.parquet",
        "credential": {
            "accessKey": "my-access-key",
            "secretKey": "my-secret-key"
        }
    },
    "mergeField": "id",
    "newFields": [
        {
            "name": "my_field1",
            "dataType": "VARCHAR",
            "params": {
                "maxLength": 512
            }
        }
    ]
}'

Responses200 - application/json

SUCCESS

codeinteger

Response code.

Example Value: 0

dataarray

Response payload which carries the IDs of the created data-merge jobs.

[]dataobject

A created data-merge job.

jobIdstring

The ID of the current data-merge job.

FAILURE

Returns an error message.

codeinteger

Response code.

messagestring

Error message.

SUCCESS

{
    "code": 0,
    "data": {
        "jobId": "job-xxxxxxxxxxxxxxxxxxxxx"
    }
}