Skip to main content

VolumeBulkWriter

Addedv2.6.x

A VolumeBulkWriter instance rewrites your raw data locally in a format that Zilliz Cloud understands, and then uploads the resulting files to a remote volume in Zilliz Cloud.

class pymilvus.bulk_writer.VolumeBulkWriter(LocalBulkWriter)

Constructor

VolumeBulkWriter(
schema: CollectionSchema,
remote_path: str,
cloud_endpoint: str,
api_key: str,
volume_name: str,
chunk_size: int = 1024 * MB,
file_type: BulkFileType = BulkFileType.PARQUET,
config: Optional[dict] = None,
**kwargs,
)

PARAMETERS:

  • schema (CollectionSchema) -

    [REQUIRED]

    The schema of a target collection to which the rewritten data is to be imported.

  • remote_path (str) -

    [REQUIRED]

    The path to the directory in the remote volume that is to hold the rewritten data.

  • cloud_endpoint (str) -

    [REQUIRED]

    The endpoint URL of the Zilliz Cloud instance.

  • api_key (str) -

    [REQUIRED]

    The API key used to authenticate with the Zilliz Cloud instance.

  • volume_name (str) -

    [REQUIRED]

    The name of the remote volume in Zilliz Cloud to which the files are uploaded.

  • chunk_size (int) -

    The maximum size of a file segment.

    While rewriting your raw data, Zilliz Cloud segments the data into batches and stores each batch in a separate file.

    The value defaults to 1,073,741,824 in bytes, which is 1 GB.

  • file_type (BulkFileType) -

    The file type of the output files.

    The value defaults to BulkFileType.PARQUET.

  • config (dict) -

    Optional configuration parameters for the bulk writer.

Notes

A VolumeBulkWriter is a context manager and can be used in a with statement. When the context exits, the local working directory is cleaned up.

Properties

The following are the properties of the VolumeBulkWriter class.

  • data_path (str)

    Returns the remote path where the uploaded files are stored.

  • batch_files (List[List[str]])

    Returns the list of uploaded file batches. Each inner list contains the remote paths of files uploaded in a single commit.

Examples

from pymilvus.bulk_writer.volume_bulk_writer import VolumeBulkWriter
from pymilvus import CollectionSchema, FieldSchema, DataType

# Define collection schema
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=False),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=128),
]
schema = CollectionSchema(fields, "example_collection")

# Create VolumeBulkWriter
with VolumeBulkWriter(
schema=schema,
remote_path="/data/bulk_import",
cloud_endpoint="https://your-cloud-endpoint.zillizcloud.com",
api_key="your-api-key",
volume_name="my-volume",
chunk_size=1024 * 1024 * 1024,
file_type=BulkFileType.PARQUET,
) as writer:
# Append rows
for i in range(1000):
writer.append_row({
"id": i,
"vector": [0.1] * 128,
})

# Commit and upload
writer.commit()

print(writer.data_path)
print(writer.batch_files)