Skip to main content
Version: User Guides (Cloud)

Manage Stages
Private Preview

A stage is an intermediate storage spot where you can hold your data for further processing, such as data merging, migration, or importing. This page explains what a stage is on Zilliz Cloud and how you can use it when managing your data there.

📘Note

To import, merge, or migrate data from a stage to a cluster, ensure that the stage and the cluster are within the same cloud region.

Overview

When using a Zilliz Cloud stage, you upload data from an applicable external source, such as local files and third-party object storage to create files in the stage for further processing. The following diagram shows the major application scenarios of Zilliz Cloud stages.

UZ2YwYMuHhDkk4bEoOHctyWFnrO

You can use stages in data import, data migration, and data merging, all of which need to fetch data from external sources but use the fetched data in different ways.

  • Data import

    During data import, you can upload prepared datasets into a stage and import them from the stage into a Zilliz Cloud collection. For details, refer to Import Data (RESTful API) and Import Data (SDK).

  • Data merging

    You can merge data from an existing Zilliz Cloud collection and that from a local file uploaded to a stage to create a collection that combines the data from both sources. For details, refer to Merge Data.

  • Data migration

    In data migration, you upload backup files of your Milvus instance into a stage and use the staged data to restore the Milvus instance as a Zilliz Cloud cluster. For details, refer to Migrate from Milvus to Zilliz Cloud Via Stage.

Create, list, and delete stages

You can manage the lifecycle of a stage by creating a stage, listing all available stages, and deleting a stage that you do not need, according to your service requirements.

Initiate a stage manager

A stage manager maintains the connection to Zilliz Cloud's Stage service. You need to initiate a stage manager before managing stages.

from pymilvus.bulk_writer.stage_manager import StageManager

stage_manager = StageManager(
cloud_endpoint="https://api.cloud.zilliz.com",
api_key="YOUR_API_KEY"
)

Create a stage

A stage is specific to a Zilliz Cloud project. When creating a stage, you need to provide the project ID, region ID, and the name of the stage, as follows:

stage_manager.create_stage(
project_id="proj-xxxxxxxxxxxxxxxxxxxxxxx",
region_id="aws-us-west-1",
stage_name="my_stage"
)

print(f"\nStage my_stage created")

# Stage my_stage created

List stages

You can check the stages already created within a specific Zilliz Cloud project as follows:

stage_list = stage_manager.list_stages(
project_id="proj-xxxxxxxxxxxxxxxxxxxxxxx",
current_page=1,
page_size=10
)

print(f"\nlistStages results: \n", stage_list.json()['data'])

# listStages results:
#
# {
# "count": 1,
# "currentPage": 1,
# "pageSize": 10,
# "stages": [
# {
# "stageName": "my_stage"
# }
# ]
# }

Delete a stage

You can delete a stage once it is no longer needed. To delete a stage, do as follows:

stage_manager.delete_stage(
stage_name="my_stage"
)

print(f"\nStage my_stage deleted")

# Stage my_stage deleted

Upload data into a stage

Once a stage is ready, upload your data onto the stage.

Initiate a stage file manager

A stage file manager maintains the connection to a specific stage on Zilliz Cloud's Stage service. You need to initiate a stage file manager before uploading files to the stage.

from pymilvus.bulk_writer.stage_file_manager import StageFileManager

stage_file_manager = StageFileManager(
cloud_endpoint='https://api.cloud.zilliz.com',
api_key='YOUR_API_KEY',
stage_name='my_stage',
)

Upload files

Once the stage file manager is ready, use it to upload files to the specified stage. The following example uploads the local file at the source file path to the target file path within the stage.

result = stage_file_manager.upload_file_to_stage(
source_file_path="/path/to/your/local/data/file",
target_stage_path="data/"
)

print(f"\nuploadFileToStage results: {result}")

# uploadFileToStage results:
#
# {
# "stageName": "my_stage",
# "path": "data/"
# }