Skip to main content
Version: User Guides (Cloud)

Manage Snapshots
Private Preview

In this guide, you will learn how to create and manage snapshots, including

Create snapshot

Before creating a snapshot, you are advised to stop writing data to the target collection and call flush() to avoid possible data loss.

📘Notes

Calling flush() is not mandatory but highly recommended to avoid data loss. If you skip this, the snapshot contains only the data that has already been flushed.

When naming a snapshot, use clear, descriptive names, such as "daily_backup_20240101" or "v2.1_production_release" and avoid generic terms, such as "backup1" and "test". Use snapshot names wisely to distinguish snapshots across versions, environments, and stages.

The code examples below assume that you already have a collection named my_collection.

from pymilvus import MilvusClient

client = MilvusClient(
uri="YOUR_CLUSTER_ENDPOINT",
token="YOUR_CLUSTER_TOKEN"
)

# Recommended: Flush data before creating snapshot to ensure all data is included
client.flush(collection_name="my_collection")

# Create snapshot for entire collection
client.create_snapshot(
collection_name="my_collection",
snapshot_name="backup_20240101",
description="Daily backup for January 1st, 2024"
)

List snapshots

You can list the names of existing snapshots.

# List all snapshots for a collection
snapshots = client.list_snapshots(
collection_name="my_collection"
)

Describe snapshot

You can get the detailed information about a specific snapshot.

snapshot_info = client.describe_snapshot(
snapshot_name="backup_20240101",
include_collection_info=True
)

print(f"Snapshot ID: {snapshot_info.id}")
print(f"Collection: {snapshot_info.collection_name}")
print(f"Created: {snapshot_info.create_ts}")
print(f"Description: {snapshot_info.description}")

Pin/unpin snapshot data

During restoration, you can pin a snapshot to temporarily protect its underlying data from garbage collection, and unpin it to release the data.

You can also set a time-to-live (TTL) duration for the pin operation so that the pinned data will be released when the duration expires.

pin_id = client.pin_snapshot_data(
snapshot_name="backup_20240101",
collection_name="my_collection",
ttl_seconds=3600,
)

client.unpin_snapshot_data(
pin_id=pin_id
)

Restore snapshot

You can restore a snapshot to a new collection. This operation is asynchronous and returns a job ID for tracking the restoration progress.

The restoration uses a copy-segment mechanism instead of data import, which is more efficient because it

  • directly copies segment files (binlogs, deltalogs, index files) from snapshot storage

  • preserves field IDs and index IDs to ensure compatibility with existing data files

  • avoids data rewriting and index rebuilding, resulting in significantly faster restore times, and

  • ensures a 10- to 100-fold performance increase compared with traditional backup and restore methods

To restore a snapshot, do as follows:

# Restore snapshot to new collection
job_id = client.restore_snapshot(
snapshot_name="backup_20240101",
collection_name="restored_collection",
)

For details on monitoring the progress of a restoration job, refer to Monitor restoration progress.

Drop snapshot

You can drop a snapshot if it is no longer needed. You are advised to remove old snapshots regularly to save storage.

client.drop_snapshot(
snapshot_name="backup_20240101"
)

List restoration jobs

You can use this API to get a list of snapshots already created for the target collection.

# List all restore jobs
jobs = client.list_restore_snapshot_jobs()

for job in jobs:
print(f"Job {job.job_id}: {job.snapshot_name} -> Collection {job.collection_id}")
print(f" State: {job.state}, Progress: {job.progress}%")

# List restore jobs for a specific collection
jobs = client.list_restore_snapshot_jobs(collection_name="my_collection")

Get restoration state

Once you have a restoration job ID, you can use it to retrieve restoration progress.

state = client.get_restore_snapshot_state(job_id=12345)

print(f"Job ID: {state.job_id}")
print(f"Snapshot Name: {state.snapshot_name}")
print(f"Collection ID: {state.collection_id}")
print(f"State: {state.state}")
print(f"Progress: {state.progress}%")
if state.state == "RestoreSnapshotFailed":
print(f"Failure Reason: {state.reason}")
print(f"Time Cost: {state.time_cost}ms")