Skip to main content
Version: User Guides (Cloud)

Cluster
Public Preview

A cluster is a set of compute resources that runs your vector database workloads. Zilliz Cloud offers two types: serving clusters, which run continuously for production workloads requiring always-on, low-latency access, and on-demand clusters, which spin up only when requests arrive and scale to zero when idle.

This topic describes how to create an on-demand cluster.

📘Note

This feature is only available to Enterprise projects.

Currently, you can only create an on-demand cluster in AWS us-west-2. For other regions, contact us.

Limitations

  • To manage an on-demand cluster, you need to be a Project Admin.

  • You can only create up to 20 on-demand clusters in each project.

  • An on-demand cluster can query up to 3 TB of raw data for every 8 CUs. Queries that exceed this limit will return an error.

Create an on-demand cluster

  • Via RESTful API

    export BASE_URL="https://api.cloud.zilliz.com"
    export TOKEN="YOUR_API_KEY"

    curl --request POST \
    --url "${BASE_URL}/v2/clusters/createOnDemandCluster" \
    --header "Authorization: Bearer ${TOKEN}" \
    --header "Accept: application/json" \
    --header "Content-Type: application/json" \
    --data-raw '{
    "projectId": "proj-09ee1f4b1151d5dd1edbc5",
    "regionId": "aws-us-west-2",
    "clusterName": "my-on-demand",
    "cuSize": 8,
    "autoSuspend": 120
    }'

    # {
    # "code": 0,
    # "data": {
    # "clusterId": "inxx-xxxxxxxxxxxxxxx",
    # "regionId": "aws-us-west-2",
    # "projectId": "proj-09ee1f4b1151d5dd1edbc5"
    # }
    # }

    The following table describes the parameters.

    Parameter

    Description

    projectId

    ID of the project where the on-demand cluster will be created.

    regionId

    Region where the cluster is deployed. Must match the project’s region.

    cuSize

    The number of query CUs to allocate. The cluster automatically scales between zero and this value based on workload — it spins up to the specified CU size when requests arrive and scales back to zero when idle.

    The minimum is 8 CU, the maximum is 256 CU, and sizes increase in increments of 8 (for example, 8, 16, and 24). Clusters with more than 8 CU require a payment method.

    Setting this to 8 enables searches across data up to 3 TB. To increase the data volume, increase the CU size.

    This value is fixed after creation and cannot be changed.

    clusterName

    Name of the cluster to create.

    autoSuspend

    Idle timeout before the cluster auto-suspends. When no requests are received within this period, the cluster suspends to stop incurring compute costs.

    • Value type: Integer

    • Unit: Seconds

    • Minimum: 60

    • Default: 60

  • Via web console

    The following demo shows how to create an on-demand cluster on the web console.

    1

    Click on On-Demand Compute > Clusters.

    2

    Click on + Cluster.

    3

    Configure cluster settings.

    The following table explains the parameters.

    Parameter

    Description

    Cluster Name

    The name of the cluster to create.

    Query CU

    The number of query CUs to allocate. The cluster automatically scales between zero and this value based on workload — it spins up to the specified CU size when requests arrive and scales back to zero when idle.

    The minimum is 8 CU, the maximum is 256 CU, and sizes increase in increments of 8 (for example, 8, 16, and 24). Clusters with more than 8 CU require a payment method.

    This value is fixed after creation and cannot be changed.

    Auto suspend

    The idle time (in seconds) before the cluster auto-suspends. Default is 1 minute. When no requests are received within this period, the cluster suspends to stop incurring compute costs.

    4

    Click on Create.

View all on-demand clusters

  • Via RESTful API

    You can list all on-demand clusters as follows:

    export BASE_URL="https://api.cloud.zilliz.com"
    export TOKEN="YOUR_API_KEY"

    curl --request GET \
    --url "{BASE_URL}/v2/clusters/onDemandClusters?projectId={PROJECT_ID}&regionId=aws-us-west-2" \
    --header "Authorization: Bearer ${TOKEN}" \
    --header "Accept: application/json"

    The following is an example output.

    {
    "code": 0,
    "data": {
    "count": 2,
    "onDemandClusters": [
    {
    "clusterId": "inxx-xxxxxxxxxxxxxxx",
    "clusterName": "xxx",
    "regionId": "aws-us-west-2",
    "cuSize": 8,
    "status": "SUSPENDED",
    "endpoint": "https://proj-09ee1f4b1151d5dd1edbc5.aws-us-west-2.vectordb-uat3.zillizcloud.com",
    "privateLink": "",
    "createdBy": "john.doe@zilliz.com",
    "createTime": 1745396115000
    }
    ]
    }
    }
  • Via web console

    WPOBwHulYhQPRIbgpjJcrAfXnVc

Check the details of an on-demand cluster

  • Via RESTful API

    You can describe an on-demand cluster as follows:

    export BASE_URL="https://api.cloud.zilliz.com"
    export TOKEN="YOUR_API_KEY"

    curl --request GET \
    --url "${BASE_URL}/v2/clusters/onDemandClusters/inxx-xxxxxxxxxxxxxxx" \
    --header "Authorization: Bearer ${TOKEN}" \
    --header "Accept: application/json"

    The following is an example output.

    {
    "code": 0,
    "data": {
    "clusterId": "inxx-xxxxxxxxxxxxxxx",
    "clusterName": "xxx",
    "regionId": "aws-us-west-2",
    "cuSize": 8,
    "status": "RUNNING",
    "endpoint": "https://proj-09ee1f4b1151d5dd1edbc5.aws-us-west-2.vectordb-uat3.zillizcloud.com",
    "privateLink": "",
    "createdBy": "john.doe@zilliz.com",
    "createTime": 1745396115000
    }
    }
  • Via web console

    NDpWwXSknh7FMibTGjNcwg8Vnjf

Drop an on-demand cluster

🚧Warning

Once you drop a cluster, it is removed immediately and cannot be recovered. This action cannot be undone.

  • Via RESTful API

    You can drop an on-demand cluster as follows:

    export BASE_URL="https://api.cloud.zilliz.com"
    export TOKEN="YOUR_API_KEY"

    curl --request DELETE \
    --url "${BASE_URL}/v2/clusters/onDemandClusters/inxx-xxxxxxxxxxxxxxx" \
    --header "Authorization: Bearer ${TOKEN}" \
    --header "Accept: application/json"

    The following is an example output.

    {
    "code": 0,
    "data": {
    "clusterId": "inxx-xxxxxxxxxxxxxxx",
    "status": "DELETING"
    }
    }
  • Via web console

    Vu38wTpLDhmRqYbmYFVcbjK5nVx