Version: User Guides (Cloud)

Performance Benchmarking with VectorDBBench

VectorDBBench is an open-source benchmarking tool designed specifically for vector databases.

This topic introduces how to use VectorDBBench to reproduce the performance test results of Zilliz Cloud.

Overview

VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it is also a tool for the ultimate performance and cost-effectiveness comparison.

VectorDBBench provides an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.

Closely mimicking real-world production environments, VectorDBBench has set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, VectorDBBench has also included public datasets from actual production scenarios, such as SIFT, GIST, Cohere, and a dataset generated by OpenAI from an open-source raw dataset.

Benchmark metrics

Metric	Description	Test Scenario
Max_load_count	The capacity of a vector database. VectorDBBench will keep inserting vector data into the vector database until the database fails or reject the insertion request over 10 times and keep a record of the maximum number of inserted entities. Higher Max_load_count values indicate better vector database performance.	Insertion
QPS	The capability of a vector database to handle concurrent queries per second. VectorDBBench uses top-100 searches in multiple times and selects the highest QPS value as the final result. Higher QPS values indicate better vector database performance.	Search & filtered search
Recall	The measure of search accuracy by comparing search results with ground truth. Higher recall values indicate better vector database performance.	Search & filtered search
Load_duration	The time it takes for Zilliz Cloud to complete the process of inserting entities and building indexes. Lower Load_duration values indicate better vector database performance.	Search & filtered search
Serial_latancy_p99	The time that 99% of queries take to complete. VectorDBBench records the search latency of each top-100 searches and uses the 99th percentile average as the final result. Lower Serial_latancy_p99 values indicate better vector database performance.	Search & filtered search

Prerequisites

You need to have a registered Zilliz Cloud account.
Create at least one cluster. Zilliz Cloud provides free clusters for you to quickly get onboard and start exploring Zilliz Cloud vector database.
You need to have Python 3.11 or later installed.

Procedures

Set up testing environment

Provision a machine.

To test the ultimate performance of Zilliz Cloud, we recommend provisioning a client machines with more than 8 vCPUs to ensure multiple threads.
Configure network.

Network communication will influence the test results, especially in the query testing scenario. To reduce the impact of network latency, we recommend:
- Deploying the client in the same cloud provider and region as your Zilliz Cloud cluster.
- Configure your client so that it shares the same VPC with your Zilliz Cloud cluster. Compared to public Internet, VPC can have lower latency. Learn more at Set up a Private Endpoint.

Install and start VectorDBBench

# Install VectorDBBench
$ pip install vectordb-bench

# Start VectorDBBench
$ init_bench

Below is an example output. You will obtain a local URL in the output. Use it to open the web user interface of VectorDBBench.

      👋 Welcome to Streamlit!

      If you’d like to receive helpful onboarding emails, news, offers, promotions,
      and the occasional swag, please enter your email address below. Otherwise,
      leave this field blank.

      Email:  
  You can find our privacy policy at https://streamlit.io/privacy-policy

  Summary:
  - This open source library collects usage statistics.
  - We cannot see and do not store information contained inside Streamlit apps,
    such as text, charts, images, etc.
  - Telemetry data is stored in servers in the United States.
  - If you'd like to opt out, add the following to ~/.streamlit/config.toml,
    creating that file if necessary:

    [browser]
    gatherUsageStats = false

  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://172.16.20.46:8501

On the homepage, you can see some pre-defined testing datasets provided by VectorDBBench and use them for a quick performance benchmarking.

Scroll down the webpage to the bottom and click Run Your Test > to configure your own benchmarking test.

AATGbLxqwo32yexKYzPcdYVTnph

Configure your benchmarking test

View benchmarking results

Click Results to view and analyze benchmarking results. Below are some example results.

LWa7bJGzOo9qKJx0ZNicjLXjnJh

DJBibk5puoOLxYxxnH3chlxcnAd

Optionally, you can set up the DB Filter and Case Filter in the left navigation pane to compare the benchmarking results of pre-defined vector databases and cases.

📘Notes

The databases are named in the format of [databasename]-[dblabel].

Overview​

Benchmark metrics​

Prerequisites​

Procedures​

Set up testing environment​

Install and start VectorDBBench​

Configure your benchmarking test​

View benchmarking results​