Skip to main content
Version: User Guides (Cloud)

Performance Benchmarking with VectorDBBench

VectorDBBench is an open-source benchmarking tool designed specifically for vector databases.

This topic introduces how to use VectorDBBench to reproduce the performance test results of Zilliz Cloud.

Overview

VectorDBBench is not just an offering of benchmark results for mainstream vector databases and cloud services, it is also a tool for the ultimate performance and cost-effectiveness comparison.

VectorDBBench provides an intuitive visual interface. This not only empowers users to initiate benchmarks at ease, but also to view comparative result reports, thereby reproducing benchmark results effortlessly.

Closely mimicking real-world production environments, VectorDBBench has set up diverse testing scenarios including insertion, searching, and filtered searching. To provide you with credible and reliable data, VectorDBBench has also included public datasets from actual production scenarios, such as SIFT, GIST, Cohere, and a dataset generated by OpenAI from an open-source raw dataset.

Benchmark metrics

MetricDescriptionTest Scenario
MaxloadcountThe capacity of a vector database. VectorDBBench will keep inserting vector data into the vector database until the database fails or reject the insertion request over 10 times and keep a record of the maximum number of inserted entities.
Higher Maxloadcount values indicate better vector database performance.
Insertion
QPSThe capability of a vector database to handle concurrent queries per second. VectorDBBench uses top-100 searches in multiple times and selects the highest QPS value as the final result.
Higher QPS values indicate better vector database performance.
Search & filtered search
RecallThe measure of search accuracy by comparing search results with ground truth.
Higher recall values indicate better vector database performance.
Search & filtered search
Load_durationThe time it takes for Zilliz Cloud to complete the process of inserting entities and building indexes.
Lower Load_duration values indicate better vector database performance.
Search & filtered search
Seriallatancyp99
The time that 99% of queries take to complete. VectorDBBench records the search latency of each top-100 searches and uses the 99th percentile average as the final result.
Lower Seriallatancyp99 values indicate better vector database performance.
Search & filtered search

Prerequisites

Procedures

Set up testing environment

Install and start VectorDBBench

# Install VectorDBBench
$ pip install vectordb-bench

# Start VectorDBBench
$ init_bench

Below is an example output. You will obtain a local URL in the output. Use it to open the web user interface of VectorDBBench.


👋 Welcome to Streamlit!

If you’d like to receive helpful onboarding emails, news, offers, promotions,
and the occasional swag, please enter your email address below. Otherwise,
leave this field blank.

Email:
You can find our privacy policy at https://streamlit.io/privacy-policy

Summary:
- This open source library collects usage statistics.
- We cannot see and do not store information contained inside Streamlit apps,
such as text, charts, images, etc.
- Telemetry data is stored in servers in the United States.
- If you'd like to opt out, add the following to ~/.streamlit/config.toml,
creating that file if necessary:

[browser]
gatherUsageStats = false

You can now view your Streamlit app in your browser.

Local URL: http://localhost:8501
Network URL: http://172.16.20.46:8501

On the homepage, you can see some pre-defined testing datasets provided by VectorDBBench and use them for a quick performance benchmarking.

Scroll down the webpage to the bottom and click Run Your Test > to configure your own benchmarking test.

AATGbLxqwo32yexKYzPcdYVTnph

Configure your benchmarking test

View benchmarking results

Click Results to view and analyze benchmarking results. Below are some example results.

LWa7bJGzOo9qKJx0ZNicjLXjnJh

DJBibk5puoOLxYxxnH3chlxcnAd

Optionally, you can set up the DB Filter and Case Filter in the left navigation pane to compare the benchmarking results of pre-defined vector databases and cases.

📘Notes

The databases are named in the format of [databasename]-[dblabel].