Skip to content

vdb_benchmark : indexing parameter values from config files are not getting used #463

@ram-sangle

Description

@ram-sangle

Issue 1: in default.yaml and 10m.yaml, for DiskANN as index type, indexing parameters max_degree and search_list_size are expected instead of M and ef_construction.

Issue 2: yaml config file parser having issue in parsing second level nested parameters under index_params:
Hence even after changing default config file as below, datagen command still uses default values from load_db.py.

+++ b/configs/vectordbbench/default.yaml
@@ -16,11 +16,11 @@ dataset:
   vector_dtype: FLOAT_VECTOR

 index:
-  index_type: DISKANN
+  index_type: HNSW
   metric_type: COSINE
   index_params:
-    M: 64
-    ef_construction: 200
+    M: 32
+    ef_construction: 100

Datagen command output after above changes:

blr-host01:~/psangle/storage (main)$ ./mlpstorage vectordb datagen --host 127.0.0.1 --port 19530 --config default --force --results-dir /tmp/vdb_results
warning: The `UV_NATIVE_TLS` environment variable is deprecated and will be removed in a future release. Use `UV_SYSTEM_CERTS` instead.
⠋ Validating environment... 0:00:002026-06-17 12:14:55|INFO: Environment validation passed
2026-06-17 12:14:55|STATUS: Benchmark results directory: /tmp/vdb_results/vector_database/datagen/20260617_121455
2026-06-17 12:14:56|INFO: Created benchmark run: vector_database_datagen_20260617_121455
2026-06-17 12:14:56|STATUS: Verifying benchmark run for vector_database_datagen_20260617_121455
2026-06-17 12:14:56|STATUS: Open: [OPEN] VectorDB benchmark is in preview status - not accepted for closed submissions (Parameter: benchmark_status)
2026-06-17 12:14:56|STATUS: Benchmark run qualifies for OPEN category ([RunID(program='vector_database', command='datagen', model=None, run_datetime='20260617_121455')])
2026-06-17 12:14:56|WARNING: Running the benchmark without verification for open or closed configurations. These results are not valid for submission. Use --open or --closed to specify a configuration.
2026-06-17 12:14:56|STATUS: Instantiated the VectorDB Benchmark...
warning: The `UV_NATIVE_TLS` environment variable is deprecated and will be removed in a future release. Use `UV_SYSTEM_CERTS` instead.
2026-06-17 12:14:56,365 - INFO - Connected to Milvus server at 127.0.0.1:19530
2026-06-17 12:14:56,365 - WARNING - FLOAT16 data type not available in this version of pymilvus. Using FLOAT_VECTOR instead.
2026-06-17 12:14:56,375 - INFO - Dropped existing collection: mlps_1m_1shards_1536dim_uniform
2026-06-17 12:14:56,385 - INFO - Created collection 'mlps_1m_1shards_1536dim_uniform' with 1536 dimensions and 1 shards
2026-06-17 12:14:56,385 - INFO - Creating index with parameters: **{'index_type': 'HNSW', 'metric_type': 'COSINE', 'params': {'M': 16, 'efConstruction': 200}}**

Refer last line of above log snippet, index type is reflecting updated value as HNSW but indexing parameters M and ef_construction are NOT changed as per config.yaml changes. With other combinations of index type and parameter values as well issue persists.
Please cross check in your setup and align indexing parameter use as per config files under vdb_benchmark/vdbbench/configs/.

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions