Issue 1: in default.yaml and 10m.yaml, for DiskANN as index type, indexing parameters max_degree and search_list_size are expected instead of M and ef_construction.
Issue 2: yaml config file parser having issue in parsing second level nested parameters under index_params:
Hence even after changing default config file as below, datagen command still uses default values from load_db.py.
+++ b/configs/vectordbbench/default.yaml
@@ -16,11 +16,11 @@ dataset:
vector_dtype: FLOAT_VECTOR
index:
- index_type: DISKANN
+ index_type: HNSW
metric_type: COSINE
index_params:
- M: 64
- ef_construction: 200
+ M: 32
+ ef_construction: 100
Datagen command output after above changes:
blr-host01:~/psangle/storage (main)$ ./mlpstorage vectordb datagen --host 127.0.0.1 --port 19530 --config default --force --results-dir /tmp/vdb_results
warning: The `UV_NATIVE_TLS` environment variable is deprecated and will be removed in a future release. Use `UV_SYSTEM_CERTS` instead.
⠋ Validating environment... 0:00:002026-06-17 12:14:55|INFO: Environment validation passed
2026-06-17 12:14:55|STATUS: Benchmark results directory: /tmp/vdb_results/vector_database/datagen/20260617_121455
2026-06-17 12:14:56|INFO: Created benchmark run: vector_database_datagen_20260617_121455
2026-06-17 12:14:56|STATUS: Verifying benchmark run for vector_database_datagen_20260617_121455
2026-06-17 12:14:56|STATUS: Open: [OPEN] VectorDB benchmark is in preview status - not accepted for closed submissions (Parameter: benchmark_status)
2026-06-17 12:14:56|STATUS: Benchmark run qualifies for OPEN category ([RunID(program='vector_database', command='datagen', model=None, run_datetime='20260617_121455')])
2026-06-17 12:14:56|WARNING: Running the benchmark without verification for open or closed configurations. These results are not valid for submission. Use --open or --closed to specify a configuration.
2026-06-17 12:14:56|STATUS: Instantiated the VectorDB Benchmark...
warning: The `UV_NATIVE_TLS` environment variable is deprecated and will be removed in a future release. Use `UV_SYSTEM_CERTS` instead.
2026-06-17 12:14:56,365 - INFO - Connected to Milvus server at 127.0.0.1:19530
2026-06-17 12:14:56,365 - WARNING - FLOAT16 data type not available in this version of pymilvus. Using FLOAT_VECTOR instead.
2026-06-17 12:14:56,375 - INFO - Dropped existing collection: mlps_1m_1shards_1536dim_uniform
2026-06-17 12:14:56,385 - INFO - Created collection 'mlps_1m_1shards_1536dim_uniform' with 1536 dimensions and 1 shards
2026-06-17 12:14:56,385 - INFO - Creating index with parameters: **{'index_type': 'HNSW', 'metric_type': 'COSINE', 'params': {'M': 16, 'efConstruction': 200}}**
Refer last line of above log snippet, index type is reflecting updated value as HNSW but indexing parameters M and ef_construction are NOT changed as per config.yaml changes. With other combinations of index type and parameter values as well issue persists.
Please cross check in your setup and align indexing parameter use as per config files under vdb_benchmark/vdbbench/configs/.
Issue 1: in default.yaml and 10m.yaml, for DiskANN as index type, indexing parameters
max_degreeandsearch_list_sizeare expected instead of M and ef_construction.Issue 2: yaml config file parser having issue in parsing second level nested parameters under index_params:
Hence even after changing default config file as below, datagen command still uses default values from load_db.py.
Datagen command output after above changes:
Refer last line of above log snippet, index type is reflecting updated value as
HNSWbut indexing parametersMandef_constructionare NOT changed as per config.yaml changes. With other combinations of index type and parameter values as well issue persists.Please cross check in your setup and align indexing parameter use as per config files under
vdb_benchmark/vdbbench/configs/.