Skip to content

feat: add async evaluation and BM25 baseline to eval_beir.py#10

Open
vaishnavidesai09 wants to merge 1 commit into
vignesh2027:mainfrom
vaishnavidesai09:feat/beir-async-bm25
Open

feat: add async evaluation and BM25 baseline to eval_beir.py#10
vaishnavidesai09 wants to merge 1 commit into
vignesh2027:mainfrom
vaishnavidesai09:feat/beir-async-bm25

Conversation

@vaishnavidesai09

Copy link
Copy Markdown
Collaborator

Closes #3

Extends benchmarks/eval_beir.py as suggested in #3.

What's added

  • --async-eval flag: evaluates multiple datasets concurrently via asyncio + ThreadPoolExecutor instead of sequential execution
  • --bm25 flag: runs BM25 alongside VORTEXRAG and adds side-by-side comparison columns to the output table (requires pip install rank-bm25)
  • --max-workers parameter: controls thread-pool size for async mode (default: 4)
  • print_table() extended to display VORTEXRAG vs BM25 metrics side-by-side
  • save_csv() updated to handle variable columns gracefully when BM25 comparison is enabled or disabled

Validation

Executed:

python benchmarks/eval_beir.py --datasets scifact --async-eval --bm25

Results:

FINAL RESULTS
----------------------------------------------------------------------------------------------------------------------
Dataset                Domain          VRTX NDCG@10  VRTX R@100  VRTX MAP    ms/q   BM25 NDCG@10  BM25 R@100  BM25 MAP
----------------------------------------------------------------------------------------------------------------------
scifact                biomedical            0.3725      0.6330    0.2981   180.0         0.2569      0.6609    0.4496
----------------------------------------------------------------------------------------------------------------------
AVERAGE                                      0.3725      0.6330    0.2981                 0.2569      0.6609    0.4496
----------------------------------------------------------------------------------------------------------------------

The benchmark completed successfully in async mode and produced side-by-side VORTEXRAG vs BM25 metrics as expected.

Example usage

Async + BM25 comparison

python benchmarks/eval_beir.py --async-eval --bm25

Specific datasets with CSV export

python benchmarks/eval_beir.py --datasets nq scifact --bm25 --output results/beir.csv

- --async-eval flag: evaluates datasets concurrently via asyncio + ThreadPoolExecutor
- --bm25 flag: adds BM25 baseline columns to output table (requires rank-bm25)
- --max-workers param: controls thread-pool size for async mode
- print_table() extended to show side-by-side VORTEXRAG vs BM25 columns
- save_csv() handles variable columns gracefully
Closes vignesh2027#3

Signed-off-by: Vaishnavi Desai <vaishnavidesai957@gmail.com>
@vaishnavidesai09

Copy link
Copy Markdown
Collaborator Author

Hi @vignesh2027 ! I've submitted the PR for #3.

Implemented async dataset evaluation (--async-eval), BM25 comparison support (--bm25), configurable worker count (--max-workers), updated table rendering, and CSV export handling.

Validated with:

python benchmarks/eval_beir.py --datasets scifact --async-eval --bm25

The output shows the expected side-by-side VORTEXRAG vs BM25 metrics. Looking forward to your feedback!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add BEIR benchmark evaluation script

1 participant