feat: add async evaluation and BM25 baseline to eval_beir.py by vaishnavidesai09 · Pull Request #10 · vignesh2027/VORTEXRAG

vaishnavidesai09 · 2026-06-24T14:16:06Z

Closes #3

Extends benchmarks/eval_beir.py as suggested in #3.

What's added

--async-eval flag: evaluates multiple datasets concurrently via asyncio + ThreadPoolExecutor instead of sequential execution
--bm25 flag: runs BM25 alongside VORTEXRAG and adds side-by-side comparison columns to the output table (requires pip install rank-bm25)
--max-workers parameter: controls thread-pool size for async mode (default: 4)
print_table() extended to display VORTEXRAG vs BM25 metrics side-by-side
save_csv() updated to handle variable columns gracefully when BM25 comparison is enabled or disabled

Validation

Executed:

python benchmarks/eval_beir.py --datasets scifact --async-eval --bm25

Results:

FINAL RESULTS
----------------------------------------------------------------------------------------------------------------------
Dataset                Domain          VRTX NDCG@10  VRTX R@100  VRTX MAP    ms/q   BM25 NDCG@10  BM25 R@100  BM25 MAP
----------------------------------------------------------------------------------------------------------------------
scifact                biomedical            0.3725      0.6330    0.2981   180.0         0.2569      0.6609    0.4496
----------------------------------------------------------------------------------------------------------------------
AVERAGE                                      0.3725      0.6330    0.2981                 0.2569      0.6609    0.4496
----------------------------------------------------------------------------------------------------------------------

The benchmark completed successfully in async mode and produced side-by-side VORTEXRAG vs BM25 metrics as expected.

Example usage

Async + BM25 comparison

python benchmarks/eval_beir.py --async-eval --bm25

Specific datasets with CSV export

python benchmarks/eval_beir.py --datasets nq scifact --bm25 --output results/beir.csv

- --async-eval flag: evaluates datasets concurrently via asyncio + ThreadPoolExecutor - --bm25 flag: adds BM25 baseline columns to output table (requires rank-bm25) - --max-workers param: controls thread-pool size for async mode - print_table() extended to show side-by-side VORTEXRAG vs BM25 columns - save_csv() handles variable columns gracefully Closes vignesh2027#3 Signed-off-by: Vaishnavi Desai <vaishnavidesai957@gmail.com>

vaishnavidesai09 · 2026-06-24T14:17:26Z

Hi @vignesh2027 ! I've submitted the PR for #3.

Implemented async dataset evaluation (--async-eval), BM25 comparison support (--bm25), configurable worker count (--max-workers), updated table rendering, and CSV export handling.

Validated with:

python benchmarks/eval_beir.py --datasets scifact --async-eval --bm25

The output shows the expected side-by-side VORTEXRAG vs BM25 metrics. Looking forward to your feedback!

vaishnavidesai09 requested a review from vignesh2027 as a code owner June 24, 2026 14:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add async evaluation and BM25 baseline to eval_beir.py#10

feat: add async evaluation and BM25 baseline to eval_beir.py#10
vaishnavidesai09 wants to merge 1 commit into
vignesh2027:mainfrom
vaishnavidesai09:feat/beir-async-bm25

vaishnavidesai09 commented Jun 24, 2026

Uh oh!

vaishnavidesai09 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

vaishnavidesai09 commented Jun 24, 2026

What's added

Validation

Example usage

Async + BM25 comparison

Specific datasets with CSV export

Uh oh!

vaishnavidesai09 commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant