How do we compare different runs with multiple folds per run?
For instance, assume we have 10-folds for run_1, ... run_5?
from ranx import compare
# Compare different runs and perform Two-sided Paired Student's t-Test
report = compare(
qrels=qrels,
runs=[run_1, run_2, run_3, run_4, run_5],
metrics=["map@100", "mrr@100", "ndcg@10"],
max_p=0.01 # P-value threshold
)
How do we compare different runs with multiple folds per run?
For instance, assume we have 10-folds for
run_1, ...run_5?