Fisher's Randomization Test yields nondeterministic results

**Describe the bug**
When running Fisher's Randomization Test multiple times using the same data and same parameters, the significance results vary between runs. This happens even when the random seed is fixed and also when just a single thread is used.

**To Reproduce**
Run `ranx.compare(qrels, runs, metrics=["precision@1", "recall@20"], stat_test="fisher", max_p=0.05, n_permutations=1000, make_comparable=True, threads=1, random_seed=0)`.
`qrels` has a few thousand entries, each element in `runs` about a thousand entries.

When running `compare()` multiple times, the significance assessments are slightly different between runs. On my data, this can be observed after three to five `compare()` runs.

**Expected behavior**
I expect multiple `compare()` runs on the same data and with the same parameters to always show the exact same significance results.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fisher's Randomization Test yields nondeterministic results #70

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fisher's Randomization Test yields nondeterministic results #70

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions