Skip to content

New --all-reruns-need-to-pass argument#311

Open
msaelices wants to merge 3 commits into
pytest-dev:masterfrom
smithai:all-reruns-need-to-pass-argument
Open

New --all-reruns-need-to-pass argument#311
msaelices wants to merge 3 commits into
pytest-dev:masterfrom
smithai:all-reruns-need-to-pass-argument

Conversation

@msaelices
Copy link
Copy Markdown

This addresses your requirement to verify that non-deterministic tests (that work ~90% of the time) pass consistently by requiring all reruns to pass after an initial failure.

This addresses your requirement to verify that non-deterministic tests (that work ~90% of the time) pass consistently by requiring all reruns to pass after an initial failure.

Signed-off-by: Manuel Saelices <msaelices@gmail.com>
@icemac
Copy link
Copy Markdown
Contributor

icemac commented Oct 9, 2025

Thank you for your PR. Unfortunately I do not understand which problem it solves could you please explain your use case with an example?

Additionally the merge conflict and lint errors could be solved.

@icemac
Copy link
Copy Markdown
Contributor

icemac commented May 22, 2026

@msaelices Are you still interested in this PR, otherwise I'll close it as abandoned.

@msaelices
Copy link
Copy Markdown
Author

msaelices commented May 22, 2026

@msaelices Are you still interested in this PR, otherwise I'll close it as abandoned.

We are using it to run evals that are not deterministic, like LLM based

Scenario: Let's say you have 100 evals with, e.g. 5% of failure each one of them, which is ok for you.

Even with a seemingly "good" 5% per-eval failure rate, running 100 evals makes it very likely, with a ~99.4% chance, that at least one fails.

How do you differentiate between a harmless flaky failure and an actual regression, where the failure rate jumps from something acceptable like 5% to, for example, 50%?

We use --reruns together with --all-rerun-need-to-pass to handle this.

For example, with --reruns 20 and --all-rerun-need-to-pass, we can ensure that the observed failure rate for that test remains around ~5%, instead of allowing a much higher regression to slip through unnoticed.

msaelices added 2 commits May 22, 2026 12:04
…o-pass-argument

# Conflicts:
#	src/pytest_rerunfailures.py
#	tests/test_pytest_rerunfailures.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants