Rebasebot merge onboard#95
Conversation
When a downstream two-parent merge commit contains tree changes that do
not exist in any replayed non-merge commit (e.g. manual conflict
resolutions), those changes were silently dropped by RebaseBot.
This commit implements the "later-run path" fix:
- Add `_find_merge_only_deltas(gitwd, source, dest)` which locates all
eligible downstream merge commits (two-parent, non-synthetic) in the
replay window and compares each commit's actual tree against the
`git merge-tree --write-tree parent1 parent2` auto-merge baseline.
Any difference is a merge-only delta.
- Add `_apply_merge_only_delta(gitwd, merge_commit, diff_output)` which
restores the affected paths to their exact state in the original merge
commit tree and records a synthetic commit:
UPSTREAM: <carry>: Recover merge-only delta from merge {sha[:12]}
- Call both functions at the end of `_do_rebase`, after all regular
cherry-picks, so the synthetic carry commits appear in topo/history
order. No-op deltas (content already present after replay) are
silently skipped.
- Add regression test `test_later_run_merge_only_delta_preserved` that
exercises the full dry-run rebase entrypoint:
1. First rebase runs; its PR is merged into dest.
2. A downstream feature branch is merged into dest with an extra file
staged only at merge time (merge-only content).
3. Source advances; the second (later-run) rebase must produce a
synthetic carry commit restoring that file.
Interleave recovered merge-only carries with downstream replay so they land in their historical position instead of being appended after later commits.
When no prior RebaseBot synthetic merge marker exists on the dest branch (i.e., the first automated rebase), _find_merge_only_deltas now scans for eligible downstream merge commits using the merge_base as the cutoff instead of returning an empty list. Legacy downstream histories that previously synced upstream through real merge commits (with merge-only content such as manual conflict resolutions) will now have that content preserved during the first automated rebase, using the same synthetic UPSTREAM: <carry> semantics established for later-run histories. Existing ordinary merged-branch replay behavior is unchanged: only merges with a non-empty merge-only delta get a recovery carry.
Preserve recovered carries through strict-tag replays, honor source-merge exclusions, and detect mode-only tree deltas so merge recovery stays safe and lossless.
…a recovery Add four focused regression tests that assert INFO-level detection, recovery, no-op skip, and excluded-skip log messages are emitted during merge-only delta handling. Also add a WARNING log in _apply_merge_only_delta before the RepoException raise so the manual-intervention path is explicitly visible in operator logs. TDD: all four tests were written without caplog.set_level / without the new WARNING log (confirmed RED), then caplog.set_level(logging.INFO) was added to the integration tests and logging.warning() was added to _apply_merge_only_delta to make them GREEN.
Treat only RebaseBot's own synthetic upstream merge as excludable and stop the rebase when merge-delta baselines cannot be computed, so real upstream-sync resolutions are preserved instead of being skipped silently.
Require the bot's configured git identity before treating a merge as RebaseBot's internal synthetic marker, and fail merge-only delta detection when merge-tree exits fatally. This preserves real upstream-sync merges that resolve to upstream and stops dry runs from silently continuing without a recoverable baseline.
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RadekManak The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
WalkthroughAdds merge-only delta recovery to rebase replay, with stricter synthetic merge detection, phased downstream commit handling, and expanded recovery, exclusion, logging, and failure coverage. ChangesMerge-only recovery flow
Estimated code review effort: 5 (Critical) | ~120 minutes Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning, 1 inconclusive)
✅ Passed checks (8 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tests/test_rebases.py (1)
917-1032: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winUse
_make_rebase_argsin these scenarios instead of re-declaring the args shape.These three tests duplicate the same CLI-args surface that Lines 39-73 already centralize. Any future arg/default change can drift these cases away from the rest of the merge-only suite.
♻️ Suggested cleanup
- def _make_args(src, rbs, dst, wdir): - a = MagicMock() - a.source = src - a.source_repo = None - a.dest = dst - a.rebase = rbs - a.working_dir = wdir - a.git_username = "test_rebasebot" - a.git_email = "test@rebasebot.ocp" - a.tag_policy = "soft" - a.bot_emails = [] - a.exclude_commits = [] - a.update_go_modules = False - a.conflict_policy = "auto" - a.ignore_manual_label = False - a.dry_run = True - a.always_run_hooks = False - a.title_prefix = "" - a.pre_rebase_hook = None - a.post_rebase_hook = None - a.pre_carry_commit_hook = None - a.pre_push_rebase_branch_hook = None - a.pre_create_pr_hook = None - a.source_ref_hook = None - return a - - args1 = _make_args(source, rebase, dest, first_run_dir) + args1 = _make_rebase_args(source, rebase, dest, first_run_dir)Apply the same replacement to the later duplicated builder and the hand-built
argsblock in the first-run test.Also applies to: 1033-1121, 1122-1226
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_rebases.py` around lines 917 - 1032, The test is re-declaring the CLI args shape instead of using the shared builder, which can drift from the rest of the suite. Replace the local `_make_args`/hand-built args setup in `test_later_run_merge_only_delta_preserved` with the existing `_make_rebase_args` helper used by the merge-only tests so all scenarios share the same defaults and fields. Keep the test logic unchanged; only centralize args construction through `_make_rebase_args`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rebasebot/bot.py`:
- Around line 669-672: The path restore logic in bot.py currently uses
gitwd.git.checkout in the rebasing flow, which can overwrite rebased upstream
edits without surfacing conflicts. Update the restore step around
added_or_modified to avoid blindly checking out from the old merge commit;
instead use a 3-way apply/merge strategy or add a divergence check against HEAD
before restoring. Keep the fix localized to the path-restoration block and
ensure any conflicting upstream changes are preserved or reported.
- Around line 682-687: The deleted-path cleanup in bot.py’s merge carry logic is
too permissive: gitwd.git.rm is being called without a pathspec separator, so
paths starting with dashes may be treated as options, and the broad
git.GitCommandError handler hides real rm failures. Update the deleted-path
removal loop to pass an explicit pathspec separator and an
ignore-missing/unmatch style option when removing paths, and only suppress the
expected “already absent” case while re-raising unexpected git rm errors so the
synthetic carry commit does not silently become partial.
---
Nitpick comments:
In `@tests/test_rebases.py`:
- Around line 917-1032: The test is re-declaring the CLI args shape instead of
using the shared builder, which can drift from the rest of the suite. Replace
the local `_make_args`/hand-built args setup in
`test_later_run_merge_only_delta_preserved` with the existing
`_make_rebase_args` helper used by the merge-only tests so all scenarios share
the same defaults and fields. Keep the test logic unchanged; only centralize
args construction through `_make_rebase_args`.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6081360c-6e12-409c-935a-9c69d7d09719
📒 Files selected for processing (2)
rebasebot/bot.pytests/test_rebases.py
Reject recovered merge-only carries once rebased paths have diverged past the original merge baseline, and surface deleted-path rm failures instead of silently partial-applying the carry. Reuse the shared rebase arg builder in the merge-only regression tests so these scenarios stay aligned with the dry-run defaults.
Bring in the latest bot-email and container updates from main while preserving the merge-only replay and partial-carry safety work on this branch.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rebasebot/bot.py`:
- Around line 186-196: Restrict _normalize_bot_email so it only rewrites GitHub
noreply addresses and leaves plus-addresses on other domains untouched. Update
the logic in _normalize_bot_email(email) to first verify the domain is a GitHub
domain before stripping the numeric prefix, and keep the current plus-address
handling only for those noreply patterns so real emails like
123+user@example.com remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 543b890a-8f0b-40fa-9f7a-08c8ba2e7614
📒 Files selected for processing (2)
rebasebot/bot.pytests/test_rebases.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_rebases.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@rebasebot/bot.py`:
- Around line 186-196: Restrict _normalize_bot_email so it only rewrites GitHub
noreply addresses and leaves plus-addresses on other domains untouched. Update
the logic in _normalize_bot_email(email) to first verify the domain is a GitHub
domain before stripping the numeric prefix, and keep the current plus-address
handling only for those noreply patterns so real emails like
123+user@example.com remain unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 543b890a-8f0b-40fa-9f7a-08c8ba2e7614
📒 Files selected for processing (2)
rebasebot/bot.pytests/test_rebases.py
🚧 Files skipped from review as they are similar to previous changes (1)
- tests/test_rebases.py
🛑 Comments failed to post (1)
rebasebot/bot.py (1)
186-196: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Restrict noreply normalization to GitHub domains.
The helper says it avoids collapsing real plus-addresses, but
123+user@example.comcurrently becomesuser@example.com. That can misclassify non-bot commits during bot squash matching.Proposed fix
def _normalize_bot_email(email: str) -> str: """Normalize GitHub noreply addresses without collapsing real plus-addresses.""" local, sep, domain = email.partition("@") if not sep: return email + domain = domain.lower() + if domain != "users.noreply.github.com": + return email + prefix, plus, rest = local.partition("+") if plus and prefix.isdigit() and rest: return f"{rest}@{domain}" return email📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def _normalize_bot_email(email: str) -> str: """Normalize GitHub noreply addresses without collapsing real plus-addresses.""" local, sep, domain = email.partition("@") if not sep: return email domain = domain.lower() if domain != "users.noreply.github.com": return email prefix, plus, rest = local.partition("+") if plus and prefix.isdigit() and rest: return f"{rest}@{domain}" return email🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@rebasebot/bot.py` around lines 186 - 196, Restrict _normalize_bot_email so it only rewrites GitHub noreply addresses and leaves plus-addresses on other domains untouched. Update the logic in _normalize_bot_email(email) to first verify the domain is a GitHub domain before stripping the numeric prefix, and keep the current plus-address handling only for those noreply patterns so real emails like 123+user@example.com remain unchanged.
This PR recovers changes in merge commits by creating synthetic carry commit.
This is needed for repositories that are onboarding and had previous changes made in merge commits. Rebasebot does not carry merge commits, so those changes are lost without this change.
Summary by CodeRabbit
New Features
UPSTREAM: <carry>commits, with correct ordering relative to subsequent commits.Bug Fixes
Tests