Skip to content

Rebasebot merge onboard#95

Draft
RadekManak wants to merge 9 commits into
openshift-eng:mainfrom
RadekManak:rebasebot-merge-onboard
Draft

Rebasebot merge onboard#95
RadekManak wants to merge 9 commits into
openshift-eng:mainfrom
RadekManak:rebasebot-merge-onboard

Conversation

@RadekManak

@RadekManak RadekManak commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

This PR recovers changes in merge commits by creating synthetic carry commit.

This is needed for repositories that are onboarding and had previous changes made in merge commits. Rebasebot does not carry merge commits, so those changes are lost without this change.

Summary by CodeRabbit

  • New Features

    • Improved rebase handling to preserve merge-only content introduced by downstream merges, including file and permission-bit changes.
    • Automatically recovers missing “carry” changes as synthetic upstream UPSTREAM: <carry> commits, with correct ordering relative to subsequent commits.
  • Bug Fixes

    • More accurately detects the bot’s synthetic upstream merges to avoid false matches.
    • Excluded commits are respected during both replay and merge-only recovery.
    • Skips no-op recoveries and fails safely when merge-only restoration can’t be applied cleanly.
  • Tests

    • Expanded unit test coverage for merge-only delta recovery scenarios and failure modes.

When a downstream two-parent merge commit contains tree changes that do
not exist in any replayed non-merge commit (e.g. manual conflict
resolutions), those changes were silently dropped by RebaseBot.

This commit implements the "later-run path" fix:

- Add `_find_merge_only_deltas(gitwd, source, dest)` which locates all
  eligible downstream merge commits (two-parent, non-synthetic) in the
  replay window and compares each commit's actual tree against the
  `git merge-tree --write-tree parent1 parent2` auto-merge baseline.
  Any difference is a merge-only delta.

- Add `_apply_merge_only_delta(gitwd, merge_commit, diff_output)` which
  restores the affected paths to their exact state in the original merge
  commit tree and records a synthetic commit:
    UPSTREAM: <carry>: Recover merge-only delta from merge {sha[:12]}

- Call both functions at the end of `_do_rebase`, after all regular
  cherry-picks, so the synthetic carry commits appear in topo/history
  order.  No-op deltas (content already present after replay) are
  silently skipped.

- Add regression test `test_later_run_merge_only_delta_preserved` that
  exercises the full dry-run rebase entrypoint:
  1. First rebase runs; its PR is merged into dest.
  2. A downstream feature branch is merged into dest with an extra file
     staged only at merge time (merge-only content).
  3. Source advances; the second (later-run) rebase must produce a
     synthetic carry commit restoring that file.
Interleave recovered merge-only carries with downstream replay so they land in their historical position instead of being appended after later commits.
When no prior RebaseBot synthetic merge marker exists on the dest branch
(i.e., the first automated rebase), _find_merge_only_deltas now scans
for eligible downstream merge commits using the merge_base as the cutoff
instead of returning an empty list.

Legacy downstream histories that previously synced upstream through real
merge commits (with merge-only content such as manual conflict
resolutions) will now have that content preserved during the first
automated rebase, using the same synthetic UPSTREAM: <carry> semantics
established for later-run histories.

Existing ordinary merged-branch replay behavior is unchanged: only
merges with a non-empty merge-only delta get a recovery carry.
Preserve recovered carries through strict-tag replays, honor source-merge exclusions, and detect mode-only tree deltas so merge recovery stays safe and lossless.
…a recovery

Add four focused regression tests that assert INFO-level detection, recovery, no-op
skip, and excluded-skip log messages are emitted during merge-only delta handling.
Also add a WARNING log in _apply_merge_only_delta before the RepoException raise so
the manual-intervention path is explicitly visible in operator logs.

TDD: all four tests were written without caplog.set_level / without the new WARNING
log (confirmed RED), then caplog.set_level(logging.INFO) was added to the integration
tests and logging.warning() was added to _apply_merge_only_delta to make them GREEN.
Treat only RebaseBot's own synthetic upstream merge as excludable and stop the rebase when merge-delta baselines cannot be computed, so real upstream-sync resolutions are preserved instead of being skipped silently.
Require the bot's configured git identity before treating a merge as RebaseBot's internal synthetic marker, and fail merge-only delta detection when merge-tree exits fatally. This preserves real upstream-sync merges that resolve to upstream and stops dry runs from silently continuing without a recoverable baseline.
@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 29, 2026
@openshift-ci

openshift-ci Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RadekManak

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2026
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 543b890a-8f0b-40fa-9f7a-08c8ba2e7614

📥 Commits

Reviewing files that changed from the base of the PR and between d96f8f2 and 8c3ac4b.

📒 Files selected for processing (2)
  • rebasebot/bot.py
  • tests/test_rebases.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_rebases.py

Walkthrough

Adds merge-only delta recovery to rebase replay, with stricter synthetic merge detection, phased downstream commit handling, and expanded recovery, exclusion, logging, and failure coverage.

Changes

Merge-only recovery flow

Layer / File(s) Summary
Detection and phasing
rebasebot/bot.py
Adds shared synthetic merge messaging, first-parent-aware synthetic merge detection, and two-phase downstream commit identification with cutoff tracking and phase logging.
Merge-only delta core
rebasebot/bot.py
Computes merge-tree baselines, detects merge-only diffs, validates replay state, restores affected paths, and creates synthetic carry commits for merge-only deltas.
Replay and branch prep
rebasebot/bot.py
Replays phase 1 before phase 2, interleaves merge-only recovery during phase 2, and generates the synthetic upstream merge message through the shared helper.
Test harness and helper coverage
tests/test_rebases.py
Adds test helpers for CLI args and branch merging, plus direct coverage for _tree_entry_for_path lookup failures.
Recovery behavior tests
tests/test_rebases.py
Covers later-run and first-run recovery, replay ordering, strict tag/squash behavior, exclusion handling, mode-only recovery, and no-op skipping.
Failure and logging tests
tests/test_rebases.py
Covers failure-closed behavior, deleted-path handling, synthetic merge classification, merge-tree detection failures, and INFO/WARNING logging.

Estimated code review effort: 5 (Critical) | ~120 minutes


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning, 1 inconclusive)

Check name Status Explanation Resolution
No-Sensitive-Data-In-Logs ❌ Error New ancestry-path logging prints --pretty=%H || %s || %aE, exposing commit author emails (PII) and subjects. Remove or sanitize that log; don’t emit %aE or full commit lines, or redact emails before logging.
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title is related to RebaseBot merge support, but it is too vague to clearly describe the main change. Use a more specific title such as "RebaseBot recovers merge-only changes with synthetic carry commits".
✅ Passed checks (8 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
No-Weak-Crypto ✅ Passed No weak-crypto or custom-crypto code was added; scans of rebasebot/bot.py and tests/test_rebases.py found no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, hashlib, or compare_digest usage.
Container-Privileges ✅ Passed PR changes only Python code/tests; no container/K8s manifest changes or privileged settings were introduced.
No-Hardcoded-Secrets ✅ Passed No hardcoded secrets found; scans found no api_key/secret/token/password literals or credential URLs in the changed files.
No-Injection-Vectors ✅ Passed No flagged sinks appear in the changed files; the new subprocess use passes argv lists and there’s no eval/exec/pickle/yaml/os.system/dangerouslySetInnerHTML in the patch.
Ai-Attribution ✅ Passed No AI-tool mention was found in PR/commit text, and no Assisted-by/Generated-by or Co-Authored-By trailers were present.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
tests/test_rebases.py (1)

917-1032: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Use _make_rebase_args in these scenarios instead of re-declaring the args shape.

These three tests duplicate the same CLI-args surface that Lines 39-73 already centralize. Any future arg/default change can drift these cases away from the rest of the merge-only suite.

♻️ Suggested cleanup
-        def _make_args(src, rbs, dst, wdir):
-            a = MagicMock()
-            a.source = src
-            a.source_repo = None
-            a.dest = dst
-            a.rebase = rbs
-            a.working_dir = wdir
-            a.git_username = "test_rebasebot"
-            a.git_email = "test@rebasebot.ocp"
-            a.tag_policy = "soft"
-            a.bot_emails = []
-            a.exclude_commits = []
-            a.update_go_modules = False
-            a.conflict_policy = "auto"
-            a.ignore_manual_label = False
-            a.dry_run = True
-            a.always_run_hooks = False
-            a.title_prefix = ""
-            a.pre_rebase_hook = None
-            a.post_rebase_hook = None
-            a.pre_carry_commit_hook = None
-            a.pre_push_rebase_branch_hook = None
-            a.pre_create_pr_hook = None
-            a.source_ref_hook = None
-            return a
-
-        args1 = _make_args(source, rebase, dest, first_run_dir)
+        args1 = _make_rebase_args(source, rebase, dest, first_run_dir)

Apply the same replacement to the later duplicated builder and the hand-built args block in the first-run test.

Also applies to: 1033-1121, 1122-1226

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_rebases.py` around lines 917 - 1032, The test is re-declaring the
CLI args shape instead of using the shared builder, which can drift from the
rest of the suite. Replace the local `_make_args`/hand-built args setup in
`test_later_run_merge_only_delta_preserved` with the existing
`_make_rebase_args` helper used by the merge-only tests so all scenarios share
the same defaults and fields. Keep the test logic unchanged; only centralize
args construction through `_make_rebase_args`.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rebasebot/bot.py`:
- Around line 669-672: The path restore logic in bot.py currently uses
gitwd.git.checkout in the rebasing flow, which can overwrite rebased upstream
edits without surfacing conflicts. Update the restore step around
added_or_modified to avoid blindly checking out from the old merge commit;
instead use a 3-way apply/merge strategy or add a divergence check against HEAD
before restoring. Keep the fix localized to the path-restoration block and
ensure any conflicting upstream changes are preserved or reported.
- Around line 682-687: The deleted-path cleanup in bot.py’s merge carry logic is
too permissive: gitwd.git.rm is being called without a pathspec separator, so
paths starting with dashes may be treated as options, and the broad
git.GitCommandError handler hides real rm failures. Update the deleted-path
removal loop to pass an explicit pathspec separator and an
ignore-missing/unmatch style option when removing paths, and only suppress the
expected “already absent” case while re-raising unexpected git rm errors so the
synthetic carry commit does not silently become partial.

---

Nitpick comments:
In `@tests/test_rebases.py`:
- Around line 917-1032: The test is re-declaring the CLI args shape instead of
using the shared builder, which can drift from the rest of the suite. Replace
the local `_make_args`/hand-built args setup in
`test_later_run_merge_only_delta_preserved` with the existing
`_make_rebase_args` helper used by the merge-only tests so all scenarios share
the same defaults and fields. Keep the test logic unchanged; only centralize
args construction through `_make_rebase_args`.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 6081360c-6e12-409c-935a-9c69d7d09719

📥 Commits

Reviewing files that changed from the base of the PR and between 2b7b2bc and d96f8f2.

📒 Files selected for processing (2)
  • rebasebot/bot.py
  • tests/test_rebases.py

Comment thread rebasebot/bot.py
Comment thread rebasebot/bot.py Outdated
Reject recovered merge-only carries once rebased paths have diverged past the original merge baseline, and surface deleted-path rm failures instead of silently partial-applying the carry. Reuse the shared rebase arg builder in the merge-only regression tests so these scenarios stay aligned with the dry-run defaults.
Bring in the latest bot-email and container updates from main while preserving the merge-only replay and partial-carry safety work on this branch.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rebasebot/bot.py`:
- Around line 186-196: Restrict _normalize_bot_email so it only rewrites GitHub
noreply addresses and leaves plus-addresses on other domains untouched. Update
the logic in _normalize_bot_email(email) to first verify the domain is a GitHub
domain before stripping the numeric prefix, and keep the current plus-address
handling only for those noreply patterns so real emails like
123+user@example.com remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 543b890a-8f0b-40fa-9f7a-08c8ba2e7614

📥 Commits

Reviewing files that changed from the base of the PR and between d96f8f2 and 8c3ac4b.

📒 Files selected for processing (2)
  • rebasebot/bot.py
  • tests/test_rebases.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_rebases.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rebasebot/bot.py`:
- Around line 186-196: Restrict _normalize_bot_email so it only rewrites GitHub
noreply addresses and leaves plus-addresses on other domains untouched. Update
the logic in _normalize_bot_email(email) to first verify the domain is a GitHub
domain before stripping the numeric prefix, and keep the current plus-address
handling only for those noreply patterns so real emails like
123+user@example.com remain unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository: openshift-eng/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 543b890a-8f0b-40fa-9f7a-08c8ba2e7614

📥 Commits

Reviewing files that changed from the base of the PR and between d96f8f2 and 8c3ac4b.

📒 Files selected for processing (2)
  • rebasebot/bot.py
  • tests/test_rebases.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/test_rebases.py
🛑 Comments failed to post (1)
rebasebot/bot.py (1)

186-196: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Restrict noreply normalization to GitHub domains.

The helper says it avoids collapsing real plus-addresses, but 123+user@example.com currently becomes user@example.com. That can misclassify non-bot commits during bot squash matching.

Proposed fix
 def _normalize_bot_email(email: str) -> str:
     """Normalize GitHub noreply addresses without collapsing real plus-addresses."""
     local, sep, domain = email.partition("@")
     if not sep:
         return email
 
+    domain = domain.lower()
+    if domain != "users.noreply.github.com":
+        return email
+
     prefix, plus, rest = local.partition("+")
     if plus and prefix.isdigit() and rest:
         return f"{rest}@{domain}"
 
     return email
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def _normalize_bot_email(email: str) -> str:
    """Normalize GitHub noreply addresses without collapsing real plus-addresses."""
    local, sep, domain = email.partition("@")
    if not sep:
        return email

    domain = domain.lower()
    if domain != "users.noreply.github.com":
        return email

    prefix, plus, rest = local.partition("+")
    if plus and prefix.isdigit() and rest:
        return f"{rest}@{domain}"

    return email
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@rebasebot/bot.py` around lines 186 - 196, Restrict _normalize_bot_email so it
only rewrites GitHub noreply addresses and leaves plus-addresses on other
domains untouched. Update the logic in _normalize_bot_email(email) to first
verify the domain is a GitHub domain before stripping the numeric prefix, and
keep the current plus-address handling only for those noreply patterns so real
emails like 123+user@example.com remain unchanged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant