Skip to content

pr-review heuristic mispairs after issue transfer (branch encodes pre-transfer issue number) #105

@ellen-goc

Description

@ellen-goc

Symptom

askcc pr-review repeatedly mispaired weyucou/issue-tracker#5 (YAML event schema, transferred from weyucou/task-management#25) with weyucou/issue-tracker#6 (.gitignore, closes weyucou/task-management#52). Each daily cron tick posted a fresh review comment on PR#6 from 2026-05-01 to 2026-05-06 — 7 redundant misposted comments before the issue was relabeled.

Root cause

Two distinct defects in _find_linked_pr_number (askcc/functions.py:97):

  1. Substring match on #N: the body fallback uses f"#{issue_number}" in body. For issue_number=5, the substring #5 matches #52, #53, … This is how PR#6 (body referencing task-management#52) was incorrectly paired with issue#5.
  2. No transfer-history awareness: after weyucou/task-management#25 was transferred to weyucou/issue-tracker#5, the real fix-PR (PR#2, branch chore/25-yaml-event-schema-v1, body Closes weyucou/task-management#25) referenced only the pre-transfer coordinates. The heuristic never consults the GitHub Timeline transferred_event, so it cannot recover the link.

The wrong PR then passed substring matching and was returned as the linked PR — the rest of pr-review had no way to detect the mismatch and posted a review on it.

Compounding issue

No per-PR dedup on the resulting review comment — every cron tick re-posted the same mismatch warning to PR#6. The 24-hour comment dedup gate evaluates the issue, not the misposted PR. (Caller/cron side — see notes below.)

Current state

  • askcc/functions.py:97_find_linked_pr_number(gh, owner, repo, issue_number):
    • Branch match: re.compile(rf"^[^/]+/{issue_number}-") — correctly anchored, no bug here.
    • Body fallback: if f"#{issue_number}" in body or f"/issues/{issue_number}" in bodysubstring bug.
  • askcc/functions.py:131fetch_pr_content raises ValueError("No linked pull request found …") when _find_linked_pr_number returns None; the CLI propagates this and pr-review exits without posting. So fixing the matcher to return None for the ambiguous case is sufficient — no extra "abort" plumbing required.
  • askcc/skills/handle-github-issue/SKILL.md:353-368 — mirrors the same heuristic in prose for the standalone-skill install. Must be kept in sync.
  • tests/test_askcc.py:1623TestFindLinkedPrNumber covers the happy paths but has no negative test for substring collision or transfer-history matching.

Implementation Plan

Step 1 — Word-boundary body matching (fixes the immediate mispair)

In askcc/functions.py:_find_linked_pr_number, replace the substring fallback with regex matches that enforce a digit boundary:

  • re.search(rf"(?<!\d)#{issue_number}(?!\d)", body)
  • re.search(rf"/issues/{issue_number}(?!\d)", body)

This alone prevents #5 from matching #52 and is the minimum fix to stop the mispair recurrence.

Step 2 — Prefer explicit close keywords over bare #N

Tighten the body matcher to require GitHub's auto-link keywords for the primary positive match (case-insensitive):

CLOSE_KEYWORD_RE = re.compile(
    rf"\b(?:closes|closed|close|fixes|fixed|fix|resolves|resolved|resolve)\s+"
    rf"(?:(?P<qualified>[\w.-]+/[\w.-]+)#)?(?P<num>\d+)\b",
    re.IGNORECASE,
)

Walk all matches; for each, if qualified is set check it against (owner, repo) or any predecessor (Step 3), and check num against issue_number (or predecessor prior_N). Keep the bare #N//issues/N matchers as a secondary fallback so non-conventional PR bodies are not regressed.

Step 3 — Transfer-history aware lookup

Add _resolve_transferred_predecessors(gh, owner, repo, issue_number) -> list[tuple[str, str, int]]:

  • Call gh api --paginate repos/{owner}/{repo}/issues/{N}/timeline.
  • For each event where event == "transferred", extract from source.issue:
    • repository.owner.loginprior_owner
    • repository.nameprior_repo
    • numberprior_number
  • Append (prior_owner, prior_repo, prior_number) to the list. (GitHub does not currently return multi-hop chains via this endpoint — predecessors point only to the immediately prior issue. The function returns a list to keep the API stable if that changes.)
  • Wrap all subprocess/JSON errors and return [] — never raise. Log a warning so misconfigured permissions surface in cron logs.

Plumb the predecessor list into _find_linked_pr_number:

  • Branch convention: also match ^[^/]+/{prior_number}- (PRs live in the current repo after transfer; the branch number stays as it was at creation).
  • Body keywords (Step 2): also accept Closes {prior_owner}/{prior_repo}#{prior_number} (and Fixes/Resolves variants).

Step 4 — No-positive-match → return None

After Steps 1–3, drop the unconditional fallback "any PR body containing #N". A PR may only be returned when:

  • Its branch matches the issue number or a predecessor's number, or
  • Its body explicitly closes the issue (or a predecessor) via the keyword regex, or
  • Its body contains /issues/N with a digit boundary referencing the issue or a predecessor.

If none of those positive matches hold, return None. fetch_pr_content already raises ValueError on None, which makes the cron run a no-op. This is mitigation #1 from the issue: "Don't run pr-review when the chosen PR's Closes reference doesn't include the input issue."

Step 5 — Sync the standalone skill

Update askcc/skills/handle-github-issue/SKILL.md:353-368 to describe the new heuristic in prose:

  • Word-boundary matching (so #5 does not match #52).
  • Prefer explicit Closes/Fixes/Resolves references over bare #N.
  • Before falling back, query gh api repos/<owner>/<repo>/issues/<N>/timeline for transferred events and accept matches against each predecessor's owner/repo#N and branch number.
  • If no positive match is found after all checks, abort the action without posting.

Step 6 — Tests

In tests/test_askcc.py, extend TestFindLinkedPrNumber and add TestResolveTransferredPredecessors:

  • Regression for the original bug: PR body "Closes weyucou/task-management#52" with issue_number=5 returns None (was returning the PR's number).
  • Substring-collision: PR body "See #51, #500" with issue_number=5 returns None.
  • Word-boundary positive: PR body "Fixes #5" with issue_number=5 returns the PR's number.
  • Transfer-aware branch: predecessor list [("weyucou", "task-management", 25)] + PR branch chore/25-yaml-event-schema-v1 + issue_number=5 returns the PR's number.
  • Transfer-aware body: PR body "Closes weyucou/task-management#25" + same predecessor + issue_number=5 returns the PR's number.
  • _resolve_transferred_predecessors: parses a synthetic timeline payload with one transferred event correctly; returns [] for a clean timeline; returns [] (with a logged warning) on gh failure or bad JSON.

Step 7 — Caller-side dedup (out of scope, documented)

Mitigation #3 ((action, repo, target_number) dedup) is enforced by the cron caller, not askcc. Note this in the summary comment so the cron operator can coordinate the change.

Affected Files

  • askcc/functions.py — rewrite _find_linked_pr_number (~L97); add _resolve_transferred_predecessors; introduce the CLOSE_KEYWORD_RE constant.
  • askcc/skills/handle-github-issue/SKILL.md — update PR-lookup prose (~L353-368).
  • tests/test_askcc.py — extend TestFindLinkedPrNumber (~L1623); add TestResolveTransferredPredecessors.

askcc/definitions.py:REVIEWPR_AGENT_PROMPT does not need editing: the Python heuristic runs before the agent in the askcc package flow, so the agent prompt is unaffected. The skill prose change at Step 5 is the only documentation surface.

Acceptance Criteria

  • _find_linked_pr_number returns None for issue_number=5 against a PR whose only body reference is Closes weyucou/task-management#52 (regression of the reported root cause).
  • _find_linked_pr_number returns the PR number for issue_number=5 against a PR with branch chore/25-yaml-event-schema-v1 when _resolve_transferred_predecessors reports [("weyucou", "task-management", 25)].
  • _find_linked_pr_number returns the PR number for issue_number=5 against a PR body Closes weyucou/task-management#25 under the same predecessor set.
  • _find_linked_pr_number returns None when no PR body uses an explicit close keyword referencing the issue or any predecessor and no branch matches.
  • _resolve_transferred_predecessors parses one synthetic transferred event payload, returns [] on a clean timeline, and returns [] on gh / JSON errors without raising.
  • askcc/skills/handle-github-issue/SKILL.md PR-lookup section describes word-boundary matching, close-keyword preference, and the timeline transferred_event lookup.
  • tests/test_askcc.py adds the six tests listed in Step 6; full suite passes.
  • uv run poe check and uv run poe test both pass.

Dependencies

  • GitHub Timeline API: gh api repos/{owner}/{repo}/issues/{N}/timeline returns transferred events containing source.issue.repository.{owner.login,name} and source.issue.number. No preview-header opt-in is required on the current REST API. Verified payload shape via GitHub REST docs.
  • No new Python dependencies — uses stdlib re and the already-imported subprocess / gh.

Risks / Open Questions

  • Timeline API cost: one extra paginated gh api call per pr-review run. Per-cron impact is small (one issue at a time), but flag it for any future batched caller.
  • Multi-hop transfers: GitHub Timeline currently only exposes the immediately prior issue. If A → B → C and we are at C, only B is recoverable. Returning a list keeps the API stable; chase deeper only if a real case appears.
  • Per-target dedup (mitigation Add release workflow and bump to v0.1.1 #3): belongs to the cron caller. Will note in the summary comment for the operator. Not implemented here.

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions