Skip to content

feat(skills): improve triage skills scalability for large repos#1697

Open
FlorianBruniaux wants to merge 1 commit intodevelopfrom
feature/skills-triage-scalability-update
Open

feat(skills): improve triage skills scalability for large repos#1697
FlorianBruniaux wants to merge 1 commit intodevelopfrom
feature/skills-triage-scalability-update

Conversation

@FlorianBruniaux
Copy link
Copy Markdown
Collaborator

Summary

  • Add --focus modes (recent/critical/stale/all) with auto-activation above 200 items — prevents token explosion on large repos
  • Two-pass data gathering: metadata first, bodies only for scoped items (faster, cheaper)
  • Cap parallel agents: 20 for issue-triage, 15 for pr-triage, with batching above the limit
  • Duplicate detection rewrite: n-gram pre-filter cuts Jaccard comparisons from ~160K to ~2-5K on a 400-issue repo
  • Overlap detection capped to a targeted sample (updatedAt <30d, additions <1000, not stale draft)
  • Pagination support via gh api cursor for repos beyond 400 items
  • Cross-analysis window in rtk-triage capped at 100 issues x 100 PRs to avoid combinatorial explosion

Test plan

  • Run /issue-triage on a repo with >200 open issues, verify --focus recent activates automatically
  • Run /pr-triage --focus critical and confirm only CI-dirty + CONFLICTING PRs appear
  • Run /rtk-triage --all and verify pagination kicks in beyond 400 items
  • Confirm agent batching works when selecting >20 issues for deep analysis
  • Verify duplicate detection still catches obvious dupes with n-gram pre-filter

🤖 Generated with Claude Code

- Add --focus modes (recent/critical/stale/all) with auto-activation above 200 items
- Switch to two-pass data gathering: metadata first, bodies only for scoped items
- Cap parallel agents: 20 for issue-triage, 15 for pr-triage, with batching above limit
- Optimize duplicate detection: n-gram pre-filter reduces Jaccard comparisons from ~160K to ~2-5K
- Cap overlap detection to targeted sample (updatedAt <30d, additions <1000, not stale draft)
- Add pagination support via gh api cursor for repos >400 items
- Cap cross-analysis window in rtk-triage to 100 issues x 100 PRs

Signed-off-by: Florian BRUNIAUX <florian@bruniaux.com>
Copilot AI review requested due to automatic review settings May 3, 2026 19:18
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the Claude triage skills to scale better on large repositories by introducing focus-based scoping, reducing expensive data pulls, and adding caps to avoid combinatorial/token explosions during cross-analysis.

Changes:

  • Add --focus filtering modes (recent/critical/stale/all) with an auto “recent” fallback above 200 items.
  • Switch triage workflows toward a two-pass gathering strategy (metadata first, bodies/files only for scoped items).
  • Add explicit batching limits for parallel deep-analysis agents and cap cross-analysis windows/overlap detection.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
.claude/skills/rtk-triage/SKILL.md Documents new focus modes, two-pass gathering, overlap/window caps, and pagination guidance for combined triage.
.claude/skills/pr-triage/SKILL.md Adds --focus modes, two-pass fetching, and batching limits for deep PR reviews/overlap detection.
.claude/skills/issue-triage/SKILL.md Adds --focus modes, duplicate-detection scaling notes, two-pass strategy, and batching limits for deep issue analysis.

Comment on lines +81 to +85
**PRs — Passe 1 (métadonnées)**:
```bash
gh pr list --state open --limit 500 \
--json number,title,author,createdAt,updatedAt,additions,deletions,changedFiles,isDraft,mergeable,reviewDecision,statusCheckRollup,body
```
Comment on lines +5 to +6
Args: "all" to review all, PR numbers to focus (e.g. "42 57"), "en"/"fr" for language,
"--focus recent" (default, last 60d), "--focus critical" (bugs/security), "--all" (everything).
Comment on lines +44 to +47
| `--focus recent` | PRs updatedAt < 60j (défaut si >200 PRs) | Triage hebdomadaire normal |
| `--focus critical` | CI dirty + CONFLICTING + overlaps détectés | Sprint urgent, avant merge |
| `--focus stale` | Aucune activité >14j | Nettoyage backlog |
| `--all` | Toutes les PRs ouvertes, paginé | Audit exhaustif mensuel |
Comment on lines +5 to +6
Args: "all" for deep analysis of all, issue numbers to focus (e.g. "42 57"), "en"/"fr" for language,
"--focus recent" (default, last 60d), "--focus critical", "--focus stale", "--all" (everything).
puis croise les données pour détecter doubles couvertures, trous sécurité,
P0 sans PR, et conflits internes. Sauvegarde dans claudedocs/RTK-YYYY-MM-DD.md.
Args: "en"/"fr" pour la langue (défaut: fr), "save" pour forcer la sauvegarde.
Args: "en"/"fr" pour la langue (défaut: fr), "--focus recent/critical/stale/all", "save" pour forcer la sauvegarde.
puis croise les données pour détecter doubles couvertures, trous sécurité,
P0 sans PR, et conflits internes. Sauvegarde dans claudedocs/RTK-YYYY-MM-DD.md.
Args: "en"/"fr" pour la langue (défaut: fr), "save" pour forcer la sauvegarde.
Args: "en"/"fr" pour la langue (défaut: fr), "--focus recent/critical/stale/all", "save" pour forcer la sauvegarde.
Comment on lines +75 to 81
# Issues ouvertes métadonnées sans body ni comments (léger)
gh issue list --state open --limit 500 \
--json number,title,author,createdAt,updatedAt,labels,assignees

# PRs ouvertes (pour cross-référence)
gh pr list --state open --limit 50 --json number,title,body
# PRs ouvertes pour cross-référence (body nécessaire pour fixes #N)
gh pr list --state open --limit 500 --json number,title,body

Comment on lines +101 to +106
gh issue view {num} --json body,comments \
--jq '{body: .body, comments: (.comments[-5:] | map(.body))}'
```

Si issue a >50 commentaires, fetcher uniquement les 5 derniers.

Comment on lines +100 to +106
**Pagination si >400 items** : utiliser `gh api` avec curseur :
```bash
# Issues au-delà de 400
gh api "repos/{owner}/{repo}/issues?state=open&per_page=100&page=2" \
--jq '[.[] | {number: .number, title: .title, updatedAt: .updated_at}]'
# Répéter page=3, page=4... jusqu'à réponse vide
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants