diff --git a/docs/showcases/README.md b/docs/showcases/README.md index 5fef4a34..6d0f3158 100644 --- a/docs/showcases/README.md +++ b/docs/showcases/README.md @@ -82,6 +82,19 @@ commit-backed self-iteration case, and one contributor-approved interactive workflow case that shows how LoopX coordinates generated scripts and worker agents under a shared control plane. +## Additional Public Evidence Cases + +| Case | Pattern | Status | Public Surface | +| --- | --- | --- | --- | +| [0623 agent-to-agent PR comment and fix loop](cases/0623-agent-to-agent-pr-comments.md) | Agent handoff, PR comment loop, review packet | Public-safe pattern case | Redacted lifecycle narrative | +| [0623 overnight project refactor](cases/0623-overnight-project-refactor.md) | PR-sized slices, todo follow-up, supersede | Public-safe pattern case | Redacted lifecycle narrative | +| [0624 PR issue automatic fix loop](cases/0624-pr-issue-auto-fix.md) | Issue-fix workflow, repro smoke, reviewer handoff | Public-safe pattern case | Redacted workflow narrative | +| [0627 overnight PR batch with reviewable control](cases/0627-overnight-pr-batch.md) | PR-sized slices, validation writeback, public-boundary discipline | Public Git evidence case | 22 merged commits over a 10-hour public Git window | + +Additional evidence cases stay in the catalog as appendix surfaces, but they +are not part of the first three canonical PoC cards until they gain a +reproducible demo or a deeper public evidence packet. + ## Appendix Cases | Case | Pattern | Status | Public Surface | diff --git a/docs/showcases/cases/0623-agent-to-agent-pr-comments.md b/docs/showcases/cases/0623-agent-to-agent-pr-comments.md new file mode 100644 index 00000000..ce5f01c2 --- /dev/null +++ b/docs/showcases/cases/0623-agent-to-agent-pr-comments.md @@ -0,0 +1,58 @@ +# 0623: Agent-To-Agent PR Comment And Fix Loop + +## Summary + +This case captures a public-safe version of a multi-agent review loop: one +agent lane can notice or respond to PR review feedback, while another lane +keeps the implementation and fix evidence reviewable. The important behavior is +not the chat transcript. It is the control-plane loop around a PR: comment, +handoff, fix, validation, and review packet. + +The original evidence included operator-side screenshots and review context, so +this repository keeps only the reusable pattern. Public PR surfaces can be used +as evidence, but raw screenshots and private coordination details stay out of +the repo. + +## Pattern + +A review comment is a good boundary object for long-running agents: + +- it is concrete enough to turn into a todo; +- it belongs to a public or reviewable PR surface; +- it can be routed to the agent that owns the implementation lane; +- it can be closed only after a fix and validation are visible. + +LoopX keeps that flow explicit instead of relying on a human to remember which +agent saw the comment. + +## LoopX Behavior + +LoopX contributes the following control-plane pieces: + +- a claimed todo names the PR feedback or comment thread; +- a handoff gate keeps the blocked agent from guessing outside its lane; +- the implementation agent records the fix and validation evidence; +- the review packet points the reviewer back to the public PR surface; +- follow-up work becomes a successor todo rather than a loose chat note. + +## User-Facing Value + +The operator does not need to manually shepherd every PR comment across agent +threads. LoopX turns review feedback into a bounded work item with owner, +evidence, and handoff state. That makes agent-to-agent collaboration useful +without hiding the final review responsibility. + +## Evidence Boundary + +This case excludes private screenshots, raw chats, internal review notes, local +state, credentials, and unpublished artifacts. The public-safe evidence shape +is the PR comment/fix lifecycle itself: a visible PR surface, a claimed todo, +the fix diff, validation output, and the resulting review packet. + +## Website Story Beats + +1. A PR receives feedback that should become executable work. +2. LoopX turns the feedback into an owned todo instead of a chat reminder. +3. Another agent lane implements or verifies the fix. +4. The review packet links the comment, fix, and validation evidence. +5. Follow-up work remains explicit as successor todos. diff --git a/docs/showcases/cases/0623-overnight-project-refactor.md b/docs/showcases/cases/0623-overnight-project-refactor.md new file mode 100644 index 00000000..6e36789c --- /dev/null +++ b/docs/showcases/cases/0623-overnight-project-refactor.md @@ -0,0 +1,58 @@ +# 0623: Overnight Project Refactor As PR-Sized Slices + +## Summary + +This case captures a long unattended refactor that stayed reviewable because +LoopX kept splitting the work into bounded PR-sized slices. The reusable lesson +is that autonomous refactoring should not land as one huge diff. It should keep +todo follow-up, supersede decisions, validation, and review boundaries visible. + +The source note described an overnight refactor wave. This repository records +the public-safe control-plane pattern rather than private screenshots or local +project state. + +## Before + +Large refactors are a bad fit for naive autonomous loops. Without a control +plane, an agent can keep editing after the original plan is stale, mix cleanup +with behavior changes, or produce a broad diff that is hard to review. + +The desired behavior is: + +1. keep the goal and current slice explicit; +2. finish one reviewable unit at a time; +3. create follow-up todos for remaining work; +4. supersede stale todos when the refactor discovers a better route; +5. validate each slice before merge or handoff. + +## LoopX Behavior + +LoopX makes that refactor loop durable: + +- `todo follow-up` turns discoveries into the next concrete slice; +- `supersede` prevents stale tasks from staying runnable; +- quota and status keep the current slice separate from adjacent cleanup; +- review packets and focused smokes keep each PR independently checkable; +- public/private boundary scans prevent local planning material from leaking + into public docs. + +## User-Facing Value + +The operator can let a refactor continue overnight while still waking up to +reviewable units. The project moves faster, but the review surface remains +human-sized. + +## Evidence Boundary + +This case excludes private screenshots, raw chats, internal planning notes, +local paths, credentials, raw logs, and unpublished project artifacts. Public +evidence should come from the resulting PR-sized diffs, validation commands, +and follow-up/supersede state, not from raw agent traces. + +## Website Story Beats + +1. A broad refactor starts as a long-running goal. +2. LoopX keeps the current slice explicit. +3. Follow-up and supersede convert discoveries into reviewable next steps. +4. Each slice gets validation and a review packet. +5. The operator reviews bounded PRs instead of a giant autonomous diff. diff --git a/docs/showcases/cases/0624-pr-issue-auto-fix.md b/docs/showcases/cases/0624-pr-issue-auto-fix.md new file mode 100644 index 00000000..0cee4c91 --- /dev/null +++ b/docs/showcases/cases/0624-pr-issue-auto-fix.md @@ -0,0 +1,59 @@ +# 0624: PR Issue Automatic Fix Loop + +## Summary + +This case captures the issue-to-fix loop: review feedback, issue text, or a PR +comment should become an executable repair plan with a repro or focused smoke, +not an informal note in chat. LoopX turns that signal into a bounded workflow +that can classify the problem, prepare a branch, implement a fix, validate it, +and report the result back to the review surface. + +The original showcase included private visual evidence. This public case keeps +only the reusable product pattern and the repository surfaces that support it. + +## Pattern + +Automatic issue fixing needs more than "read the issue and edit files." A safe +workflow needs to: + +- classify whether the issue body or review comment is enough to act on; +- create or identify a focused reproduction path; +- keep private or gated issue bodies out of public fixtures; +- make the implementation branch explicit; +- run a small validation command before reporting success; +- record any unresolved reviewer decision as a concrete todo. + +## LoopX Behavior + +LoopX supports the loop with issue-fix planning and command-pack style +contracts: + +- the initial signal becomes ordered todos rather than prose; +- gated reads remain explicit when a body or comment is not safe to consume; +- implementation and validation steps stay separate; +- review feedback can create a successor todo instead of being lost after a PR + comment; +- the final packet records what was fixed, what was validated, and what still + needs a reviewer. + +## User-Facing Value + +The operator can point LoopX at a review issue and expect a controlled repair +loop: understand the request, create a repro, implement the fix, validate it, +and surface remaining review decisions. The user does not have to translate +every PR comment into a manual agent prompt. + +## Evidence Boundary + +This case excludes private screenshots, raw issue bodies from gated sources, +internal review notes, local paths, raw logs, credentials, and unpublished +repository artifacts. Public evidence should be the sanitized workflow plan, +focused smoke, branch diff, and public PR review outcome. + +## Website Story Beats + +1. A PR issue or review comment appears. +2. LoopX classifies the issue and creates ordered repair todos. +3. The agent builds or finds a focused repro. +4. The fix lands as a reviewable branch diff with validation. +5. Remaining reviewer decisions are written back as concrete todos. diff --git a/docs/showcases/cases/0627-overnight-pr-batch.md b/docs/showcases/cases/0627-overnight-pr-batch.md new file mode 100644 index 00000000..e62735e4 --- /dev/null +++ b/docs/showcases/cases/0627-overnight-pr-batch.md @@ -0,0 +1,102 @@ +# 0627: Overnight PR Batch With Reviewable Control + +## Summary + +LoopX produced an overnight burst of public repository progress without turning +the project into an unreadable pile of agent output. In the ten-hour public Git +window from `2026-06-27 01:29 +08:00` to `2026-06-27 11:29 +08:00`, the public +repository advanced by 22 merged commits touching 60 files, with 6695 insertions +and 223 deletions. + +This case is useful because the signal is PR-shaped and reviewable. The work +landed as small slices across docs, state projection, issue-fix workflow, +event-sourced state, benchmark launch contracts, status/quota smokes, and +release/runtime guardrails. LoopX did not make a single giant change that a +maintainer had to trust blindly. + +The public case deliberately uses merged Git history as the evidence floor. The +operator-side note also tracked a larger contemporaneous PR queue, but this +page only claims what the public repository can support. + +## Public Repository Signal + +The evidence window is anchored to public Git history and can be reproduced +locally: + +```bash +git log --since="2026-06-27T01:29:00+08:00" \ + --until="2026-06-27T11:29:00+08:00" --oneline + +git log --since="2026-06-27T01:29:00+08:00" \ + --until="2026-06-27T11:29:00+08:00" --numstat +``` + +| Signal | Value | +| --- | --- | +| Public evidence window | 2026-06-27 01:29 +08:00 to 2026-06-27 11:29 +08:00 | +| Merged commits in window | 22 | +| Unique files touched | 60 | +| Public insertions / deletions | 6695 / 223 | +| Commit messages with explicit PR numbers | 10 | +| Evidence floor | Public Git history only | + +Representative merged slices in the window included: + +- issue-fix workflow planning and command-pack guidance; +- event-sourced LoopX state contracts, API, compaction, and downstream read + path checks; +- Terminal-Bench and SkillsBench launch or prerequisite contracts; +- status/quota performance budget and projection smokes; +- rollout-state documentation and README workflow refinement; +- agent-scope wait scheduler progression. + +## LoopX Behavior + +The product behavior was not "make more commits." The useful behavior was that +high-throughput work stayed bounded and reviewable: + +- each slice remained small enough to review as a PR or PR-sized commit; +- public docs, examples, and runtime code moved together when the contract + changed; +- focused smokes validated reusable control-plane behavior instead of + preserving raw run traces; +- self-merge stayed limited to narrow validated changes; +- broader review gates and handoffs remained visible instead of being hidden + behind the throughput number; +- public/private boundary checks kept internal screenshots, local state, raw + logs, and private planning out of the repository. + +## User-Facing Value + +For an operator, this case shows a different shape of agent productivity: +overnight progress can be high-throughput without becoming high-risk. The user +can wake up to a batch of merged, reviewable public slices, while the control +plane still records what changed, which validations ran, which gates remained, +and which evidence is safe to publish. + +For an agent-platform developer, the reusable pattern is a PR-scale work loop: +LoopX keeps each lane tied to todo ownership, validation, review policy, and +public evidence, so a long-running agent team can move quickly without relying +on chat memory or private screenshots. + +## Evidence Boundary + +This case intentionally excludes private workspace state, internal documents, +screenshots, raw chats, local paths, raw benchmark logs, credentials, and any +unpublished operator notes. The public evidence floor is Git history and the +public repository surfaces it changed. + +The 22-commit window is not a universal productivity benchmark. It is a +showcase of reviewable control-plane throughput in one public repository at one +point in time. Future versions can strengthen the case by linking each public +PR number to its validation evidence and review outcome. + +## Website Story Beats + +1. A long-running LoopX project enters an overnight autonomous work window. +2. Many small slices land across runtime, docs, benchmark contracts, smokes, and + state projection. +3. LoopX keeps each slice tied to todo ownership, validation, and review policy. +4. The operator sees public Git evidence instead of raw agent logs. +5. The evidence boundary keeps private screenshots and internal planning out of + the showcase. diff --git a/docs/showcases/showcase-catalog.json b/docs/showcases/showcase-catalog.json index 2a2ab432..ea69e29c 100644 --- a/docs/showcases/showcase-catalog.json +++ b/docs/showcases/showcase-catalog.json @@ -1,6 +1,6 @@ { "schema_version": "loopx_showcase_catalog_v0", - "updated_at": "2026-06-22", + "updated_at": "2026-06-27", "redaction_policy": { "public_repo_may_include": [ "sanitized domain labels", @@ -228,6 +228,175 @@ ] } }, + { + "id": "2026-06-23-agent-to-agent-pr-comments", + "date": "2026-06-23", + "title": "Agent-to-agent PR comment and fix loop", + "status": "public_safe_pattern_case", + "case_page": "docs/showcases/cases/0623-agent-to-agent-pr-comments.md", + "demo_command": null, + "domain": "pull-request-review", + "audience": [ + "operator", + "agent-platform-developer", + "open-source-maintainer" + ], + "pattern_tags": [ + "agent_to_agent_handoff", + "pr_comment_loop", + "review_packet", + "claimed_todo", + "successor_todo" + ], + "headline": "PR review feedback can become an owned agent todo with fix evidence instead of a loose chat reminder.", + "problem": "Multiple agent lanes can see or act on the same PR feedback, but without ownership and handoff state the comment-to-fix loop becomes hard to audit.", + "loopx_behavior": [ + "turn review feedback into a claimed todo", + "route implementation through the owning agent lane", + "record fix and validation evidence in the review packet", + "keep successor work explicit after the comment is handled" + ], + "user_value": "The operator can let agents coordinate around PR comments without losing final review visibility or fix evidence.", + "evidence_boundary": "Public-safe pattern case only; no private screenshots, raw chats, internal review notes, local state, credentials, or unpublished artifacts.", + "appendix_surface": { + "reason": "Pattern is durable, but the original evidence included private screenshots; keep as appendix until a fully public PR-comment packet is curated.", + "public_surface": "appendix_only", + "links": [ + "docs/showcases/cases/0623-agent-to-agent-pr-comments.md" + ] + } + }, + { + "id": "2026-06-23-overnight-project-refactor", + "date": "2026-06-23", + "title": "Overnight project refactor as PR-sized slices", + "status": "public_safe_pattern_case", + "case_page": "docs/showcases/cases/0623-overnight-project-refactor.md", + "demo_command": null, + "domain": "repository-refactor", + "audience": [ + "operator", + "agent-platform-developer", + "technical-lead" + ], + "pattern_tags": [ + "long_unattended_goal", + "pr_sized_slices", + "todo_follow_up", + "supersede", + "validation_writeback" + ], + "headline": "A broad refactor can run overnight while staying split into human-sized review units.", + "problem": "Autonomous refactors become risky when discoveries, stale tasks, cleanup, and behavior changes collapse into one broad diff.", + "loopx_behavior": [ + "keep the current refactor slice explicit", + "convert discoveries into follow-up todos", + "supersede stale tasks when the route changes", + "validate each slice before merge or handoff" + ], + "user_value": "The operator can wake up to reviewable PR-sized refactor slices instead of a single giant autonomous diff.", + "evidence_boundary": "Public-safe pattern case only; no private screenshots, raw chats, internal planning notes, local paths, credentials, raw logs, or unpublished project artifacts.", + "appendix_surface": { + "reason": "Pattern is durable, but keep outside canonical cards until public PR slices and validations are curated into a deeper packet.", + "public_surface": "appendix_only", + "links": [ + "docs/showcases/cases/0623-overnight-project-refactor.md" + ] + } + }, + { + "id": "2026-06-24-pr-issue-auto-fix", + "date": "2026-06-24", + "title": "PR issue automatic fix loop", + "status": "public_safe_pattern_case", + "case_page": "docs/showcases/cases/0624-pr-issue-auto-fix.md", + "demo_command": null, + "domain": "issue-fix-workflow", + "audience": [ + "operator", + "agent-platform-developer", + "open-source-maintainer" + ], + "pattern_tags": [ + "issue_fix_workflow", + "review_feedback", + "repro_smoke", + "command_pack", + "successor_todo" + ], + "headline": "Review feedback should become an ordered repair workflow with repro, fix, validation, and reviewer handoff.", + "problem": "PR comments and issues are often concrete enough to fix, but unsafe to feed into an agent as unstructured prompt text without routing, repro, and validation.", + "loopx_behavior": [ + "classify the issue or review feedback", + "create ordered repair todos", + "keep gated source reads explicit", + "separate repro, implementation, validation, and reviewer handoff" + ], + "user_value": "The operator can turn a PR issue into a controlled fix loop without manually rewriting the review comment as an agent plan.", + "evidence_boundary": "Public-safe pattern case only; no private screenshots, raw gated issue bodies, internal review notes, local paths, raw logs, credentials, or unpublished repository artifacts.", + "appendix_surface": { + "reason": "Pattern is durable, but keep outside canonical cards until a public issue-fix PR packet and focused smoke are curated together.", + "public_surface": "appendix_only", + "links": [ + "docs/showcases/cases/0624-pr-issue-auto-fix.md" + ] + } + }, + { + "id": "2026-06-27-overnight-pr-batch", + "date": "2026-06-27", + "title": "Overnight PR batch with reviewable control", + "status": "public_evidence_case", + "case_page": "docs/showcases/cases/0627-overnight-pr-batch.md", + "demo_command": null, + "domain": "agent-platform-self-improvement", + "audience": [ + "operator", + "agent-platform-developer", + "technical-lead" + ], + "pattern_tags": [ + "high_throughput_reviewable_work", + "pr_sized_slices", + "self_merge_policy", + "validation_writeback", + "public_boundary" + ], + "headline": "An overnight LoopX run can produce many PR-sized slices while keeping review, validation, and public evidence boundaries visible.", + "problem": "High-throughput autonomous work is only useful if the resulting changes remain reviewable, validated, and safe to publish.", + "loopx_behavior": [ + "keep work broken into PR-sized slices instead of a giant unreviewable diff", + "tie runtime, docs, and focused smoke updates together when a control-plane contract changes", + "limit self-merge to narrow validated changes while preserving broader review gates", + "record public evidence from Git history instead of raw agent logs or private screenshots", + "keep public/private boundary checks in the showcase path" + ], + "user_value": "The operator can wake up to a compact batch of merged public slices, see what changed, and still trust that gates, validation, and evidence boundaries were not bypassed.", + "workload_signal": { + "scope": "public_repository_window", + "window": { + "from": "2026-06-27T01:29:00+08:00", + "to": "2026-06-27T11:29:00+08:00", + "hours": 10 + }, + "public_git": { + "merged_commits": 22, + "files_touched": 60, + "insertions": 6695, + "deletions": 223, + "commit_messages_with_pr_numbers": 10 + }, + "claim_boundary": "public Git history only; contemporaneous private queue notes are not used as public evidence" + }, + "evidence_boundary": "Public Git evidence only; no private documents, internal screenshots, raw chats, local active-state bodies, raw logs, credentials, or machine-specific paths.", + "appendix_surface": { + "reason": "Public Git evidence case for an overnight throughput window; keep outside the first canonical frontstage cards until each PR slice has a deeper public evidence packet or reproducible demo.", + "public_surface": "appendix_only", + "links": [ + "docs/showcases/cases/0627-overnight-pr-batch.md" + ] + } + }, { "id": "2026-06-20-creator-operator-case-spec", "date": "2026-06-20",