Skip to content

Draft: feat: add /awos:regression command#114

Open
FlySpot wants to merge 5 commits into
mainfrom
feat/regression
Open

Draft: feat: add /awos:regression command#114
FlySpot wants to merge 5 commits into
mainfrom
feat/regression

Conversation

@FlySpot

@FlySpot FlySpot commented May 7, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • /awos:regression — new command for managing the long-term regression suite. After a feature's Testing & Regression slice is complete, it: extracts test candidates from annotated test files (@spec, @regression), deduplicates against the existing suite, asks the user to confirm the selection, updates context/qa/regression-suite.md, optionally runs the suite, and generates a dated report at context/qa/regression-reports/regression-YYYY-MM-DD-[spec].md.
  • templates/regression-suite-template.md — starter template for context/qa/regression-suite.md, used on first run if the file doesn't exist.
  • commands/tasks.md — uncomments the /awos:regression sub-task inside the Feature Testing & Regression slice (was commented out in PR feat: Feature Testing & Regression slice, QA audit command, and verification hardening #109 pending this merge).

Merge order

Merge after PR #109. This PR's commands/tasks.md includes all changes from #109 plus the enabled regression sub-task.

File map

commands/regression.md          ← /awos:regression command
claude/commands/regression.md   ← thin Claude Code wrapper
templates/
  regression-suite-template.md  ← regression suite scaffold
commands/tasks.md               ← /awos:regression sub-task enabled

Test plan

  • Complete all implementation slices + Feature Testing & Regression slice on a spec
  • Run /awos:regression [spec-name] — confirm candidates are extracted from annotated test files
  • Confirm deduplication works against an existing regression-suite.md
  • Confirm regression-suite.md is updated after user approval
  • Confirm dated report is generated in context/qa/regression-reports/
  • Run without argument — confirm auto-detection finds the completed spec

🤖 Generated with Claude Code

FlySpot and others added 2 commits May 7, 2026 20:15
…on-suite template

Introduces the Regression Suite Manager command that promotes feature tests
to the long-term regression suite, deduplicates entries, optionally runs the
suite, and generates a dated report.

Note: commands/tasks.md reference to /awos:regression is commented out in
feat/qa-pyramid-agent — uncomment after that branch merges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… Regression slice

Now that the regression command is in this branch, the sub-task calling
/awos:regression is active. Merges after feat/qa-pyramid-agent.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@FlySpot FlySpot requested a review from kmakarychev-dev May 7, 2026 18:25
@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown
📝 Walkthrough

Walkthrough

Adds a regression-suite manager: new /awos:regression command, a regression-suite template, candidate extraction/deduplication with user confirmation, optional test execution and dated reporting, and task workflow updates enforcing artifact cleanup and the regression slice template.

Changes

Regression Suite Manager Feature

Layer / File(s) Summary
Command Declaration
claude/commands/regression.md
/awos:regression command metadata with directives to use AskUserQuestion for user interactions and reference canonical instructions.
Regression Suite Template
templates/regression-suite-template.md
Template defines header metadata (Last updated, total test count), spec subsection placeholders, fixed table schemas for Unit/Integration/E2E/Contract, and allowed Status/Polarity values.
Regression Suite Manager Procedure
commands/regression.md
Procedure: inputs/outputs, spec selection (explicit or auto-detect), candidate extraction (prefer @spec+@regression, fallback to tasks.md), duplicate detection/classification (DUPLICATE/EXTEND/NEW), user confirmation, suite update rules (add/extend/skip, refresh totals/Last updated), optional test-run with runner detection, result collection, and dated report generation; includes operational constraints (never write tests, never auto-delete entries, require confirmation, mark missing paths as pending discovery).
Vertical Slice Integration
commands/tasks.md
Require artifact deletion after verification for slices (except Feature Testing & Regression), add per-slice cleanup sub-task, enforce exact Feature Testing & Regression slice template (including testing-expert and /awos:regression [spec-directory-name] usage), and update example slices accordingly.

Sequence Diagram

sequenceDiagram
  participant User
  participant Command as /awos:regression
  participant TaskFile as tasks.md
  participant SuiteFile as regression-suite.md
  participant Runner as Test Runner
  participant Report as Regression Report

  User->>Command: Invoke /awos:regression [spec]
  Command->>TaskFile: Extract test candidates (annotations or fallback)
  Command->>SuiteFile: Detect existing entries (duplicates/extensions)
  Command->>User: Confirm changes (proceed/review/cancel)
  User->>Command: Approve updates
  Command->>SuiteFile: Update suite (add/merge/skip entries)
  Command->>SuiteFile: Update metadata (Last updated, totals)
  Command->>User: Ask to run tests
  User->>Command: Confirm or skip execution
  Command->>Runner: Execute tests (full or new-only)
  Runner->>Report: Generate dated report (results + recommendations)
  Report->>User: Return report
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 A regression suite is born today,
with @spec tags showing the way,
duplicates caught, extensions penned,
reports dated, fixes to append,
cleanup tidy — the meadow hops along!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title accurately describes the main change: introducing a new /awos:regression command with supporting documentation and templates.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/regression

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@commands/regression.md`:
- Around line 55-66: The guidance in Step 2 and the later promotion step
contradicts: ensure the command prioritizes annotated test files first and only
uses context/spec/[target-spec]/tasks.md when no `@spec/`@regression annotations
are found; specifically modify the promotion logic referenced at "Line 230" to
conditionally promote entries from tasks.md only when the initial annotated-file
search returns zero candidates, and update the related text to state this
explicit fallback behavior and that promoted entries are marked "pending
discovery."
- Around line 69-75: The fenced code blocks in the document (notably the
Markdown table and other sample blocks referenced in the review) are missing
language identifiers; update each opening fence to include a language tag (e.g.,
change ``` to ```markdown) for the table block and the other fenced blocks
starting around the sample sections so the linter MD040 is satisfied, ensuring
every triple-backtick has a language specifier.

In `@commands/tasks.md`:
- Line 130: The bold/escape is malformed in the regression example text: replace
the accidental escaped closing bold marker `agent.\*\*` with a proper closing
bold marker so the sentence reads using normal Markdown bolding, e.g. `>
**Requires \`testing-expert\` agent.** If necessary, ensure the surrounding
backticks around testing-expert remain and both opening and closing `**` are
present to render the emphasis correctly (locate the string that starts with `>
**Requires` in the tasks example).
- Around line 72-74: The markdown fenced code blocks that contain checklist
items (e.g., the block starting with "- [ ] Cleanup: Delete any screenshots,
videos, or e2e scripts generated during this slice's verification. **[Agent:
general-purpose]**" and the subsequent fenced block covering lines 83–103) lack
language identifiers and trigger MD040; update the opening fences to include an
explicit language (for example, use ```markdown) for both fenced blocks so
linting passes.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: bb87aaa5-ee6f-4fba-93c9-c2cc1f8eb8cc

📥 Commits

Reviewing files that changed from the base of the PR and between b749393 and 544797a.

📒 Files selected for processing (2)
  • commands/regression.md
  • commands/tasks.md

Comment thread commands/regression.md Outdated
Comment thread commands/regression.md Outdated
Comment thread commands/tasks.md Outdated
Comment thread commands/tasks.md
FlySpot and others added 2 commits May 11, 2026 11:19
- Add language tags to fenced code blocks in regression.md and tasks.md (MD040)
- Fix malformed bold escape in tasks.md example (agent.\*\* → agent.**)
- Clarify Step 2 fallback priority in regression.md: annotated files first,
  tasks.md only when zero annotations found

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
commands/regression.md (1)

230-230: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Unify candidate-source rule with Step 2 fallback logic

Line 230 contradicts Step 2: the flow says “annotations first, tasks.md only as fallback,” but this line says promotion is only from tasks.md. This can cause incorrect command behavior.

Suggested wording fix
-- Never write new tests — only promote existing ones from tasks.md.
+- Never write new tests — only promote existing tests discovered from annotated test files (`@spec` + `@regression`), or from `tasks.md` only when annotation discovery returns zero candidates.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@commands/regression.md` at line 230, Update the candidate-source rule text so
it matches Step 2's fallback logic: replace the sentence "Never write new tests
— only promote existing ones from tasks.md." with a unified rule that prefers
annotations as the primary source for test candidates and only uses tasks.md as
a fallback; ensure references to "candidate-source", "Step 2", "annotations",
and "tasks.md" are consistent so the doc clearly states "use annotations first;
if no annotations exist, promote existing tests from tasks.md."
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@commands/regression.md`:
- Line 27: Update the wording so the "Empty = auto-detect the most recently
completed spec (all tasks ✅, Status Completed)" promise matches Step 1 behavior:
either add a deterministic recency rule to Step 1 (e.g., "When multiple
completed specs exist, choose the one with the latest completed_at timestamp")
or change the Line 27 phrase to a relaxed form like "auto-detect a recently
completed spec" and apply the same change to the similar block covering lines
44-51; reference the "Empty = auto-detect..." line and the Step 1 selection
description when making the edit.

---

Duplicate comments:
In `@commands/regression.md`:
- Line 230: Update the candidate-source rule text so it matches Step 2's
fallback logic: replace the sentence "Never write new tests — only promote
existing ones from tasks.md." with a unified rule that prefers annotations as
the primary source for test candidates and only uses tasks.md as a fallback;
ensure references to "candidate-source", "Step 2", "annotations", and "tasks.md"
are consistent so the doc clearly states "use annotations first; if no
annotations exist, promote existing tests from tasks.md."
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a1ae317f-75c3-4d05-842e-97517e69e569

📥 Commits

Reviewing files that changed from the base of the PR and between 544797a and 285bdba.

📒 Files selected for processing (2)
  • commands/regression.md
  • commands/tasks.md
🚧 Files skipped from review as they are similar to previous changes (1)
  • commands/tasks.md

Comment thread commands/regression.md
# INPUTS & OUTPUTS

- **User Prompt (Optional):** <user_prompt>$ARGUMENTS</user_prompt>
- Empty = auto-detect the most recently completed spec (all tasks ✅, Status Completed)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Align “most recently completed spec” promise with Step 1 selection behavior

Line 27 promises automatic selection of the most recently completed spec, but Step 1 currently asks the user to choose when multiple candidates exist and doesn’t define recency sorting. Please either define a deterministic recency rule in Step 1 or relax the wording in Line 27.

Suggested wording fix (minimal)
-- Empty = auto-detect the most recently completed spec (all tasks ✅, Status Completed)
+- Empty = auto-detect a completed spec candidate (all tasks ✅, Status Completed); if multiple are found, ask the user to choose

Also applies to: 44-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@commands/regression.md` at line 27, Update the wording so the "Empty =
auto-detect the most recently completed spec (all tasks ✅, Status Completed)"
promise matches Step 1 behavior: either add a deterministic recency rule to Step
1 (e.g., "When multiple completed specs exist, choose the one with the latest
completed_at timestamp") or change the Line 27 phrase to a relaxed form like
"auto-detect a recently completed spec" and apply the same change to the similar
block covering lines 44-51; reference the "Empty = auto-detect..." line and the
Step 1 selection description when making the edit.

@FlySpot FlySpot changed the title Feat/regression feat: add /awos:regression command May 11, 2026
Comment thread commands/tasks.md
Comment on lines +127 to +130
- `[ ] **Slice 3: Feature Testing & Regression**`
- `> Verifies the complete feature works end-to-end as described in functional-spec.md.`
- `> Run AFTER all implementation slices are complete.`
- `> **Requires \`testing-expert\` agent.\*\* If it is not present in \`.claude/agents/\`, stop and run \`/awos:hire\` before executing this slice.`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slice is something that covers some functionality implementation. Something that can be commited. It feels like "regression" command should be similar to "verify". What do you think?

Comment thread commands/regression.md
# INPUTS & OUTPUTS

- **User Prompt (Optional):** <user_prompt>$ARGUMENTS</user_prompt>
- Empty = auto-detect the most recently completed spec (all tasks ✅, Status Completed)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get most recent spec from the diff or lookup the commits -- which feature was added most recently

Comment thread commands/regression.md
## Step 1: Identify target spec

1. Read `<user_prompt>`. If it names a spec, use that directory.
2. If empty, scan `context/spec/*/tasks.md` files. Find the spec where:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest to lookup log on context/spec, find most recent commit and search what exact feature was added there

Comment thread commands/regression.md
- **File** — the test file path (already known from the search)
- **Test Name** — the test function name

**Fallback (only if primary source returns zero results):** Read `context/spec/[target-spec]/tasks.md`. Find the "Feature Testing & Regression" slice and list each `**[Agent: testing-expert]**` sub-task as a single candidate entry, marking Layer/Behavior/Polarity as "pending discovery". Inform the user that annotations were not found and entries are marked for future discovery.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just confused that we wire command on the single agent to use. What if we use this command on the brownfield with existing tests? OR, in some particular project test are written by client-side specialists? With this pipeline, we have no value for that cases. Isn't that reliable to map existing feature specs on the test to detect which are primary for regression?

Comment thread commands/regression.md

### Integration

...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that could confuse llm what kind of information is expected here, it's better to provide template even if it's copy-paste

Comment thread commands/regression.md

| File | Test Name | Behavior | Polarity | Status | Notes |
| ------------------ | ------------------ | ------------------------------ | -------- | ------ | ----- |
| tests/test_auth.py | test_token_payload | token payload, expiry, signing | positive | OK | — |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exactly can came to Notes? what is the value of this field?

Comment thread commands/regression.md
3. If user chooses to run:
- Detect test runner: check for `docker-compose.yml`, `Makefile` (with `test` target), `package.json` (`test` script), `pytest.ini` / `pyproject.toml`, `justfile`.
- If runner found: spin up infrastructure if needed, run the selected tests, capture output.
- If NO runner found: inform user — "No test runner detected. Tests are saved in regression-suite.md. Run them manually using your project's test command." Proceed to Step 7.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't that makes sense to ask user how tests are supposed to run, and if user provides that information, store it somewhere in MD, if user explicitly answers that tests run manually, store that also to avoid detection loop further

Comment thread commands/tasks.md
8. For each slice's verification sub-task, identify required MCPs/services (browser MCP, curl, database access, etc.) and note any that may be missing.
5. After the verification sub-task, add a cleanup sub-task as the last item of the slice:
```md
- [ ] Cleanup: Delete any screenshots, videos, or e2e scripts generated during this slice's verification. **[Agent: general-purpose]**

@dustyo-O dustyo-O May 21, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be likely lost after compact on one hand and have some value on debugging on another, consider creating gitignored folder for the artifact on the first run

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"e2e scripts" is a little bit scary broad definition

FlySpot added a commit that referenced this pull request May 25, 2026
- hire.md: restructure complementary pairs to search-first pattern with
  testing-expert as fallback; remove duplicate Python entry; replace
  playwright MCP with playwright CLI
- qa.md: update description to "QA health check"; add user confirmation
  for full-scope audit; make architecture.md required with warning;
  rewrite Step 3 to reflect list-of-tests.md is maintained by
  testing-expert, not created by /awos:qa; implement risk-based gap
  analysis in Step 5 (two-pass: project-level + per-AC); implement
  coordinator pattern in Step 6 (specialist agent writes, testing-expert
  validates and updates registry); add guard for missing functional-spec;
  remove regression suite steps (moved to PR #114); add staleness and
  delta-coverage notes to TODO

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
FlySpot added a commit that referenced this pull request May 25, 2026
The command had no active Claude Code wrapper and its functionality
is covered by two dedicated commands:
- testing-expert agent (proactive test writing via /awos:tasks)
- /awos:regression (regression suite management, PR #114)

Retroactive staleness/gap auditing can be revisited as a plugin
if needed in the future.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
AlexanderMakarov pushed a commit that referenced this pull request May 29, 2026
Two cleanups to the emitted slice, from testing the branch on a real spec:

- Collapse the three blockquote note lines into one description. The
  "QA agent for this slice: `{agent}` (selected from .claude/agents/ and
  the Agent tool's description block)" line leaked the internal selection
  rationale into the generated artifact — the `**[Agent: ...]**` markers
  already say who runs the slice.
- Drop the `<!-- TODO: enable when feat/regression merges -->` block.
  Its trigger lives in a different repo (the feat/regression PR), so it
  can't be tracked from here and just sits as dead commentary in every
  user's tasks.md. The `/awos:regression` wiring belongs in PR #114,
  which owns that command.
FlySpot added a commit that referenced this pull request Jun 2, 2026
…ication hardening (#109)

* feat: add testing-expert agent for test pyramid generation

* fix: address code quality issues in testing-expert agent

* feat: extend /awos:tasks to generate test pyramid tasks per vertical slice

* fix: split positive/negative test examples into separate tasks in tasks.md

* feat: add /awos:qa optional full-audit command

* fix: restore testing-expert cross-reference and improve qa.md clarity

* feat: add QA context templates for test registry and regression suite

* docs: add QA pyramid agent implementation plan

* chore: remove unnecessary testing-expert command wrapper — agent is internal-only

* docs: remove testing-expert wrapper entry from plan file table

* fix: resolve contradictions in testing-expert and qa commands

* docs(qa): add TODO section with known limitations

Documents three open gaps discovered during end-to-end testing:
ephemeral E2E artifacts, coverage-by-inspection vs measurement,
and no regression baseline/enforcement mechanism.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Delete docs/superpowers/plans/2026-04-07-qa-pyramid-agent.md

* chore: fix prettier formatting across markdown files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tasks): sequential numbering, opt-out tests, ignore docs/superpowers

- Renumber thought process steps sequentially (3b → 5, shift 5-8 → 6-9)
- Make test generation opt-out instead of REQUIRED
- Add docs/superpowers/ to .gitignore and untrack pushed files

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tasks): sequential numbering and opt-out test generation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(testing-expert): move from commands/ to plugins/awos/agents/

* fix(tasks): replace vague 'planning mode' with explicit Agent tool invocation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tasks): clarify Agent tool context-passing and move opt-out check first

* fix(tasks): rebalance examples to show layer judgment and consistent test coverage

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(testing-expert): replace caller-based mode detection with condition-based

* fix(tasks): add missing verify sub-tasks to JWT and avatar Slice 1 examples

* fix(qa): remove stale testing-expert cross-reference, clarify e2e-tester, add regression-suite template init

* fix: clean up garbled e2e-tester TODO and rename mode headings to match condition-based detection

* fix: update stale 'execution mode' reference and restore step 4 formatting

* fix(testing-expert): remove remaining caller-specific references from role description and output format

* fix(tasks): use Task tool instead of Agent tool for testing-expert invocation

* fix(installer): deploy testing-expert agent to .claude/agents/ during setup

* fix(tasks): move Verify step after test sub-tasks in Slice 2 avatar example

* fix(testing-expert): normalize positive/negative suffix format across all pyramid layers

* fix(template): clarify test registry maintenance attribution

* fix: delegate qa gap tests to testing-expert, fix code fence, update installer docs

- commands/qa.md: Step 6 now delegates to testing-expert via Task tool instead of writing tests inline; update TODO to reflect current architecture
- commands/tasks.md: fix malformed single-backtick example block → proper code fence
- plugins/awos/agents/testing-expert.md: Step 7 uses generic "caller" instead of hardcoded /awos:implement; add explicit completion signal on no-gap path
- src/CLAUDE.md: add plugins/awos/agents/ → .claude/agents/ row to copy table

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tasks): replace per-slice testing with Feature Testing & Regression final slice

* feat(qa): remove /awos:qa slash command — preserved as future plugin reference

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(regression): add /awos:regression command with dedup, confirmation, run, and report

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(regression): fix extraction logic, fully-processed check, and constraints wording

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(regression): add claude/commands wrapper for /awos:regression

* fix(regression): strip wrapper to standard pattern — remove prose and double-tagged $ARGUMENTS

* fix(installer): remove testing-expert auto-deploy — now hired via awos-recruitment

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(verify): enforce mandatory Step 3 with fallback chain — prevent silent skips

* feat(template): update regression-suite with layered format; fix stale qa-context attribution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(testing-expert): remove from awos core — now lives in awos-recruitment registry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(hire): add QA complement rule — auto-suggest testing-expert alongside tech agents

* fix(audit): update SDD-07 to recognize Feature Testing & Regression slice model

Check item 7 now handles both AWOS 2.x (single final QA slice) and the
legacy per-slice QA assignment model, preventing false WARN/FAIL on
projects that use the new tasks format.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore(tasks): remove regression files, comment out /awos:regression sub-task

Regression command moves to feat/regression branch.
/awos:regression sub-task in Feature Testing & Regression slice is commented
out with TODO — uncomment when feat/regression merges.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tasks): add artifact cleanup sub-task after each implementation slice

After a slice's verification, agents now delete temporary artifacts
(screenshots, videos, e2e scripts) generated by e2e-tester or browser MCP.
The Feature Testing & Regression slice is explicitly excluded — its artifacts
are retained for the regression suite.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: fix prettier formatting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address CodeRabbit review comments

- Add language tags to fenced code blocks in tasks.md and qa.md (MD040)
- Add plugins/awos/agents → .claude/agents copy operation to setup-config.js
  to match src/CLAUDE.md documentation

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: fix prettier formatting

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: address review comments on hire.md and qa.md

- hire.md: restructure complementary pairs to search-first pattern with
  testing-expert as fallback; remove duplicate Python entry; replace
  playwright MCP with playwright CLI
- qa.md: update description to "QA health check"; add user confirmation
  for full-scope audit; make architecture.md required with warning;
  rewrite Step 3 to reflect list-of-tests.md is maintained by
  testing-expert, not created by /awos:qa; implement risk-based gap
  analysis in Step 5 (two-pass: project-level + per-AC); implement
  coordinator pattern in Step 6 (specialist agent writes, testing-expert
  validates and updates registry); add guard for missing functional-spec;
  remove regression suite steps (moved to PR #114); add staleness and
  delta-coverage notes to TODO

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* remove: delete commands/qa.md

The command had no active Claude Code wrapper and its functionality
is covered by two dedicated commands:
- testing-expert agent (proactive test writing via /awos:tasks)
- /awos:regression (regression suite management, PR #114)

Retroactive staleness/gap auditing can be revisited as a plugin
if needed in the future.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* feat(tasks): add --no-tests / skip tests flag to suppress verification and testing slice

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* chore: fix prettier formatting in tasks.md

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix: resolve 3 consistency issues found in PR review

- hire.md line 109: playwright MCP → playwright CLI (matches line 116
  and global CLAUDE.md rule; MCP vs CLI was a contradiction)
- hire.md line 120: complete the Terraform/IaC entry — was truncated
  with no agent reference
- tasks.md example: replace "chrome MCP" with playwright-cli phrasing
  (legacy example text, inconsistent with playwright-cli convention)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* fix(tasks): trim Feature Testing & Regression slice noise

Two cleanups to the emitted slice, from testing the branch on a real spec:

- Collapse the three blockquote note lines into one description. The
  "QA agent for this slice: `{agent}` (selected from .claude/agents/ and
  the Agent tool's description block)" line leaked the internal selection
  rationale into the generated artifact — the `**[Agent: ...]**` markers
  already say who runs the slice.
- Drop the `<!-- TODO: enable when feat/regression merges -->` block.
  Its trigger lives in a different repo (the feat/regression PR), so it
  can't be tracked from here and just sits as dead commentary in every
  user's tasks.md. The `/awos:regression` wiring belongs in PR #114,
  which owns that command.

* fix(verify): require real UI rendering + screenshots in docs/screenshots/

Reframing /awos:verify as "look-and-feel" wasn't enough — on a UI-heavy
spec the agent satisfied it with in-process component tests (NiceGUI
test-client / pytest) and never rendered the UI, so no visual evidence
was produced.

- Visual/UI acceptance criteria now MUST be verified by driving the
  actual running UI through the project's browser-automation tool
  (Playwright MCP/CLI, Cypress, chrome MCP, …). A passing component or
  test-client test confirms logic, not look-and-feel, and no longer
  counts as evidence for a visual criterion. Non-visual criteria keep
  the pick-by-fit freedom.
- Screenshots are saved to `docs/screenshots/` — the same evidence
  folder the testing-expert agent (awos-recruitment) writes E2E captures
  to — named `<spec-directory>-<state>.png` so they sort by spec. The
  browser tool creates the folder on first write; verify does NOT edit
  .gitignore (git-ignoring docs/screenshots/ is one-time project setup),
  matching testing-expert's scope guarantee. The report lists the paths.
- skip-tests still suppresses test suites but not look-and-feel.

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Aleksandr Makarov <amakarov@provectus.com>
@FlySpot FlySpot changed the title feat: add /awos:regression command Draft: feat: add /awos:regression command Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants