Skip to content

feat(spec): raise completeness and quality of /awos:spec output#140

Open
FlySpot wants to merge 2 commits into
mainfrom
spec-command-optimization
Open

feat(spec): raise completeness and quality of /awos:spec output#140
FlySpot wants to merge 2 commits into
mainfrom
spec-command-optimization

Conversation

@FlySpot

@FlySpot FlySpot commented Jun 25, 2026

Copy link
Copy Markdown
Collaborator

What

Applies a set of targeted improvements to `/awos:spec` that close five quality gaps identified in the optimization plan. All changes were validated end-to-end in the awos-qa fork before being ported here.

Changes to `commands/spec.md`

Step 3 — Functional Requirements & Acceptance Criteria (Points 2)

  • New bullet: "Where applicable, capture the boundary and error behavior the user sees: what error message appears, what limits exist (file size, format, count), and what happens when the action fails."
  • New bullet: "For requirements that capture boundary or error behavior, include at least one acceptance criterion covering the failure path."

Step 4 — Self-Review (Point 4)

  • Renamed to "Self-Review (Language and Ambiguity Check)"
  • Added a second pass that scans §Functional Requirements for vague or unmeasurable wording (weasel words like "fast", "user-friendly", "as appropriate") and requires each to be made concrete in user-perceivable terms or converted to a [NEEDS CLARIFICATION: …] marker.

Step 5 — Specification Definition of Done (Points 1, 5, 7) — new step

  • Pre-write self-check (not a gate — file is always written):
    1. No vague wording remains in requirements or acceptance criteria.
    2. Every requirement has at least one When/Then acceptance criterion.
    3. No unresolved [NEEDS CLARIFICATION: …] markers. In unattended runs, each marker is replaced with a visible **Assumption:** … line so open decisions are recorded rather than hidden.
  • Old Step 5 (Final Review) → Step 6; old Step 6 (File Generation) → Step 7.

New file: `docs/optimization/spec-command-optimization.md`

Documents the problem statement, fix direction, and placement for all seven optimization points, including the rationale for carving out Points 3 (non-functional requirements) and 6 (relative-importance signal) as separate cross-command tasks.

Why

Before this change:

  • Error and boundary conditions appeared only in an example question, not as a rule — requirements routinely omitted failure paths.
  • The self-review caught technical language leakage but not vague, untestable wording like "user-friendly" or "as appropriate".
  • No pre-write gate verified that every requirement had a criterion, or that all [NEEDS CLARIFICATION] markers were resolved.

Out of scope (separate tasks)

  • Point 3 — Non-functional requirements with measurable metrics. These belong in `technical-considerations.md`; adding them to the functional spec breaks the what/how boundary. Requires coordinated changes to `tech.md` and `technical-considerations-template.md`.
  • Point 6 — Per-requirement importance signal. No current consumer exists in `tasks.md`, `tech.md`, or `verify.md`; adding the signal without consumption is dead state. Requires coordinated changes to `functional-spec-template.md` and `tasks.md`.

Tests

Three new lint checks in `tests/lint-prompts.test.js` pin the contracts so they cannot be silently dropped:

  • Definition of Done checklist presence (anchors: Definition of Done, **Assumption:**, at least one acceptance criterion, No vague wording remains)
  • Ambiguity self-review presence (vague or unmeasurable wording, "user-friendly", concrete in user-perceivable terms)
  • Boundary/error rules presence (boundary and error behavior the user sees, at least one acceptance criterion covering the failure path)

Behavioral coverage (no residual markers in produced `functional-spec.md`, every requirement carries a criterion, no unverifiable wording) is in the awos-qa repository.

Summary by CodeRabbit

  • Documentation

    • Updated the spec-writing guide to better capture user-facing edge cases, failure behavior, and clearer acceptance criteria.
    • Added a new definition-of-done checklist and a stronger self-review step for ambiguous or overly technical wording.
    • Introduced a new optimization guide outlining quality checks for generated specifications.
  • Tests

    • Added lint checks to ensure the updated spec guidance is present and followed, including clarity, boundary cases, and complete acceptance criteria.

FlySpot and others added 2 commits June 25, 2026 14:00
Applies all spec-command-optimization changes validated in the awos-qa
fork (PRs #1#3 on feat/spec-command-optimization):

- Step 3: add boundary/error elicitation rule (item 2) and require at
  least one failure-path AC for boundary requirements (item 3)
- Step 4: extend self-review to catch vague/unmeasurable wording in
  requirements and acceptance criteria, with concrete weasel-word
  examples and the user-perceivable-terms resolution rule
- Step 5 (new): Specification Definition of Done — pre-write checklist
  that gates on no vague wording, every requirement having ≥1 AC, and
  no unresolved [NEEDS CLARIFICATION] markers; unattended runs replace
  markers with **Assumption:** lines and still save the file
- Steps 5→6, 6→7: renumber Final Review and File Generation to
  accommodate the new DoD step
- tests: add three lint checks pinning the DoD checklist, ambiguity
  self-review, and boundary/error rules so contracts cannot be silently
  dropped

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Records the problem statement, fix direction, placement, and scope
boundary for each of the seven optimization points (Points 3 and 6
explicitly carved out as requiring coordinated changes across other
commands).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 25, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: c22cacee-d0fd-40e7-93ed-e1dba140258a

📥 Commits

Reviewing files that changed from the base of the PR and between 7df798b and 9f8d428.

📒 Files selected for processing (3)
  • commands/spec.md
  • docs/optimization/spec-command-optimization.md
  • tests/lint-prompts.test.js

📝 Walkthrough

Walkthrough

Updates /awos:spec guidance to require boundary/error behavior in functional requirements, failure-path acceptance criteria, a language-and-ambiguity self-review, and a Definition of Done that resolves clarification markers. It also adds a companion optimization document and lint tests for the updated prompt text.

Changes

Spec Prompt Guidance

Layer / File(s) Summary
Optimization spec draft
docs/optimization/spec-command-optimization.md
A new optimization document lists requirement-generation changes for clarification markers, boundary/error coverage, ambiguity review, acceptance criteria, and a consolidated Definition of Done.
Boundary and failure-path rules
commands/spec.md, tests/lint-prompts.test.js
commands/spec.md adds user-visible boundary and error behavior to functional requirements and requires failure-path acceptance criteria; the lint prompt test checks both rules.
Ambiguity check and Definition of Done
commands/spec.md, tests/lint-prompts.test.js
commands/spec.md replaces the self-review step with language and ambiguity checks, adds a Definition of Done step for wording, acceptance criteria, and clarification markers, and renumbers later workflow steps; the lint prompt test checks the updated prompts.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

minor

Suggested reviewers

  • workshur
  • AlexanderMakarov

Poem

Hop, hop — I nibbled the vague bits away,
Added boundary signs for the error-day.
One DoD checklist, neat as a row,
With clear little carrots for specs to show,
Hooray, says Bunny, “Now the prompts glow!” 🐇

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the PR’s main goal of improving /awos:spec output completeness and quality.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch spec-command-optimization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@FlySpot FlySpot requested a review from AlexanderMakarov June 25, 2026 11:06

@AlexanderMakarov AlexanderMakarov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tightens /awos:spec in sensible places: it makes boundary/error behavior an explicit elicitation rule rather than a buried example, requires a failure-path acceptance criterion for those requirements, adds a weasel-word pass to the self-review, and closes the loop with a pre-write Definition of Done that sweeps [NEEDS CLARIFICATION] markers. The prompt edits read clean — the Step 4↔Step 5 cross-references line up, the marker-sweep "runs last so it also catches markers from items 1 and 2" ordering is thought through, the old-Step-5/6 → 6/7 renumbering is applied consistently, and the wording stays plain-declarative (one bold emphasis per rule) per the repo's emphasis convention. Nothing in the actual command blocks merge.

What is docs/optimization/spec-command-optimization.md for? It's a new pattern (no docs/optimization/ today) and it isn't shipped by the installer, so this is purely about the repo's own docs. After merge, five of its seven points (1, 2, 4, 5, 7) are implemented — the doc then describes problems that no longer exist, in the present tense. The repo's own rule is "docs describe the present, not the history." The genuinely live content is Points 3 and 6 (the deferred cross-command work). Worth deciding: keep it as a durable record (then it wants section/step references instead of line numbers, and past-tense problem statements), or trim it to the deferred tasks and let the PR description carry the rest? Your call — I just don't want it to rot silently.

'commands/spec.md Definition of Done must tell unattended runs to replace each unresolved [NEEDS CLARIFICATION] marker with a visible **Assumption:** and still save the file'
);
assert.ok(
/at least one acceptance criterion/i.test(body),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is meant to pin DoD item 2 ("Every requirement has at least one acceptance criterion"), but /at least one acceptance criterion/i also matches the Step 3 failure-path rule you added at spec.md:99 ("include at least one acceptance criterion covering the failure path"). So if someone deleted the DoD completeness gate but left the Step 3 rule, this stays green — the item it names isn't actually covered. The other three assertions in this test (Definition of Done, No vague wording remains, **Assumption:**) are unique, so the section as a whole is pinned; it's just this one item that slips through. Anchoring on the unique bolded lead-in would fix it:

assert.ok(
  body.includes('Every requirement has at least one acceptance criterion'),
  '...'
);

'commands/spec.md self-review must name concrete weasel-word examples (e.g. "user-friendly") so the model knows what to hunt'
);
assert.ok(
/concrete in user-perceivable terms/i.test(body),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, optional — same flavor as the DoD one but not load-bearing: /concrete in user-perceivable terms/i matches both the Step 4 pass (spec.md:111) and DoD item 1 (spec.md:117), so on its own it wouldn't catch Step 4 being removed. The sibling assertion /vague or unmeasurable wording/i is unique to Step 4, so the test still does its job. Leave it or tighten to a Step-4-only phrase (e.g. the search results appear within 2 seconds example) — your preference.


## Point 1 — Unresolved `[NEEDS CLARIFICATION]` markers reaching the saved file

- **Problem:** The marker is only required to be closed on the parent requirement before acceptance criteria are written — `commands/spec.md:96`: _"If any `[NEEDS CLARIFICATION: …]` markers remain on the parent requirement in §Functional Requirements, ask clarifying questions and resolve the markers before writing acceptance criteria."_ Neither the self-review (Step 4, line 108) nor file generation (Step 6, lines 114–119) blocks saving a file with markers still open in other sections. The marker is introduced at `commands/spec.md:90`: _"mark every unresolved detail with `[NEEDS CLARIFICATION: your specific question]`"_.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commands/spec.md:NN citations here are accurate against main as it stands, but this PR's own edits shift every one of them down (~+2 lines) and renumber the steps — so on merge they point at the wrong lines, and "file generation (Step 6, lines 114–119)" names a step that's now Step 7. Same root shows in Point 4 (line 30): "the self-review catches only leakage … weasel words [are] not present in spec.md today" reads in the present tense, but this PR is exactly what makes that false. If the doc stays, citing by section/step name and writing the problem statements in past tense ("before this change…") keeps it from contradicting the file it ships with.

Comment thread commands/spec.md
### Step 4: Self-Review (Language and Ambiguity Check)

- Before presenting to the user, re-read the entire draft end-to-end. For every sentence, ask: "Would this make sense to someone who has never seen the codebase?" Replace any developer-facing language with plain, non-technical wording in the same language the user is using. Remove any references to internal system behavior, code, or architecture that slipped in.
- Then re-read §Functional Requirements — both the requirement statements and their acceptance criteria — for vague or unmeasurable wording: words like "fast", "user-friendly", or "as appropriate" that a tester could not verify. Make each one concrete in user-perceivable terms (e.g., "the search feels fast" becomes "search results appear within 2 seconds"), or, if the user has not decided the specific value yet, replace it with a `[NEEDS CLARIFICATION: …]` marker so it is resolved in Step 5. Quantify only where the user would notice the difference — do not force a number onto every sentence; narrative sections like §Overview and Rationale may stay qualitative.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

like §Overview and Rationale may stay qualitative.

Weak reference - I don't understand where name of section ends. The same for re-read §Functional Requirements — both.

Please use [Link Text](#section-name) formatting. Here and below in this file.

Comment thread commands/spec.md

Before the final review and saving the file, check the draft against this list. It is a self-review, not a new approval gate — the file is still written at the end of the process.

1. **No vague wording remains in requirements or acceptance criteria.** Each requirement and criterion is concrete enough that a non-developer can tell unambiguously whether it passes or fails. If a vague term slipped through, resolve it as described in Step 4: make it concrete in user-perceivable terms, or convert it to a `[NEEDS CLARIFICATION: …]` marker — the marker is a temporary hand-off, not a final state; item 3 closes every marker before the file is written.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No vague wording remains in requirements or acceptance criteria.

I guess it is excessive - big part of step 4 was dedicated to this check. If we make LLM to check itself in the same session (the same as asking person "Are you done?") it rarely works - LLM reads from it's own context that everything is completed already and do nothing (as the person answers "Yes" without checking, from memory). So this usually just a waste of tokens.

We may ask it to Grep in the specific file specific word instead - item 3 below.

Comment thread commands/spec.md

1. **No vague wording remains in requirements or acceptance criteria.** Each requirement and criterion is concrete enough that a non-developer can tell unambiguously whether it passes or fails. If a vague term slipped through, resolve it as described in Step 4: make it concrete in user-perceivable terms, or convert it to a `[NEEDS CLARIFICATION: …]` marker — the marker is a temporary hand-off, not a final state; item 3 closes every marker before the file is written.
2. **Every requirement has at least one acceptance criterion.** Confirm that each functional requirement in §Functional Requirements carries at least one acceptance criterion in the When/Then shape (Given optional). If a requirement has none, write one for it before saving.
3. **No unresolved clarification markers.** Scan the whole document for `[NEEDS CLARIFICATION: …]` markers, not only the section you edited last — this check runs last so it also catches markers created by items 1 and 2. In an interactive session, resolve each one with the user and fold the answer into the relevant requirement and its acceptance criteria. In an unattended session where no one can answer, **do not stop**: replace each marker with a visible `**Assumption:** …` line in plain language and save the file anyway, so the open decision is recorded rather than hidden. Example: `The user sees an error message. [NEEDS CLARIFICATION: exact wording?]` becomes `The user sees an error message. **Assumption:** the message reads "Something went wrong — please try again."`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In an interactive session, resolve each one with the user

Please be more precise - like using AskUserQuestion tool with reasonable defaults - it would make LLM understand (and, therefore, implement) task better and be proactive for user (don't invent units for metrics).

@@ -0,0 +1,51 @@
# Spec Command Optimization

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR: better to move this file into separate PR. I don't see how it is used here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants