Functional core, imperative shell: encode as essential principle#27
Open
andrewemark wants to merge 2 commits into
Open
Functional core, imperative shell: encode as essential principle#27andrewemark wants to merge 2 commits into
andrewemark wants to merge 2 commits into
Conversation
…iple Establishes the functional-core / imperative-shell architecture (Bernhardt; also Hexagonal / Ports-and-Adapters) as a load-bearing commitment that flows through every phase of the SDLC, not just a testing preference. The motivating problem: assessment-curated test suites still felt low-value and mock-heavy. Root cause was that the planning skill designed test strategy independently of architecture (test pyramid, coverage targets, "mock at unit boundaries"), which made mock-heavy unit tests structurally inevitable. The fix is to invert the dependency — architecture determines test shape. Changes: - CLAUDE.md: Add the principle as the first Key Principle. Expand in Engineering Practices with how it flows through each phase. Register `kind` (core/shell/both) as a task frontmatter field and require a Core/Shell Decomposition subsection in plan Architecture Fit. - skills/planning-guidance: Add the architectural commitment up front. Restructure Test Strategy to be derived from a required Core/Shell Decomposition subsection. Replace test-pyramid ratios, coverage quotas, and "mock at unit boundaries" guidance with: pure-core unit tests (values in / values out, no mocks) and shell integration tests (real I/O). Remove the "scaffolding test" category — every test is now durable by construction. Add boundary self-review. - skills/implementation-guidance: Add core/shell rules to the sub-agent prompt template. Add "core/shell boundary" as a deviation category — needing a mock for an internal module is a stop-and-surface signal, not a thing to paper over with a mock. - skills/materialize-tasks: Read the plan's Core/Shell Decomposition and propagate per-task classification into task doc frontmatter (`kind`) and a `## Core/Shell` section carrying the rules sub-agents enforce. BLOCK materialization for plans missing the decomposition. - skills/research-guidance: Surface the natural core/shell decomposition for the feature during "How" research. Add Core/Shell Decomposition to the overview template so planning has the seam in hand. - commands/assess: Rewrite PRUNE/KEEP/PROMOTE around mock presence and core/shell membership. Mock-on-internal-module is the strongest PRUNE signal. Reverse the "when uncertain, KEEP" default for mock-heavy tests — promote them so the architecture failure surfaces. Add §FCIS boundary check to code quality review (always runs, independent of whether the codebase has its own standards artifact). - commands/dry-run-plan: Add boundary checks — plans missing the decomposition, pure-core items that aren't pure, internal-module mocks in proposed tests, and test/item classification mismatches all surface as gaps (mostly HIGH severity). - agents/standards-review: Add §FCIS check (Step 2a, always runs) alongside codebase standards (Step 2b, conditional). Standards artifact path becomes optional. - commands/docs-artifacts: Add Functional Core, Imperative Shell section to the architecture.md handoff template so reviewers can verify the boundary is intact in the diff. - commands/review, commands/setup, templates/CLAUDE.md.template: Update descriptions to reflect the new commitment. Steering: when the surrounding code isn't already in core/shell shape, the plugin pushes for extraction anyway. Local mess from a clean extraction is preferred to a clean fit with an entangled neighbor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…verdict
Three softenings to avoid forcing the functional-core / imperative-shell
commitment onto code where it doesn't pay off, while keeping the pressure
where it does.
1. Shell-only carve-out
Some features are genuinely shell-only by nature: thin CRUD endpoints
with no business logic, webhook forwarders, glue code, integration
wrappers. The principle's value (extracting pure logic for clean
unit testing) doesn't apply because there's no meaningful pure logic
to extract.
Plans can now declare "Pure core: (none — shell-only feature)" with
a required Rationale that names the type of work. Vague rationales
("doesn't fit", "everything is I/O") still get pushed back at
dry-run as MEDIUM-severity gaps. Specific rationales pass.
For shell-only plans: test strategy is integration-only; the pure-
core checks in self-review, dry-run, assess, and standards-review
are skipped; only shell-side checks apply. Routing/dispatch
branching in shell-only code is no longer flagged as "shell carrying
logic" — the branching IS the feature.
2. Broader, justified mocking rule
The previous rule (mocks only at "hard external boundaries: third-
party APIs without sandboxes, cost-bearing services, hardware
absent in test") was too narrow. It rejected legitimate cases:
mocking an internal LLM client (real calls cost money), mocking
an internal billing collaborator (don't want to actually charge),
mocking time in retry-logic tests (real wall clock makes tests
flaky).
The rule now permits mocks of internal modules when the real
collaborator is:
- External (third-party API with no sandbox)
- Expensive (real money per invocation)
- Non-deterministic in ways you can't control
- Absent in the test environment
Every mock must be named with its justification (a comment on the
mock or a note in the test docstring). Unjustified mocks of
internal modules remain FAIL. Mocks of pure-core logic remain FAIL
(boundary failure). "The real thing is slow" or "I don't want to
set up the DB" are explicitly called out as not-acceptable
justifications.
3. N/A verdict in §FCIS checks
The §FCIS check in assess (Step 4a) and standards-review (Step 2a)
was binary PASS/FAIL. That generated noise for cases where the
decomposition genuinely doesn't apply: a routing function with
URL-path branching, a framework-mandated hook with a one-line
body, a justified-mock test, files attributed to a shell-only
plan.
Both checks now support N/A, recorded for transparency but not
counted as FAIL and not triggering fix workflows. Each N/A row
includes a brief reason ("shell-only plan, routing dispatch",
"mock of LLM client — justified as expensive") so reviewers can
audit the exceptions.
Files changed: planning-guidance (decomposition template, test
strategy template, mocking rules, constraints, confirm-approach,
self-review, anti-patterns); dry-run-plan (checks 14-19 + gap-type
table); assess (Step 4a verdicts + table example + opening framing);
standards-review (Step 2a verdicts + example table); implementation-
guidance (architectural commitment block, default, sub-agent prompt
in both standards and no-standards variants, deviation table, anti-
patterns); materialize-tasks (Step 3.5 shell-only handling, task doc
template Core/Shell section).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
The plugin's curated test suites still felt low-value and mock-heavy even after
/drvr:assess. We traced it to a structural cause in the planning skill:The fix flips the dependency: architecture determines test shape, not the other way around.
What this PR does
Encodes functional core, imperative shell (Bernhardt; also Hexagonal / Ports-and-Adapters) as a load-bearing architectural commitment across every phase of the SDLC. Under this commitment:
When the surrounding code isn't already in core/shell shape, the plugin steers each new feature toward extraction anyway — local mess from a clean extraction is preferred to a clean fit with an entangled neighbor. We're fighting architectural entropy actively.
Where it lands, by phase
CLAUDE.md+templates/CLAUDE.md.templatekindtask frontmatter field (core/shell/both). Plan schema requires a### Core/Shell Decompositionsubsection under## Architecture Fit.skills/research-guidance/SKILL.mdskills/planning-guidance/SKILL.md(heaviest edit)commands/dry-run-plan.mdskills/materialize-tasks/SKILL.mdkind) and a## Core/Shellsection. BLOCKs if plan lacks the decomposition.skills/implementation-guidance/SKILL.mdcommands/assess.mdagents/standards-review.mdcommands/docs-artifacts.mdarchitecture.mdhandoff template so reviewers can verify the boundary in the diff.commands/review.md,commands/setup.mdWhat's deliberately NOT in this PR
test_plan_doc_sectionsprobably doesn't know about the required### Core/Shell Decompositionsubsection. Worth adding as a follow-up.skills/intent-guidance,skills/sdlc-orchestration. Neither phase materially shapes architecture, so left untouched.Backward compatibility
This is a real philosophical shift. Plans written before this change won't have a Core/Shell Decomposition subsection —
materialize-taskswill BLOCK on them and route back to planning. That's intentional: we don't want pre-existing plans silently treated as conforming.Test plan
/drvr:feature, run through Research → Planning → see the Core/Shell Decomposition prompts land in the expected places/drvr:assesson a test suite known to contain mock-heavy tests; confirm they surface as PRUNE/PROMOTE rather than KEEP/drvr:reviewon a codebase without its own CLAUDE.md; confirm §FCIS check still runs🤖 Generated with Claude Code