Skip to content

Functional core, imperative shell: encode as essential principle#27

Open
andrewemark wants to merge 2 commits into
mainfrom
andrew/functional-core-imperative-shell
Open

Functional core, imperative shell: encode as essential principle#27
andrewemark wants to merge 2 commits into
mainfrom
andrew/functional-core-imperative-shell

Conversation

@andrewemark
Copy link
Copy Markdown

Motivation

The plugin's curated test suites still felt low-value and mock-heavy even after /drvr:assess. We traced it to a structural cause in the planning skill:

  • Test strategy was designed independently of architecture (test pyramid ratios, coverage quotas, "mock at unit boundaries").
  • Under that framing, mock-heavy unit tests are structurally inevitable — the planner has nowhere to put logic except inside classes whose collaborators then get mocked.
  • The "scaffolding vs durable" framing leaned on assessment to clean up later, but the KEEP criterion ("asserts observable behavior") was a weaker bar than what we actually want: a test a reader can use to understand the code.

The fix flips the dependency: architecture determines test shape, not the other way around.

What this PR does

Encodes functional core, imperative shell (Bernhardt; also Hexagonal / Ports-and-Adapters) as a load-bearing architectural commitment across every phase of the SDLC. Under this commitment:

  • Pure-core code takes values in, returns values out — no I/O, no time, no randomness, no mutable shared state. Unit-tested with values in / values out and no mocks.
  • Imperative shell performs I/O and calls into the core. Integration-tested against real I/O.
  • A ""unit test"" that needs a mock for an internal module is a signal the boundary is broken — fix the architecture, not the test.
  • Mocks are permitted only at hard external boundaries (third-party APIs without sandboxes, cost-bearing services, hardware absent in test).

When the surrounding code isn't already in core/shell shape, the plugin steers each new feature toward extraction anyway — local mess from a clean extraction is preferred to a clean fit with an entangled neighbor. We're fighting architectural entropy actively.

Where it lands, by phase

Phase File Change
Constitution CLAUDE.md + templates/CLAUDE.md.template Principle as first Key Principle. Expanded in Engineering Practices. New kind task frontmatter field (core/shell/both). Plan schema requires a ### Core/Shell Decomposition subsection under ## Architecture Fit.
Research skills/research-guidance/SKILL.md Surfaces the natural decomposition during ""How"" research. Required section in overview.
Planning skills/planning-guidance/SKILL.md (heaviest edit) Test Strategy is now derived from a required Core/Shell Decomposition subsection. Test pyramid + coverage quotas + ""mock at unit boundaries"" replaced with: pure-core unit tests (values in / values out, no mocks) and shell integration tests (real I/O). ""Scaffolding test"" category removed — every test is now durable by construction. Added boundary self-review.
Validation commands/dry-run-plan.md New gap checks (mostly HIGH severity): missing decomposition, pure-core items that aren't pure, internal-module mocks in proposed tests, test/item classification mismatches.
Materialization skills/materialize-tasks/SKILL.md Reads the plan's decomposition, propagates per-task classification into task doc frontmatter (kind) and a ## Core/Shell section. BLOCKs if plan lacks the decomposition.
Implementation skills/implementation-guidance/SKILL.md Sub-agent prompts now carry the core/shell rules + per-task classification. New ""core/shell boundary"" deviation category — needing a mock for an internal module is a stop-and-surface signal.
Assessment commands/assess.md PRUNE/KEEP/PROMOTE keyed on mock presence and core/shell membership. ""When uncertain, KEEP"" reversed for mock-heavy tests (PROMOTE instead so the architecture failure surfaces). New §FCIS boundary check in Step 4a — runs regardless of whether codebase has its own standards artifact.
Standards review agents/standards-review.md §FCIS check runs unconditionally (Step 2a). Codebase-specific standards layer on top (Step 2b, conditional).
Handoff commands/docs-artifacts.md New Functional Core, Imperative Shell section in architecture.md handoff template so reviewers can verify the boundary in the diff.
Pass-through commands/review.md, commands/setup.md Updated descriptions to reflect the new commitment.

What's deliberately NOT in this PR

  • Validation tests for the new contract. The existing test_plan_doc_sections probably doesn't know about the required ### Core/Shell Decomposition subsection. Worth adding as a follow-up.
  • CHANGELOG entry. Template version bumped to 1.1.0, but no CHANGELOG entry written.
  • README mention. README doesn't currently surface the commitment; worth adding for plugin users browsing the repo.
  • skills/intent-guidance, skills/sdlc-orchestration. Neither phase materially shapes architecture, so left untouched.

Backward compatibility

This is a real philosophical shift. Plans written before this change won't have a Core/Shell Decomposition subsection — materialize-tasks will BLOCK on them and route back to planning. That's intentional: we don't want pre-existing plans silently treated as conforming.

Test plan

  • Open a new feature with /drvr:feature, run through Research → Planning → see the Core/Shell Decomposition prompts land in the expected places
  • Try to materialize a plan missing the decomposition; confirm it BLOCKs
  • Run /drvr:assess on a test suite known to contain mock-heavy tests; confirm they surface as PRUNE/PROMOTE rather than KEEP
  • Run /drvr:review on a codebase without its own CLAUDE.md; confirm §FCIS check still runs

🤖 Generated with Claude Code

…iple

Establishes the functional-core / imperative-shell architecture (Bernhardt;
also Hexagonal / Ports-and-Adapters) as a load-bearing commitment that flows
through every phase of the SDLC, not just a testing preference.

The motivating problem: assessment-curated test suites still felt low-value
and mock-heavy. Root cause was that the planning skill designed test strategy
independently of architecture (test pyramid, coverage targets, "mock at unit
boundaries"), which made mock-heavy unit tests structurally inevitable. The
fix is to invert the dependency — architecture determines test shape.

Changes:

- CLAUDE.md: Add the principle as the first Key Principle. Expand in
  Engineering Practices with how it flows through each phase. Register
  `kind` (core/shell/both) as a task frontmatter field and require a
  Core/Shell Decomposition subsection in plan Architecture Fit.

- skills/planning-guidance: Add the architectural commitment up front.
  Restructure Test Strategy to be derived from a required Core/Shell
  Decomposition subsection. Replace test-pyramid ratios, coverage quotas,
  and "mock at unit boundaries" guidance with: pure-core unit tests
  (values in / values out, no mocks) and shell integration tests (real
  I/O). Remove the "scaffolding test" category — every test is now durable
  by construction. Add boundary self-review.

- skills/implementation-guidance: Add core/shell rules to the sub-agent
  prompt template. Add "core/shell boundary" as a deviation category —
  needing a mock for an internal module is a stop-and-surface signal,
  not a thing to paper over with a mock.

- skills/materialize-tasks: Read the plan's Core/Shell Decomposition and
  propagate per-task classification into task doc frontmatter (`kind`)
  and a `## Core/Shell` section carrying the rules sub-agents enforce.
  BLOCK materialization for plans missing the decomposition.

- skills/research-guidance: Surface the natural core/shell decomposition
  for the feature during "How" research. Add Core/Shell Decomposition to
  the overview template so planning has the seam in hand.

- commands/assess: Rewrite PRUNE/KEEP/PROMOTE around mock presence and
  core/shell membership. Mock-on-internal-module is the strongest PRUNE
  signal. Reverse the "when uncertain, KEEP" default for mock-heavy
  tests — promote them so the architecture failure surfaces. Add §FCIS
  boundary check to code quality review (always runs, independent of
  whether the codebase has its own standards artifact).

- commands/dry-run-plan: Add boundary checks — plans missing the
  decomposition, pure-core items that aren't pure, internal-module mocks
  in proposed tests, and test/item classification mismatches all surface
  as gaps (mostly HIGH severity).

- agents/standards-review: Add §FCIS check (Step 2a, always runs)
  alongside codebase standards (Step 2b, conditional). Standards artifact
  path becomes optional.

- commands/docs-artifacts: Add Functional Core, Imperative Shell section
  to the architecture.md handoff template so reviewers can verify the
  boundary is intact in the diff.

- commands/review, commands/setup, templates/CLAUDE.md.template: Update
  descriptions to reflect the new commitment.

Steering: when the surrounding code isn't already in core/shell shape,
the plugin pushes for extraction anyway. Local mess from a clean
extraction is preferred to a clean fit with an entangled neighbor.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…verdict

Three softenings to avoid forcing the functional-core / imperative-shell
commitment onto code where it doesn't pay off, while keeping the pressure
where it does.

1. Shell-only carve-out

   Some features are genuinely shell-only by nature: thin CRUD endpoints
   with no business logic, webhook forwarders, glue code, integration
   wrappers. The principle's value (extracting pure logic for clean
   unit testing) doesn't apply because there's no meaningful pure logic
   to extract.

   Plans can now declare "Pure core: (none — shell-only feature)" with
   a required Rationale that names the type of work. Vague rationales
   ("doesn't fit", "everything is I/O") still get pushed back at
   dry-run as MEDIUM-severity gaps. Specific rationales pass.

   For shell-only plans: test strategy is integration-only; the pure-
   core checks in self-review, dry-run, assess, and standards-review
   are skipped; only shell-side checks apply. Routing/dispatch
   branching in shell-only code is no longer flagged as "shell carrying
   logic" — the branching IS the feature.

2. Broader, justified mocking rule

   The previous rule (mocks only at "hard external boundaries: third-
   party APIs without sandboxes, cost-bearing services, hardware
   absent in test") was too narrow. It rejected legitimate cases:
   mocking an internal LLM client (real calls cost money), mocking
   an internal billing collaborator (don't want to actually charge),
   mocking time in retry-logic tests (real wall clock makes tests
   flaky).

   The rule now permits mocks of internal modules when the real
   collaborator is:
   - External (third-party API with no sandbox)
   - Expensive (real money per invocation)
   - Non-deterministic in ways you can't control
   - Absent in the test environment

   Every mock must be named with its justification (a comment on the
   mock or a note in the test docstring). Unjustified mocks of
   internal modules remain FAIL. Mocks of pure-core logic remain FAIL
   (boundary failure). "The real thing is slow" or "I don't want to
   set up the DB" are explicitly called out as not-acceptable
   justifications.

3. N/A verdict in §FCIS checks

   The §FCIS check in assess (Step 4a) and standards-review (Step 2a)
   was binary PASS/FAIL. That generated noise for cases where the
   decomposition genuinely doesn't apply: a routing function with
   URL-path branching, a framework-mandated hook with a one-line
   body, a justified-mock test, files attributed to a shell-only
   plan.

   Both checks now support N/A, recorded for transparency but not
   counted as FAIL and not triggering fix workflows. Each N/A row
   includes a brief reason ("shell-only plan, routing dispatch",
   "mock of LLM client — justified as expensive") so reviewers can
   audit the exceptions.

Files changed: planning-guidance (decomposition template, test
strategy template, mocking rules, constraints, confirm-approach,
self-review, anti-patterns); dry-run-plan (checks 14-19 + gap-type
table); assess (Step 4a verdicts + table example + opening framing);
standards-review (Step 2a verdicts + example table); implementation-
guidance (architectural commitment block, default, sub-agent prompt
in both standards and no-standards variants, deviation table, anti-
patterns); materialize-tasks (Step 3.5 shell-only handling, task doc
template Core/Shell section).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@andrewemark andrewemark changed the title Functional core, imperative shell: encode as load-bearing principle Functional core, imperative shell: encode as essential principle May 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant