Skip to content

Add implement-design-doc-java skill#6

Draft
szjanikowski wants to merge 12 commits into
mainfrom
feature/implement-design-doc-java-skill
Draft

Add implement-design-doc-java skill#6
szjanikowski wants to merge 12 commits into
mainfrom
feature/implement-design-doc-java-skill

Conversation

@szjanikowski
Copy link
Copy Markdown
Contributor

@szjanikowski szjanikowski commented Mar 27, 2026

Summary

  • New skill that translates DesignDoc JSON contracts into Java domain model code
  • Adapts to target project's coding style (Lombok, plain Java, records, sealed interfaces, Vavr Either)
  • Detects and replicates project patterns: facade+services, command/handler, service-per-use-case, nested repository interfaces, sealed event interfaces
  • Reuses existing shared types (IDs, Money) instead of creating duplicates
  • Enforces package-private encapsulation for entities within aggregates

Eval results (10 iterations, final 4 discriminating scenarios)

Scenario With Skill Without Skill Delta
Pricing (Plain Java + Vavr) 9/9 (100%) 7/9 (78%) +22pp
Payroll (Mixed patterns) 9/9 (100%) 7/9 (78%) +22pp
Receiving (Unusual conventions) 8/8 (100%) 6/8 (75%) +25pp
Returns (Cross-module) 9/9 (100%) 9/9 (100%) 0pp
Total 35/35 (100%) 29/35 (83%) +17pp

Results confirmed repeatable across 2 independent iterations with identical pass/fail patterns.

Structure

implement_design_doc_java/
├── SKILL.md                          # Main skill instructions
├── references/designdoc_mapping.md   # BuildingBlock type → Java class mapping rules
├── README.md                         # Installation & usage guide
├── implement_design_doc_java.skill   # Installable package (7.7KB)
├── evals/
│   ├── evals.json                    # 4 eval definitions
│   └── fixtures/                     # 7 Java project fixtures + 7 DesignDoc JSONs
└── implement-design-doc-java-workspace/
    ├── grade_eval.py                 # Programmatic grading script
    └── iteration-1..10/              # All eval run outputs and grading results

Key design decisions

  • DesignDoc JSON is authoritative — skill implements what's described, doesn't "fix" the design
  • Style detection before code generation — reads 3-5 existing classes to detect conventions
  • "Search before creating" — explicit instruction to find existing types before creating new ones
  • Neutral eval prompts — identical for with/without skill, no pattern hints
  • Removed non-discriminating evals (shipping, transfers — both 9/9 without skill)
  • Removed invalid factory assertion (DesignDoc didn't describe a factory building block)

Methodology notes

  • Iterations 1-2: biased prompts (baseline got style hints) — discarded
  • Iterations 3-4: neutral prompts, fixed Money fixture
  • Iterations 5-7: added hard scenarios (mixed patterns, delta/modify existing, unusual conventions, cross-module)
  • Iterations 8-9: repeatability confirmation on final 4 evals
  • Iteration 10: focused receiving x2 after skill improvement

🤖 Generated with Claude Code

Szymon Janikowski and others added 12 commits March 27, 2026 16:35
Skill translates DesignDoc JSON contracts into Java domain model code,
adapting to the target project's coding style (Lombok, plain Java, records).
Includes 3 eval test cases with diverse Java project fixtures and DesignDoc
JSONs covering shipping, pricing, and transfers domains.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Includes generated Java outputs (with/without skill), grading script,
timing data, and benchmark JSONs for reviewer inspection.
Iteration 3 with neutral prompts pending.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… fix

Iteration 3: neutral prompts (no style hints in baseline), with_skill 96.3%
vs without_skill 92.6%. Money fixture was missing — caused false failure.

Iteration 4: added Money.java to plain-java fixture, with_skill 100% (27/27)
vs without_skill 96.3% (26/27). Consistent skill advantage on
package-private entity encapsulation.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…iter 5-6

New fixtures: style-unusual-conventions (nested repo, sealed events in
separate file, factory as inner class), style-cross-module (2 bounded
contexts with shared types).

Cleaned all DesignDoc JSONs of "Already exists"/"NEW" hints.
Simplified all eval prompts to minimal form.

Iteration 5 (payroll + order-delta): with_skill 100% vs without 95%.
Iteration 6 (receiving + returns): with_skill 94.4% vs without 77.8%
— first significant delta (+16.6pp) on hard scenarios. Key skill wins:
reusing shared types and detecting subtle conventions.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Payroll: with_skill 9/9 vs without 7/9 (+10pp). Removing "Already exists"
hint from JSON exposed baseline failure on EmployeeId reuse.
Order-delta: both 11/11 — non-discriminating, will be removed.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Removed shipping and transfers evals (both 9/9 with and without skill).
Kept 4 discriminating evals: pricing, payroll, receiving, returns.

Iteration 8 (clean prompts, original 3 evals): with_skill 100% vs
without 92.6%. Shipping/transfers confirmed non-discriminating.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Results identical to iter 6-8: with_skill 97.2% (35/36) vs without 80.6%
(29/36), delta +16.7pp. All pass/fail patterns reproduced exactly.

Consistent skill wins: package-private entities, shared type reuse.
Consistent shared failure: factory_as_inner_class (both configs).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…er 10

Factory_as_inner_class assertion was invalid — DesignDoc has no factory
building block, so expecting one violates "DesignDoc is authoritative".
Reverted skill change that encouraged adding unrequested factories.

Iteration 10 (receiving x2): with_skill 8/8 (100%) vs without 6/8 (75%)
on both runs. Delta +25pp, fully repeatable.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expanded description with more trigger phrases and broader coverage.
Baseline trigger eval: 11/20 (55%) - all not-trigger correct, most
should-trigger failing. May be eval tooling limitation (claude -p
command-based triggering vs real skill system).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Packaged implement-design-doc-java as installable .skill file (399KB ZIP).
Fixed skill name from "Implement Design Doc (Java)" to
"implement-design-doc-java" (kebab-case required by packager).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

## Input

A DesignDoc JSON conforming to the schema at `contracts/design-doc-schema.json`. The JSON contains:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Czy kontrakt nie powinien być dołączony do skill'a? Jakie mogą być scenariusze w których odegrałoby to rolę z perspektywy skuteczności skill'a?


4. **Replicate the project's exact patterns.** When the project uses a specific pattern for domain events (e.g., `sealed interface Event permits ...` with inner records), aggregates (e.g., `pendingEvents` + `flushEvents()`), or use case coordination (e.g., facade + package-private services + @Configuration), replicate that exact pattern. Do not substitute a different pattern even if it's valid DDD — the goal is consistency with the existing codebase.

5. **Implement domain model only.** Infrastructure implementations (persistence, messaging, HTTP) are out of scope unless the contract explicitly includes `external_integration` building blocks. Repository interfaces are in scope; their implementations are not.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To musimy przegadać wspólnie - na pewno skill powinien odpowiadać za implementację modelu domenowego - JEŚLI jednak pojawiłyby się klasy z poziomu infrastruktury - odpowiednio opisane - uważam że powinny one być normalnie implementowane . Koniecznie musimy dodać na to evale.

Copy link
Copy Markdown
Contributor Author

@szjanikowski szjanikowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uwagi apropos skilla


2. **Adapt to the project's style.** Before writing any code, read existing sources to detect the project's conventions (see Style Detection below). The contract dictates *what* exists; the project dictates *how* it looks.

3. **Reuse existing classes — never duplicate.** When a building block's `description` says "Already exists in the project" or a class with that name already exists in the codebase, import it from its current location. Do not create a new copy in a different package. Search the project for the class before creating it. This is critical for types like shared value objects (IDs, Money) that are used across modules.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W kontrakcie nie jest ściśle powiedziane że będzie uwaga "Already exists in the project" za każdym razem gdy konieczne jest re-użycie. To fajna konwencja ale potrzebujemy też testów gdzie tak wcale nie jest a MIMO to skill jest w stanie inteligentnie się połapać że powinien wykorzystać / rozbudować istniejące klasy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant