feat(mycelium): add distill subcommand + harden pr-retro capture quality#173
Conversation
Hermes-style knowledge distillation loop (periodic nudge + skill-candidate synthesis prep) on top of the existing lessons table. Part 0 — pr-retrospective capture hardening (upstream signal quality): - confidence no longer hardcoded to 7; differentiate by source (user-stated 8-9, cross-model 8, inferred 5-6; +1 on recurrence) - Q4 lessons unlock from fixed "3 points" to 0-5 (quality over padding) - lessons add --skill now records the subject skill, not the producer - key slug gains a domain prefix convention to converge recurrence/dedup Part A — mycelium distill subcommand: - distill_service.py: harvest (read-only window) -> cluster (deterministic token-Jaccard + type + key-prefix; pluggable for future embeddings) -> score (>=3 members, >=2 distinct retro_pr, avg_conf>=7, procedural type, new-evidence-since-watermark) -> DigestReport JSON - models.py: DistilledCluster / SkillCandidate / DigestReport - cli.py: `distill run` (read-only + watermark) and `distill promote-tiers` (wires the previously unscheduled tier promotion) - config.py: DISTILL_DIR / DISTILL_STATE_PATH - watermark state.json makes it idempotent (periodic nudge only on new accrual) Tests: 20 new (clustering, thresholds, watermark idempotency). Full mycelium suite green (389). Real-data dry-run over 233 lessons surfaced 4 coherent cross-PR candidates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU
Final Aggregated Review — PR #173Modegroup-review (3/3 voices active: Claude 4-subagent / agy / Codex) Voice verdicts
Consensus Important (must fix)
Single-voice, verified real (must fix)
Disputed → refuted with evidence (drop)
Actionable NIT (must fix — convention cleans all NITs)
Voices unavailable
|
… tests) Resolves the consensus + verified findings from the 3-voice mob review on PR #173. Clustering (consensus Claude + agy): - prefix no longer unconditionally unions same-prefix lessons; it lowers the similarity bar to PREFIX_SIMILARITY_FLOOR (0.15) instead of overriding it, so zero-overlap same-prefix lessons no longer collapse into a grab-bag mega-cluster Observability / silent-failure hardening (Claude silent-failure-hunter): - harvest returns HarvestResult with dropped_unparseable_ts + truncated; CLI warns on both so "0 candidates" is diagnosable - load_watermark distinguishes corrupt (warn) from missing state.json - DigestReport gains dropped_unparseable_ts / truncated fields Docs (Claude comment-analyzer): - pr-retro SKILL.md: parametrize --source (was hardcoded inferred while the rubric scores confidence by source — source also drives decay); LESSON_CONFIDENCE 5->10 - tighten "read-only" claims to "lesson rows only" (init_db does schema writes) - clarify /knowledge-distill consumer lives in the ainization-skill repo NITs: - remove dead `now` param from score(); fix _slug_title double-prefix Tests (Claude pr-test-analyzer — the [Critical] blind spot): - add the positive watermark resurface path (DT-017/018) — previously only the suppression direction was tested - add union-find transitivity (DT-005), zero-overlap-same-prefix no-merge (DT-002b), jaccard ratios (EG-001), boundary avg_conf==7.0 (BVA-001), unparseable-ts counted (EG-003), corrupt watermark (EG-004) Refuted (agy single-voice criticals, evidence in PR comment): UTC import needs 3.11 (repo requires-python>=3.11) and null-confidence TypeError (schema is NOT NULL). 27 distill tests + full mycelium suite (396) green; ruff/mypy/pylint/bandit clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU
…ert) Round-2 re-review: Claude (4 subagents) and Codex both LGTM with all round-1 findings verified resolved (incl. mutation-checked watermark resurface tests). Remaining items addressed: - load_watermark: guard non-dict-but-valid JSON (e.g. `[1,2]`) so .get() can't raise AttributeError; move `import sys` to function top (also clears the local-import nit) - run_distill: document the >scan-limit watermark-skip limitation (surfaced via HarvestResult.truncated + CLI warn; no SQL-paging redesign at current scale) - test DT-018: also assert has_new_evidence is True (symmetry with DT-017) agy round-2 criticals refuted again with evidence (same as round-1): null confidence / null insight are impossible (schema NOT NULL + Pydantic constraints); the agy voice reviews the diff without full-repo schema context. 27 distill tests + ruff/mypy/pylint/bandit green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU
Round-2 Mob Review — PR #173 (convergence)Voice verdicts (round 2, on the fix commit)
agy round-2 items → disposition
ConvergenceFunctional convergence: Claude LGTM + Codex LGTM; agy's remaining objections are |
What & Why
Adds a Hermes-style knowledge-distillation loop on top of the existing mycelium
lessonstable, modeled on NousResearch/hermes-agent's Skills System (procedural memory +periodic nudges + autonomous skill-creation prep).
/pr-retroalready captures per-PRlessons, but nothing periodically reviews accumulated lessons to surface recurring patterns —
this fills that gap (the synthesis/skill-drafting half lands in
ainization-skillas theknowledge-distillskill).Part 0 — pr-retro capture hardening (upstream signal quality)
The distill scoring relies on
confidenceand recurrence signals that/pr-retrowasflattening:
--confidenceno longer hardcoded to 7 — differentiate by source (user-stated 8-9,cross-model 8, inferred 5-6; +1 on recurrence)
lessons add --skillrecords the subject skill, not the producer (pr-retrospective)--keyslug gains a domain-prefix convention so recurrence/dedup converge across PRsPart A —
mycelium distillsubcommanddistill_service.py:harvest(read-only time window) →cluster(deterministictoken-Jaccard + type + key-prefix; pluggable
_similarityfor when embeddings land) →score(≥3 members, ≥2 distinctretro_pr, avg_conf ≥7, procedural type, new-evidencesince watermark) →
DigestReportJSONmodels.py:DistilledCluster/SkillCandidate/DigestReportcli.py:distill run(read-only + watermark) anddistill promote-tiers(wires thepreviously unscheduled tier promotion)
config.py:DISTILL_DIR/DISTILL_STATE_PATHstate.jsonmakes runs idempotent — periodic nudge only fires on new accrualTests / verification
green (389)
🤖 Generated with Claude Code