feat(mycelium): add distill subcommand + harden pr-retro capture quality by howie · Pull Request #173 · heyu-ai/yibi-stack

howie · 2026-06-28T13:40:19Z

What & Why

Adds a Hermes-style knowledge-distillation loop on top of the existing mycelium
lessons table, modeled on NousResearch/hermes-agent's Skills System (procedural memory +
periodic nudges + autonomous skill-creation prep). /pr-retro already captures per-PR
lessons, but nothing periodically reviews accumulated lessons to surface recurring patterns —
this fills that gap (the synthesis/skill-drafting half lands in ainization-skill as the
knowledge-distill skill).

Part 0 — pr-retro capture hardening (upstream signal quality)

The distill scoring relies on confidence and recurrence signals that /pr-retro was
flattening:

--confidence no longer hardcoded to 7 — differentiate by source (user-stated 8-9,
cross-model 8, inferred 5-6; +1 on recurrence)
Q4 lessons unlock from a fixed "3 points" to 0-5 (quality over padding)
lessons add --skill records the subject skill, not the producer (pr-retrospective)
--key slug gains a domain-prefix convention so recurrence/dedup converge across PRs

Part A — `mycelium distill` subcommand

distill_service.py: harvest (read-only time window) → cluster (deterministic
token-Jaccard + type + key-prefix; pluggable _similarity for when embeddings land) →
score (≥3 members, ≥2 distinct retro_pr, avg_conf ≥7, procedural type, new-evidence
since watermark) → DigestReport JSON
models.py: DistilledCluster / SkillCandidate / DigestReport
cli.py: distill run (read-only + watermark) and distill promote-tiers (wires the
previously unscheduled tier promotion)
config.py: DISTILL_DIR / DISTILL_STATE_PATH
watermark state.json makes runs idempotent — periodic nudge only fires on new accrual

Tests / verification

20 new tests (clustering, score thresholds DT, watermark idempotency); full mycelium suite
green (389)
ruff / mypy / pylint / bandit / lint-skill-bash / markdownlint all green
Real-data dry-run over 233 lessons surfaced 4 coherent cross-PR candidates

🤖 Generated with Claude Code

Hermes-style knowledge distillation loop (periodic nudge + skill-candidate synthesis prep) on top of the existing lessons table. Part 0 — pr-retrospective capture hardening (upstream signal quality): - confidence no longer hardcoded to 7; differentiate by source (user-stated 8-9, cross-model 8, inferred 5-6; +1 on recurrence) - Q4 lessons unlock from fixed "3 points" to 0-5 (quality over padding) - lessons add --skill now records the subject skill, not the producer - key slug gains a domain prefix convention to converge recurrence/dedup Part A — mycelium distill subcommand: - distill_service.py: harvest (read-only window) -> cluster (deterministic token-Jaccard + type + key-prefix; pluggable for future embeddings) -> score (>=3 members, >=2 distinct retro_pr, avg_conf>=7, procedural type, new-evidence-since-watermark) -> DigestReport JSON - models.py: DistilledCluster / SkillCandidate / DigestReport - cli.py: `distill run` (read-only + watermark) and `distill promote-tiers` (wires the previously unscheduled tier promotion) - config.py: DISTILL_DIR / DISTILL_STATE_PATH - watermark state.json makes it idempotent (periodic nudge only on new accrual) Tests: 20 new (clustering, thresholds, watermark idempotency). Full mycelium suite green (389). Real-data dry-run over 233 lessons surfaced 4 coherent cross-PR candidates. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU

howie · 2026-06-28T13:53:08Z

Final Aggregated Review — PR #173

Mode

group-review (3/3 voices active: Claude 4-subagent / agy / Codex)

Voice verdicts

Claude (code-reviewer + silent-failure-hunter + pr-test-analyzer + comment-analyzer): NEEDS_CHANGES
agy / Gemini: NEEDS_CHANGES (2 Critical refuted on verification, 1 consensus Important)
Codex: LGTM (diff inspected, no substantive findings; test exec blocked by sandbox temp dir)

Consensus Important (must fix)

Prefix union-find over-clusters (Claude code-reviewer + agy) — distill_service.py cluster():
same_prefix unconditionally unions any two same-type lessons sharing a key prefix, regardless
of token overlap. With union-find transitivity, all bash-* pattern lessons collapse into one
grab-bag mega-cluster, diluting candidates (contradicts the PR's own "湊數稀釋 cluster" goal).
Fix: make prefix lower the similarity bar, not override it — union iff
_similarity >= threshold OR (same_prefix AND _similarity >= PREFIX_FLOOR) (PREFIX_FLOOR=0.15).
Update DT-002 to assert corrected behavior + add a zero-overlap-same-prefix → no-merge test.

Single-voice, verified real (must fix)

[test gap, raised Critical] Positive watermark resurface path untested (Claude pr-test-analyzer)
— verified by inspection: every "candidate present" test uses wm=None; the periodic-nudge
positive branch (wm != None AND a member newer than wm → candidate surfaces) has zero coverage.
An inverted > would pass the whole suite. Fix: add a resurface test (watermark older than
members → 1 candidate; advance past → 0; insert fresh post-watermark member → resurfaces).
Silent data loss on parse failures (Claude silent-failure-hunter; agy raised the Z-suffix
variant) — _parse_ts returns None and harvest folds unparseable-ts rows into the same
"skip" branch as out-of-window rows; load_watermark swallows corruption → silent re-flood +
overwrite. Fix: count + surface dropped-unparseable-ts (CLI warn + DigestReport field);
distinguish corrupt watermark from missing (warn before treating as first run).
Silent harvest truncation at 2000 (Claude code-reviewer + silent-failure-hunter) — warn when
len(rows) == _HARVEST_SCAN_LIMIT so window truncation is observable.
SKILL.md confidence rubric contradicts hardcoded --source inferred (Claude comment-analyzer)
— the new by-source confidence guidance is impossible to honor while --source stays hardcoded.
Fix: parametrize --source in the lessons-add template alongside --confidence.
/knowledge-distill forward reference (Claude comment-analyzer) — referenced as the downstream
consumer but lives in the separate ainization-skill repo. Fix: clarify cross-repo / planned.

Disputed → refuted with evidence (drop)

agy [Critical] from datetime import UTC breaks Python < 3.11 → REFUTED: pyproject.toml
requires-python = ">=3.11"; existing modules already import UTC; CI runs 3.12. The Z-suffix
parse concern is likewise moot on 3.11+ (fromisoformat handles Z) — but the underlying
silent-drop hardening (refactor: reorganize skills into 7 plugin packs for selective marketplace install #3) is kept regardless.
agy [Critical] TypeError on null confidence → REFUTED: schema is confidence INTEGER NOT NULL CHECK(confidence BETWEEN 1 AND 10); Pydantic confidence: int = Field(ge=1, le=10). Never None.
(Claude code-reviewer independently confirmed.)

Actionable NIT (must fix — convention cleans all NITs)

Dead now param in score() — remove (implies time logic that isn't there).
_slug_title double-prefixes (bash-bash-cd-git) — strip redundant prefix.
SKILL.md LESSON_CONFIDENCE placeholder {{5-9}} → {{5-10}} (recurrence +1 ceilings at 10).
"唯讀 lessons table" overstates — tighten to "只讀 lessons 資料列；schema init 由 init_db 寫入".
Add high-value test assertions: union-find transitivity (A~~B,B~~C ⇒ one cluster), _jaccard
partial-overlap ratio, boundary avg_confidence == 7.0 passes.

Voices unavailable

None (all 3 active; Codex test-exec blocked by sandbox but review completed).

… tests) Resolves the consensus + verified findings from the 3-voice mob review on PR #173. Clustering (consensus Claude + agy): - prefix no longer unconditionally unions same-prefix lessons; it lowers the similarity bar to PREFIX_SIMILARITY_FLOOR (0.15) instead of overriding it, so zero-overlap same-prefix lessons no longer collapse into a grab-bag mega-cluster Observability / silent-failure hardening (Claude silent-failure-hunter): - harvest returns HarvestResult with dropped_unparseable_ts + truncated; CLI warns on both so "0 candidates" is diagnosable - load_watermark distinguishes corrupt (warn) from missing state.json - DigestReport gains dropped_unparseable_ts / truncated fields Docs (Claude comment-analyzer): - pr-retro SKILL.md: parametrize --source (was hardcoded inferred while the rubric scores confidence by source — source also drives decay); LESSON_CONFIDENCE 5->10 - tighten "read-only" claims to "lesson rows only" (init_db does schema writes) - clarify /knowledge-distill consumer lives in the ainization-skill repo NITs: - remove dead `now` param from score(); fix _slug_title double-prefix Tests (Claude pr-test-analyzer — the [Critical] blind spot): - add the positive watermark resurface path (DT-017/018) — previously only the suppression direction was tested - add union-find transitivity (DT-005), zero-overlap-same-prefix no-merge (DT-002b), jaccard ratios (EG-001), boundary avg_conf==7.0 (BVA-001), unparseable-ts counted (EG-003), corrupt watermark (EG-004) Refuted (agy single-voice criticals, evidence in PR comment): UTC import needs 3.11 (repo requires-python>=3.11) and null-confidence TypeError (schema is NOT NULL). 27 distill tests + full mycelium suite (396) green; ruff/mypy/pylint/bandit clean. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU

…ert) Round-2 re-review: Claude (4 subagents) and Codex both LGTM with all round-1 findings verified resolved (incl. mutation-checked watermark resurface tests). Remaining items addressed: - load_watermark: guard non-dict-but-valid JSON (e.g. `[1,2]`) so .get() can't raise AttributeError; move `import sys` to function top (also clears the local-import nit) - run_distill: document the >scan-limit watermark-skip limitation (surfaced via HarvestResult.truncated + CLI warn; no SQL-paging redesign at current scale) - test DT-018: also assert has_new_evidence is True (symmetry with DT-017) agy round-2 criticals refuted again with evidence (same as round-1): null confidence / null insight are impossible (schema NOT NULL + Pydantic constraints); the agy voice reviews the diff without full-repo schema context. 27 distill tests + ruff/mypy/pylint/bandit green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Claude-Session: https://claude.ai/code/session_01XprgLv8LiF2zM3kPhNtyRU

howie · 2026-06-29T02:46:15Z

Round-2 Mob Review — PR #173 (convergence)

Voice verdicts (round 2, on the fix commit)

Claude (code-reviewer + silent-failure-hunter + pr-test-analyzer + comment-analyzer): LGTM
— all round-1 findings verified resolved; pr-test-analyzer mutation-checked the new watermark
resurface tests (an inverted > in score() now fails DT-017/DT-015). One new NIT (non-dict
watermark JSON → AttributeError) — fixed in cc83f55.
Codex: LGTM — no substantive findings (both rounds).
agy / Gemini: NEEDS_CHANGES — all items refuted or documented (see below).

agy round-2 items → disposition

[Critical] null confidence → int(None) crash → REFUTED (3rd time): schema
confidence INTEGER NOT NULL CHECK(BETWEEN 1 AND 10) + Pydantic Field(ge=1, le=10). Never None.
[Important] str(None) → "None" in clustering → REFUTED: insight/key are NOT NULL;
nullable skill is already guarded by if m.get("skill").
[Critical] scan-limit truncation + watermark skip → documented limitation: only triggers at

2000 in-window lessons (current total ~233); surfaced via HarvestResult.truncated + CLI WARN;
docstring on run_distill now records it. No SQL-paging redesign at current scale.
[NIT] local import sys → addressed by moving it to load_watermark top.

agy reviews the diff without full-repo schema context, so it re-raises the same nullability
false-positives each round. Per this skill's "single-voice Critical → empirically refute", these
are dropped with evidence rather than cargo-culting a wrong fix.

Convergence

Functional convergence: Claude LGTM + Codex LGTM; agy's remaining objections are
evidence-refuted false-positives or a documented (warned) limitation. All real findings from both
rounds are resolved. 27 distill tests + 396 full mycelium suite + ruff/mypy/pylint/bandit green.

howie and others added 2 commits June 28, 2026 22:05

howie merged commit 9151a6e into main Jun 29, 2026
1 check passed

howie deleted the feat/mycelium-distill branch June 29, 2026 08:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(mycelium): add distill subcommand + harden pr-retro capture quality#173

feat(mycelium): add distill subcommand + harden pr-retro capture quality#173
howie merged 3 commits into
mainfrom
feat/mycelium-distill

howie commented Jun 28, 2026

Uh oh!

howie commented Jun 28, 2026

Uh oh!

howie commented Jun 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

howie commented Jun 28, 2026

What & Why

Part 0 — pr-retro capture hardening (upstream signal quality)

Part A — mycelium distill subcommand

Tests / verification

Uh oh!

howie commented Jun 28, 2026

Final Aggregated Review — PR #173

Mode

Voice verdicts

Consensus Important (must fix)

Single-voice, verified real (must fix)

Disputed → refuted with evidence (drop)

Actionable NIT (must fix — convention cleans all NITs)

Voices unavailable

Uh oh!

howie commented Jun 29, 2026

Round-2 Mob Review — PR #173 (convergence)

Voice verdicts (round 2, on the fix commit)

agy round-2 items → disposition

Convergence

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Part A — `mycelium distill` subcommand