freshie + catalog hygiene: D+F regression, 250 orphan skills, SQLite LFS plan

## Context

Freshie discovery run 6 (PR #659, 2026-05-03) — first refresh since 2026-04-06 — surfaced three independent issues that the inventory had been hiding while it was 27 days stale:

1. **Compliance regression** under schema 3.3.1 (zero D/F → 115 D/F)
2. **Three different skill counts** across the discovery / catalog / filesystem surfaces (250-skill catalog gap)
3. **Freshie SQLite over the 50 MB recommended GitHub limit** (now 63 MB after run 6)

Filing as one issue since they share an investigation path (all surfaced by the same Freshie refresh) but they should be solved as three separate PRs.

---

## Issue 1 — D+F skill regression vs. \"zero D/F\" baseline

**Old state** (CLAUDE.md memory \`project_severity_audit.md\` + operator audit):
> \"Zero D/F grade skills — all 2,834 skills score C+ (70+) or better after v4.24.0 remediation\"

**New state** (run 6, schema 3.3.1, 2026-05-03):
| Grade | Count |
|---|---|
| 📉 D | 113 |
| ❌ F | 2 |
| **D+F total** | **115** |

**Most likely cause:** schema 3.3.1 added stricter required-field rules at the marketplace tier (8 fields including \`compatibility\`, \`tags\`, \`author\`, \`license\`, \`version\`). Skills that passed under the previous schema may now fail because:
- A field is missing entirely
- A field is present but has the deprecated \`compatible-with\` form (the migration covered ~3,200 skills but some may have been missed)
- A field has wrong type (e.g., \`allowed-tools\` as a string when YAML list is expected)

**Resolution path:**
1. Query \`skill_compliance\` table to extract the 115 failing skills + their specific error categories
2. Cluster failures by error type (missing field X, wrong type Y, etc.)
3. For each cluster: either run \`scripts/batch-remediate.py --execute\` (auto-fixable) or open targeted PRs (manual)
4. Re-run the validator to confirm grade improvement
5. Update CLAUDE.md memory + operator audit when zero D/F is restored (or update the baseline if some D/F is acceptable)

**Acceptance criteria:**
- [ ] D+F count back to 0 (or documented threshold)
- [ ] CLAUDE.md memory \`project_severity_audit.md\` updated to reflect schema-3.3.1-era reality
- [ ] Operator audit gist's \"What's Working\" section reflects the new baseline

---

## Issue 2 — Three different skill counts across surfaces

| Surface | Count | Source |
|---|---|---|
| Filesystem | 3,061 | \`find plugins -name SKILL.md\` |
| Freshie discovery (run 6) | 3,000 | \`freshie/scripts/rebuild-inventory.py\` |
| Marketplace catalog | 2,810 | \`marketplace/scripts/discover-skills.mjs\` |

**Filesystem → Marketplace gap (251 skills):**
- 250 \"orphaned\" — skills under plugin directories that aren't in \`marketplace.extended.json\`
- 1 \"failed to process\" — likely malformed frontmatter
- 19 duplicate-slug clusters

**Filesystem → Freshie gap (61 skills):**
Unknown — Freshie's intermediate filtering needs investigation. Could be the same orphan-skip logic with slightly different criteria, or could be excluding files with empty/missing frontmatter.

**Resolution path:**
1. Query Freshie's \`skill_files\` table for the 61-skill delta vs filesystem
2. For each orphan: decide ship-or-delete (add parent plugin to \`marketplace.extended.json\` OR \`rm\` the orphan)
3. For each duplicate slug: pick the canonical, deprecate or rename the others
4. Document the \"3,061 → 3,000 → 2,810\" funnel in \`marketplace/scripts/discover-skills.mjs\` header comment so future maintainers don't re-investigate

**Acceptance criteria:**
- [ ] All three surfaces produce the same number, OR the difference is clearly documented as intentional filtering at each stage
- [ ] Zero duplicate slugs in the marketplace catalog
- [ ] Each orphan has been triaged (kept-and-listed OR deleted)

---

## Issue 3 — Freshie SQLite past 50 MB GitHub recommended limit

**Current size:** 63 MB after run 6 (was 53 MB after run 5).
**GitHub thresholds:**
- Soft warning: 50 MB ✗ (currently over)
- Hard limit: 100 MB ✗ (will hit in ~3-4 more discovery runs at +10 MB/run)

**Options:**

**A. Git LFS migration**
- Move \`freshie/inventory.sqlite\` to LFS
- One-time conversion via \`git lfs migrate import --include=\"freshie/inventory.sqlite\"\`
- LFS bandwidth costs apply on clone/checkout — currently free tier supports the use case but worth tracking
- Most defensive option

**B. Archive old runs**
- Add a maintenance script that periodically vacuums old \`run_id\`s into a separate \`freshie/archives/inventory-runs-1-N.sqlite\` and removes them from the active DB
- Active DB stays small; historical data still accessible
- More complex; needs careful design to preserve compliance trends

**C. Don't commit the SQLite at all**
- Generate it on-demand during CI from \`marketplace.extended.json\` + filesystem
- Loses the historical \`discovery_runs\` time series
- Would require rebuilding the comparison/trend reports as CI artifacts

**Recommendation:** Option A (LFS) — minimal disruption, preserves history, standard pattern. File a separate PR with the LFS migration before run 7.

**Acceptance criteria:**
- [ ] DB stored in LFS (or chosen alternative)
- [ ] CI confirms next discovery run still works post-migration
- [ ] CLAUDE.md updated with the new \"how to refresh Freshie\" instructions if the workflow changed

---

## Related

- PR #659 — discovery run 6 + compliance population (the refresh that surfaced these)
- PR #656 — v4.29.0 quality sweep (introduced new lint/format gates)
- PR #658 — README/Prettier/generator conflict fix (the architectural-issue-fixed-same-day pattern this issue should follow)

jeremylongshore.com made me do it
  -claude
intentsolutions.io

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

freshie + catalog hygiene: D+F regression, 250 orphan skills, SQLite LFS plan #660

Context

Issue 1 — D+F skill regression vs. "zero D/F" baseline

Issue 2 — Three different skill counts across surfaces

Issue 3 — Freshie SQLite past 50 MB GitHub recommended limit

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Surface	Count	Source
Filesystem	3,061	`find plugins -name SKILL.md`
Freshie discovery (run 6)	3,000	`freshie/scripts/rebuild-inventory.py`
Marketplace catalog	2,810	`marketplace/scripts/discover-skills.mjs`

Uh oh!

freshie + catalog hygiene: D+F regression, 250 orphan skills, SQLite LFS plan #660

Description

Context

Issue 1 — D+F skill regression vs. "zero D/F" baseline

Issue 2 — Three different skill counts across surfaces

Issue 3 — Freshie SQLite past 50 MB GitHub recommended limit

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions