Skip to content

freshie + catalog hygiene: D+F regression, 250 orphan skills, SQLite LFS plan #660

@jeremylongshore

Description

@jeremylongshore

Context

Freshie discovery run 6 (PR #659, 2026-05-03) — first refresh since 2026-04-06 — surfaced three independent issues that the inventory had been hiding while it was 27 days stale:

  1. Compliance regression under schema 3.3.1 (zero D/F → 115 D/F)
  2. Three different skill counts across the discovery / catalog / filesystem surfaces (250-skill catalog gap)
  3. Freshie SQLite over the 50 MB recommended GitHub limit (now 63 MB after run 6)

Filing as one issue since they share an investigation path (all surfaced by the same Freshie refresh) but they should be solved as three separate PRs.


Issue 1 — D+F skill regression vs. "zero D/F" baseline

Old state (CLAUDE.md memory `project_severity_audit.md` + operator audit):

"Zero D/F grade skills — all 2,834 skills score C+ (70+) or better after v4.24.0 remediation"

New state (run 6, schema 3.3.1, 2026-05-03):

Grade Count
📉 D 113
❌ F 2
D+F total 115

Most likely cause: schema 3.3.1 added stricter required-field rules at the marketplace tier (8 fields including `compatibility`, `tags`, `author`, `license`, `version`). Skills that passed under the previous schema may now fail because:

  • A field is missing entirely
  • A field is present but has the deprecated `compatible-with` form (the migration covered ~3,200 skills but some may have been missed)
  • A field has wrong type (e.g., `allowed-tools` as a string when YAML list is expected)

Resolution path:

  1. Query `skill_compliance` table to extract the 115 failing skills + their specific error categories
  2. Cluster failures by error type (missing field X, wrong type Y, etc.)
  3. For each cluster: either run `scripts/batch-remediate.py --execute` (auto-fixable) or open targeted PRs (manual)
  4. Re-run the validator to confirm grade improvement
  5. Update CLAUDE.md memory + operator audit when zero D/F is restored (or update the baseline if some D/F is acceptable)

Acceptance criteria:

  • D+F count back to 0 (or documented threshold)
  • CLAUDE.md memory `project_severity_audit.md` updated to reflect schema-3.3.1-era reality
  • Operator audit gist's "What's Working" section reflects the new baseline

Issue 2 — Three different skill counts across surfaces

Surface Count Source
Filesystem 3,061 `find plugins -name SKILL.md`
Freshie discovery (run 6) 3,000 `freshie/scripts/rebuild-inventory.py`
Marketplace catalog 2,810 `marketplace/scripts/discover-skills.mjs`

Filesystem → Marketplace gap (251 skills):

  • 250 "orphaned" — skills under plugin directories that aren't in `marketplace.extended.json`
  • 1 "failed to process" — likely malformed frontmatter
  • 19 duplicate-slug clusters

Filesystem → Freshie gap (61 skills):
Unknown — Freshie's intermediate filtering needs investigation. Could be the same orphan-skip logic with slightly different criteria, or could be excluding files with empty/missing frontmatter.

Resolution path:

  1. Query Freshie's `skill_files` table for the 61-skill delta vs filesystem
  2. For each orphan: decide ship-or-delete (add parent plugin to `marketplace.extended.json` OR `rm` the orphan)
  3. For each duplicate slug: pick the canonical, deprecate or rename the others
  4. Document the "3,061 → 3,000 → 2,810" funnel in `marketplace/scripts/discover-skills.mjs` header comment so future maintainers don't re-investigate

Acceptance criteria:

  • All three surfaces produce the same number, OR the difference is clearly documented as intentional filtering at each stage
  • Zero duplicate slugs in the marketplace catalog
  • Each orphan has been triaged (kept-and-listed OR deleted)

Issue 3 — Freshie SQLite past 50 MB GitHub recommended limit

Current size: 63 MB after run 6 (was 53 MB after run 5).
GitHub thresholds:

  • Soft warning: 50 MB ✗ (currently over)
  • Hard limit: 100 MB ✗ (will hit in ~3-4 more discovery runs at +10 MB/run)

Options:

A. Git LFS migration

  • Move `freshie/inventory.sqlite` to LFS
  • One-time conversion via `git lfs migrate import --include="freshie/inventory.sqlite"`
  • LFS bandwidth costs apply on clone/checkout — currently free tier supports the use case but worth tracking
  • Most defensive option

B. Archive old runs

  • Add a maintenance script that periodically vacuums old `run_id`s into a separate `freshie/archives/inventory-runs-1-N.sqlite` and removes them from the active DB
  • Active DB stays small; historical data still accessible
  • More complex; needs careful design to preserve compliance trends

C. Don't commit the SQLite at all

  • Generate it on-demand during CI from `marketplace.extended.json` + filesystem
  • Loses the historical `discovery_runs` time series
  • Would require rebuilding the comparison/trend reports as CI artifacts

Recommendation: Option A (LFS) — minimal disruption, preserves history, standard pattern. File a separate PR with the LFS migration before run 7.

Acceptance criteria:

  • DB stored in LFS (or chosen alternative)
  • CI confirms next discovery run still works post-migration
  • CLAUDE.md updated with the new "how to refresh Freshie" instructions if the workflow changed

Related

jeremylongshore.com made me do it
-claude
intentsolutions.io

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions