Skip to content

Add leakage-aware-splitting skill#47

Closed
jsilter wants to merge 1 commit into
anthropics:mainfrom
jsilter:add-leakage-aware-splitting
Closed

Add leakage-aware-splitting skill#47
jsilter wants to merge 1 commit into
anthropics:mainfrom
jsilter:add-leakage-aware-splitting

Conversation

@jsilter

@jsilter jsilter commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • Adds a new skill, leakage-aware-splitting, for building leakage-aware train/validation/test splits for biological ML data so benchmarks measure real generalization instead of memorized similarity.
  • Registers it in marketplace.json and documents it in README.md.

Changes

  • New leakage-aware-splitting/ skill directory (SKILL.md, scripts/, references/, LICENSE.txt Apache-2.0).
  • Added a marketplace.json entry (source: "./", strict: false, skills: ["./leakage-aware-splitting"]).
  • Added a README Skills entry and install commands.

Modalities

sequence (MMseqs2) · structure (Foldseek) · small_molecule (RDKit scaffold/Butina) · protein_ligand (double-cold) · temporal (deposition date) · metadata/grouped (group-aware).

Testing

  • Validated locally (deterministic invariants, calibration controls, golden/regression checks) with MMseqs2 + Foldseek; the full suite and evals are maintained in the skill's source repo.
  • After merge: /plugin install leakage-aware-splitting@life-sciences

🤖 Generated with Claude Code

…litting

Clusters biological datasets by similarity (sequence/structure/small-molecule/
protein-ligand/temporal) and/or shared metadata groups, assigns whole clusters to a
single split, and reports honest diagnostics (realized ratios, train/test nearest-
neighbour similarity, kNN baseline, provenance manifest) so benchmarks measure real
generalization instead of memorized similarity.

Registers the skill in marketplace.json and documents it in README.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@jsilter jsilter force-pushed the add-leakage-aware-splitting branch from 02a1e58 to 8a0869b Compare June 16, 2026 03:08
@jsilter jsilter closed this Jun 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant