This repo contains the tuning control plane, FSM skills, stress-test harness, and per-DSL work directories for non-stop GPU kernel tuning.
Use /perf-tune as the only tuning entrypoint.
Examples:
/perf-tune
/perf-tune cuda f16
/perf-tune croqtile f16 4096x8192x4096
The expected workflow is always:
PROFILE -> IDEATE -> IMPLEMENT -> MEASURE -> DECIDE -> STORE
Measured KEEP/DISCARD iterations must always execute STORE.
Compile-failed attempts are stored as attempt records and do not consume public iteration ids.
Only resume from artifacts inside the current git repository.
Forbidden resume sources:
- sibling repos such as
*-paper - absolute includes that resolve outside this repo
- stale
state.jsonentries without a valid local checkpoint and local active loop-state
Before resuming any in-progress tuning session, validate the resume source:
python3 scripts/validate_tuning_session.py --dsl cuda
python3 scripts/validate_tuning_session.py --dsl allIf validation fails, clean invalid work state and restart from INIT:
python3 scripts/clean_kernel_work_state.py --dsl cuda --invalid-only
python3 scripts/clean_kernel_work_state.py --dsl all --invalid-onlyTo intentionally clear all active in-progress kernel work state before starting over:
python3 scripts/clean_kernel_work_state.py --dsl allAt every STORE step, the system must update:
tuning/<gpu>/<dsl>/logs/<key>/results.tsvtuning/<gpu>/<dsl>/checkpoints/<key>.jsontuning/<gpu>/<dsl>/memory/<key>/rounds.raw.jsonltuning/<gpu>/<dsl>/memory/<key>/rounds.md.claude/skills/fsm-engine/state/<dsl>/compaction-summary.md
These files are what make crash-safe resume and post-mortem debugging possible.
The mock harness simulates long tuning sessions and failure modes:
python3 scripts/mock_skill_ab_test.py \
--agents-per-dsl 24 \
--round-target 336 \
--seed 20260411 \
--run-name strict_ab_20260411_336itersThe strict variant models:
- explicit
iter000baseline - attempt-only compile failures
- win/stall-triggered profiling
- research escalation
- raw/md round-history persistence
- active-state and resume-source validation at long-run checkpoints
If a tuning session behaved strangely, do this in order:
python3 scripts/validate_tuning_session.py --dsl <dsl>python3 scripts/clean_kernel_work_state.py --dsl <dsl> --invalid-only- restart with
/perf-tune <dsl> <dtype> [shape]
This avoids carrying forward polluted seed.cu, stale checkpoint state, or external-source includes.