Research Workflow

Domain-agnostic research-coding workflow discipline for computational science (the JAX/Python research family — gravax, stellax, progenax, radax, …), packaged as a Claude Code plugin. The human is the scientist-in-the-loop, PI-level collaborator, and supervisor; the skills enforce evidence-first execution, structural correctness over compatibility, falsifiability, and reproducible artifacts. Domain specifics (e.g. MESA parity) live in thin lenses, so the stances stay sharp while the suite stays general.

Skills (70, by workflow phase)

Phase	Skill
Collaborate	`researcher-in-the-loop` · `high-impact-checkpoint`
Ideate	`research-ideation` · `research-brainstorming`
Literature	`prior-art-check` · `reading-notes-discipline` · `related-work-map`
Scope	`minimal-falsifiable-slice` · `discriminating-experiment-design` · `testing-strategist`
Build correctly	`ownership-and-structure` · `correct-cutover` · `numerical-precision` · `derivation-before-implementation` · `staleness-sweep` · `no-silent-except`
Equation-critical sources	`pdf-equation-extraction` · `equation-to-code-traceability` · `reference-license-firewall` · `equation-errata-ledger`
Verify	`evidence-first-execution` · `verification-gate` · `numerical-method-validation` · `gradient-validation` · `reference-parity-audit` · `adversarial-result-check` · `uncertainty-reporting-gate` · `plausibility-envelope` · `ai-self-distrust` · `seed-and-stochasticity` · `prior-sensitivity` · `systematic-error-hunting` · `figure-interpretation-guard` · `no-stub-when-done`
Inference rigor	`mcmc-convergence-gate` · `predictive-checks` · `model-selection-discipline`
Review (audit written code/figures)	`scientific-code-reviewer` · `numerical-methods-auditor` · `jax-code-validator` · `error-handling-reviewer` · `code-craft-reviewer` · `benchmark-generator` · `plot-faithfulness-inspector` · `plot-craft-reviewer`
Performance & scale	`profiling-discipline` · `scaling-validation` · `jax-performance` · `cluster-run-contract`
Record	`decision-log-and-commits` · `provenance-of-constants` · `experiment-tracking` · `data-provenance` · `data-io-validator` · `null-result-integrity` · `assumption-ledger` · `no-secrets-in-git`
Communicate (docs & figures)	`myst-expert` · `docs-writing-voice` · `myst-ci` · `interactive-figures` · `mystmd-plugin-dev` · `astro-plotting-craft` · `plot-design-inspector` · `publication-figure-validator`
Reproduce & release	`artifact-first-reproducibility` · `reproducible-environment-contract` · `software-citation` · `research-release-checklist` · `data-management-plan`

Each skill's description carries a "Don't use when… (→ sibling)" partition and a ## Related block, so the suite reads as one ordered protocol. reference-parity-audit loads a domain lens when one exists (lenses/mesa.md and lenses/nbody.md ship; lenses/rad-transfer.md is added on first need).

The Ideate and Literature clusters (v1.4.0) complete the front of the funnel the suite previously lacked: research-ideation (divergent — generate and triage directions) → research-brainstorming (convergent — sharpen one into a falsifiable hypothesis + discriminating observable) → prior-art-check (is it novel?) → discriminating-experiment-design → minimal-falsifiable-slice → Build. The Inference rigor cluster gates the inference itself for the NumPyro family — sampler convergence (R-hat/ESS/divergences), prior/posterior predictive fit, and honest out-of-sample model selection — distinct from the forward-numerics Verify cluster. Performance & scale covers measure-first profiling, strong/weak scaling, JAX compile-boundary performance, and the HPC job→artifact contract. Reproduce & release extends reproducibility to the citable public artifact (CITATION.cff/DOI, the figure→release trace, FAIR data management plans).

The Equation-critical sources cluster is for papers whose equations become code, tests, or benchmark fixtures. It keeps rendered-PDF verification, implementation traceability, reference-code licensing boundaries, and errata/conflict decisions separate on purpose. The equation-verifier agent is the adversarial row checker for promoting digest rows to verified.

The Review and Communicate clusters and several MyST references were consolidated in v1.2.0 from the former astro-code-review and myst plugins (now retired) — see the Status section. MyST authoring skills ship co-located references (myst-cheatsheet, math-and-gotchas, myst-projects-and-workflows, voice-fingerprint, page-anatomy) and the shippable mystmd-plugins/interactive.mjs directive bundle.

Hooks (enforcement)

The skills document the discipline; eight path-/command-scoped, self-limiting hooks (hooks/hooks.json) enforce it. Each stays inert outside research code (e.g. during course work or quick edits) and fails open on any error, so it never blocks legitimate work.

Hook	Event	Fires on	Action
deletion gate	`PreToolUse(Bash)`	`rm` / `git rm` / `git clean` / `shred`	asks for confirmation before a destructive op
no-secrets-in-git	`PreToolUse(Bash)`	`git add`/`commit` that names a credential file (`.env`, `*.pem`, …) or stages a secret signature (AWS/GitHub/Slack/Google token, `PRIVATE KEY` block, `api_key=…`)	asks before a secret enters git history
test-integrity	`PreToolUse(Edit/Write)`	edits to `test_.py` / `tests/*` that loosen a tolerance, drop an `assert`, or add `skip`/`xfail`	asks before a test is weakened to pass
no-silent-except	`PreToolUse(Edit/Write)`	new Python that catches an exception and does nothing (bare `except:`, or `except …: pass/…/continue`)	asks before an error is silently swallowed
myst-docs-hygiene	`PreToolUse(Edit/Write)`	MyST docs (`docs/*/.md`, `myst.yml`) with legacy Sphinx-MyST syntax (`{toctree}`/`{eval-rst}`/autodoc/RST), or a page missing the house-minimum `title`+`description` frontmatter	asks before legacy/incomplete MyST docs land (pairs with the `myst@myst-dev` plugin)
provenance	`PreToolUse(Edit/Write)`	uncited numeric literals in constants/calibration files, or references to external datasets/checkpoints (data-file URLs, `data/raw/…`) with no source/version/checksum	asks for a source (DOI/arXiv/Zenodo/checksum)
evidence-before-done	`Stop` (+ `SubagentStop` when `RWF_SUBAGENT_EVIDENCE` set)	a code/test/result/build claim ("fixed / passing / converged / built") with no fresh command output in the turn	blocks until the verification command + output are shown
no-stub-when-done	`Stop` (+ `SubagentStop` when `RWF_SUBAGENT_EVIDENCE` set)	a completion claim ("implemented / complete / ready") while an edit this turn left a stub in code (`NotImplementedError`, `TODO`/`FIXME`, placeholder body)	blocks until the stub is finished or the scope is restated
jq sanity check	`SessionStart`	`jq` not on `PATH`	warns that the gates are inactive (they need `jq`)

Hooks load at session start — restart Claude Code after installing or updating the plugin to activate them. Smoke tests: bash hooks/tests/run_tests.sh.

Prerequisite: the hooks use jq. If jq is not on PATH they fail open (no-op) — so install it (brew install jq) for the gates to be active.

Debugging: the hooks are silent by default. To see when each fires and what it decided (allow:* / ask:* / block:*), set RWF_HOOK_DEBUG=1; entries append to $RWF_HOOK_LOG (default ${TMPDIR:-/tmp}/research-workflow-hooks.log). Example:

export RWF_HOOK_DEBUG=1
tail -f "${TMPDIR:-/tmp}/research-workflow-hooks.log"
# 2026-06-15T22:41:48 [evidence] block:claim-without-evidence — All tests pass.
# 2026-06-15T22:41:48 [deletion] ask:destructive — rm -rf x
# 2026-06-15T22:41:49 [skill] invoke:research-workflow:numerical-precision

The log also records skill invocations ([skill] invoke:<name>, via a PreToolUse(Skill) hook), so a week of RWF_HOOK_DEBUG data shows not just which gates fired but which of the 70 skills actually surface in real work — the missing signal for auditing the advisory layer. (Caveat: this captures skills invoked through the Skill tool; guidance the model follows without an explicit invocation is not logged — it's a lower bound.)

Commands

Six slash commands give deliberate entry points (skills also auto-surface by description); each does more than restate a skill:

Command	Does
`/checkpoint [action]`	Go/no-go before an expensive or irreversible run (`high-impact-checkpoint`).
`/review [target]`	Multi-lens scientific code/figure review of a changeset — the deterministic entry point for the Review cluster (correctness · numerics · JAX · robustness · craft · figures), producing a severity-tagged report. Beats hoping the review skills auto-surface.
`/parity <ref>`	Reference-parity audit vs. an external reference, loading the matching lens (`mesa`/`nbody`).
`/reproduce`	Capture a reproducibility contract — env lock, seeds, precision, input ids, commit.
`/equation-digest <source>`	Create or review an equation-critical digest from a PDF/source note, with rendered-PDF verification states, traceability, reference-license firewalling, and errata handling. In installed plugin form, Claude may expose it as `/research-workflow:equation-digest`.
`/hooks-debug [status\|tail\|on\|off]`	Inspect/enable the hook decision log (see Debugging above; enabling needs a `settings.json` env entry + restart).

Agents

Agent	Does
`equation-verifier`	Adversarially checks equation-digest rows against rendered PDFs or trusted publisher sources before rows are promoted to `verified`.

Installation

This plugin is research-workflow; the dev marketplace (in .claude-plugin/marketplace.json) is research-workflow-dev. Public repo: https://github.com/drannarosen/research-workflow.

git clone https://github.com/drannarosen/research-workflow.git
# then, in Claude Code:
/plugin marketplace add ./research-workflow
/plugin install research-workflow@research-workflow-dev

Then restart Claude Code (hooks load at session start). The version is single-sourced in .claude-plugin/plugin.json; keep marketplace.json in sync.

Development

CI (.github/workflows/ci.yml) runs on every push / PR: shellcheck, the consistency checks, and the hook smoke tests. Run the same locally before committing:

bash scripts/checks.sh         # version sync (plugin.json == marketplace.json) + skill/command/agent/hook/lens lint
bash hooks/tests/run_tests.sh  # hook smoke tests (60 cases)

Status

Consolidated 2026-05-30 from a former 15-skill scientific-workflow plugin: the ownership cluster merged → ownership-and-structure + correct-cutover; the MESA pair → reference-parity-audit + lenses/mesa.md; decision + commit → decision-log-and-commits; the rest were renamed and de-stellarified into a domain-agnostic numerical-research substrate.

v1.1.0 added the gradient-validation skill (finite-difference grad-checks, NaN/zero-gradient traps) and the first four enforcement hooks, and refined every skill — sharper "Use when…" descriptions with sibling disambiguation, concrete computational-astrophysics worked examples, dedupe-by-pointer cross-references, and explicit hard-vs-adaptable stances.

v1.1.x then grew the suite to 32 skills and added an epistemic-integrity set (derivation-before-implementation, plausibility-envelope, ai-self-distrust, null-result-integrity, the inference-robustness trio) plus four more deterministic gates (no-silent-except, no-secrets-in-git, no-stub-when-done, myst-docs-hygiene).

v1.2.0 consolidates this into one comprehensive research plugin: the former astro-code-review plugin (11 of its 12 skills — reproducibility-auditor dropped as a duplicate of reproducible-environment-contract / artifact-first-reproducibility) and the former myst plugin (5 skills + the interactive.mjs directive bundle) were migrated in, adding the Review (computational-physics code/figure review) and Communicate (MyST docs + figure design/publication) clusters → 48 skills, eight enforcement hooks. Both source plugins are retired (disabled, marked deprecated). MyST skills were re-scoped to research docs (teaching moved to the sophie platform).

v1.3.0 adds the equation-critical source layer: pdf-equation-extraction, equation-to-code-traceability, reference-license-firewall, equation-errata-ledger, the /equation-digest command, and the equation-verifier agent. This is additive to the research workflow rather than a refactor: ordinary source ingest stays lightweight, while implementation-critical equations now have rendered-PDF verification, traceability, firewall, and errata gates.

v1.3.1 hardens the plugin after adversarial review: Task delegation no longer counts as verification by itself, completion claims scan final touched code files for stubs, the shipped interactive.mjs escapes JSON/JS values safely, CI runs official Claude plugin validation, and scripts/checks.sh enforces the skill graph promises.

v1.4.0 extends the suite from 52 to 65 skills across four new clusters, completing the research lifecycle end-to-end: a true front-of-funnel (Ideate — research-ideation, research-brainstorming; Literature — prior-art-check), Inference rigor for the Bayesian/NumPyro family (mcmc-convergence-gate, predictive-checks, model-selection-discipline), Performance & scale for HPC work (profiling-discipline, scaling-validation, jax-performance, cluster-run-contract), and a Reproduce & release tail for citable artifacts (software-citation, research-release-checklist, data-management-plan). Additive — no hooks added, no skills renamed; sibling plugins keep their boundaries (manuscripts → manuscript-workflow, grants → grant-writing).

v1.4.1 completes the Literature cluster deferred in v1.4.0: reading-notes-discipline (per-paper claim/evidence/caveat intake) and related-work-map (durable cross-paper field map) — 67 skills. The boundary against manuscript-workflow:lit-scan (per-manuscript citation completeness) was confirmed clean before building: reading-notes is per-paper intake, related-work-map is cross-paper synthesis you maintain, prior-art-check is the one-time novelty gate, lit-scan is late-stage citation completeness in a different plugin.

v1.5.0 adds the figure craft & interpretation layer (70 skills), completing the figure lifecycle from authoring to reading: astro-plotting-craft (author publication-grade plots in the house style — the jaxstroviz theme/figure helpers as source of truth, seaborn perceptually-uniform colormaps, CVD-safe color discipline, log/linear and LaTeX-not-unicode rules), plot-craft-reviewer (audit existing plot code for craft defects — wrong axis scale, unicode-vs-LaTeX, mathtext syntax errors, overlays, off-brand/unsafe colormaps), and figure-interpretation-guard (what a finished figure lets you conclude — over-reading, visual traps, reproducing/comparing paper figures, AI-misread plots). These sit beside the existing design/faithfulness/publication trio (ADR-0009) without overlap; the house style matches the jaxstroviz package as-is (its themes/architecture are near-SoTA), with the colorblind-safety, color×marker, reproducible-font, and perceptually-uniform-colormap upgrade targets named in astro-plotting-craft/references/house-style.md.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.claude-plugin		.claude-plugin
.github/workflows		.github/workflows
agents		agents
commands		commands
docs/reviews		docs/reviews
hooks		hooks
mystmd-plugins		mystmd-plugins
scripts		scripts
skills		skills
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Research Workflow

Skills (70, by workflow phase)

Hooks (enforcement)

Commands

Agents

Installation

Development

Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Research Workflow

Skills (70, by workflow phase)

Hooks (enforcement)

Commands

Agents

Installation

Development

Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages