Two Claude Code skills that take computational chemistry seriously.
Documented method selection over feature lists.
Honest limits over silent fallbacks.
Mandatory cross-validation over "just trust the ML potential."
ase-chemist · amber-chemist · test harness
Stack:
ASE ·
tblite-xTB ·
MACE-MP-0 / MACE-OFF ·
AmberTools (GAFF2, pmemd, cpptraj, MMPBSA) ·
Gaussian DFT
chemist is the dev workspace for two sibling Agent Skills for Claude Code — small, scoped extensions that turn natural-language requests like "thermalize this solvated ligand at 300 K" into the right calculation, run through the right backend, with the limits stated up front.
ase-chemist— atomistic / molecular simulation on top of ASE. Seven backends behind one method-selection router: ASE built-ins (EMT, LJ, TIP3P), tblite-xTB, MACE foundation models, an Amber-GAFF2 carve-out for small-molecule MD, and Gaussian DFT (SP / Opt / Freq).amber-chemist— Amber-native MD sibling. Single-replica MD with restart / extend, T-REMD as a first-class capability, plus add-ons for cpptraj-driven analysis, single-point energies, and MMPBSA endpoint scoring.
Both ship a trigger-test harness that runs prompts through claude -p in fresh sessions to catch activation and method-selection drift.
This is not an application. There's no library to import and no service to run. The skills are markdown contracts (
SKILL.md) plus Python scripts; Claude Code loads them on demand based on what the user asked for.
Comp-chem tooling tends to be either "here are 50 tools, you figure it out" (general-purpose ASE) or "here's a black box, give me your structure" (opinionated wrapper). These skills sit between, on a shared design rigor:
- Right method for the system, not the request. A "minimize this molecule" prompt walks a documented
task → calculator → install checktree, each rule with a stated why. EMT on an organic gets caught; GFN2-xTB on a 5000-atom system gets redirected to MACE. - Honest about limits. xTB MD stops being practical at ~1k atoms; MACE-medium tops out near 1–2k on a 40 GB GPU; ASE's
Ambercalculator can't drive production MD. These surface in plain language whenever they're load-bearing — never hidden. - Cross-validation is non-negotiable for ML potentials. Every MACE MD run validates against GFN2-xTB every 1 ps and aborts at force MAE > 100 meV/Å. Opt-out is per-run, not the default — ML potentials produce plausible-but-wrong PESs users can't spot on their own.
- No DFT method / basis defaults. The Gaussian scripts refuse to run without explicit
--method/--basis(plus charge, multiplicity, resources). A silently-picked B3LYP/6-31G(d) on a transition metal is the same wrong-physics failure mode guarded against elsewhere.
A user types this in a Claude Code session in their working directory:
"Run NVT MD on a 5000-atom organic system at 300 K for 50 ps. Don't actually run it — I just want the command."
ase-chemist activates and does this:
- Walks the method-selection tree. GFN2-xTB MD on 5000 atoms is impractical (xTB size cliff at ~1k atoms). Routes to MACE-OFF (pure-organic foundation model, element-set auto-detected).
- Turns on mandatory cross-validation against GFN2-xTB every 1 ps. The MD aborts if force MAE exceeds 100 meV/Å — the published rule of thumb for "trajectory drifted out of training distribution."
- Writes a runnable command and the why:
python scripts/run_md.py --structure system.xyz --calculator mace \
--ensemble nvt-langevin --temperature 300 --n-steps 50000 \
--output md.traj
# validation.csv written every 1 ps; aborts at MAE_F > 100 meV/ÅOther prompts route differently: "compute G_298 for caffeine at B3LYP-D3/def2-TZVP" → gaussian_opt.py → gaussian_freq.py (in-house thermochem parsing); "build a Pt(111) slab with a CO adsorbate" stays inline with ase.build — no script for a 5-line task.
See ase-chemist/README.md and amber-chemist/README.md for the full user-facing walkthroughs.
| Backend | Reach for it when... | Skill | Through |
|---|---|---|---|
| EMT | Quick metallic answers — Al, Cu, Ag, Au, Ni, Pd, Pt + H/C/N/O | ase-chemist |
optimize.py / run_md.py |
| Lennard-Jones | Toy systems, noble gases, methodology training | ase-chemist |
optimize.py / run_md.py |
| TIP3P | Pure-water MD where rigid O–H bonds matter | ase-chemist |
run_md.py |
| GFN2-xTB (tblite) | Default for organic / main-group up to ~1k atoms | ase-chemist |
optimize.py / run_md.py / single_point.py |
| MACE (MP-0 + OFF) | Past the xTB size cliff (~1–2k atoms), with mandatory cross-validation | ase-chemist |
optimize.py --calculator mace / run_md.py --calculator mace |
| Amber + GAFF2 | Small-mol production MD, plain NPT (carve-out — deeper Amber lives next door) | ase-chemist |
parameterize_gaff2.py → run_amber.py |
| Gaussian DFT | Publication-quality DFT — SP, Opt, Freq + thermochem | ase-chemist |
gaussian_sp.py / gaussian_opt.py / gaussian_freq.py |
| Amber (deep) | Restart / extend, T-REMD, implicit GB, MMPBSA, cpptraj analysis | amber-chemist |
amber_run.py (easy mode) or amber_md.py / amber_remd.py directly |
The two skills install independently — only what you need. Conda is preferred on HPC; pip works for laptops.
ase-chemist — required + optional backends
# Required
conda install -c conda-forge ase tblite-python mdanalysis matplotlib numpy
# Pip-only fallback (libgfortran-fragile on some HPC):
pip install ase tblite mdanalysis matplotlib numpy
# Optional, install only what you need
pip install mace-torch # MACE — CUDA strongly recommended
conda install -c conda-forge ambertools # Amber GAFF2 carve-out
# Gaussian: license-gated; install per https://gaussian.com/
# Sanity-check what your environment actually supports
python ase-chemist/scripts/check_env.pyamber-chemist — AmberTools + MPI / CUDA
# Required — AmberTools25 is fully open-source
conda install -c conda-forge ambertools
# T-REMD needs MPI builds (pmemd.MPI / pmemd.cuda.MPI)
# MMPBSA.py.MPI is shipped with AmberTools
python amber-chemist/scripts/check_env.pycheck_env.py ends with a one-line [SUMMARY] listing exactly which workflows the box can run right now. The skills recommend methods that are actually installed — they won't ask the user to install Gaussian when EMT or LJ would already cover the question.
run_tests.sh runs prompts in fresh claude -p sessions (180 s cap each), each tagged trigger (should activate + write correct code), no_trigger (should stay out), or borderline (either response defensible; human review).
python generate_test.py # regenerate fixtures
bash run_tests.sh # full sweep
TIMEOUT_SECS=300 bash run_tests.sh # longer per-prompt capThe cap is intentionally too short to finish a simulation — the test asks "did Claude write the right code?", not "did it run?". Every prompt says not to execute its output. evals/evals.json (per skill) holds richer prompts for manual review.
chemist/
├── ase-chemist/ # skill #1 dev source — SKILL.md, README.md, scripts/, references/, evals/
├── amber-chemist/ # skill #2 dev source — same shape
├── .claude/skills/{ase,amber}-chemist/ # project copies (what `claude -p` loads)
├── ~/.claude/skills/{ase,amber}-chemist/ # user copies (kept in parity)
├── CLAUDE.md # design decisions + three-copy sync rules
├── run_tests.sh / generate_test.py # trigger-test harness + fixtures
└── test-inputs/ , results/ # generated fixtures + run logs (gitignored)
Each skill has three copies: dev source (edit here), the project copy, and the user copy. Tests load the copies, not the dev source — so rsync after every edit (see Contributing).
ase-chemist—README.md(user-facing: backends, examples, install) ·SKILL.md(trigger contract + method-selection tree) ·references/(scoped reference files).amber-chemist—README.md·SKILL.md·references/(md_core,remd,scoring,failure_modes, …).- Repo-level —
CLAUDE.mdfor load-bearing design decisions and the three-copy sync rules.
Bug reports and feature requests welcome via GitHub issues. For changes that touch the trigger contract (SKILL.md description field) or method-selection rules, please open an issue first — those are the parts that move eval results.
Workflow for skill edits:
- Edit dev source under
ase-chemist/oramber-chemist/. rsyncto both loaded copies (.claude/skills/...and~/.claude/skills/...).bash run_tests.shand confirm no regressions on trigger / no-trigger prompts.- Update the relevant
references/*.mdif you changed a contract (cross-validation threshold, method-selection rule, deferral surface).
See CLAUDE.md for the full design-decision rationale.
Released under the MIT License. The backends the skills orchestrate (ASE, tblite/xTB, MACE, AmberTools, Gaussian) carry their own licenses and citation requirements — install and cite each per its own terms.