chemist

Two Claude Code skills that take computational chemistry seriously.
Documented method selection over feature lists. Honest limits over silent fallbacks. Mandatory cross-validation over "just trust the ML potential."

_{ase-chemist ·
amber-chemist ·
test harness}

_{Stack:
ASE ·
tblite-xTB ·
MACE-MP-0 / MACE-OFF ·
AmberTools (GAFF2, pmemd, cpptraj, MMPBSA) ·
Gaussian DFT}

What this is

chemist is the dev workspace for two sibling Agent Skills for Claude Code — small, scoped extensions that turn natural-language requests like "thermalize this solvated ligand at 300 K" into the right calculation, run through the right backend, with the limits stated up front.

ase-chemist — atomistic / molecular simulation on top of ASE. Seven backends behind one method-selection router: ASE built-ins (EMT, LJ, TIP3P), tblite-xTB, MACE foundation models, an Amber-GAFF2 carve-out for small-molecule MD, and Gaussian DFT (SP / Opt / Freq).
amber-chemist — Amber-native MD sibling. Single-replica MD with restart / extend, T-REMD as a first-class capability, plus add-ons for cpptraj-driven analysis, single-point energies, and MMPBSA endpoint scoring.

Both ship a trigger-test harness that runs prompts through claude -p in fresh sessions to catch activation and method-selection drift.

This is not an application. There's no library to import and no service to run. The skills are markdown contracts (SKILL.md) plus Python scripts; Claude Code loads them on demand based on what the user asked for.

Why you might care

Comp-chem tooling tends to be either "here are 50 tools, you figure it out" (general-purpose ASE) or "here's a black box, give me your structure" (opinionated wrapper). These skills sit between, on a shared design rigor:

Right method for the system, not the request. A "minimize this molecule" prompt walks a documented task → calculator → install check tree, each rule with a stated why. EMT on an organic gets caught; GFN2-xTB on a 5000-atom system gets redirected to MACE.
Honest about limits. xTB MD stops being practical at ~1k atoms; MACE-medium tops out near 1–2k on a 40 GB GPU; ASE's Amber calculator can't drive production MD. These surface in plain language whenever they're load-bearing — never hidden.
Cross-validation is non-negotiable for ML potentials. Every MACE MD run validates against GFN2-xTB every 1 ps and aborts at force MAE > 100 meV/Å. Opt-out is per-run, not the default — ML potentials produce plausible-but-wrong PESs users can't spot on their own.
No DFT method / basis defaults. The Gaussian scripts refuse to run without explicit --method / --basis (plus charge, multiplicity, resources). A silently-picked B3LYP/6-31G(d) on a transition metal is the same wrong-physics failure mode guarded against elsewhere.

A minute in the skill

A user types this in a Claude Code session in their working directory:

"Run NVT MD on a 5000-atom organic system at 300 K for 50 ps. Don't actually run it — I just want the command."

ase-chemist activates and does this:

Walks the method-selection tree. GFN2-xTB MD on 5000 atoms is impractical (xTB size cliff at ~1k atoms). Routes to MACE-OFF (pure-organic foundation model, element-set auto-detected).
Turns on mandatory cross-validation against GFN2-xTB every 1 ps. The MD aborts if force MAE exceeds 100 meV/Å — the published rule of thumb for "trajectory drifted out of training distribution."
Writes a runnable command and the why:

python scripts/run_md.py --structure system.xyz --calculator mace \
    --ensemble nvt-langevin --temperature 300 --n-steps 50000 \
    --output md.traj
# validation.csv written every 1 ps; aborts at MAE_F > 100 meV/Å

Other prompts route differently: "compute G_298 for caffeine at B3LYP-D3/def2-TZVP" → gaussian_opt.py → gaussian_freq.py (in-house thermochem parsing); "build a Pt(111) slab with a CO adsorbate" stays inline with ase.build — no script for a 5-line task.

See ase-chemist/README.md and amber-chemist/README.md for the full user-facing walkthroughs.

Backends at a glance

Backend	Reach for it when...	Skill	Through
EMT	Quick metallic answers — Al, Cu, Ag, Au, Ni, Pd, Pt + H/C/N/O	`ase-chemist`	`optimize.py` / `run_md.py`
Lennard-Jones	Toy systems, noble gases, methodology training	`ase-chemist`	`optimize.py` / `run_md.py`
TIP3P	Pure-water MD where rigid O–H bonds matter	`ase-chemist`	`run_md.py`
GFN2-xTB (tblite)	Default for organic / main-group up to ~1k atoms	`ase-chemist`	`optimize.py` / `run_md.py` / `single_point.py`
MACE (MP-0 + OFF)	Past the xTB size cliff (~1–2k atoms), with mandatory cross-validation	`ase-chemist`	`optimize.py --calculator mace` / `run_md.py --calculator mace`
Amber + GAFF2	Small-mol production MD, plain NPT (carve-out — deeper Amber lives next door)	`ase-chemist`	`parameterize_gaff2.py` → `run_amber.py`
Gaussian DFT	Publication-quality DFT — SP, Opt, Freq + thermochem	`ase-chemist`	`gaussian_sp.py` / `gaussian_opt.py` / `gaussian_freq.py`
Amber (deep)	Restart / extend, T-REMD, implicit GB, MMPBSA, cpptraj analysis	`amber-chemist`	`amber_run.py` (easy mode) or `amber_md.py` / `amber_remd.py` directly

Install

The two skills install independently — only what you need. Conda is preferred on HPC; pip works for laptops.

ase-chemist — required + optional backends

# Required
conda install -c conda-forge ase tblite-python mdanalysis matplotlib numpy
# Pip-only fallback (libgfortran-fragile on some HPC):
pip install ase tblite mdanalysis matplotlib numpy

# Optional, install only what you need
pip install mace-torch                              # MACE — CUDA strongly recommended
conda install -c conda-forge ambertools             # Amber GAFF2 carve-out
# Gaussian: license-gated; install per https://gaussian.com/

# Sanity-check what your environment actually supports
python ase-chemist/scripts/check_env.py

amber-chemist — AmberTools + MPI / CUDA

# Required — AmberTools25 is fully open-source
conda install -c conda-forge ambertools

# T-REMD needs MPI builds (pmemd.MPI / pmemd.cuda.MPI)
# MMPBSA.py.MPI is shipped with AmberTools

python amber-chemist/scripts/check_env.py

check_env.py ends with a one-line [SUMMARY] listing exactly which workflows the box can run right now. The skills recommend methods that are actually installed — they won't ask the user to install Gaussian when EMT or LJ would already cover the question.

The trigger-test harness

run_tests.sh runs prompts in fresh claude -p sessions (180 s cap each), each tagged trigger (should activate + write correct code), no_trigger (should stay out), or borderline (either response defensible; human review).

python generate_test.py            # regenerate fixtures
bash run_tests.sh                  # full sweep
TIMEOUT_SECS=300 bash run_tests.sh # longer per-prompt cap

The cap is intentionally too short to finish a simulation — the test asks "did Claude write the right code?", not "did it run?". Every prompt says not to execute its output. evals/evals.json (per skill) holds richer prompts for manual review.

Layout

chemist/
├── ase-chemist/      # skill #1 dev source — SKILL.md, README.md, scripts/, references/, evals/
├── amber-chemist/    # skill #2 dev source — same shape
├── .claude/skills/{ase,amber}-chemist/    # project copies (what `claude -p` loads)
├── ~/.claude/skills/{ase,amber}-chemist/  # user copies (kept in parity)
├── CLAUDE.md         # design decisions + three-copy sync rules
├── run_tests.sh / generate_test.py        # trigger-test harness + fixtures
└── test-inputs/ , results/                # generated fixtures + run logs (gitignored)

Each skill has three copies: dev source (edit here), the project copy, and the user copy. Tests load the copies, not the dev source — so rsync after every edit (see Contributing).

Project documentation

ase-chemist — README.md (user-facing: backends, examples, install) · SKILL.md (trigger contract + method-selection tree) · references/ (scoped reference files).
amber-chemist — README.md · SKILL.md · references/ (md_core, remd, scoring, failure_modes, …).
Repo-level — CLAUDE.md for load-bearing design decisions and the three-copy sync rules.

Contributing

Bug reports and feature requests welcome via GitHub issues. For changes that touch the trigger contract (SKILL.md description field) or method-selection rules, please open an issue first — those are the parts that move eval results.

Workflow for skill edits:

Edit dev source under ase-chemist/ or amber-chemist/.
rsync to both loaded copies (.claude/skills/... and ~/.claude/skills/...).
bash run_tests.sh and confirm no regressions on trigger / no-trigger prompts.
Update the relevant references/*.md if you changed a contract (cross-validation threshold, method-selection rule, deferral surface).

See CLAUDE.md for the full design-decision rationale.

License

Released under the MIT License. The backends the skills orchestrate (ASE, tblite/xTB, MACE, AmberTools, Gaussian) carry their own licenses and citation requirements — install and cite each per its own terms.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
amber-chemist		amber-chemist
ase-chemist		ase-chemist
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
OPENMM_PROPOSAL.md		OPENMM_PROPOSAL.md
PLAN.md		PLAN.md
README.md		README.md
generate_test.py		generate_test.py
run_tests.sh		run_tests.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

chemist

What this is

Why you might care

A minute in the skill

Backends at a glance

Install

The trigger-test harness

Layout

Project documentation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

chemist

What this is

Why you might care

A minute in the skill

Backends at a glance

Install

The trigger-test harness

Layout

Project documentation

Contributing

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages