You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nvalchemi-toolkit-ops's batch naive neighbor list method (batch_naive_neighbor_list), which is torch_sim's default neighbor list, silently returns an incorrect neighbor list (and therefore wrong energies/forces) for non-periodic systems that still carry a non-zero cell.
Issue in the upstream repository: NVIDIA/nvalchemi-toolkit-ops#104
It shows up in common setups:
a molecule (pbc=[F,F,F]) carrying a dummy cell (e.g. ase.Atoms.get_cell(complete=True) returns a 1 Å identity)
a slab (pbc=[T,T,F]) whose atoms straddle the non-periodic z boundary
It is silent: the neighbor count can look right, but actually distances/energies etc are wrong.
Downstream, e.g. a MACE single point on an isolated molecule comes back with the wrong energy.
Note that only the default (naive) backend is affected.
The other backends, such as vesin_nl and alchemiops cell_list, return correct values.
Code sample
Minimal reproduction: a pbc=False H2O molecule carrying a 1 Å dummy cell, run through ts.static with the MACE-MPA-0 model while swapping the neighbor list backend, and compared against a plain ASE MACE single point.
importtorchfromase.buildimportmoleculefrommace.calculatorsimportmace_mpimporttorch_simastsfromtorch_sim.models.maceimportMaceModelfromtorch_sim.neighborsimport (
alchemiops_nl_cell_list,
alchemiops_nl_n2,
vesin_nl,
)
device="cuda"dtype=torch.float64MODEL="medium-mpa-0"# Non-periodic H2O carrying a 1 A dummy cell. pbc stays False, so the energy# must be cell-independent and match a plain ASE MACE single point.atoms=molecule("H2O")
atoms.set_cell([1.0, 1.0, 1.0])
# Reference: the standard ASE MACE calculator, which uses its own# neighbor list and is unaffected by the dummy cell.ref_atoms=atoms.copy()
ref_atoms.calc=mace_mp(
model=MODEL, dispersion=False, device=device, default_dtype="float64"
)
ref_energy=ref_atoms.get_potential_energy()
raw_model=mace_mp(
model=MODEL,
dispersion=False,
device=device,
default_dtype="float64",
return_raw_model=True,
)
defenergy_with(neighbor_list_fn):
model=MaceModel(
model=raw_model,
device=torch.device(device),
dtype=dtype,
neighbor_list_fn=neighbor_list_fn,
compute_forces=True,
compute_stress=True,
)
result=ts.static(system=[atoms.copy()], model=model)[0]
returnresult["potential_energy"].reshape(-1)[-1].item()
backends= {
"naive (default)": alchemiops_nl_n2,
"cell_list": alchemiops_nl_cell_list,
"vesin": vesin_nl,
}
print(f"ASE reference : E = {ref_energy:.6f} eV")
forname, fninbackends.items():
energy=energy_with(fn)
delta=energy-ref_energyprint(f"{name:16s}: E = {energy:.6f} eV (dE = {delta:+.6f} eV)")
Output — only the default naive backend disagrees with ASE, while cell_list and vesin reproduce the reference energy exactly:
ASE reference : E = -13.785927 eV
naive (default) : E = 25.357815 eV (dE = +39.143741 eV)
cell_list : E = -13.785927 eV (dE = +0.000000 eV)
vesin : E = -13.785927 eV (dE = +0.000000 eV)
Possible solutions (looking for guidance)
Wait for the upstream fix and update
Quick guard for fully non-periodic systems: in alchemiops_nl_n2, when not pbc.any(), call the underlying batch_naive_neighbor_list with pbc=None and cell=None (the no-PBC kernel, which skips wrapping entirely). pbc=None while cell is still passed raises ValueError ("If cell is provided, pbc must also be provided"), so the cell must be dropped too; the no-PBC path returns a 2-tuple, which the existing len(res) == 3 handling in alchemiops_nl_n2 already absorbs as zero shifts. Cheap, safe, and fixes the molecule case immediately. Slabs / partial pbc need more, and I think it's better to wait for upstream fix to mitigate the complexity.
Questions:
Would you like a stop-gap guard now (option 2), or prefer to wait for the upstream fix?
Setting a dummy cell on a molecule may not be physically meaningful, but the neighbor list and energy/force should still be correct, as vesin and cell_list already are for the same input.
I'm happy to open PRs once we agree on the directions.
Summary
nvalchemi-toolkit-ops's batch naive neighbor list method (
batch_naive_neighbor_list), which is torch_sim's default neighbor list, silently returns an incorrect neighbor list (and therefore wrong energies/forces) for non-periodic systems that still carry a non-zero cell.Issue in the upstream repository: NVIDIA/nvalchemi-toolkit-ops#104
It shows up in common setups:
pbc=[F,F,F]) carrying a dummy cell (e.g.ase.Atoms.get_cell(complete=True)returns a 1 Å identity)pbc=[T,T,F]) whose atoms straddle the non-periodic z boundaryIt is silent: the neighbor count can look right, but actually distances/energies etc are wrong.
Downstream, e.g. a MACE single point on an isolated molecule comes back with the wrong energy.
Note that only the default (naive) backend is affected.
The other backends, such as
vesin_nland alchemiopscell_list, return correct values.Code sample
Minimal reproduction: a
pbc=FalseH2O molecule carrying a 1 Å dummy cell, run throughts.staticwith the MACE-MPA-0 model while swapping the neighbor list backend, and compared against a plain ASE MACE single point.Output — only the default naive backend disagrees with ASE, while cell_list and vesin reproduce the reference energy exactly:
Possible solutions (looking for guidance)
alchemiops_nl_n2, whennot pbc.any(), call the underlyingbatch_naive_neighbor_listwithpbc=Noneandcell=None(the no-PBC kernel, which skips wrapping entirely).pbc=Nonewhilecellis still passed raisesValueError("If cell is provided, pbc must also be provided"), so the cell must be dropped too; the no-PBC path returns a 2-tuple, which the existinglen(res) == 3handling inalchemiops_nl_n2already absorbs as zero shifts. Cheap, safe, and fixes the molecule case immediately. Slabs / partial pbc need more, and I think it's better to wait for upstream fix to mitigate the complexity.Questions:
I'm happy to open PRs once we agree on the directions.