Skip to content

Naive neighbor list is wrong for non-periodic systems carrying a cell #575

@lil-lon

Description

@lil-lon

Summary

nvalchemi-toolkit-ops's batch naive neighbor list method (batch_naive_neighbor_list), which is torch_sim's default neighbor list, silently returns an incorrect neighbor list (and therefore wrong energies/forces) for non-periodic systems that still carry a non-zero cell.
Issue in the upstream repository: NVIDIA/nvalchemi-toolkit-ops#104

It shows up in common setups:

  • a molecule (pbc=[F,F,F]) carrying a dummy cell (e.g. ase.Atoms.get_cell(complete=True) returns a 1 Å identity)
  • a slab (pbc=[T,T,F]) whose atoms straddle the non-periodic z boundary

It is silent: the neighbor count can look right, but actually distances/energies etc are wrong.
Downstream, e.g. a MACE single point on an isolated molecule comes back with the wrong energy.

Note that only the default (naive) backend is affected.
The other backends, such as vesin_nl and alchemiops cell_list, return correct values.

Code sample

Minimal reproduction: a pbc=False H2O molecule carrying a 1 Å dummy cell, run through ts.static with the MACE-MPA-0 model while swapping the neighbor list backend, and compared against a plain ASE MACE single point.

import torch
from ase.build import molecule
from mace.calculators import mace_mp

import torch_sim as ts
from torch_sim.models.mace import MaceModel
from torch_sim.neighbors import (
    alchemiops_nl_cell_list,
    alchemiops_nl_n2,
    vesin_nl,
)

device = "cuda"
dtype = torch.float64
MODEL = "medium-mpa-0"

# Non-periodic H2O carrying a 1 A dummy cell. pbc stays False, so the energy
# must be cell-independent and match a plain ASE MACE single point.
atoms = molecule("H2O")
atoms.set_cell([1.0, 1.0, 1.0])

# Reference: the standard ASE MACE calculator, which uses its own
# neighbor list and is unaffected by the dummy cell.
ref_atoms = atoms.copy()
ref_atoms.calc = mace_mp(
    model=MODEL, dispersion=False, device=device, default_dtype="float64"
)
ref_energy = ref_atoms.get_potential_energy()

raw_model = mace_mp(
    model=MODEL,
    dispersion=False,
    device=device,
    default_dtype="float64",
    return_raw_model=True,
)


def energy_with(neighbor_list_fn):
    model = MaceModel(
        model=raw_model,
        device=torch.device(device),
        dtype=dtype,
        neighbor_list_fn=neighbor_list_fn,
        compute_forces=True,
        compute_stress=True,
    )
    result = ts.static(system=[atoms.copy()], model=model)[0]
    return result["potential_energy"].reshape(-1)[-1].item()


backends = {
    "naive (default)": alchemiops_nl_n2,
    "cell_list": alchemiops_nl_cell_list,
    "vesin": vesin_nl,
}

print(f"ASE reference   : E = {ref_energy:.6f} eV")
for name, fn in backends.items():
    energy = energy_with(fn)
    delta = energy - ref_energy
    print(f"{name:16s}: E = {energy:.6f} eV  (dE = {delta:+.6f} eV)")

Output — only the default naive backend disagrees with ASE, while cell_list and vesin reproduce the reference energy exactly:

ASE reference   : E = -13.785927 eV
naive (default) : E = 25.357815 eV  (dE = +39.143741 eV)
cell_list       : E = -13.785927 eV  (dE = +0.000000 eV)
vesin           : E = -13.785927 eV  (dE = +0.000000 eV)

Possible solutions (looking for guidance)

  1. Wait for the upstream fix and update
  2. Quick guard for fully non-periodic systems: in alchemiops_nl_n2, when not pbc.any(), call the underlying batch_naive_neighbor_list with pbc=None and cell=None (the no-PBC kernel, which skips wrapping entirely). pbc=None while cell is still passed raises ValueError ("If cell is provided, pbc must also be provided"), so the cell must be dropped too; the no-PBC path returns a 2-tuple, which the existing len(res) == 3 handling in alchemiops_nl_n2 already absorbs as zero shifts. Cheap, safe, and fixes the molecule case immediately. Slabs / partial pbc need more, and I think it's better to wait for upstream fix to mitigate the complexity.

Questions:

  • Would you like a stop-gap guard now (option 2), or prefer to wait for the upstream fix?
  • OK to add the non-periodic dummy-cell test as part of Verify everything that should work with zero volume cell does work. #549?
    • Setting a dummy cell on a molecule may not be physically meaningful, but the neighbor list and energy/force should still be correct, as vesin and cell_list already are for the same input.

I'm happy to open PRs once we agree on the directions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions