Skip to content

Add PADS generation script and config#136

Open
hanaol wants to merge 6 commits into
mainfrom
hanaol/pads-generation
Open

Add PADS generation script and config#136
hanaol wants to merge 6 commits into
mainfrom
hanaol/pads-generation

Conversation

@hanaol

@hanaol hanaol commented May 19, 2026

Copy link
Copy Markdown
Collaborator

Summary

This PR implements the generation of precomputed atomic density superposition (PADS) inputs for the model. This serves as an alternative to the SAD approach used in VASP and avoids the need to run or invoke VASP during the workflow.

  • Adds src/electrai/pads_generation/create_pads.py: reconstructs charge density grids from atomic radial profiles using a Numba-accelerated parallel kernel
  • Adds src/electrai/configs/config_pads.yaml: config file for the PADS generation pipeline
  • Adds numba as a project dependency

This is also relevant to #96.

Hananeh Oliaei and others added 2 commits May 19, 2026 14:55
Adds create_pads.py to reconstruct charge density grids from atomic
radial profiles using a Numba-accelerated kernel, and config_pads.yaml
to src/electrai/configs/. Also adds numba as a project dependency.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
zarr v3 raises an error when chunks=False is passed to root.create().
Resolve falsy chunks to the full array shape (single chunk) instead.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hanaol

hanaol commented May 19, 2026

Copy link
Copy Markdown
Collaborator Author

@claude review this PR

@claude

claude Bot commented May 19, 2026

Copy link
Copy Markdown

Claude finished @hanaol's task in 1m 34s —— View job


PR Review: Add PADS generation script and config

  • Gather context and read changed files
  • Review create_pads.py
  • Review config_pads.yaml
  • Review zarr_writer.py changes
  • Review dependency additions
  • Post final review

The implementation is generally clean and the Numba-accelerated kernel is well-structured. A few issues to address before merging:


Potential Bug: nmax underestimation for oblique lattices

create_pads.py:119-120

lat_lengths = np.linalg.norm(cell, axis=1)
nmax = int(np.ceil(r_max / lat_lengths.min())) + extra_shells

lat_lengths are the norms of the lattice vectors, not the perpendicular distances between opposite faces. For oblique (non-orthogonal) cells — which are common in materials datasets — the distance between parallel face planes is V / |a × b| etc., which can be much smaller than the lattice vector lengths. Using lat_lengths.min() will undercount the number of required periodic images, potentially missing atom contributions at the cell boundary.

The correct formula uses the minimum inter-planar distance:

a, b, c = cell[0], cell[1], cell[2]
d_ab = np.linalg.norm(np.cross(a, b)) / np.linalg.norm(cell)  # not right either
# Better: perpendicular heights
h_a = vol / np.linalg.norm(np.cross(b, c))
h_b = vol / np.linalg.norm(np.cross(a, c))
h_c = vol / np.linalg.norm(np.cross(a, b))
nmax = int(np.ceil(r_max / min(h_a, h_b, h_c))) + extra_shells

Fix this →


Implicit assumption: uniform radial spacing

create_pads.py:111

dr = float(r_valid[1] - r_valid[0])

_accumulate_images uses dr as a constant step size for O(1) index lookup, which is only valid if r_valid is uniformly spaced. This invariant is enforced in main() (lines 236–240) but not in reconstruct_from_radial_function. If anyone calls reconstruct_from_radial_function directly with non-uniform data it will silently produce wrong interpolation results. Add a docstring note, or add a cheap assertion:

assert np.allclose(np.diff(r_valid), r_valid[1] - r_valid[0], rtol=1e-3), \
    "r_valid must be uniformly spaced"

Deprecated pymatgen API: site.specie

create_pads.py:173

specie = str(site.specie)

site.specie is deprecated in modern pymatgen. Use site.species_string instead:

specie = site.species_string

Fix this →


Config contains cluster-specific absolute paths

config_pads.yaml:2,5,6

input_file: /scratch/gpfs/ROSENGROUP/ho0950/electrai/data/...
radial_dir: /scratch/gpfs/ROSENGROUP/common/...
zval_file: /scratch/gpfs/ROSENGROUP/common/...

These hardcoded Princeton HPC paths will not work for other users or CI. Consider using placeholder values (e.g. ??? for Hydra-style required fields) or environment variable references, with a comment explaining what each path should point to.


Type hint doesn't reflect False as valid chunks value

zarr_writer.py:29

chunks: tuple[int, int, int] = (16, 16, 16),

After the fix allowing chunks=False, the type hint is stale. Update to:

chunks: tuple[int, int, int] | bool = (16, 16, 16),

Fix this →


Minor: missing __init__.py in pads_generation/

zarr_conversion/ has an __init__.py but pads_generation/ does not, making the package structure inconsistent. Add an empty __init__.py.


No tests

There are no tests for the new PADS generation logic. At minimum, a smoke test with a small synthetic structure (using a known radial profile like a Gaussian) would help verify the normalization and periodic image logic are correct.


Nits

  • zarr_writer.py:50: docstring still says Default: True for write_diff but the default is False. Fix this →
  • create_pads.py:85-87: Adding 0.5/N shifts voxel positions to cell centers — this is intentional, but it would be worth a brief comment explaining the coordinate convention.

@hanaol hanaol requested a review from forklady42 May 19, 2026 20:22
forklady42 and others added 2 commits May 20, 2026 11:21
Resolved conflicts:
- src/electrai/zarr_conversion/zarr_writer.py: took main's version
  of the chunks-fallback fix (semantically equivalent to ours)
- uv.lock: regenerated via 'uv lock' against merged pyproject.toml,
  adding numba 0.65.1 + llvmlite 0.47.0

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the empty __init__.py in zarr_conversion/ so the package
structure is consistent. Addresses PR #136 review feedback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@forklady42

Copy link
Copy Markdown
Collaborator

@claude re-review

@claude

claude Bot commented May 20, 2026

Copy link
Copy Markdown

Claude finished @forklady42's task in 1m 32s —— View job


PR Re-Review: Add PADS generation script and config

  • Read changed files
  • Review create_pads.py
  • Review config_pads.yaml
  • Review dependency additions
  • Post final review

Compared to the previous review, here's the current status of each item:

Addressed since last review

  • __init__.py added to pads_generation/ — fixed.
  • write_diff docstring in zarr_writer.py:50 now correctly says Default: False — fixed.
  • Uniform spacing invariant is now explicitly documented via a comment in main() (lines 234–239) explaining the resampling step — addressed in the right place.
  • zarr_writer.py chunks default changed from (16, 16, 16) to None, which is a better default.

Still outstanding

Bug: nmax underestimation for oblique lattices

create_pads.py:119-120 — unchanged from last review. Using lattice vector lengths instead of perpendicular inter-planar distances will silently miss periodic images for non-orthogonal cells. This is the highest-priority issue given materials datasets frequently have oblique cells.

# Still present:
lat_lengths = np.linalg.norm(cell, axis=1)
nmax = int(np.ceil(r_max / lat_lengths.min())) + extra_shells

Fix using perpendicular heights:

vol = abs(np.dot(cell[0], np.cross(cell[1], cell[2])))
h_a = vol / np.linalg.norm(np.cross(cell[1], cell[2]))
h_b = vol / np.linalg.norm(np.cross(cell[0], cell[2]))
h_c = vol / np.linalg.norm(np.cross(cell[0], cell[1]))
nmax = int(np.ceil(r_max / min(h_a, h_b, h_c))) + extra_shells

Fix this →


Deprecated pymatgen API: site.specie

create_pads.py:173 — unchanged. site.specie is deprecated; use site.species_string directly (no str() wrapper needed):

# Change:
specie = str(site.specie)
# To:
specie = site.species_string

Fix this →


New: chunks=False passed to function that only accepts tuple | None

create_pads.py:193:

write_chgcar_to_zarr(cd, out_zarr, write_diff=False, chunks=False)

zarr_writer.py now declares chunks: tuple[int, int, int] | None = None. Passing chunks=False works at runtime (because False or total_density.shape evaluates to total_density.shape) but it's a type error and reads as unintentional. Use chunks=None instead:

write_chgcar_to_zarr(cd, out_zarr, write_diff=False, chunks=None)

Fix this →


Config contains cluster-specific absolute paths

config_pads.yaml:2,5,6 — unchanged. The input_file, radial_dir, and zval_file paths are hardcoded Princeton HPC paths. Consider replacing with placeholder comments or ??? (Hydra-style required fields) so it's clear these need to be set by the user.


No tests

The PADS reconstruction logic remains untested. A small smoke test using a known radial profile (e.g., a Gaussian) on a synthetic orthogonal cell would catch regressions and validate the normalization/periodic-image logic.


Summary

Issue Status
nmax oblique lattice bug Still present
site.specie deprecation Still present
Config absolute paths Still present
chunks=False type inconsistency New (introduced by zarr_writer change)
__init__.py missing Fixed
write_diff docstring default Fixed
Uniform spacing comment Fixed (via main() comment)
No tests Still absent

The oblique-lattice nmax bug is the only correctness issue; the rest are code quality concerns.

write_chgcar_to_zarr declares chunks: tuple[int, int, int] | None,
so passing False was a type mismatch even though it worked at runtime.
Addresses PR #136 review feedback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@forklady42

Copy link
Copy Markdown
Collaborator

@hanaol I addressed the merge conflicts and most of the code quality concerns. What are your thoughts on Claude's nmax comment and site.specie deprecation? Does the nmax oblique lattice bug impact any of our materials, or is it irrelevant here?

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@hanaol

hanaol commented May 22, 2026

Copy link
Copy Markdown
Collaborator Author

@forklady42 I benchmarked the proposed code (based on nmax) across different materials, and the results were identical to those obtained with our current approach; so, I'll skip the fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants