Skip to content

Various updates for GMAO, NAS, and NCCS#2029

Draft
mathomp4 wants to merge 29 commits into
JCSDA:developfrom
GMAO-SI-Team:feature/nas-nccs-test
Draft

Various updates for GMAO, NAS, and NCCS#2029
mathomp4 wants to merge 29 commits into
JCSDA:developfrom
GMAO-SI-Team:feature/nas-nccs-test

Conversation

@mathomp4
Copy link
Copy Markdown
Collaborator

Description

This PR updates the NASA Discover (NCCS) tier1 site configuration to "match" the new tier2 discover-gmao where I'm doing testing. It sort of "modernizes" the tier1 discover and hopefully will let @ashley314 or myself use the util/gmao/batch_install.sh script to do easier installs a la @climbfuji

I've also added py-cmocean as a dependency of geos-gcm-env (see #1502), updates several package versions in common configs, and includes GMAO-internal site and utility improvements.

Dependencies

I guess #2026 ? Technically, this PR has the same update to submodules. I wanted to get this PR in just so I didn't forget and to make sure nothing breaks CI elsewhere.

Issues addressed

Applications affected

  • GEOS (addition of py-cmocean to geos-gcm-env)
  • Anything that might depend on libpng being exactly 1.6.37? But I dou

Systems affected

All platforms:

  • py-cmocean added as a dependency of geos-gcm-env
    • libpng bumped from 1.6.37 to 1.6.55 (needed for py-cmocean)
    • py-contourpy bumped from 3.7.4 to 3.10.8 (needed for py-cmocean)
  • FMS yaml support enabled in configs/common/packages.yaml
  • repos/builtin submodule updated (libpng, ncurses, openblas, py-matplotlib, git-lfs)
  • geos-dev and geos-dev-nag templates cleaned up for dual esmf debug/release builds

NASA Discover (NCCS) — new tier1 site config:

  • Replaces the old discover-scu17 placeholder with a full discover site config
  • Includes external package definitions for gcc 14.2.0 + oneapi 2024.2.0 and 2025.3.0 compilers

GMAO-internal (util/gmao, tier2 sites):

  • Various improvements to batch_install.sh (scheduler handling, logging, workarounds)
  • New utility scripts: monitor_install.py (monitor running installs) and patch_ecbuild_ectrans.py (ectrans/oneapi workaround at NAS)
  • New discover-gmao tier2 site (gcc 15.2.0 + oneapi, GMAO-specific cache paths)
  • NAS gcc 13→14 upgrade, macos.gmao NAG openblas static-only workaround

Testing

  • CI: Note whether the automatic tests (GitHub actions tests that run automatically for every commit) pass or not
    • GitHub actions CI tests pass
    • GitHub actions CI tests do not pass (provide explanation)
    • GitHub actions CI tests skipped (provide explanation if necessary)
  • New tests added: List and describe any new tests added to GitHub actions
    • ...
  • Additional testing: Add information on any additional tests conducted
    • ...

Checklist

  • This PR addresses one issue/problem/enhancement or has a very good reason for not doing so.
  • These changes have been tested on the affected systems and applications.
  • All dependency PRs/issues have been resolved and this PR can be merged.
  • All necessary updates to the documentation (spack-stack wiki) will be made when this PR is merged

@mathomp4 mathomp4 changed the title Move GEOS envs to ESMF 9.0.0b11d Various updates for GMAO, NAS, and NCCS May 29, 2026
@mathomp4 mathomp4 self-assigned this May 29, 2026
@mathomp4 mathomp4 requested a review from climbfuji May 29, 2026 19:04
Comment thread configs/sites/tier1/discover/packages_oneapi-2024.2.0.yaml Outdated
Comment thread configs/sites/tier1/discover/packages_oneapi-2025.3.0.yaml Outdated
esmf:
require:
- +python
- '@=9.0.0b11'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was part of my fun with ESMF debug handling exploration on my mac. For reasons I don't yet know ESMF_BOPT=O just doesn't seem to work with gfortranclang (at least).

I was trying everything to figure out how to "If on mac, only do esmf +debug but do both debug and non-debug elsewhere". Eventually the "fix" was a hack in my install script to just remove a line from spack.yaml before concretizing or something. 🤷🏼

fi
slurm_log="${job_name}.log"
echo "INFO: salloc output redirected to ${slurm_log}"
salloc --nodes=1 --ntasks-per-node=${tpn} --time=${walltime} \
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that on Blueback at least, this isn't working as expected. From GitHub actions, the ${script} is still executed on the login node. It works from the command line, though. Replacing salloc with srun should fix this, but I am still testing.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. It's doing as expected at NCCS. This might be due to how things are setup with SLURM on discover? When I do an salloc there, it dumps me into a bash shell.

But I know on some SLURM setups I've been on, you need to do things like srun --pty bash to get that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[INSTALL]: openblas 0.3.32 [INSTALL]: libpng 1.6.55 [INSTALL]: py-matplotlib 3.10.8

2 participants