Scheduled Halo Exchange by philip-paul-mueller · Pull Request #980 · C2SM/icon4py

philip-paul-mueller · 2025-12-18T12:06:05Z

This PR introduces the scheduled exchange feature from GHEX into ICON4Py.

These exchange allows to call the exchange function before all work has been completed, i.e. the exchange will wait until the previous work is done. A similar feature is the "scheduled wait", that allows to initiate the receive without the need to wait on its completion.

In addition to this the function also renamed the functions related to halo exchange:

exchange() was renamed to start().
wait() was renamed to finish() (that might now return before the transfer has fully concluded).
exchange_and_wait() was renamed to exchange().

All of these functions now accepts the an argument called stream, which defaults to DEFAULT_STREAM. It is indicate how synchronization with the stream should be performed.
In case of start() it means that the actual exchange should not start until all work previously submitted to stream has finished. For finish() it means that further work, submitted to stream, should not start until the exchange has ended. For finish() it is also possible to specify BLOCK, which means that finish() waits until the transfer has fully finished.

The orchestrator was not updated, but the change were made in such a way that it continues to work in diffusion, although using the original, blocking behaviour.

Note:
The CI fails for cscs/extra, but it also does this for current main, see See this test PR: #982

philip-paul-mueller · 2025-12-18T12:06:36Z

cscs-ci run default

philip-paul-mueller · 2025-12-18T12:06:40Z

cscs-ci run extra

philip-paul-mueller · 2025-12-18T14:45:26Z

cscs-ci run default

philip-paul-mueller · 2025-12-18T14:45:28Z

cscs-ci run dace

philip-paul-mueller · 2025-12-18T14:45:32Z

cscs-ci run extra

**NOTE:** This commit still follows the old nomoclature, where `None` means default stream. Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.

philip-paul-mueller · 2025-12-19T05:48:53Z

cscs-ci run default

- There are now two protocols that describes how to extract the underlying address. They are probably at the wrong location. - `stream=None` no longer means "default stream" but is not equivalent to "do not use scheduled version". - To indicate the default stream the singelton `DefaultStream` is used. The `cupy.cuda.Stream.null` singelton was not used, because it would require that `cupy` is present. - However, use the default stream is still the default behaviour.

philip-paul-mueller · 2025-12-19T06:43:22Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T06:43:26Z

cscs-ci run dace

philip-paul-mueller · 2025-12-19T06:43:31Z

cscs-ci run extra

philip-paul-mueller · 2025-12-19T07:50:24Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T07:50:28Z

cscs-ci run dace

philip-paul-mueller · 2025-12-19T07:50:33Z

cscs-ci run extra

philip-paul-mueller · 2025-12-19T13:34:14Z

There is a failing in extra, however, this error is also present on main.

See this test PR: #982

philip-paul-mueller · 2025-12-19T14:10:31Z

cscs-ci run default

philip-paul-mueller · 2025-12-19T14:10:33Z

cscs-ci run dace

…osition.py

msimberg · 2026-03-17T16:30:28Z

cscs-ci run default

msimberg · 2026-03-17T16:30:33Z

cscs-ci run distributed

msimberg · 2026-03-17T16:39:17Z

cscs-ci run default

msimberg · 2026-03-17T16:39:22Z

cscs-ci run distributed

havogt · 2026-03-17T17:22:44Z

+"""
+
+
+class Block:


Suggested change

class Block:

class BlockType:

to avoid accidentally passing Block instead of BLOCK. Or alternatively make call it _Block and use type[BLOCK] as annotation? Not sure which option is best, but currently it's just too tempting to pass Block...

Actually we need to make this a proper Singelton otherwise we might have prblems if someone does Block()

Maybe this? @egparedes

class BlockType: _instance = None def __new__(cls): if cls._instance is None: cls._instance = super().__new__(cls) return cls._instance BLOCK = BlockType()

According to SO this should be the correct way, although the SO answer is way more fancy, but do we need that?

I have implemented it, but improvements are appreciated.

havogt

lgtm

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>

havogt · 2026-03-18T08:46:06Z

cscs-ci run default

havogt · 2026-03-18T08:46:15Z

cscs-ci run distributed

havogt · 2026-03-18T10:52:07Z

cscs-ci run default

havogt · 2026-03-18T10:52:11Z

cscs-ci run distributed

github-actions · 2026-03-18T10:52:35Z

Mandatory Tests

Please make sure you run these tests via comment before you merge!

cscs-ci run default
cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

request was from an early state. we'll address further cleanup in future PRs.

* main: (29 commits) Scheduled Halo Exchange (#980) Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114) Muphys: Lowering with single precision (#1101) Add single-rank lsq pseudoinv factory test (#1099) Cleanup Diffusion config (#1060) Fortran bindings: fix numpy allocation and cleanups (#1112) fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111) py2fgen: don't recompile if unchanged (#1110) CI for standalone_driver (#1070) Update mpi4py and pymetis groups to make them optional (#1100) Bump mshick/add-pr-comment from 2 to 3 (#1109) Use inout fields for full_muphys as well (#1108) Update GPU configuration for graupel (#1104) Move the mask of _q_t_update outside in graupel (#1093) Update gt4py to v1.1.7 (#1105) cleanup for ugly if condition of single node default in lsq coeffs (#1103) Domain decomposition and halo construction (#540) Muphys: Add flag to wait for graupel completion (#1095) Give each gt4py program a return type hint (#1087) Turn data download off for distributed CI (#1092) ...

* main: Scheduled Halo Exchange (#980) Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114) Muphys: Lowering with single precision (#1101) Add single-rank lsq pseudoinv factory test (#1099) Cleanup Diffusion config (#1060) Fortran bindings: fix numpy allocation and cleanups (#1112) fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111) py2fgen: don't recompile if unchanged (#1110) CI for standalone_driver (#1070) Update mpi4py and pymetis groups to make them optional (#1100) Bump mshick/add-pr-comment from 2 to 3 (#1109) Use inout fields for full_muphys as well (#1108) Update GPU configuration for graupel (#1104) Move the mask of _q_t_update outside in graupel (#1093) Update gt4py to v1.1.7 (#1105) cleanup for ugly if condition of single node default in lsq coeffs (#1103)

In [PR#980](#980) introduced streams into the halo exchanges. For this also `DEFAULT_STREAM`, which models the default stream and implements the [CUDA Stream Protocol](https://nvidia.github.io/cuda-python/cuda-core/latest/interoperability.html#cuda-stream-protocol). However, the original implementation identified as protocol version `1` instead of version `0`. Because of a related bug in [GHEX](ghex-org/GHEX#202) this error was hidden. This PR fixes the Python implementation and also updates GHEX.

philip-paul-mueller added 2 commits December 18, 2025 09:26

Modified versions.

26d685b

Made some addaptions towards the asynchronous exchange.

6518ce9

More uniformity.

ec7fca2

havogt reviewed Dec 18, 2025

View reviewed changes

Comment thread model/common/src/icon4py/model/common/decomposition/definitions.py Outdated

havogt reviewed Dec 18, 2025

View reviewed changes

Comment thread model/common/src/icon4py/model/common/decomposition/definitions.py Outdated

philip-paul-mueller added 2 commits December 18, 2025 15:43

Updated ghex version.

1f5e9e6

Fixed at least that issue.

e69cb82

Made the components aware of async stuff.

f60a1f8

**NOTE:** This commit still follows the old nomoclature, where `None` means default stream. Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.

philip-paul-mueller added 2 commits December 19, 2025 07:30

Fixed some stray stream argument.

e11da41

philip-paul-mueller added 2 commits December 19, 2025 08:10

Realized that the strams are disabled.

383f959

Let's see if that help, but it is strange that it takes longer now.

41322f7

philip-paul-mueller requested a review from havogt December 19, 2025 08:45

Updated ghex version.

ae6db39

philip-paul-mueller changed the title ~~[DO NOT MERGE]: Scheduled Halo Exchange~~ Scheduled Halo Exchange Dec 19, 2025

msimberg reviewed Mar 17, 2026

View reviewed changes

Comment thread model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py Outdated

msimberg added 3 commits March 17, 2026 17:24

Update model/common/src/icon4py/model/common/decomposition/mpi_decomp…

e226f9d

…osition.py

Point to test.pypi for ghex

a99d701

Remove custom ghex branch

e648e54

Merge remote-tracking branch 'origin/main' into phimuell__async_mpi_test

4ad834e

havogt reviewed Mar 17, 2026

View reviewed changes

philip-paul-mueller added 2 commits March 18, 2026 07:59

Made BLOCK a singelton.

0786fb8

Merge remote-tracking branch 'origin/main' into phimuell__async_mpi_test

1d0254e

havogt approved these changes Mar 18, 2026

View reviewed changes

Update model/common/src/icon4py/model/common/metrics/metrics_factory.py

61f9873

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>

havogt reviewed Mar 18, 2026

View reviewed changes

Comment thread pyproject.toml Outdated

philip-paul-mueller added 2 commits March 18, 2026 09:34

Updated GHEX

7cde49a

Fixed formatting.

5be457a

msimberg approved these changes Mar 18, 2026

View reviewed changes

Merge branch 'main' into phimuell__async_mpi_test

8dc3b30

philip-paul-mueller merged commit a967314 into main Mar 18, 2026
54 checks passed

philip-paul-mueller mentioned this pull request Mar 25, 2026

Fixed Cuda Stream Protocol #1123

Merged

Uh oh!

Conversation

philip-paul-mueller commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 18, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

philip-paul-mueller commented Dec 19, 2025

Uh oh!

Uh oh!

msimberg commented Mar 17, 2026

Uh oh!

msimberg commented Mar 17, 2026

Uh oh!

msimberg commented Mar 17, 2026

Uh oh!

msimberg commented Mar 17, 2026

Uh oh!

havogt Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

havogt Mar 17, 2026

Choose a reason for hiding this comment

Uh oh!

havogt Mar 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

philip-paul-mueller Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

havogt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

havogt commented Mar 18, 2026

Uh oh!

havogt commented Mar 18, 2026

Uh oh!

havogt commented Mar 18, 2026

Uh oh!

havogt commented Mar 18, 2026

Uh oh!

github-actions Bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

philip-paul-mueller commented Dec 18, 2025 •

edited

Loading

philip-paul-mueller commented Dec 19, 2025 •

edited

Loading

havogt Mar 17, 2026 •

edited

Loading