Skip to content

Scheduled Halo Exchange#980

Merged
philip-paul-mueller merged 61 commits into
mainfrom
phimuell__async_mpi_test
Mar 18, 2026
Merged

Scheduled Halo Exchange#980
philip-paul-mueller merged 61 commits into
mainfrom
phimuell__async_mpi_test

Conversation

@philip-paul-mueller

@philip-paul-mueller philip-paul-mueller commented Dec 18, 2025

Copy link
Copy Markdown
Collaborator

This PR introduces the scheduled exchange feature from GHEX into ICON4Py.

These exchange allows to call the exchange function before all work has been completed, i.e. the exchange will wait until the previous work is done. A similar feature is the "scheduled wait", that allows to initiate the receive without the need to wait on its completion.

In addition to this the function also renamed the functions related to halo exchange:

  • exchange() was renamed to start().
  • wait() was renamed to finish() (that might now return before the transfer has fully concluded).
  • exchange_and_wait() was renamed to exchange().

All of these functions now accepts the an argument called stream, which defaults to DEFAULT_STREAM. It is indicate how synchronization with the stream should be performed.
In case of start() it means that the actual exchange should not start until all work previously submitted to stream has finished. For finish() it means that further work, submitted to stream, should not start until the exchange has ended. For finish() it is also possible to specify BLOCK, which means that finish() waits until the transfer has fully finished.

The orchestrator was not updated, but the change were made in such a way that it continues to work in diffusion, although using the original, blocking behaviour.

Note:
The CI fails for cscs/extra, but it also does this for current main, see See this test PR: #982

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run extra

Comment thread model/common/src/icon4py/model/common/decomposition/definitions.py Outdated
Comment thread model/common/src/icon4py/model/common/decomposition/definitions.py Outdated
@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run dace

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run extra

**NOTE:**
This commit still follows the old nomoclature, where `None` means default stream.
Most likely this will change such that `None` means "not using `schedule_*()` functions and another sigelton is used for it.
@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

- There are now two protocols that describes how to extract the underlying address.
	They are probably at the wrong location.
- `stream=None` no longer means "default stream" but is not equivalent to "do not use scheduled version".
- To indicate the default stream the singelton `DefaultStream` is used.
	The `cupy.cuda.Stream.null` singelton was not used, because it would require that `cupy` is present.
- However, use the default stream is still the default behaviour.
@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run dace

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run extra

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run dace

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run extra

@philip-paul-mueller

philip-paul-mueller commented Dec 19, 2025

Copy link
Copy Markdown
Collaborator Author

There is a failing in extra, however, this error is also present on main.

See this test PR: #982

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run default

@philip-paul-mueller

Copy link
Copy Markdown
Collaborator Author

cscs-ci run dace

@philip-paul-mueller philip-paul-mueller changed the title [DO NOT MERGE]: Scheduled Halo Exchange Scheduled Halo Exchange Dec 19, 2025
Comment thread model/common/src/icon4py/model/common/decomposition/mpi_decomposition.py Outdated
@msimberg

Copy link
Copy Markdown
Contributor

cscs-ci run default

@msimberg

Copy link
Copy Markdown
Contributor

cscs-ci run distributed

@msimberg

Copy link
Copy Markdown
Contributor

cscs-ci run default

@msimberg

Copy link
Copy Markdown
Contributor

cscs-ci run distributed

"""


class Block:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class Block:
class BlockType:

to avoid accidentally passing Block instead of BLOCK. Or alternatively make call it _Block and use type[BLOCK] as annotation? Not sure which option is best, but currently it's just too tempting to pass Block...

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually we need to make this a proper Singelton otherwise we might have prblems if someone does Block()

@havogt havogt Mar 17, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this? @egparedes

class BlockType:
    _instance = None

    def __new__(cls):
        if cls._instance is None:
            cls._instance = super().__new__(cls)
        return cls._instance

BLOCK = BlockType()

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to SO this should be the correct way, although the SO answer is way more fancy, but do we need that?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have implemented it, but improvements are appreciated.

@havogt havogt left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Co-authored-by: Mikael Simberg <mikael.simberg@iki.fi>
Comment thread pyproject.toml Outdated
@havogt

havogt commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

cscs-ci run default

@havogt

havogt commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

cscs-ci run distributed

@havogt

havogt commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

cscs-ci run default

@havogt

havogt commented Mar 18, 2026

Copy link
Copy Markdown
Contributor

cscs-ci run distributed

@github-actions

Copy link
Copy Markdown

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

@havogt havogt dismissed muellch’s stale review March 18, 2026 11:49

request was from an early state. we'll address further cleanup in future PRs.

@philip-paul-mueller philip-paul-mueller merged commit a967314 into main Mar 18, 2026
54 checks passed
jcanton added a commit that referenced this pull request Mar 18, 2026
* main: (29 commits)
  Scheduled Halo Exchange (#980)
  Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114)
  Muphys: Lowering with single precision (#1101)
  Add single-rank lsq pseudoinv factory test (#1099)
  Cleanup Diffusion config (#1060)
  Fortran bindings: fix numpy allocation and cleanups (#1112)
  fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111)
  py2fgen: don't recompile if unchanged (#1110)
  CI for standalone_driver (#1070)
  Update mpi4py and pymetis groups to make them optional (#1100)
  Bump mshick/add-pr-comment from 2 to 3 (#1109)
  Use inout fields for full_muphys as well (#1108)
  Update GPU configuration for graupel (#1104)
  Move the mask of _q_t_update outside in graupel (#1093)
  Update gt4py to v1.1.7 (#1105)
  cleanup for ugly if condition of single node default in lsq coeffs (#1103)
  Domain decomposition and halo construction (#540)
  Muphys: Add flag to wait for graupel completion (#1095)
  Give each gt4py program a return type hint (#1087)
  Turn data download off for distributed CI (#1092)
  ...
jcanton added a commit that referenced this pull request Mar 19, 2026
* main:
  Scheduled Halo Exchange (#980)
  Add missing metrics fields to `test_parallel_grid_manager.py` test (#1114)
  Muphys: Lowering with single precision (#1101)
  Add single-rank lsq pseudoinv factory test (#1099)
  Cleanup Diffusion config (#1060)
  Fortran bindings: fix numpy allocation and cleanups (#1112)
  fix: fix gt4py metrics extractor in the StencilTest benchmarking (#1111)
  py2fgen: don't recompile if unchanged (#1110)
  CI for standalone_driver (#1070)
  Update mpi4py and pymetis groups to make them optional (#1100)
  Bump mshick/add-pr-comment from 2 to 3 (#1109)
  Use inout fields for full_muphys as well (#1108)
  Update GPU configuration for graupel (#1104)
  Move the mask of _q_t_update outside in graupel (#1093)
  Update gt4py to v1.1.7 (#1105)
  cleanup for ugly if condition of single node default in lsq coeffs (#1103)
philip-paul-mueller added a commit that referenced this pull request Mar 27, 2026
In [PR#980](#980) introduced streams
into the halo exchanges. For this also `DEFAULT_STREAM`, which models the
default stream and implements the [CUDA Stream Protocol](https://nvidia.github.io/cuda-python/cuda-core/latest/interoperability.html#cuda-stream-protocol). However, the original
implementation identified as protocol version `1` instead of version `0`.
Because of a related bug in [GHEX](ghex-org/GHEX#202)
this error was hidden.

This PR fixes the Python implementation and also updates GHEX.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants