Skip to content

DArray/stencil: Allocate HaloArray outside spawn_datadeps#703

Open
jpsamaroo wants to merge 7 commits into
masterfrom
jps/stencil-outer-alloc
Open

DArray/stencil: Allocate HaloArray outside spawn_datadeps#703
jpsamaroo wants to merge 7 commits into
masterfrom
jps/stencil-outer-alloc

Conversation

@jpsamaroo
Copy link
Copy Markdown
Member

@jpsamaroo jpsamaroo commented May 22, 2026

This PR changes the stencil code to allocate HaloArray objects outside of the spawn_datadeps region, allowing Datadeps to properly parallelize stencil tasks accessing them (no more serial dependency because of UnknownAliasing). This appears to improve performance by about 3x on a 512x512 benchmark with 8 threads, which is pretty nice! I expect that improvement to scale as we increase threads as well.

This PR also caches and reuses allocated HaloArray objects on the same task (caching by DArray and per-chunk-index), netting another 2x performance benefit.

This PR also includes changes to reduce task spawns and memory allocations, hopefully improving ROCm CI (which invokes the OOM killer).

Written by Claude Sonnet and Cursor Composer

@jpsamaroo jpsamaroo force-pushed the jps/stencil-outer-alloc branch from 1965160 to 7a1416b Compare May 29, 2026 17:51
@jpsamaroo jpsamaroo added the gpu label May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant