-
Notifications
You must be signed in to change notification settings - Fork 0
Home
SW4 parallelisation has not yet plateaued at the tested core/node counts for either strong or weak scaling. The observed super-scaling at some points is not completely understood, but may just be due to random run-to-run variation. In practice, four nodes is the maximum configuration size that can be reliably scheduled from the Mahuika queue, but the Cascade investigation will be extended to higher node counts to identify where scaling eventually plateaus.
| HPC (binary build) | Throughput, (Giga cell-updates / core-hour) |
Scaling efficiency (%) |
|---|---|---|
| Cascade (znver4) | 3.5 | 99 |
| Mahuika Genoa (znver4) | 3.0 | 96 |
| Mahuika Genoa (znver3) | 2.8 | 90* |
| Mahuika Milan (znver3) | 1.6 | 90* |
| RCH (znver3) | 1.4 | 90 |
* Estimated as the largest efficiency drop that could be hidden by the inter-run variability.
Throughputs are the median across the four weak-scaling runs (see the figure below). Enabling SW4's optional NaN check adds approximately 5 % overhead.
Cascade's throughput drops by roughly 30 % for simulation domains shaped as thin slabs, like those used in the strong-scaling investigations (see the lower panel of the figure below). The cause isn't fully understood, but likely relates to the interplay of its processor architecture and memory system.
The memory required for a simulation domain of
where
The total number of cell-updates in an SW4 simulation is given by
The compute,
where
Assuming ideal scaling, the required wall-clock time,
where