Skip to content

Fix/perf 3bc capture coalesce#22

Open
jzmiller1 wants to merge 5 commits into
LeonidGrr:masterfrom
jzmiller1:fix/perf-3bc-capture-coalesce
Open

Fix/perf 3bc capture coalesce#22
jzmiller1 wants to merge 5 commits into
LeonidGrr:masterfrom
jzmiller1:fix/perf-3bc-capture-coalesce

Conversation

@jzmiller1

Copy link
Copy Markdown
Contributor

Note: This PR stacks on #21 (fix/perf-3a-roche-scalars — "perf(system): eliminate two Planetesimal clones in planetesimals_intersect"). #21 introduces the std::mem::swap pattern in planetesimals_intersect that the mem::replace extraction in this PR builds on. Merging in the order #21 → this pr produces a clean fast-forward. If #21 hasn't landed yet, please review/merge it first.

Problem

After #21 stripped the call-site clones in planetesimals_intersect, two deep-clone hot spots remained in the post-accretion path:

capture_moon still cloned both arguments internally:

let mut planet = larger.clone();   // deep tree clone of larger body + its moons + rings
let mut moon = smaller.clone();

For heavily-mooned planets late in bombardment, the larger-body clone is the dominant per-iteration cost. Every captured satellite gets duplicated just to update one orbit.

coalesce_planetesimals allocated a shadow Vec and cloned every Planetesimal on every call, regardless of whether any orbits actually intersected:

let mut next_planets = Vec::new();
for (i, p) in planets.iter_mut().enumerate() {
    if i == 0 {
        next_planets.push(p.clone());            // unconditional clone
    } else if let Some(prev_p) = next_planets.last_mut() {
        if check_orbits_intersect(...) { ... } else { next_planets.push(p.clone()); }
    }
}

coalesce_planetesimals runs once per planet per bombardment iteration (and recursively on moons after each capture). In the common case, no intersection, every clone is unneeded.

This was caught in the same profiling pass as #21 (2.56M-system Rayon batch on 12-core), where these two paths together dominated the remaining clone cost after the Roche-limit refactor.

Change

Fix 3B — capture_moon ownership. Change the signature to take owned (Planetesimal, Planetesimal) and modify the larger body in place:

fn capture_moon(
    mut planet: Planetesimal,
    mut moon: Planetesimal,
    stellar_mass: &f64,
    rng: &mut dyn RngCore,
    events_log: &mut AccreteEvents,
) -> Planetesimal { ... }

Only moon.id (a String) is cloned, for the event message — not the struct. The float-precision wrappers and planet.b = semi_minor_axis(planet.a, planet.e) line added in commit 7351622 are preserved (the moon orbit loop's m.b = m.a * (1.0 - m.e^2).sqrt() line as well).

Caller in planetesimals_intersect. Take owned values out of the &mut Planetesimal slots via std::mem::replace with a placeholder:

if p.mass >= prev_p.mass { std::mem::swap(p, prev_p); }   // from #21
let prev_owned = std::mem::replace(prev_p, Planetesimal::placeholder());
let p_owned    = std::mem::replace(p,      Planetesimal::placeholder());
*prev_p = capture_moon(prev_owned, p_owned, primary_star_mass, rng, events_log);

The placeholder left in *p is dropped at scope end — the slot is either a Vec::remove local (from coalesce_planetesimals) or a freshly-allocated outer_body local (from post_accretion); never read after the branch. *prev_p is overwritten by the capture_moon result in the same statement.

New Planetesimal::placeholder() constructor. A zero-initialized stub used solely by the mem::replace extraction above. Visibility is pub(crate) so it cannot leak into serialized output or downstream APIs. Never observed.

Fix 3C coalesce_planetesimals in place. Replace the shadow-Vec + clone-everything loop with an in-place walk that only removes a body when an actual intersection occurs:

let mut i = 1;
while i < planets.len() {
    let do_intersect = { /* read-only check on planets[i] and planets[i-1] */ };
    if do_intersect {
        let mut p = planets.remove(i);
        planetesimals_intersect(&mut p, &mut planets[i - 1], ..., events_log);
        // don't advance i — the merged body may now intersect planets[i]
    } else {
        i += 1;
    }
}

No Vec allocation, no clones in the common case.

Four new tests are added (coalesce_produces_correct_count, capture_moon_adds_moon, no_spurious_coalescence, same_seed_same_system_after_perf_fixes) covering the merged-count invariant and same-seed reproducibility across the refactor.

Impact

  • Zero behavioral change. All 16 fixture tests pass bit-identically with no fixture regeneration needed. Same-seed reproducibility is exercised by same_seed_same_system_after_perf_fixes.
  • Performance: a private fork validated against 2.56M stellar systems attributed cumulative 3A + 3B + 3C to roughly a 23% improvement (1362s → 1052s) on the full batch. This means this PR alone accounts for the gap from perf(system): eliminate two Planetesimal clones in planetesimals_inte… #21 's ~15% to the full ~23%. Combined with the opt-out from the events_log PR (also in this series), the same fork's run dropped to 333s. Those numbers are reference-fork data, not re-measured for this exact patch; the structural costs being eliminated are the same.
  • Planetesimal::placeholder() is pub(crate) is not part of the public API. It exists only as a mem::replace filler and is overwritten in the same statement that creates it. No risk of leaking a zero-id planet into serialized output or downstream code.
  • Preserved upstream's commit 7351622 precision behavior: float_to_precision(new_axis), float_to_precision(new_eccn), planet.b = semi_minor_axis(planet.a, planet.e), and the moon-loop m.b = m.a * (1.0 - m.e^2).sqrt() are all retained. Earlier internal variants of this refactor dropped those lines; this PR does not.
  • Public API surface: capture_moon and coalesce_planetesimals are both private (fn, no pub). Their signatures change, but no downstream caller can observe it. Planetesimal::placeholder() is pub(crate). Net public-API delta: none.
  • Stacks on perf(system): eliminate two Planetesimal clones in planetesimals_inte… #21 (see note at top). Once both land, the cumulative effect on heavily-stressed batch workloads is the larger story; on a single-system call the practical effect is invisible.

jzmiller1 added 5 commits May 11, 2026 15:33
…rsect

The Roche-limit computation in planetesimals_intersect cloned both `p`
and `prev_p` purely to satisfy the borrow checker — only three scalar
fields (mass, mass, radius) are read from the clones. Planetesimal
carries Vec<Planetesimal> moons + Vec<Ring> rings, so each clone is a
deep tree copy. In hot post-accretion loops this is per-iteration waste.

Replace the two clones with direct scalar extraction:

    let (larger_mass, smaller_mass, smaller_radius) = if p.mass >= prev_p.mass {
        (p.mass, prev_p.mass, prev_p.radius)
    } else {
        (prev_p.mass, p.mass, p.radius)
    };

Note the third argument: the original code passed `&smaller.radius`
(the smaller body's radius, matching roche_limit_au's documented
contract that the third arg is the moon/smaller body's radius). The
refactor preserves this — using smaller_radius rather than the larger
body's radius keeps the physics identical.

For the capture_moon call site, use std::mem::swap with a `>=` tiebreak
to orient the bodies in place: after the swap, prev_p holds the larger
body. This matches the original `match p.mass >= prev_p.mass { ... }`
ordering exactly (p wins on equal mass).

All 16 fixture tests remain bit-identical after the refactor — zero
fixture drift under cargo test, as the clone-elimination itself does
not change the arithmetic order.
Two coupled hot-path fixes that build on the Roche-scalar refactor:

Fix 3B (capture_moon ownership): the function previously cloned both
the larger and smaller bodies internally, even after the Roche-scalar
fix removed the caller-side clones. With heavily-mooned planets these
are deep tree clones (Vec<Planetesimal> moons + Vec<Ring> rings each).
Change the signature to take owned (Planetesimal, Planetesimal) values
and modify the larger body in place. Only moon.id (a String) is
cloned for the event message — not the struct.

Add a pub(crate) Planetesimal::placeholder() constructor: a zero-
initialized stub used solely by `std::mem::replace` to extract owned
values from &mut Planetesimal slots in planetesimals_intersect.
The placeholder is overwritten in the same statement that creates it
and is never observed. Visibility is pub(crate) so it cannot leak
into serialized output or downstream APIs.

Fix 3C (coalesce in-place): the function allocated a shadow Vec and
cloned every Planetesimal on every call, regardless of whether any
coalescence actually occurred. Most calls on the post-accretion hot
path have no intersections. Replace with an in-place loop that uses
Vec::remove(i) only on actual coalescence events:

    while i < planets.len() {
        if check_orbits_intersect(...) {
            let mut p = planets.remove(i);
            planetesimals_intersect(&mut p, &mut planets[i - 1], ...);
            // don't advance i — merged body may intersect next one
        } else {
            i += 1;
        }
    }

Preserved from upstream commit 7351622: `planet.b = semi_minor_axis(
planet.a, planet.e)` in capture_moon, plus the float_to_precision
wrappers on new_axis/new_eccn, plus `m.b = m.a * (1.0 - m.e^2).sqrt()`
in the moon orbit loop.

All 16 fixture tests pass bit-identically after the refactor — no
fixture regeneration needed. 4 new tests added: coalesce_produces_
correct_count, capture_moon_adds_moon, no_spurious_coalescence,
same_seed_same_system_after_perf_fixes.
# Conflicts:
#	src/structs/system.rs
The merge of master (post PR 1 + PR 2) changed events_log from
&mut AccreteEvents to Option<&mut AccreteEvents>. The capture_moon
call site at planetesimals_intersect was missed during conflict
resolution and moved the Option instead of reborrowing it, leaving
the subsequent coalesce_planetesimals and moons_to_rings calls
trying to use a moved value.

Add .as_deref_mut() to match the surrounding calls.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant