Skip to content

Ephemerons: wait on all domains to be done marking before asking for another round#14774

Open
gasche wants to merge 1 commit into
ocaml:trunkfrom
gasche:ephemerons-wait-on-all-domains-before-next-round
Open

Ephemerons: wait on all domains to be done marking before asking for another round#14774
gasche wants to merge 1 commit into
ocaml:trunkfrom
gasche:ephemerons-wait-on-all-domains-before-next-round

Conversation

@gasche
Copy link
Copy Markdown
Member

@gasche gasche commented Apr 25, 2026

Ephemeron marking must be done after normal marking, and may in turn reveal more normal marking work, which in turn requires a new round of ephemeron marking (marking all "TODO" ephemerons again to check if their keys have been marked), etc., until a fixpoint is reached as no additional normal-marking happens anymore.

On a single-domain system there is an obvious good strategy: wait until all normal-marking work is done, and then do ephemeron marking, and repeat. On multi-domain systems we only know when we (the current domain) are done marking. The trunk code asks for a new ephemeron round (on all domains) whenever a domain is done marking; this can lead to useless repetition of ephemeron-marking work: if two domains finish marking one after the other, each will ask all domains to do a round of ephemeron-marking.

This PR changes the end-of-normal-marking logic to ensure that we ask for a new round of epehmeron marking when all domains are done marking, rather than whenever the current domain is done marking.

@OlivierNicole and @damiendoligez reviewed an earlier version of this branch yesterday and we found a concurrency bug together. The new version uses a CAS loop to avoid concurrency issues, as we discussed.

Copy link
Copy Markdown
Contributor

@eutro eutro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CAS loop looks sound to me, and I see why it is necessary instead of just checking caml_atomic_counter_decr(&num_domains_to_mark) == 0 (although we miss a debug assert that domains_still_marking > 0).

Comment thread runtime/major_gc.c Outdated
Comment on lines 350 to 351
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this comment is no longer true?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I fixed the comment.

@gasche gasche force-pushed the ephemerons-wait-on-all-domains-before-next-round branch from b41950f to 560029c Compare May 18, 2026 15:48
@gasche gasche force-pushed the ephemerons-wait-on-all-domains-before-next-round branch from 560029c to c21e04c Compare May 27, 2026 04:44
@gasche gasche force-pushed the ephemerons-wait-on-all-domains-before-next-round branch from c21e04c to cefbed6 Compare May 27, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants