WIP: fast beam startup#21
Draft
ziopio wants to merge 37 commits into
Draft
Conversation
Replay was aborting with "function erl_init:start/2 not found in
active code index" because the export and fun tables were always
reinitialized empty. Their IndexTable metadata lives as static
storage via erl_code_staged.h and load_preloaded() is skipped in
replay mode, so the recorded bucket/seg arrays in the mapped arena
were unreachable.
Extend the existing atom/module root-dump pipeline to cover
export_tables[] and fun_tables[]:
- export.c / erl_fun.c: tag each per-code-index table with
erts_alloc_trace_note_alloc(...); add init_{export,fun}_table_replay
that copy the recorded roots into the static tables, restore
htable.fun to the current build's function pointers, and re-init
the staging rwmutex and entry-bytes atomic.
- erl_alloc.c: whitelist the two new tags in
erts_alloc_struct_should_snapshot and derive dump file names from
the tag directly.
- erl_init.c: restore_struct_roots_for_replay dispatches on tag and
fills atom/module/export/fun arrays uniformly; erl_init moves
erts_init_fun_table and init_export_table into the replay/record
branch, calling the _replay variants when replay is enabled.
With this change, validate_replay_module_tables succeeds for all
22 preloaded modules and erl_init:start/2 resolves against the
restored export table without re-running load_preloaded().
The active/staging code-index atomics are BSS-allocated and thus reset to 0 at replay process startup, so modules that were active at record time ended up being dispatched through a stale code index during replay. Depending on the indices at record time this manifested as a SIGSEGV in process_main (e.g. select_val_lin) because Export dispatch addresses for the active index were unpopulated. Add a plain-int32 shadow of the_active_code_index and the_staging_code_index, register it as "code_ix.root" in the struct-root-dump pipeline, and restore it into the live atomics after the index tables are loaded on replay. In replay mode, skip the preload-driven erts_end_staging_code_ix()/erts_commit_staging_code_ix() pair in erl_start, since the indices have already been restored from the snapshot and must not be advanced again.
Support infrastructure for the struct-root-dumps record/replay
pipeline:
- atom.c/.h: init_atom_table_replay() rebuilds erts_atom_table from a
snapshotted IndexTable root, re-establishing the hash/alloc function
pointers and recomputing atom_space. atom_table_replay_debug_dump()
and atom_replay_debug_lookup() are tooling used by
debug_replay_roots_sanity() to probe atom-hash integrity after
replay.
- module.c/.h: init_module_table_replay() performs the same for all
ERTS_NUM_CODE_IX module_tables, plus module_table_replay_debug_dump().
- erl_global_literals.c: route global_literal_chunk allocations through
erts_mmap_record_alloc when record mode is enabled so literal chunks
land in the mapped arena and are replayable. Register each chunk with
erts_alloc_trace_note_alloc("global_literal.chunk", ...). Add
erts_global_literal_is_in_range() and extend erts_is_in_literal_range
(erl_alloc.h) to recognize addresses inside those chunks on 64-bit
builds.
- erl_mmap.h/erl_mmap_record.c: expose
erts_mmap_record_option_replay_enabled() so the beam side can
conditionally skip record-time-only initialization during replay.
In replay mode load_preloaded() is skipped because the atom, module, export and fun tables (plus the active/staging code indices) are restored directly from the struct-root snapshots, and the module code pages are restored with the mmap arena. The side effect is that erts_update_ranges() is never called, so the sorted per-code-index Range array in beam_ranges.c stays empty and erts_find_function_from_pc() returns NULL for every PC. This silently breaks tracing, stack walking, exception handling and anything else that resolves a PC to an MFA, and manifests in practice as hard-to-diagnose BTI / Illegal-instruction crashes during scheduler startup and process spawning. Add erts_ranges_replay_rebuild() which reconstructs r[ix] for every ERTS_NUM_CODE_IX directly from the restored Module table, including both curr and old instances, and invoke it from erl_start() right after validate_replay_module_tables(). The logic mirrors what erts_update_ranges() + erts_end_staging_ranges() would have built during a normal load, but drives it from already-restored state so we never need to advance the code indices.
On 64-bit the literal allocator (ERTS_ALC_A_LITERAL) has its own mmapper
(erts_literal_mmapper) that reserves a 1 GB virtual super-carrier and
services mseg allocations directly via erts_alcu_mmapper_mseg_alloc().
That path bypasses erts_mmap_record_alloc(), so literal contents were
never written into the shared record arena file. On replay, the carrier
was re-reserved at the same virtual address (ASLR disabled) but the
pages were empty zeros, so any baked-in literal pointer in restored
code dereferenced zeroed memory and faulted with badarg
(e.g. erlang:display_string/1 returned badarg and then the catch
handler crashed with "Catch not found").
Implementation:
- Track live (ptr, size) regions inside the literal super-carrier in
erts_alcu_mmapper_mseg_alloc / _realloc / _dealloc, keyed on
alloc_no == ERTS_ALC_A_LITERAL and only when -record is enabled.
- On process exit, dump those regions and their raw bytes to a sidecar
file next to the main record arena (<arena>.literals). Registered via
atexit next to the existing struct-root-dumps hook.
- During -replay, after erts_mseg_init() has initialised
erts_literal_mmapper (so the 1 GB range is virtually reserved) but
BEFORE set_au_allocator(ERTS_ALC_A_LITERAL, ...) creates the
allocator's main carrier, read the sidecar and for each region:
1. Call mm->reserve_physical(ptr, size) to flip the pages to
PROT_READ|PROT_WRITE.
2. Advance mm->sa.top past the region (new erts_mmap_mark_allocated
helper) so subsequent erts_mmap() calls won't hand the same
pages back out and overwrite the restored bytes.
3. memcpy the recorded bytes into place.
With this change replay reaches Erlang code, hello:hello/1 dereferences
its literal "hello, world\n" cons cell correctly, and
erlang:display_string/1 executes as expected. Subsequent crashes come
from entirely different paths and are tracked separately.
New helpers added to erl_mmap.{h,c} so that the record/replay TU can
reserve physical backing and mark super-carrier ranges as in-use
without touching the opaque ErtsMemMapper_ struct directly.
The record arena was being opened O_RDWR and mmapped MAP_SHARED even in -replay mode, so any write the VM performed against restored memory propagated back to the on-disk file. A crash partway through replay therefore left the arena in a partially-modified state, and the next replay started from that corrupted snapshot producing a different failure. Changes: - erts_mmap_record_init(): in replay, open the arena O_RDONLY and map it MAP_PRIVATE (copy-on-write). The VM can still mutate restored pages as before, but nothing reaches the backing file. - erts_alloc_struct_dump_snapshots_on_exit(): return early when running under -replay so the struct-root-dumps directory (replay INPUT) is never rewritten. The literal sidecar (<arena>.literals) was already safe because its contents are read() into freshly-allocated pages, not mmapped. Running the same replay twice in a row now produces identical output and the arena / struct-root-dumps files are left untouched on disk.
Fixes "Catch not found" crashes in replay-mode as soon as any
try/catch handler fires (e.g. hello:test_catches/0).
The catch-table pool lives in two places:
1. bccix[ERTS_NUM_CODE_IX], a small file-static header array in
beam_catches.c's BSS holding {free_list, high_mark, tabsize,
beam_catches*} per code-ix.
2. bccix[i].beam_catches, a dynamically-sized array of
{handler_cp, cdr} pairs allocated via ERTS_ALC_T_CATCHES
(CATCHES -> LONG_LIVED -> CODE -> default mseg super-carrier),
whose bytes end up in the record arena.
At module load time patchCatches() / emu_load.c call
beam_catches_cons() to allocate an index and bake
make_catch(index) immediates into the generated code. handle_error()
later resolves those via beam_catches_car(index) ->
bccix[active].beam_catches[index].cp.
Replay preserves (2) automatically via the arena's MAP_PRIVATE
mapping, but (1) is BSS and was being reinitialised to a fresh
empty table by beam_catches_init() during init_emulator().
load_preloaded() is skipped in replay so no beam_catches_cons()
calls ever repopulate the fresh table, which left every baked-in
catch index pointing at slot 0 of an empty table, and the first
exception unwound past the real handler producing "Catch not found".
Register bccix[] under tag "beam_catches.bccix" via
erts_alloc_trace_note_alloc() so the struct-root-dump pipeline
snapshots it, add beam_catches_apply_replay_root() to memcpy the
dumped bytes back over bccix[], and call it from erl_start() in
replay mode after erts_ranges_replay_rebuild(). The restored
pointers refer to long-lived-allocator carriers which the default
mseg mapper brings back at the same virtual addresses, so the
entries they index resolve correctly.
…n corruption Replay could crash in two coupled ways during early boot/logger traffic: 1) Corrupted stacktrace terms (c_p->ftrace) were propagated into terminate/logging paths and then copied as regular terms, leading to invalid boxed/list traversal and SIGSEGV in copy paths. 2) Some callback slots in gen_server server_data could resolve to invalid external-fun metadata during replay-sensitive startup paths, causing badfun/undef-style failures while starting logger supervision. This hotfix adds targeted mitigations: - beam_common.c: add replay term sanity validation (pointer-range and tuple-recursive checks) and drop invalid ftrace to NIL both before add_stacktrace() and after save_stacktrace() packaging. - utils.c: in ERTS_REPLAY_COPY_DEBUG mode, bypass error-term copying in do_send_term_to_logger() and send text-only logger payload to avoid re-triggering deep term copy on already-corrupt args. - gen_server.erl: store callback cache entries as explicit closures, preserving stable fun identity/arity under replay and avoiding fragile direct external-fun references in startup-critical paths. This is a pragmatic mitigation to keep replay progressing and observability intact; it does not claim to fully solve the upstream memory corruption source.
Replay was reusing snapshot-backed Export->lambda terms. In practice these terms can carry stale runtime state and later surface as badfun errors when erlang:make_fun/3 returns ep->lambda (for example fun M:F/A callback funs used by logger/gen_server paths). Fix by rebuilding every export lambda from current export metadata after replay roots/code-index restoration via erts_export_replay_repair_all_lambdas(). Each rebuilt ErlFunThing is re-linked to its current Export entry and header. Also ensure the lambda dump artifact exists in record mode (create parent dirs + create file with CSV header if missing) and keep the dump-at-exit hook registered.
4cef881 to
4059cb1
Compare
Problem - Replay startup could execute static NIF reinitialization too early (during erl_init()) and on an unmanaged startup context. - In replay snapshots, NIF/module refc state can be restored in a stale/under-initialized form (notably zero), while runtime structures still assume a live baseline reference. - These conditions lead to brittle behavior in shell-driven module operations and can cascade into crashes/hangs when later code paths (dirty scheduling/resource-type takeover/load callbacks) consume invalid refcount assumptions. Root causes addressed 1) Reinit timing/context mismatch - Reinitializing static NIFs in erl_init() happens before init process identity is available and before a context aligned with code-mod permission expectations. 2) Replay refc baseline drift - dynlib_refc/refc for module/resource ownership may be observed as < 1 during replay even though objects are logically live, causing follow-up increments/assertions to operate on invalid baseline state. 3) prim_file load argument during replay - prim_file load path expects a process identity in relevant replay scenarios; SMALL_ZERO can be semantically wrong when init pid is available. What this commit changes - Move replay static NIF reinit from erl_init() to erl_start(), immediately after init process creation (erts_init_process_id assignment). - Keep reinit wrapped with unmanaged thread progress delay/continue, and under lock-check builds soften code-mod permission checks around replay reinit call. - In replay-specific paths, defensively re-establish refc baselines before increments/assertions: - schedule(): ensure env->mod_nif->dynlib_refc >= 1 before first dirty scheduling ownership bind. - prepare_opened_rt(): ensure previous owner refc/dynlib_refc and new owner lib refc/dynlib_refc are initialized to a valid baseline when replay restored state is < 1. - Relax open_resource_type() lock-check assertion under replay to permit replay reinit path semantics. - In erts_replay_reinit_loaded_static_nifs(), pass init pid as load_arg for prim_file when available; keep SMALL_ZERO default for others. Why this is safe - All baseline reinitializations are gated behind replay mode and only applied when current value is < 1. - Normal non-replay execution paths are unchanged. - Changes preserve existing ownership/reference increment logic; they only prevent invalid zero/negative-baseline transitions in replay. Files - erts/emulator/beam/erl_init.c - erts/emulator/beam/erl_nif.c
Add helpers used to diagnose the ets:insert deep-copy crash and the empty-tuple mismatch that caused it. - erl_mmap.h / erl_mmap_record.c: expose erts_mmap_record_arena_contains() and erts_mmap_record_arena_bounds() so any file can classify a pointer as ARENA / LITERAL / HEAP. - copy.c: add replay_classify_ptr, replay_subtag_name and erts_replay_dump_term_to_stderr which recursively walks and prints an Erlang term, annotating each slot with its class. Controlled by ERTS_REPLAY_COPY_DEBUG env var. - erl_db.c: add ets_insert_replay_dump helper that calls the above before every ets:insert / ets:insert_new during replay. Activated by ERTS_REPLAY_ETS_INSERT_DEBUG env var. - erl_gc.c: add ERTS_REPLAY_GC_PTR_MIN/MAX env-var-controlled hook for tracing suspicious pointer ranges during GC. Controlled by those vars. - global.h: declare erts_replay_dump_term_to_stderr and erts_replay_static_nif_phase for cross-file access. All helpers are compile-in debug aids; they are gated behind env vars so they are silent by default in normal and replay production runs.
In DEBUG builds the staged-table template (erl_code_staged.h) and module.c track whether staging is currently in progress using sentinel variables (fun/export _debug_stage_ix == ~0, dbg_load_code_ix == -1 means idle). BSS zero-initialises all three to 0, which is not the idle value. In a normal boot they are set to their idle sentinel when the preloaded-module staging cycle runs end_staging; replay skips that cycle entirely, leaving all three at 0. When compile:file later calls erts_start_staging_code_ix the very first assertion in fun_staged_start_staging fires: beam/erl_code_staged.h:385: Assertion failed: fun_debug_stage_ix == ~0 Fix: reset each sentinel to the idle value at the end of the replay- specific table init functions: - fun_debug_stage_ix = ~0 in erts_init_fun_table_replay (erl_fun.c) - export_debug_stage_ix = ~0 in init_export_table_replay (export.c) - dbg_load_code_ix = -1 in init_module_table_replay (module.c) Also move the dbg_load_code_ix static declaration earlier in module.c so it is visible at the call site inside init_module_table_replay. All changes are #ifdef DEBUG / IF_DEBUG guarded, so release builds are unaffected. Verified: compile:file succeeds across 5 deterministic replay runs.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.