WIP: fast beam startup by ziopio · Pull Request #21 · stritzinger/otp

ziopio · 2026-04-29T14:27:56Z

No description provided.

… sys_drivers

Replay was aborting with "function erl_init:start/2 not found in active code index" because the export and fun tables were always reinitialized empty. Their IndexTable metadata lives as static storage via erl_code_staged.h and load_preloaded() is skipped in replay mode, so the recorded bucket/seg arrays in the mapped arena were unreachable. Extend the existing atom/module root-dump pipeline to cover export_tables[] and fun_tables[]: - export.c / erl_fun.c: tag each per-code-index table with erts_alloc_trace_note_alloc(...); add init_{export,fun}_table_replay that copy the recorded roots into the static tables, restore htable.fun to the current build's function pointers, and re-init the staging rwmutex and entry-bytes atomic. - erl_alloc.c: whitelist the two new tags in erts_alloc_struct_should_snapshot and derive dump file names from the tag directly. - erl_init.c: restore_struct_roots_for_replay dispatches on tag and fills atom/module/export/fun arrays uniformly; erl_init moves erts_init_fun_table and init_export_table into the replay/record branch, calling the _replay variants when replay is enabled. With this change, validate_replay_module_tables succeeds for all 22 preloaded modules and erl_init:start/2 resolves against the restored export table without re-running load_preloaded().

The active/staging code-index atomics are BSS-allocated and thus reset to 0 at replay process startup, so modules that were active at record time ended up being dispatched through a stale code index during replay. Depending on the indices at record time this manifested as a SIGSEGV in process_main (e.g. select_val_lin) because Export dispatch addresses for the active index were unpopulated. Add a plain-int32 shadow of the_active_code_index and the_staging_code_index, register it as "code_ix.root" in the struct-root-dump pipeline, and restore it into the live atomics after the index tables are loaded on replay. In replay mode, skip the preload-driven erts_end_staging_code_ix()/erts_commit_staging_code_ix() pair in erl_start, since the indices have already been restored from the snapshot and must not be advanced again.

Support infrastructure for the struct-root-dumps record/replay pipeline: - atom.c/.h: init_atom_table_replay() rebuilds erts_atom_table from a snapshotted IndexTable root, re-establishing the hash/alloc function pointers and recomputing atom_space. atom_table_replay_debug_dump() and atom_replay_debug_lookup() are tooling used by debug_replay_roots_sanity() to probe atom-hash integrity after replay. - module.c/.h: init_module_table_replay() performs the same for all ERTS_NUM_CODE_IX module_tables, plus module_table_replay_debug_dump(). - erl_global_literals.c: route global_literal_chunk allocations through erts_mmap_record_alloc when record mode is enabled so literal chunks land in the mapped arena and are replayable. Register each chunk with erts_alloc_trace_note_alloc("global_literal.chunk", ...). Add erts_global_literal_is_in_range() and extend erts_is_in_literal_range (erl_alloc.h) to recognize addresses inside those chunks on 64-bit builds. - erl_mmap.h/erl_mmap_record.c: expose erts_mmap_record_option_replay_enabled() so the beam side can conditionally skip record-time-only initialization during replay.

In replay mode load_preloaded() is skipped because the atom, module, export and fun tables (plus the active/staging code indices) are restored directly from the struct-root snapshots, and the module code pages are restored with the mmap arena. The side effect is that erts_update_ranges() is never called, so the sorted per-code-index Range array in beam_ranges.c stays empty and erts_find_function_from_pc() returns NULL for every PC. This silently breaks tracing, stack walking, exception handling and anything else that resolves a PC to an MFA, and manifests in practice as hard-to-diagnose BTI / Illegal-instruction crashes during scheduler startup and process spawning. Add erts_ranges_replay_rebuild() which reconstructs r[ix] for every ERTS_NUM_CODE_IX directly from the restored Module table, including both curr and old instances, and invoke it from erl_start() right after validate_replay_module_tables(). The logic mirrors what erts_update_ranges() + erts_end_staging_ranges() would have built during a normal load, but drives it from already-restored state so we never need to advance the code indices.

On 64-bit the literal allocator (ERTS_ALC_A_LITERAL) has its own mmapper (erts_literal_mmapper) that reserves a 1 GB virtual super-carrier and services mseg allocations directly via erts_alcu_mmapper_mseg_alloc(). That path bypasses erts_mmap_record_alloc(), so literal contents were never written into the shared record arena file. On replay, the carrier was re-reserved at the same virtual address (ASLR disabled) but the pages were empty zeros, so any baked-in literal pointer in restored code dereferenced zeroed memory and faulted with badarg (e.g. erlang:display_string/1 returned badarg and then the catch handler crashed with "Catch not found"). Implementation: - Track live (ptr, size) regions inside the literal super-carrier in erts_alcu_mmapper_mseg_alloc / _realloc / _dealloc, keyed on alloc_no == ERTS_ALC_A_LITERAL and only when -record is enabled. - On process exit, dump those regions and their raw bytes to a sidecar file next to the main record arena (<arena>.literals). Registered via atexit next to the existing struct-root-dumps hook. - During -replay, after erts_mseg_init() has initialised erts_literal_mmapper (so the 1 GB range is virtually reserved) but BEFORE set_au_allocator(ERTS_ALC_A_LITERAL, ...) creates the allocator's main carrier, read the sidecar and for each region: 1. Call mm->reserve_physical(ptr, size) to flip the pages to PROT_READ|PROT_WRITE. 2. Advance mm->sa.top past the region (new erts_mmap_mark_allocated helper) so subsequent erts_mmap() calls won't hand the same pages back out and overwrite the restored bytes. 3. memcpy the recorded bytes into place. With this change replay reaches Erlang code, hello:hello/1 dereferences its literal "hello, world\n" cons cell correctly, and erlang:display_string/1 executes as expected. Subsequent crashes come from entirely different paths and are tracked separately. New helpers added to erl_mmap.{h,c} so that the record/replay TU can reserve physical backing and mark super-carrier ranges as in-use without touching the opaque ErtsMemMapper_ struct directly.

The record arena was being opened O_RDWR and mmapped MAP_SHARED even in -replay mode, so any write the VM performed against restored memory propagated back to the on-disk file. A crash partway through replay therefore left the arena in a partially-modified state, and the next replay started from that corrupted snapshot producing a different failure. Changes: - erts_mmap_record_init(): in replay, open the arena O_RDONLY and map it MAP_PRIVATE (copy-on-write). The VM can still mutate restored pages as before, but nothing reaches the backing file. - erts_alloc_struct_dump_snapshots_on_exit(): return early when running under -replay so the struct-root-dumps directory (replay INPUT) is never rewritten. The literal sidecar (<arena>.literals) was already safe because its contents are read() into freshly-allocated pages, not mmapped. Running the same replay twice in a row now produces identical output and the arena / struct-root-dumps files are left untouched on disk.

Fixes "Catch not found" crashes in replay-mode as soon as any try/catch handler fires (e.g. hello:test_catches/0). The catch-table pool lives in two places: 1. bccix[ERTS_NUM_CODE_IX], a small file-static header array in beam_catches.c's BSS holding {free_list, high_mark, tabsize, beam_catches*} per code-ix. 2. bccix[i].beam_catches, a dynamically-sized array of {handler_cp, cdr} pairs allocated via ERTS_ALC_T_CATCHES (CATCHES -> LONG_LIVED -> CODE -> default mseg super-carrier), whose bytes end up in the record arena. At module load time patchCatches() / emu_load.c call beam_catches_cons() to allocate an index and bake make_catch(index) immediates into the generated code. handle_error() later resolves those via beam_catches_car(index) -> bccix[active].beam_catches[index].cp. Replay preserves (2) automatically via the arena's MAP_PRIVATE mapping, but (1) is BSS and was being reinitialised to a fresh empty table by beam_catches_init() during init_emulator(). load_preloaded() is skipped in replay so no beam_catches_cons() calls ever repopulate the fresh table, which left every baked-in catch index pointing at slot 0 of an empty table, and the first exception unwound past the real handler producing "Catch not found". Register bccix[] under tag "beam_catches.bccix" via erts_alloc_trace_note_alloc() so the struct-root-dump pipeline snapshots it, add beam_catches_apply_replay_root() to memcpy the dumped bytes back over bccix[], and call it from erl_start() in replay mode after erts_ranges_replay_rebuild(). The restored pointers refer to long-lived-allocator carriers which the default mseg mapper brings back at the same virtual addresses, so the entries they index resolve correctly.

…n corruption Replay could crash in two coupled ways during early boot/logger traffic: 1) Corrupted stacktrace terms (c_p->ftrace) were propagated into terminate/logging paths and then copied as regular terms, leading to invalid boxed/list traversal and SIGSEGV in copy paths. 2) Some callback slots in gen_server server_data could resolve to invalid external-fun metadata during replay-sensitive startup paths, causing badfun/undef-style failures while starting logger supervision. This hotfix adds targeted mitigations: - beam_common.c: add replay term sanity validation (pointer-range and tuple-recursive checks) and drop invalid ftrace to NIL both before add_stacktrace() and after save_stacktrace() packaging. - utils.c: in ERTS_REPLAY_COPY_DEBUG mode, bypass error-term copying in do_send_term_to_logger() and send text-only logger payload to avoid re-triggering deep term copy on already-corrupt args. - gen_server.erl: store callback cache entries as explicit closures, preserving stable fun identity/arity under replay and avoiding fragile direct external-fun references in startup-critical paths. This is a pragmatic mitigation to keep replay progressing and observability intact; it does not claim to fully solve the upstream memory corruption source.

Replay was reusing snapshot-backed Export->lambda terms. In practice these terms can carry stale runtime state and later surface as badfun errors when erlang:make_fun/3 returns ep->lambda (for example fun M:F/A callback funs used by logger/gen_server paths). Fix by rebuilding every export lambda from current export metadata after replay roots/code-index restoration via erts_export_replay_repair_all_lambdas(). Each rebuilt ErlFunThing is re-linked to its current Export entry and header. Also ensure the lambda dump artifact exists in record mode (create parent dirs + create file with CSV header if missing) and keep the dump-at-exit hook registered.

… boot arguments

Problem - Replay startup could execute static NIF reinitialization too early (during erl_init()) and on an unmanaged startup context. - In replay snapshots, NIF/module refc state can be restored in a stale/under-initialized form (notably zero), while runtime structures still assume a live baseline reference. - These conditions lead to brittle behavior in shell-driven module operations and can cascade into crashes/hangs when later code paths (dirty scheduling/resource-type takeover/load callbacks) consume invalid refcount assumptions. Root causes addressed 1) Reinit timing/context mismatch - Reinitializing static NIFs in erl_init() happens before init process identity is available and before a context aligned with code-mod permission expectations. 2) Replay refc baseline drift - dynlib_refc/refc for module/resource ownership may be observed as < 1 during replay even though objects are logically live, causing follow-up increments/assertions to operate on invalid baseline state. 3) prim_file load argument during replay - prim_file load path expects a process identity in relevant replay scenarios; SMALL_ZERO can be semantically wrong when init pid is available. What this commit changes - Move replay static NIF reinit from erl_init() to erl_start(), immediately after init process creation (erts_init_process_id assignment). - Keep reinit wrapped with unmanaged thread progress delay/continue, and under lock-check builds soften code-mod permission checks around replay reinit call. - In replay-specific paths, defensively re-establish refc baselines before increments/assertions: - schedule(): ensure env->mod_nif->dynlib_refc >= 1 before first dirty scheduling ownership bind. - prepare_opened_rt(): ensure previous owner refc/dynlib_refc and new owner lib refc/dynlib_refc are initialized to a valid baseline when replay restored state is < 1. - Relax open_resource_type() lock-check assertion under replay to permit replay reinit path semantics. - In erts_replay_reinit_loaded_static_nifs(), pass init pid as load_arg for prim_file when available; keep SMALL_ZERO default for others. Why this is safe - All baseline reinitializations are gated behind replay mode and only applied when current value is < 1. - Normal non-replay execution paths are unchanged. - Changes preserve existing ownership/reference increment logic; they only prevent invalid zero/negative-baseline transitions in replay. Files - erts/emulator/beam/erl_init.c - erts/emulator/beam/erl_nif.c

Add helpers used to diagnose the ets:insert deep-copy crash and the empty-tuple mismatch that caused it. - erl_mmap.h / erl_mmap_record.c: expose erts_mmap_record_arena_contains() and erts_mmap_record_arena_bounds() so any file can classify a pointer as ARENA / LITERAL / HEAP. - copy.c: add replay_classify_ptr, replay_subtag_name and erts_replay_dump_term_to_stderr which recursively walks and prints an Erlang term, annotating each slot with its class. Controlled by ERTS_REPLAY_COPY_DEBUG env var. - erl_db.c: add ets_insert_replay_dump helper that calls the above before every ets:insert / ets:insert_new during replay. Activated by ERTS_REPLAY_ETS_INSERT_DEBUG env var. - erl_gc.c: add ERTS_REPLAY_GC_PTR_MIN/MAX env-var-controlled hook for tracing suspicious pointer ranges during GC. Controlled by those vars. - global.h: declare erts_replay_dump_term_to_stderr and erts_replay_static_nif_phase for cross-file access. All helpers are compile-in debug aids; they are gated behind env vars so they are silent by default in normal and replay production runs.

In DEBUG builds the staged-table template (erl_code_staged.h) and module.c track whether staging is currently in progress using sentinel variables (fun/export _debug_stage_ix == ~0, dbg_load_code_ix == -1 means idle). BSS zero-initialises all three to 0, which is not the idle value. In a normal boot they are set to their idle sentinel when the preloaded-module staging cycle runs end_staging; replay skips that cycle entirely, leaving all three at 0. When compile:file later calls erts_start_staging_code_ix the very first assertion in fun_staged_start_staging fires: beam/erl_code_staged.h:385: Assertion failed: fun_debug_stage_ix == ~0 Fix: reset each sentinel to the idle value at the end of the replay- specific table init functions: - fun_debug_stage_ix = ~0 in erts_init_fun_table_replay (erl_fun.c) - export_debug_stage_ix = ~0 in init_export_table_replay (export.c) - dbg_load_code_ix = -1 in init_module_table_replay (module.c) Also move the dbg_load_code_ix static declaration earlier in module.c so it is visible at the call site inside init_module_table_replay. All changes are #ifdef DEBUG / IF_DEBUG guarded, so release builds are unaffected. Verified: compile:file succeeds across 5 deterministic replay runs.

…rding

ziopio added 23 commits April 20, 2026 17:11

mmap to file and log allocations on disk

5ed8bd3

record and replay mods use single mapped file of 100MB

8cb75d4

Do not use the mapped file for new allocations during replay

65ab72a

Root every allocation to mapped memory except for special handling in…

bfd0022

… sys_drivers

Route binary allocator carriers through mseg during record

5e8f088

Preserve restored BIF exports during replay initialization

3668c94

Always launch sys processes

b9d63ab

Reinitialize static NIF state during replay

bb4d625

REmove debug code

9df1e5a

remove ERTS_REPLAY_ROOT_DEBUG gating and debug code

c89e030

allocator: write roots dumps only in -record mode

ce0ce48

record/replay: remove nonessential trace and debug file dumps

5e3840b

Simplify replay setup by automatically forwarding replay flag to node…

1502d86

… boot arguments

Remove eccessive trace notes

4059cb1

ziopio force-pushed the calzone-dev branch 2 times, most recently from 4cef881 to 4059cb1 Compare April 29, 2026 15:01

ziopio and others added 5 commits April 30, 2026 12:46

erts replay: rebuild index hash buckets for restored tables

e775692

Dockerfile to build custom OTP

0ba9835

bring back rebar3 into dockerfile

8fd0b01

Fix lock checking crash in debug builds

1643c99

ziopio added 2 commits May 6, 2026 13:02

ziopio force-pushed the calzone-dev branch from 4f3f562 to 959dcb3 Compare May 6, 2026 13:08

ziopio and others added 7 commits May 11, 2026 14:33

Add support for record and replay flags in erlexec

d63da2f

Use record and replay path as directory, always dump struct when reco…

858530e

…rding

Update preloaded

fc80ff6

add dockerfile

b03b833

cleanup replay argument in erl_init

f8f5f5e

Move diagnostic prints to debug knobs

51b9acc

Increase arena record size to 256 MB

20e007f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: fast beam startup#21

WIP: fast beam startup#21
ziopio wants to merge 37 commits into
ziopio/OTP-28.4.2from
calzone-dev

ziopio commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ziopio commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants