runtime,cl: panic-site pc snapshots — deferred callers and fault stacks see the panic frames#2026
Open
cpunion wants to merge 62 commits into
Open
runtime,cl: panic-site pc snapshots — deferred callers and fault stacks see the panic frames#2026cpunion wants to merge 62 commits into
cpunion wants to merge 62 commits into
Conversation
Bring over the cross-branch runtime funcinfo benchmark (hot, deep, multipkg, cold, stdlib scenarios) so xgo-dev#2012 can reproduce its own performance numbers. cold.FirstCallersFrames now walks to the first fully symbolized frame, because synthetic runtime frames (LLGo's runtime.Callers placeholder) carry no file/line and the metric was silently skipped on LLGo. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
macOS previously had no entry/stub/pcline site sections, so first-use funcinfo initialization fell back to one dlsym per function and per stub (13ms cold on a small binary, 27ms with LTO), and statement-level pc-line records did not exist at all. Emit the same site records on Mach-O: - __DATA,__llgo_fie / __llgo_stub / __llgo_pcl sections with the live_support attribute: under ld64/lld -dead_strip a live_support atom survives only if the atom it references (the anchor label inside the function body) is live, which matches the records-follow-function semantics ELF gets from SHF_LINK_ORDER with --gc-sections. - One lowercase-l linker-private symbol per record so each record is its own atom and dead functions drop exactly their own records. - Assembler-local (L-prefixed) pc-site labels: Mach-O subsections-via-symbols treats visible labels as atom boundaries, and a visible label in the middle of a function let the linker split and reorder function bodies. - Boundary symbols via ld64's section$start$/section$end$, emitted with the \x01 verbatim-name prefix so LLVM does not prepend the Mach-O underscore. - A no_dead_strip zero record per section in the main module keeps the sections (and their boundary symbols) present even when no package contributed records. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First-use initialization: - Skip the per-stub dlsym loop when the stub-site section provided the frames; each dlsym is a dynamic-loader query and the loop dominated cold latency. - Materialize per-function strings and entry PCs once per function and packed file strings once per file ID during pcline table construction instead of once per site. Cold FuncForPC fast path: before the frame table exists, resolve exact function-value PCs with a bounded linear scan of the raw entry-site and stub-site sections (compile-time data, no loader query), then one dladdr as fallback; both require an entry match within the warm path's slack so stripped-local misattribution is impossible. The path is budgeted: after a handful of cold lookups the sorted table amortizes better, so it is built as usual. cold.FirstFuncForPC drops from 13ms to ~35us on macOS. Find index: subbucket deltas are now uint16 and the whole-index abandonment on delta overflow is gone. Go stores uint8 deltas because its linker guarantees a 16-byte minimum function size; LLGo indexes call-site records that sit a few bytes apart, and a dense 4KiB bucket silently degraded every lookup in the process to a full binary search. A delta counts deduplicated PCs inside one bucket, so it is bounded by the bucket size and uint16 cannot overflow. Observability: LLGO_FUNCINFO_DEBUG=1 prints one line per lazily built table (frame/bucket counts, index built or fallback, sites vs dlsym sources) so benchmarks can tell which path they measured. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every Caller/Callers capture used to intern the frame into the synthetic table: a hash probe plus a full frame comparison per stack slot per call. Memoize the interned PC base in the shadow-stack slot and invalidate it when the recorded line changes (for one entry the instrumented name/file operands are constants, so the line is the only thing that varies between call sites). The three static frames emitted around every Callers walk get per-store memo slots, and the emit loop is unrolled so nothing escapes and skipped frames are never captured. macOS: hot.CallersOnly 182ns -> 125ns (Go 1.26: 118ns); with LTO 96ns. hot.CallersFramesFirst 528ns -> 471ns, 354ns with LTO (Go: 401ns). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…py limit Frames.Next allocated a fresh *Func per symbolized frame; route it through the FuncForPC 4-way cache so repeated CallersFrames walks over the same PCs stop allocating. hot.CallersFramesFirst: macOS 471->456ns (338ns with LTO, Go 1.26: 406ns); Linux LTO reaches parity at 433ns. Also document a pre-existing limitation at the entry-site emitter: the body-embedded inline-asm record is duplicated by LTO inlining into every inline site (~4x section growth on multipkg) and registers host-function PCs under the inlinee's symbol ID. Runtime only consults the table when native symbolization fails, which bounds the impact; the fix (data globals with !associated metadata) needs LLVMGlobalSetMetadata in the llvm binding and lands with the link-phase ftab work. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Record the experiment results at the emitter: !associated only guides linker GC and IR-level GlobalDCE deletes the records; llvm.compiler.used pins dead functions through the records' address initializers; and noduplicate blocks inlining. Section dedup is link-phase work. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Post-link table generation plan: parse the linked binary's metadata sections, dedup LTO inline copies against the symbol table, sort with a sentinel, build Go-layout findfunctab via internal/pclntab, and write back into a reserved section with ASLR-safe anchor offsets. Runtime adopts the prebuilt table when the header validates and keeps first-use construction as fallback. Includes the list of platform facts established in xgo-dev#2012 so implementation does not re-derive them. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every llgo-linked executable (linux/darwin, sites enabled) now gets the prebuilt ftab/findfunctab automatically: internal/build runs internal/pclnpost.Rewrite after linkMainPkg, and any failure degrades silently to the first-use construction fallback. Moves the tool core into internal/pclnpost and hardens it: - Canonical-record detection by FNV: a record survives when its anchor's owning symbol hashes to the record's symbolID (or is the __llgo_stub. wrapper of it). The previous one-per-symbolID rule wrongly collapsed a function with its stub — they share the target's symbolID by design — which broke exact-entry lookups (caught by TestRuntimeLineInfoAndStack on Linux). LTO inline copies are now identified exactly: 8.4k/9.5k copies removed in the LTO probes. - Mach-O chained-fixups surgery: unlink the rewritten sections' pointer slots from the dyld page chains (repointing predecessors' next links and page_start entries) so dyld neither rebases slots inside the new table nor skips unrelated fixups after the zeroed stub section, then re-sign ad hoc. Without this the table was corrupted at load. - LTO-safe metadata location: the entry section carries a meta record whose relocations hold the addresses of the symbol-index pointer and count globals; LTO internalization strips those names from the symbol table but relocations always resolve. Runtime skips the meta rows (pc==0 / symbolID==0). - Idempotence guard (already-rewritten binaries are left alone). Runtime fixes that surfaced during validation: - materializePrebuiltEntries is now two-phase so concurrent losers wait for the winner's store instead of reading a nil entries slice. - pcLineFrameForPC rejects nearest-below sites whose entry is unresolved when the caller knows the function entry, instead of leaking a neighboring function's file/line. Validation: macOS cl (full) + test/go + LLDB 194/194; Linux test/go TestRuntime suite; probes on both platforms report entries=prebuilt with first-FuncForPC at 7-21µs (Linux) from 13ms on the original baseline, and LTO builds drop 8-9.5k inline copies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…table On Mach-O, pointer slots that name exported functions — every __llgo_stub.* wrapper and any exported Go function — are emitted as chained-fixup BIND nodes, not rebases. The rewriter only decoded rebase nodes, so all stub records (and some entry records) were dropped as unowned and never reached the prebuilt ftab; FuncForPC on function values silently fell back to dladdr (~6µs per fresh pc on darwin). - Parse the LC_DYLD_CHAINED_FIXUPS imports table and resolve bind ordinals to their in-image definitions. - Match canonical owners against the record symbolID with underscore normalization (debug/macho's suffix-shared string table can surface one mangling underscore more or less than the source-level name). - Splice the prebuilt header's base slot back into the fixup chain as a live rebase node: dyld writes the slid text base at load, so the runtime reads a ready runtime PC with no slide arithmetic (non-PIE ELF link-time values already equal runtime addresses). - LLGO_PCLNPOST=0 escape hatch keeps first-use construction. Fresh-pc FuncForPC slow path: darwin 6-8µs -> 1.2-1.7µs, linux 6.8µs -> 0.5µs; first-in-process lookup: darwin ~32µs -> ~14µs, linux ~6.8µs -> ~4µs. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pure-compute probes (recursive fib, JSON round-trip, sort.Ints, map churn) with no runtime introspection, so one harness run covers both the introspection extremes and what the funcinfo machinery costs code that never asks for it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Go's pclntab pages are touched by its own runtime (traceback, GC) long before user code queries it, so its first FuncForPC never pays page-in. Mirror that: when the prebuilt table is present, init adopts it (zero-copy, sub-µs), touches the pages the lookup path reads (blob, funcinfo records, string offsets, strings), runs one synthetic lookup to warm the code paths, and write-warms the FuncForPC cache pages. First-in-process FuncForPC: darwin ~17µs -> ~2.8µs, linux ~6.6µs -> ~1.0µs. Startup cost is page-count-bound (tens of µs on stdlib-sized tables, invisible next to ~3ms process startup; hello-world medians unchanged). Non-prebuilt binaries stay fully lazy: first-use construction allocates, which has no place in init, and programs that never introspect pay nothing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
-depths generates deep_<N> scenarios at configurable call depths; -bigsizes generates bigfunc scenarios (funcs x statements) whose large bodies stress statement-level pcline density, mid-function pc symbolization, and ordinary performance of big method bodies. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
- Blob overflow: function-value stubs can double the row count, and at ~9k functions the prebuilt blob no longer fit the entry section, so the rewrite silently fell back to first-use construction (cold.FirstFuncForPC 96x96 non-LTO: 2.4ms). On overflow, retry with function entries only — stub pcs degrade to dladdr, real entries keep the prebuilt table. - FuncForPC cache thrash: the set-associative pc cache holds 4k entries; batch workloads over 9k+ distinct functions evicted constantly and paid the string-materializing slow path on every call (multipkg.FuncForPCMany 96x96: 8-11ms vs Go 172µs). Add a per-ftab-row *Func cache for exact-entry lookups, so batch lookups are O(binary search) after the first pass at any scale. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…erflow
Function-value stubs can push the row count past what the entry section
holds (~9k functions with taken addresses). Instead of dropping stub
rows, write the full blob into the (larger) stub section and leave a
32-byte redirect header ("LLGOFTB2" + a live-relocation pointer) in the
entry section; the runtime follows it and adopts the same zero-copy
view. Function-value lookups keep the prebuilt table at any scale
instead of degrading to dladdr.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
funcForPCSlow treated any unaligned pc as a shadow-stack synthetic marker. arm64 function entries are always 4-aligned so this never fired, but amd64 function and stub entries need not be: an unaligned function-value pc skipped the prebuilt exact-entry path entirely and fell through to nearest-below symbolization, reporting the previous function's name (test/go TestRuntimeLineInfoAndStack on ubuntu CI, "bad function value func: main.renamedPC"). Hoist the prebuilt exact-entry + per-row-cache lookup ahead of the alignment heuristic; a genuine synthetic pc just misses the cheap search and proceeds as before. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The overflow fallback dropped stub rows to fit the entry section. That leaves pc ranges the table claims to cover but does not: a function value whose stub falls in a gap resolves nearest-below to the previous function and silently returns the wrong name — exactly what ubuntu CI caught (amd64 --icf=safe layouts overflow by a few hundred bytes, and non-PIE ELF dladdr cannot rescue). If the blob fits neither the entry section nor the (larger) stub section, skip the rewrite entirely: first-use construction is slower but covers every record. Reproduced and verified on linux/amd64 (qemu): the stub pc had no exact row and nearest-below returned the neighbouring function's name. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…rgery Fabricated fixtures make the IO paths testable in-process: a minimal ELF exercises load/Rewrite end to end (in-place, stub-section spill, and the overflow fallback that must leave the binary untouched), and a synthetic Mach-O image drives the chained-fixup chain surgery (remove+splice, empty-page insert, unconsumed-insert error). Package coverage 16% -> 69%. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
A fabricated Mach-O (segments, sections, symtab, chained-fixup imports and an empty page chain) drives load, bind-target resolution, record decoding and both Rewrite outcomes (in-place and stub-section spill) end to end. codesign now runs only when the input carries LC_CODE_SIGNATURE: real lld executables always do, unsigned inputs need no signature and codesign rejects them. Also cover asmQuoteELFSymbol, the empty-table initializers and the Rewrite error paths. Package coverage: pclnpost 69% -> 86%. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every Go function on supported targets keeps the frame-pointer chain
("frame-pointer"="non-leaf", gated by Program.NeedsFramePointer to
linux/darwin — on embedded targets the unwinder does not exist and the
layout change perturbed the conservative GC on ESP32-C3). runtime.Caller,
Callers, CallersFrames, Stack and the unrecovered-panic dump walk
[fp]/[fp+w] directly and symbolize through the prebuilt ftab and pcline
tables:
- Return addresses resolve at pc-1 (Go's convention); statement labels
can land exactly on a return address, so raw-pc nearest-below reported
the following line. The convention holds with or without the prebuilt
table (text bounds fall back to the first-use frame table — link-phase
overflow layouts otherwise silently disabled it, the root cause of the
amd64 CI failures).
- The walk is bounded to the program's own text: libc frames without FP
discipline decode as wild pcs that nearest-below would attribute to
arbitrary functions.
- Methods and anonymous functions are now trackable (methods had no
pcline labels; closures lost their innermost frame to tail-call
optimization), and mid-function aligned pcs merge statement records
instead of returning declaration lines.
- frameSymbol results are memoized per pc (deep re-walks paid a dladdr
per frame: 32-frame walks 8µs -> 180ns) and the pcline table is built
during the startup pre-warm (lazily building it inside the first
Caller cost ~200µs at scale).
- Shadow-stack instrumentation is no longer emitted; LLGO_SHADOW_STACK=1
keeps the legacy emitters for one release. Tracked functions retain
noinline, no-tail-call and the data-only pcline records.
- libunwind is gone: the clite stacktrace fallback walks the FP chain
with dladdr names (same output format), and linux binaries no longer
link -lunwind.
Semantics are gc ground truth, verified against go: physical stacks show
every real frame; interface-chain Caller marks land at skip 3 and
closure chains at skip 4 (the old expectations encoded shadow-stack
frame loss). Perf (best-of, mac/linux): hot.Caller0 17/37ns (Go
155/241), deep.Direct512 2.8µs (Go 9.7µs; was 87-95µs), bigfunc.Work
18µs (Go 30µs; was 433µs), binary size unchanged or smaller.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
IR goldens gain the frame-pointer attribute (out.ll files carry no attribute groups and needed no regeneration); the legacy shadow-stack emitter assertions opt into LLGO_SHADOW_STACK; statement-line probes move to gc ground-truth skip counts; NeedsFramePointer target matrix and pclnpost symbolAddr/decodePtr edges covered. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ct log/slog/testing locations An unrecovered panic now prints a Go-style traceback (function names plus file:line per physical frame) through a PanicTraceback hook the public runtime registers; the clite dladdr dump remains the fallback when the FP walk or the tables are unavailable. Caller-frame tracking now applies uniformly: the blanket stdlib exclusion is gone, so the same per-package reaches-runtime.Caller analysis that already covered third-party code tracks log.Output, slog's Logger.log and testing's decorate chains (their thin wrappers were inlined, making fixed Caller depths count past them — log.Lshortfile printed "???:1"). Call sites into caller-pc-consuming functions of other packages get a statement anchor so the attributed frame reports the exact line. The collector also picks up named-type methods declared by the package itself — a type used only concretely never enters RuntimeTypes, which is exactly how slog.(*Logger).Info escaped tracking. hello-world size cost: +368 bytes (the traceback printer). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Four scenarios, every expectation verified against gc: unrecovered panic tracebacks; log.Lshortfile and slog AddSource (text+JSON, package funcs and logger methods); a failing t.Errorf under llgo test; and an introspection grab-bag (goroutine/init/defer callers, FuncForPC names for methods, closures and generics, the errors-with-stack capture idiom). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
554625e to
9c33770
Compare
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
9c33770 to
157d48d
Compare
9a4beda to
f102328
Compare
cpunion
added a commit
to cpunion/llgo
that referenced
this pull request
Jul 4, 2026
…libunwind + FP chain) Hardware faults — SIGSEGV/SIGBUS and previously-fatal SIGFPE — now convert to ordinary recoverable Go panics, and the unrecovered traceback shows the fault-site chain: C frames down through the Go callers. - A SA_SIGINFO handler captures the interrupted context; the handler's own frame-pointer chain dead-ends at the signal trampoline, which is why the ucontext pc/fp is required. - The capture prefers a dynamically-resolved libunwind (dlopen/dlsym at install time only — no link-time -lunwind; LLGO_DYNUNWIND=0 disables): DWARF/compact-unwind stepping survives C frames compiled without frame pointers, and the nongnu flavor's unw_get_proc_name reads .symtab, naming static C symbols dladdr cannot see (they otherwise display under a neighboring Go function via nearest-below). Where unwind info runs out, the walk resumes along the FP chain from libunwind's final cursor. Flavors: darwin libSystem, linux nongnu (arch-prefixed symbols), linux LLVM (context translated). - Only man-page async-signal-safe unw_* calls run in the handler (resolution and a lazy-state warm-up happen at install). Fault-context walks probe page readability (msync) before dereferencing — an arithmetic-valid frame pointer can still point into an unmapped hole, and faulting inside the fault path would recurse; a re-entered handler restores the default disposition for one clean core. - The unrecovered dump goes through a new PanicTraceback hook (gc-style frames via the funcinfo tables; libunwind's name for dot-less C symbols); non-fault panics keep the existing clite dump. Verified: darwin/arm64 (libSystem flavor); linux/amd64 with a -fomit-frame-pointer C chain — the FP-only walk recovers 2 frames with a misattributed name, the dynamic path recovers the full chain with correct static names. Overlaps with the fault half of xgo-dev#2026 (panic-site snapshots); whichever lands second rebases to drop its duplicate. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ks see the panic frames gc runs deferred functions on top of the panicked stack; LLGo's longjmp unwinding removes those frames physically, so runtime.Caller / CallersFrames / debug.Stack from a deferred function (before or after recover) could not see the panic site. Now: - Panic() captures the physical pc chain (the existing SavePanicCallerFrames hook, empty since the shadow stack left) into a per-thread snapshot; Recover() marks the recovering frame so the snapshot stays observable exactly while that frame is live. - Caller-info walks splice the snapshot below the live deferred frames at the defer-owner junction, keeping one panic-machinery frame where gc has runtime.gopanic (fixed Caller depths count it). - Hardware faults (SIGSEGV/SIGBUS and previously-fatal SIGFPE) install a SA_SIGINFO handler that captures from the interrupted ucontext pc/fp — the handler's own chain dead-ends at the signal trampoline — so fault tracebacks start at the fault site, through C frames into the Go callers. C is compiled with -fno-omit-frame-pointer so x86-64 chains hold. - Defer execution is attributed to the function's closing brace like gc, and explicit panic statements get their own statement anchor. Signal-path robustness (the reflectmake flake, ~7% -> 0 over 300 runs): - The recover mark reads the frame-pointer chain, which after siglongjmp can reach a stale/unmapped slot; the guarded read (msync page probe) lives in the public runtime via a RecoverMark hook, and the core just calls it — an unguarded read self-faulted and corrupted the value the recover was extracting. - The fault handler does no async-signal-unsafe work: the snapshot buffer is preallocated (no bdwgc malloc in signal context) and the page size is primed at install (no sysconf). SA_NODEFER + an unblock on capture keep a savemask=0 longjmp escape from leaving the fault signal blocked, and a re-entered handler restores the default disposition for one clean core; fault-context walks probe page readability before dereferencing. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Remove issue14646/issue5856/issue33724 xfails; the C-fault regression runs three sequential faults (a handler leaving the signal blocked after the longjmp escape cores on the second) and asserts the fault-site chain. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
f102328 to
2de72d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Second follow-up planned in #2004 (parallel with #2024): panic-site pc snapshots. gc runs deferred functions on top of the panicked stack, so
runtime.Caller,CallersFramesanddebug.Stackcalled from a deferred function — before or afterrecover()— see the panic frames. LLGo's longjmp unwinding removes them physically; this PR captures them at panic time and splices them back into every caller-info view.Mechanism
Panic()fills a per-thread snapshot through theSavePanicCallerFrameshook (empty since the shadow stack was removed).Recover()marks the recovering frame's fp: the snapshot stays observable exactly while that frame is live on the physical chain — matching how long gc keeps the panic frames on the stack — and can never leak into unrelated later walks.runtime.gopanic— fixedCallerdepths count it (issue5856'sCaller(2)). The no-snapshot fast path costs one TLS load.SA_SIGINFOhandler captures from the interrupted ucontext's pc/fp — the handler's own chain dead-ends at the signal trampoline, which is why fault snapshots were previously impossible. SIGSEGV, SIGBUS and previously-fatal SIGFPE (x86 integer division now recovers with gc's "integer divide by zero") all convert to Go panics, and their tracebacks start at the fault site, through C frames, into the Go callers. C is compiled with-fno-omit-frame-pointerso x86-64 chains hold.issue14646); explicitpanicstatements get their own statement anchor (issue5856); and a foreign (C) function linked between Go functions no longer steals the preceding Go function's name on darwin (dladdr cross-check inrefinePCSymbolLinewhen no statement anchor covers the pc — cold, cache-backed path only).Example —
test/_manualtest/cexcept, NULL store in C called from Go, unrecovered:goroot conformance
issue14646(deferredCallerlocation),issue5856(Caller(2)panic line through gopanic depth),issue33724(debug.Stackafter recover shows the panic-site method, not the inlined-away one).bug347/348,issue29504/4562/27201— the panicking statement's line in an untracked function needs statement-level granularity everywhere (P4b prebuilt pcline; statement anchors structurally requirenoinlineunder the ELF link-order mechanism, so they cannot be blanket-emitted today).Known limitations (documented in the manual test)
Validation (both platforms + amd64 before push)
test/go, goroot B-scope all green;plain/hot paths untouched (snapshot fast path is one TLS load).-fno-omit-frame-pointer).Based on #2023 (contains #2012/#2016/#2019); independent of #2024.
🤖 Generated with Claude Code