You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposal: Runtime Funcinfo Metadata and Fast Runtime Lookup
Goal
Give LLGo Go-compatible runtime introspection — runtime.Caller, Callers, CallersFrames, FuncForPC, Func.FileLine, runtime.Stack, panic tracebacks — without DWARF, llvm-symbolizer or libunwind on the primary path, with DCE safety, controlled binary size, and performance at or beyond Go.
Architecture (three independent layers)
Metadata model (runtime: add Go-style funcinfo find index #2012, Stage 1) — Go-1.26-style compact function records + shared string pools + statement-level pcline records, emitted as DCE-safe sections: ELF SHF_LINK_ORDER (honored by --gc-sections), Mach-O live_support + one linker-private atom symbol per record (verified under ld64/lld -dead_strip, incl. LTO). findfunctab geometry follows Go (4KiB buckets × 16 subbuckets) with one deliberate deviation: uint16 subbucket deltas — LLGo has no 16-byte MINFUNC guarantee, so Go's uint8 could overflow; uint16 makes it structurally impossible. Runtime falls back to first-use table construction when the link-phase table is absent.
Link-phase table (runtime: link-phase ftab/findfunctab generation (Stage 2, P1–P3) #2016, Stage 2 P1–P3) — after linking, internal/pclnpost deduplicates records against the symbol table (FNV: a record is canonical iff its anchor's owning symbol hashes to its symbolID, or is that symbol's __llgo_stub. wrapper — LTO inline copies fail exactly), sorts, and rewrites the entry section in place with a prebuilt ftab+findfunctab the runtime adopts zero-copy. Mach-O specifics proved out: chained-fixup chain surgery, BIND-node decoding through the imports table (exported-function anchors are binds, not rebases), the header base slot spliced back as a live rebase so dyld writes the slid address, ad-hoc re-sign only when the input was signed. Overflow spills the blob into the larger stub section behind a LLGOFTB2 redirect — rows are never dropped (a gap-y table silently mis-names function-value pcs). Startup pre-warm (adopt + page-touch + one synthetic lookup) mirrors Go, whose pclntab is warm by main.
Physical unwinder (runtime: LLGo-owned frame-pointer unwinder (Stage 5) #2019, Stage 5) — every Go function on supported targets keeps the frame-pointer chain ("frame-pointer"="non-leaf"); Caller/Callers/CallersFrames/Stack and the unrecovered-panic dump walk [fp]/[fp+w] directly, symbolized through layer 2's tables with Go's pc-1 return-address convention. The walk is bounded to the program's own text (libc frames without FP discipline decode as wild pcs). Shadow-stack instrumentation is no longer emitted (LLGO_SHADOW_STACK=1 keeps the legacy emitters one release); linux binaries no longer link -lunwind (the clite fallback walks FP with dladdr names). Semantics are gc ground truth — physical stacks show every real frame; skip counts verified against go (interface chain MARK at skip 3, closure at 4). Two hardening rules: the compiler records its FP decision in a per-binary __llgo_fp_chain byte the runtime trusts (no inference from table presence), and every lookup path refines function records through one refinePCSymbolLine helper (statement record at pc, then pc-1) so FuncForPC/FileLine/CallersFrames cannot disagree on line attribution — the class of bug behind all six amd64 debug rounds.
Reference bar: Go 1.26 hot.Caller0 155/241ns; per-fresh-pc FuncForPC 40–125ns (zero-copy pclntab views) vs LLGo 0.4–1.7µs (string materialization — the headline P4 item). plain.* (fib/json/sort/map) is identical across all variants: code that never introspects pays nothing. Scale sweeps (48×48/96×96 pkgs×methods, depth 128/512, big methods 32×200/16×2000) hardened #2016 (overflow spill, per-row Func cache — 96×96 batch lookups 8–19ms → 147µs +LTO, beating Go's 179µs) and quantified the shadow-stack tax (~160ns/frame) that #2019 removed.
Platform support (three gates: FP attr → walker → symbolization)
Target
Status
darwin arm64/amd64, linux arm64/amd64
green in CI; AAPCS64 mandates FP, attr forces it on x86-64
embedded (ESP32…) / wasm
attr off by design (NeedsFramePointer): unwinder doesn't exist there (runtime gated !baremetal && !wasm); FP layout change also perturbed the conservative GC on ESP32-C3. Behavior = pre-Stage-5; LLGO_SHADOW_STACK=1 restores caller tracking — on wasm the shadow stack is the only possible mechanism
BSDs / android / ios
same ELF/Mach-O + SysV/AAPCS mechanics; expected to be a GOOS-whitelist change only
windows
needs COFF equivalents of the DCE-safe sections first (bigger than the unwinder); then either clang-kept RBP chains with the same walker, or SEH RtlVirtualUnwind. Currently attr off → graceful degradation
riscv64 / arm32
known gap: frame layout differs (RISC-V: ra at fp−8, prev fp at fp−16), walker offsets are arm64/x86-64-specific; the FP gate should whitelist arm64/amd64 explicitly until per-arch offsets land
Obstacle assessment for full support: BSD/android/ios — mechanical (GOOS whitelist). riscv/arm32 — per-arch walker offsets (~10 lines each; arm32 thumb/arm FP registers unify under clang's attr). Windows — bounded, path known: COFF IMAGE_COMDAT_SELECT_ASSOCIATIVE is the direct link-order analog, PE .reloc rewriting is simpler than the Mach-O chained-fixups we already handle, and clang keeps RBP chains on Win x64 so the same walker works (SEH optional); the real gate is LLGo's Windows runtime base, not funcinfo. wasm — physical unwinding is impossible by spec: full support means shadow stack as the permanent wasm backend (escape hatch exists; flip to default there), and wasm-ld section-GC semantics for the metadata need dedicated verification. Embedded — full support should be opt-in metadata (LLGO_FUNCINFO=0 exists) + P4d shrink given flash budgets; the conservative-GC/FP interaction is gated off today and only a precise-scanning project would lift it.
Fallback ladder guarantees no platform regresses below Stage 1: tables (FuncForPC/FileLine) work everywhere; Callers falls back shadow-stack → clite FP walk.
Follow-ups (priority order)
P4a zero-copy joined names (unsafe.String views over pre-joined strings): removes the 0.4–1.7µs per-function materialization — the only remaining structural gap vs Go on fresh-pc and >4k-target batch lookups; also enables per-funcIndex Func caching. Size cost ~+150–250KB stdlib (still ≪ Go's 1.41MB pclntab).
P4c !pcsections as the site mechanism: removes the ±% inline/layout perturbation of body-embedded site asm.
P4d section shrink: name pool dominates at extreme function counts (96×96: 15.4 vs Go 9 MiB; typical scenarios already ≤ Go).
Stage 4 stub removal (stubs already merged into the prebuilt ftab; mainly deletes generation), Stage 3 LTO policy.
Unwinder arch ports per the platform table; inline-tree (with P4b) to lift noinline from tracked functions (stdlib.Work's residual ~23µs vs Go 6.2µs is noinline + LLGo baseline, not unwind cost).
Benchmarks: benchmark/runtime_funcinfo (in-tree since runtime: link-phase ftab/findfunctab generation (Stage 2, P1–P3) #2016) covers hot/deep/multipkg/cold/stdlib/plain plus -scales/-depths/-bigsizes; codecov ignores benchmark/chore/cltest; coverage uploads from both OSes (single-platform uploads mark the other OS's branches dead).
Proposal: Runtime Funcinfo Metadata and Fast Runtime Lookup
Goal
Give LLGo Go-compatible runtime introspection —
runtime.Caller,Callers,CallersFrames,FuncForPC,Func.FileLine,runtime.Stack, panic tracebacks — without DWARF,llvm-symbolizeror libunwind on the primary path, with DCE safety, controlled binary size, and performance at or beyond Go.Architecture (three independent layers)
SHF_LINK_ORDER(honored by--gc-sections), Mach-Olive_support+ one linker-private atom symbol per record (verified under ld64/lld-dead_strip, incl. LTO). findfunctab geometry follows Go (4KiB buckets × 16 subbuckets) with one deliberate deviation: uint16 subbucket deltas — LLGo has no 16-byte MINFUNC guarantee, so Go's uint8 could overflow; uint16 makes it structurally impossible. Runtime falls back to first-use table construction when the link-phase table is absent.internal/pclnpostdeduplicates records against the symbol table (FNV: a record is canonical iff its anchor's owning symbol hashes to its symbolID, or is that symbol's__llgo_stub.wrapper — LTO inline copies fail exactly), sorts, and rewrites the entry section in place with a prebuiltftab+findfunctab the runtime adopts zero-copy. Mach-O specifics proved out: chained-fixup chain surgery, BIND-node decoding through the imports table (exported-function anchors are binds, not rebases), the header base slot spliced back as a live rebase so dyld writes the slid address, ad-hoc re-sign only when the input was signed. Overflow spills the blob into the larger stub section behind aLLGOFTB2redirect — rows are never dropped (a gap-y table silently mis-names function-value pcs). Startup pre-warm (adopt + page-touch + one synthetic lookup) mirrors Go, whose pclntab is warm bymain."frame-pointer"="non-leaf");Caller/Callers/CallersFrames/Stackand the unrecovered-panic dump walk[fp]/[fp+w]directly, symbolized through layer 2's tables with Go'spc-1return-address convention. The walk is bounded to the program's own text (libc frames without FP discipline decode as wild pcs). Shadow-stack instrumentation is no longer emitted (LLGO_SHADOW_STACK=1keeps the legacy emitters one release); linux binaries no longer link-lunwind(the clite fallback walks FP with dladdr names). Semantics are gc ground truth — physical stacks show every real frame; skip counts verified against go (interface chain MARK at skip 3, closure at 4). Two hardening rules: the compiler records its FP decision in a per-binary__llgo_fp_chainbyte the runtime trusts (no inference from table presence), and every lookup path refines function records through onerefinePCSymbolLinehelper (statement record at pc, then pc-1) soFuncForPC/FileLine/CallersFramescannot disagree on line attribution — the class of bug behind all six amd64 debug rounds.Stage / PR status
main+LTO mac crash worth root-causing there)!pcsections, section shrink, stub removalWhat each PR owns (with final numbers, best-of, mac/linux)
Reference bar: Go 1.26 hot.Caller0 155/241ns; per-fresh-pc FuncForPC 40–125ns (zero-copy pclntab views) vs LLGo 0.4–1.7µs (string materialization — the headline P4 item).
plain.*(fib/json/sort/map) is identical across all variants: code that never introspects pays nothing. Scale sweeps (48×48/96×96 pkgs×methods, depth 128/512, big methods 32×200/16×2000) hardened #2016 (overflow spill, per-row Func cache — 96×96 batch lookups 8–19ms → 147µs +LTO, beating Go's 179µs) and quantified the shadow-stack tax (~160ns/frame) that #2019 removed.Platform support (three gates: FP attr → walker → symbolization)
NeedsFramePointer): unwinder doesn't exist there (runtime gated!baremetal && !wasm); FP layout change also perturbed the conservative GC on ESP32-C3. Behavior = pre-Stage-5;LLGO_SHADOW_STACK=1restores caller tracking — on wasm the shadow stack is the only possible mechanismRtlVirtualUnwind. Currently attr off → graceful degradationObstacle assessment for full support: BSD/android/ios — mechanical (GOOS whitelist). riscv/arm32 — per-arch walker offsets (~10 lines each; arm32 thumb/arm FP registers unify under clang's attr). Windows — bounded, path known: COFF
IMAGE_COMDAT_SELECT_ASSOCIATIVEis the direct link-order analog, PE.relocrewriting is simpler than the Mach-O chained-fixups we already handle, and clang keeps RBP chains on Win x64 so the same walker works (SEH optional); the real gate is LLGo's Windows runtime base, not funcinfo. wasm — physical unwinding is impossible by spec: full support means shadow stack as the permanent wasm backend (escape hatch exists; flip to default there), and wasm-ld section-GC semantics for the metadata need dedicated verification. Embedded — full support should be opt-in metadata (LLGO_FUNCINFO=0exists) + P4d shrink given flash budgets; the conservative-GC/FP interaction is gated off today and only a precise-scanning project would lift it.Fallback ladder guarantees no platform regresses below Stage 1: tables (FuncForPC/FileLine) work everywhere; Callers falls back shadow-stack → clite FP walk.
Follow-ups (priority order)
unsafe.Stringviews over pre-joined strings): removes the 0.4–1.7µs per-function materialization — the only remaining structural gap vs Go on fresh-pc and >4k-target batch lookups; also enables per-funcIndex Func caching. Size cost ~+150–250KB stdlib (still ≪ Go's 1.41MB pclntab).!pcsectionsas the site mechanism: removes the ±% inline/layout perturbation of body-embedded site asm.noinlinefrom tracked functions (stdlib.Work's residual ~23µs vs Go 6.2µs is noinline + LLGo baseline, not unwind cost).PR housekeeping / merge order
benchmark/runtime_funcinfo(in-tree since runtime: link-phase ftab/findfunctab generation (Stage 2, P1–P3) #2016) covers hot/deep/multipkg/cold/stdlib/plain plus-scales/-depths/-bigsizes; codecov ignores benchmark/chore/cltest; coverage uploads from both OSes (single-platform uploads mark the other OS's branches dead).Acceptance criteria