Skip to content

runtime: add statement line caller frames#2002

Closed
cpunion wants to merge 18 commits into
xgo-dev:mainfrom
cpunion:codex/runtime-stmtline-table
Closed

runtime: add statement line caller frames#2002
cpunion wants to merge 18 commits into
xgo-dev:mainfrom
cpunion:codex/runtime-stmtline-table

Conversation

@cpunion

@cpunion cpunion commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

Changes:

  • Add compact runtime funcinfo and statement-line PC metadata for runtime.Caller, CallersFrames, FuncForPC, Func.FileLine, and stack paths.
  • Store DCE-safe string/id records instead of function pointers; keep the post-DCE filtering hook available.
  • Compress package/function and root/file strings into shared pools with 16-byte records and uint16 symbol hash buckets.
  • Resolve indirect caller paths, function-value entry PCs, closure stub aliases, and first executable statement lines.
  • Make lazy table initialization thread-safe and keep func-PC fallback initialization low-allocation.
  • Ignore linked-module external declarations when collecting ABI roots.

Local validation:

  • macOS: go test ./test/go -run 'TestRuntime(LineInfoAndStack|StatementLineInfo|FuncInfoConcurrentFirstUse)' -count=1
  • macOS Go 1.26: llgo test -c github.com/goplus/llgo/test/go, then full compiled test/go binary: PASS
  • macOS: go test ./internal/build -run 'TestLinkedModuleGlobals|TestFuncInfoTable|TestPrepareFuncInfo|TestIsFuncInfoEnabled' -count=1
  • macOS: go test ./cl -run TestFuncInfoMetadataEmission -count=1; go test ./internal/build/funcinfo -count=1; go test ./ssa -run 'TestClosureFuncDeclValue|TestClosureFuncPtrValue|TestDevLTOGlobalDCEFuncInfo' -count=1
  • Linux arm64 container: full performance probes built and ran in an 8 GiB Colima profile; internal funcinfo/build tests, cl, and ssa target tests passed in the low-memory container.

Benchmark methodology:

  • Performance cells are best / trimmed avg, using ns or us; trimmed avg drops the min and max run from 11 process runs.
  • current is this PR. current+lto is this PR built with full LTO for comparison only; this PR does not change LTO defaults.
  • main does not provide valid funcinfo for entry-PC or synthetic-PC FuncForPC/Func.FileLine, so those cells are n/a.
  • Size cells are MiB.

Local macOS arm64 performance:

Test main go current current+lto
entry FuncForPC only n/a 4/4ns 6/7ns 4/4.2ns
entry Func.Name only n/a 10/10ns 6/7ns 4/4ns
entry FuncForPC + Func.Name n/a 13/13.7ns 7/7.1ns 5/5ns
entry Func.FileLine n/a 9/9.9ns 8/8ns 5/5ns
runtime.Caller(0) 9.3/9.4us 153/156.4ns 44/45.9ns 36/38.3ns
runtime.Caller(1) 5.9/5.9us 171/173.6ns 74/76.4ns 61/62.7ns
runtime.Callers only 13.6/13.7us 123/126.9ns 141/152.9ns 134/140.3ns
CallersFrames.Next first frame 21.3/21.5us 267/270.6ns 351/364.1ns 306/313.2ns
synthetic-PC FuncForPC + Name n/a 13/13.3ns 43/43.4ns 33/34.4ns
synthetic-PC Func.FileLine n/a 13/14.4ns 45/46.2ns 35/35.8ns

Local Linux arm64 performance:

Test main go current current+lto
entry FuncForPC only n/a 4/4ns 7/7.8ns 5/6.1ns
entry Func.Name only n/a 10/10ns 6/6.9ns 5/6ns
entry FuncForPC + Func.Name n/a 13/13.6ns 8/8ns 6/6ns
entry Func.FileLine n/a 9/10.1ns 8/8.8ns 6/7ns
runtime.Caller(0) 1.3/1.3us 164/168.4ns 50/54.4ns 40/47.8ns
runtime.Caller(1) 4.4/4.5us 178/182.4ns 90/100ns 80/86.7ns
runtime.Callers only 8.4/8.5us 123/127.1ns 160/182.2ns 160/175.6ns
CallersFrames.Next first frame 8.8/8.9us 272/280.4ns 440/446.7ns 380/400ns
synthetic-PC FuncForPC + Name n/a 15/15ns 52/54.7ns 46/47.1ns
synthetic-PC Func.FileLine n/a 14/14.9ns 54/57.3ns 48/48.9ns

Local macOS arm64 binary size:

Probe main main + DWARF go current current+lto
entry-PC 1.83 MiB 2.42 MiB 2.24 MiB 2.04 MiB 1.59 MiB
caller/stack 1.83 MiB 2.42 MiB 2.26 MiB 2.04 MiB 1.60 MiB

Local Linux arm64 binary size:

Probe main main + DWARF go current current+lto
entry-PC 1.78 MiB 4.26 MiB 2.15 MiB 2.07 MiB 1.85 MiB
caller/stack 1.78 MiB 4.27 MiB 2.15 MiB 2.07 MiB 1.86 MiB

Notes:

  • Caller source locations, entry-PC function names, and file/line results are correct in the local probes.
  • Full LTO improves both size and the remaining lookup costs, but this PR does not change LTO defaults.
  • Follow-ups are tracked in Proposal: runtime funcinfo metadata and fast FuncForPC lookup #2004: full-LTO policy, exact entry-PC lookup, and a separate CallersFrames/synthetic-PC fast path.

@codecov

codecov Bot commented Jun 29, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 99.05808% with 6 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
cl/instr.go 98.71% 3 Missing and 3 partials ⚠️

📢 Thoughts on this report? Let us know!

@cpunion cpunion force-pushed the codex/runtime-stmtline-table branch from 831894f to 506555b Compare June 29, 2026 16:30
@cpunion cpunion force-pushed the codex/runtime-stmtline-table branch from 506555b to a7f0383 Compare June 29, 2026 23:48
@cpunion cpunion marked this pull request as draft July 1, 2026 15:46
@cpunion

cpunion commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Superseded by #2012: the statement-line/funcinfo baseline was reimplemented there as the full Go 1.26 pclntab-style model (statement-level pcline records included), extended by #2016's link-phase table. Review continues on #2012 + #2016.

@cpunion cpunion closed this Jul 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant