Skip to content

ssa: panic with runtime type assertion errors#1892

Open
cpunion wants to merge 56 commits into
xgo-dev:mainfrom
cpunion:codex/goroot-interface-coverage
Open

ssa: panic with runtime type assertion errors#1892
cpunion wants to merge 56 commits into
xgo-dev:mainfrom
cpunion:codex/goroot-interface-coverage

Conversation

@cpunion

@cpunion cpunion commented May 22, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • lower failed non-comma-ok type assertions through a runtime TypeAssertionError helper instead of panic(string)
  • add stable test/go coverage for recovered runtime.Error values on interface-to-interface and interface-to-concrete assertions
  • remove fixed fixedbugs/issue16130.go GOROOT xfails; leave issue32187 comparison xfails untouched

Testing

  • go test ./test/go -run 'TestInterfaceAssert|TestNilInterfaceSameTypeAssertPanics|TestInterfaceCompareUncomparable' -count=1\n- go test ./test/go -count=1\n- (cd runtime && go test ./internal/runtime -run 'TestDoesNotExist' -count=1)\n- LLGO_ROOT="$PWD" go run -tags=dev ./cmd/llgo test -run 'TestInterfaceAssert|TestNilInterfaceSameTypeAssertPanics|TestInterfaceCompareUncomparable' ./test/go\n- go test ./ssa -count=1\n- go test ./test/goroot -run TestGoRootRunCases/fixedbugs/issue16130.go -count=1 -args -goroot "$(go env GOROOT)" -dirs fixedbugs -case '^fixedbugs/issue16130\.go$' -xfail /tmp/llgo-no-xfail.yaml\n- go test ./test/goroot -run TestGoRootRunCases/fixedbugs/issue16130.go -count=1 -args -goroot "$(go env GOROOT)" -dirs fixedbugs -case '^fixedbugs/issue16130\.go$'\n\n## Notes\nGOROOT CI is slow/disabled, so this PR does not depend on automatic full GOROOT runs. The targeted local GOROOT check used the current machine GOROOT (go env GOROOT = /opt/homebrew/Cellar/go@1.24/1.24.11/libexec). Normal PR CI should still run from the fork branch.\n\nA non-targeted local LLGO_ROOT="$PWD" go run -tags=dev ./cmd/llgo test ./test/go run reached test execution but failed on darwin temp-dir cleanup with operation not permitted in existing temp-dir cleanup paths; the focused interface assertion llgo test above passed.

@codecov-commenter

codecov-commenter commented May 22, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@cpunion

cpunion commented May 24, 2026

Copy link
Copy Markdown
Collaborator Author

Update for latest interface/nointerface coverage (head c9e354c):

Scope:

  • interface/nointerface only; no chan/nil/print/recover/GC/liveness/finalizer/goroutine changes.
  • //go:nointerface methods are recorded during AST directive scan and omitted from runtime ABI method tables, while direct method calls/method expressions remain available.

GOROOT xfails removed after targeted local verification:

  • typeparam/mdempsky/15.go (//go:nointerface, generics, promoted methods)
  • fixedbugs/issue47928.go (promoted //go:nointerface method)
  • typeparam/mdempsky/16.go (interface type assertion panic text; already fixed by this PR's earlier assertion-error work)

Coverage:

  • Added test/go/nointerface_test.go plus llgo/non-llgo expectation files. Host Go's default toolchain does not filter nointerface without the fieldtrack experiment, while llgo should filter under the llgo tag.

Local tests:

  • go test ./test/go -count=1
  • go test ./ssa -count=1
  • go test ./cl -count=1
  • go test ./test/goroot -run TestGoRootRunCases -count=1 -args -goroot "$(go env GOROOT)" -dirs typeparam/mdempsky,fixedbugs -case '^(typeparam/mdempsky/(15|16)|fixedbugs/issue47928)\.go$' -directive-mode runlike -run-timeout 120s

Not covered here:

  • Remaining non-nointerface xfails such as typeparam/nested.go and unrelated chan/nil/print/recover/GC cases are intentionally left for their owning domains.

@cpunion

cpunion commented May 24, 2026

Copy link
Copy Markdown
Collaborator Author

Update for commit 64ae33c:\n\n- Added reflect/interface type identity coverage for reflect.Type.Method and MethodByName: Method.Func.Interface() now asserts to and calls func(*T) / func(T) using the canonical LLGo closure-backed function type.\n- Fixed reflect.Type interface invokes to request method ABI type initialization, included closure ABI symbols for reflect method FuncOf lookup, and boxed Type.Method Func values as LLGo closure values while preserving public reflect Type identity.\n- recover.go classification: without #1882 it still stops at the existing recover-frame failure (spurious recover). With #1882's recover-frame fix layered locally, the previous func(*T1) vs func(*T1) type identity failure is gone; the next observed failure is missing recover 10, which is recover semantics and intentionally not covered here. No recover.go xfail removed.\n- Local tests: go test ./test/go -count=1; go run ./cmd/llgo test -run TestReflectTypeMethodFuncInterfaceTypeIdentity -count=1 ./test/go; go test ./ssa -count=1; go test ./cl -count=1; go test ./internal/build -count=1; focused goroot recover.go with existing xfail passed.

@cpunion

cpunion commented May 24, 2026

Copy link
Copy Markdown
Collaborator Author

Coverage update for fixedbugs/issue26094.go (commit 5733be6aa6aa6f2bb3782a00ae49e7bc8c960989):

  • Classified as the same local type identity / interface assertion root as this draft PR: different scopes can produce the same printed type name, but interface assertion must compare scoped type identity and panic with different scopes.
  • Added stable test/go coverage in TestInterfaceAssertRejectsSameNameTypesFromDifferentScopes for local-vs-local and local-vs-package same-name type assertions.
  • Removed only the now-covered fixedbugs/issue26094.go goroot xfail entries for darwin/arm64 and linux/amd64. No nil/chan/print/recover/GC/finalizer/goroutine cases were changed.

Focused local validation:

  • go test ./test/go -run 'TestInterfaceAssert(RejectsSameNameTypesFromDifferentScopes|To.*Panics)' -count=1\n- go test ./test/goroot -run TestGoRootRunCases/fixedbugs/issue26094.go -count=1 -args -goroot /Users/lijie/sdk/go1.26.0 -dirs fixedbugs -case '^fixedbugs/issue26094\\.go$' -progress 30s\n- go test ./test/go -count=1\n- go test ./ssa/... ./cl/... ./runtime/... -count=1 reported ssa and cl OK; the command itself returned FAIL because ./runtime/... is a separate module path from the repo root.\n- (cd runtime && go test ./... -count=1) passed.

@cpunion

cpunion commented Jun 6, 2026

Copy link
Copy Markdown
Collaborator Author

Merged latest xgo-dev/main and resolved the remaining #1892 conflicts.

Conflict/coverage updates:

  • Kept the mainline reflect Method.Func closure boxing and combined it with this PR's interface assertion closure matching semantics.
  • Fixed the malformed adjacent xfail entries for typeparam/issue51521.go and typeparam/orderedmap.go while keeping ssa: panic with runtime type assertion errors #1892-owned xfails removed.
  • Added package-level coverage for //go:nointerface parsing/filtering and reflect-method closure ABI symbol filtering: cl/import_coverage_test.go, ssa/nointerface_test.go, internal/build/main_module_test.go.

Local verification:

  • go test -timeout 20m ./test/go -run 'Test(InterfaceAssert|NilInterfaceSameTypeAssertPanics|InterfaceCompareUncomparable|NoInterface|ReflectTypeMethodFuncInterfaceTypeIdentity)' -count=1\n- go test -timeout 20m ./ssa ./cl -run 'Test.*(Interface|NoInterface|FuncType|Method|TypeAssert|Import|ABI|Closure)|TestDoesNotExist' -count=1\n- cd runtime && go test -timeout 20m ./internal/runtime ./internal/lib/reflect -count=1\n- go test -timeout 30m ./test/goroot -run TestGoRootRunCases -count=1 -args -goroot /Users/lijie/sdk/go1.26.0 -dirs fixedbugs,typeparam/mdempsky -case '^(fixedbugs/(issue16130|issue26094|issue47928)|typeparam/mdempsky/(15|16))\.go$' -directive-mode runlike -run-timeout 120s\n- go test -timeout 20m ./cl -coverprofile=/tmp/llgo-1892-cl.cover -count=1 (93.8%)\n- go test -timeout 20m ./ssa -coverprofile=/tmp/llgo-1892-ssa.cover -count=1 (87.9%)\n- go test -timeout 20m ./internal/build -coverprofile=/tmp/llgo-1892-build.cover -count=1 (67.4%)\n- LLGO_ROOT="$PWD" go run -tags=dev ./cmd/llgo test -run 'Test(InterfaceAssert|NilInterfaceSameTypeAssertPanics|NoInterface|ReflectTypeMethodFuncInterfaceTypeIdentity)' -count=1 ./test/go\n- git diff --check && git diff --cached --check

@cpunion cpunion force-pushed the codex/goroot-interface-coverage branch 5 times, most recently from 14ffd15 to f0c4fd2 Compare June 10, 2026 11:30
@cpunion cpunion force-pushed the codex/goroot-interface-coverage branch from 71068c1 to 2cf0202 Compare June 20, 2026 02:54
@cpunion cpunion force-pushed the codex/goroot-interface-coverage branch 2 times, most recently from aafb17c to 8412508 Compare June 27, 2026 14:54
cpunion and others added 27 commits July 2, 2026 16:58
Post-link table generation plan: parse the linked binary's metadata
sections, dedup LTO inline copies against the symbol table, sort with a
sentinel, build Go-layout findfunctab via internal/pclntab, and write
back into a reserved section with ASLR-safe anchor offsets. Runtime
adopts the prebuilt table when the header validates and keeps first-use
construction as fallback. Includes the list of platform facts
established in xgo-dev#2012 so implementation does not re-derive them.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The monotonic time source had two problems:

- On Linux, runtimeNano passed clite's CLOCK_MONOTONIC, whose value is
  Darwin's clock id (6). Linux interprets 6 as CLOCK_MONOTONIC_COARSE,
  a millisecond-granularity clock: consecutive time.Now() readings were
  identical 100% of the time and the smallest nonzero delta was 1ms.
- On Darwin, clock_gettime(CLOCK_MONOTONIC) itself only has microsecond
  granularity (96% identical consecutive readings, 1us minimum delta).

Mirror Go's runtime structure with a per-OS nanotime1 in the runtime
package itself, keeping the hot path free of clite indirection and clite
unchanged: Darwin reads CLOCK_UPTIME_RAW through clock_gettime_nsec_np
(the same clock Go's nanotime uses there), Linux uses clock_gettime with
the OS-correct CLOCK_MONOTONIC id as a local constant, and remaining
platforms keep the previous behavior.

Measured with consecutive time.Now() deltas (min nonzero / zero-frac):
- macOS arm64: 1us / 96.5%  ->  41ns / 26%  (Go 1.26: 41ns / 22%)
- Linux arm64: 1ms / 100%   ->  41ns / 21%

time.Sleep, Timer and Ticker behave identically before and after.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The macOS CI LLDB step caught the funcinfo entry/stub site anchors
shifting instruction/scope layout: with the records emitted at function
entry, LLDB reported variables from an inner lexical block (ScopeIf's
b, c) as in scope before the block began. Debug builds carry full
DWARF, so the funcinfo tables are redundant there; gate the metadata
pipeline on !IsDbgEnabled(). Caller-frame instrumentation is
independent of this switch, so runtime.Caller keeps working in debug
builds. _lldb/runtest.sh: 194/194 pass.

This also covers Linux, where the same interference existed since the
sites were introduced but the LLDB suite only runs on the macOS jobs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Refine the previous commit: instead of disabling the whole funcinfo
metadata pipeline under LLGO_DEBUG/LLGO_DEBUG_SYMBOLS, add a separate
Program.EnableFuncInfoSites switch and turn off just the body-embedded
site records (entry/stub anchors and pc-line labels) — they are what
shifts instruction/scope layout and confused LLDB. The funcinfo tables
are plain data globals and stay enabled, so runtime.FuncForPC keeps its
normalized name and Func.FileLine keeps file/line in debug builds (via
the dlsym fallback path); runtime.Caller/Callers were never affected
because caller-frame instrumentation is independent of both switches.

Debug builds lose only the section fast paths (first-use latency) and
statement-level pc-line granularity, both redundant next to full DWARF.
_lldb/runtest.sh: 194/194; cl and test/go suites pass.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
frameFuncForPC could cache a Func built from a pcline frame whose entry
resolution failed (entry == 0); a later FuncForPC on the same PC would
then observe Entry() == 0 where its own constructor falls back to pc.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
LLGO_FUNCINFO_SITES=0 keeps the funcinfo metadata tables but drops the
body-embedded entry/stub/pc-line inline-asm sites. This is the narrow
A/B needed to isolate codegen perturbation caused by the in-body asm
anchors: with sites off, plain-code benchmarks match the no-funcinfo
baseline within noise, while sites on shifts hot runtime-internal
loops by -30%..+6% through inline/layout decisions.

Semantics with sites off: FuncForPC(entry) and Func.FileLine(entry)
keep working through the dlsym fallback path; statement/call-site
granularity PC line lookup is disabled, and first-use table
construction loses the section fast path.

Tests assert the split: tables still materialize while entry/stub
section asm, boundary symbols, and pc-line site labels are all absent.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
First stage of doc/design/pclntab-linkphase.md: parse a linked binary's
funcinfo entry/stub sections (Mach-O and ELF), deduplicate LTO inline
copies against the symbol table's text ranges, sort with a Go-style
sentinel, and build findfunctab through internal/pclntab — the faithful
port that has been waiting for exactly this caller. Read-only: prints
what the P2 build integration would write back.

Measured on the 576-target multipkg binaries:
- non-LTO: 9319 records -> ftab 3161 + 207 buckets; lookup self-check
  3160/3160; site sections 149KB -> 29KB (5.1x)
- LTO: 15371 entry records -> 13857 inline copies dropped, 4144 kept;
  self-check 3045/3045; 299KB -> 28.5KB (10.5x)

Findings for P2: on-disk Mach-O pointer slots hold dyld chained-fixup
encodings (low 36 bits are the target; decoded here; the write-back
design stores anchor-relative offsets and avoids pointers entirely),
and some non-LTO stub symbols are absent from the symbol table
(records conservatively dropped; needs tightening).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…adoption

pclnpost -write rewrites the entry-site section in place with the
prebuilt table (header + ftab {entryOff,funcIndex} + runtime-layout
findfunctab buckets), resolving funcinfo indexes through the binary's
symbol-index section, and voids the stub section (its records are
merged into the table). ASLR is handled by anchoring on the section's
own link-time address; entries are normalized to true symbol starts,
which retires the entry-PC slack on this path. macOS re-signs with an
ad-hoc codesign after rewriting.

The runtime adopts the table zero-copy when the magic header validates:
lookups binary-search the on-disk ftab directly through the shared
bucket index, nothing is materialized on first use (the funcIndex ->
entry map is built lazily and only for the pcline initializer), and the
cold scan/dladdr path is skipped since adoption is cheap. First-use
construction remains the fallback whenever the header is absent.

Linux end-to-end: entries=prebuilt, FuncForPC/FileLine correct,
first-FuncForPC 110µs (materializing) -> 6-8µs (zero-copy); 13ms on the
original macOS baseline. Known gap: on macOS the on-disk rewrite is
corrupted at load time because dyld still walks the stale chained-fixup
chain over the section; fix (unlinking the section's nodes from the
page chains in LC_DYLD_CHAINED_FIXUPS) is identified and next.
Non-prebuilt paths verified regression-free: cl + test/go suites pass,
smoke behavior unchanged.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Every llgo-linked executable (linux/darwin, sites enabled) now gets the
prebuilt ftab/findfunctab automatically: internal/build runs
internal/pclnpost.Rewrite after linkMainPkg, and any failure degrades
silently to the first-use construction fallback.

Moves the tool core into internal/pclnpost and hardens it:

- Canonical-record detection by FNV: a record survives when its anchor's
  owning symbol hashes to the record's symbolID (or is the __llgo_stub.
  wrapper of it). The previous one-per-symbolID rule wrongly collapsed a
  function with its stub — they share the target's symbolID by design —
  which broke exact-entry lookups (caught by TestRuntimeLineInfoAndStack
  on Linux). LTO inline copies are now identified exactly: 8.4k/9.5k
  copies removed in the LTO probes.
- Mach-O chained-fixups surgery: unlink the rewritten sections' pointer
  slots from the dyld page chains (repointing predecessors' next links
  and page_start entries) so dyld neither rebases slots inside the new
  table nor skips unrelated fixups after the zeroed stub section, then
  re-sign ad hoc. Without this the table was corrupted at load.
- LTO-safe metadata location: the entry section carries a meta record
  whose relocations hold the addresses of the symbol-index pointer and
  count globals; LTO internalization strips those names from the symbol
  table but relocations always resolve. Runtime skips the meta rows
  (pc==0 / symbolID==0).
- Idempotence guard (already-rewritten binaries are left alone).

Runtime fixes that surfaced during validation:

- materializePrebuiltEntries is now two-phase so concurrent losers wait
  for the winner's store instead of reading a nil entries slice.
- pcLineFrameForPC rejects nearest-below sites whose entry is
  unresolved when the caller knows the function entry, instead of
  leaking a neighboring function's file/line.

Validation: macOS cl (full) + test/go + LLDB 194/194; Linux test/go
TestRuntime suite; probes on both platforms report entries=prebuilt
with first-FuncForPC at 7-21µs (Linux) from 13ms on the original
baseline, and LTO builds drop 8-9.5k inline copies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…table

On Mach-O, pointer slots that name exported functions — every
__llgo_stub.* wrapper and any exported Go function — are emitted as
chained-fixup BIND nodes, not rebases. The rewriter only decoded rebase
nodes, so all stub records (and some entry records) were dropped as
unowned and never reached the prebuilt ftab; FuncForPC on function
values silently fell back to dladdr (~6µs per fresh pc on darwin).

- Parse the LC_DYLD_CHAINED_FIXUPS imports table and resolve bind
  ordinals to their in-image definitions.
- Match canonical owners against the record symbolID with underscore
  normalization (debug/macho's suffix-shared string table can surface
  one mangling underscore more or less than the source-level name).
- Splice the prebuilt header's base slot back into the fixup chain as a
  live rebase node: dyld writes the slid text base at load, so the
  runtime reads a ready runtime PC with no slide arithmetic (non-PIE
  ELF link-time values already equal runtime addresses).
- LLGO_PCLNPOST=0 escape hatch keeps first-use construction.

Fresh-pc FuncForPC slow path: darwin 6-8µs -> 1.2-1.7µs, linux
6.8µs -> 0.5µs; first-in-process lookup: darwin ~32µs -> ~14µs,
linux ~6.8µs -> ~4µs.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pure-compute probes (recursive fib, JSON round-trip, sort.Ints, map
churn) with no runtime introspection, so one harness run covers both
the introspection extremes and what the funcinfo machinery costs code
that never asks for it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Go's pclntab pages are touched by its own runtime (traceback, GC) long
before user code queries it, so its first FuncForPC never pays page-in.
Mirror that: when the prebuilt table is present, init adopts it
(zero-copy, sub-µs), touches the pages the lookup path reads (blob,
funcinfo records, string offsets, strings), runs one synthetic lookup
to warm the code paths, and write-warms the FuncForPC cache pages.

First-in-process FuncForPC: darwin ~17µs -> ~2.8µs, linux ~6.6µs ->
~1.0µs. Startup cost is page-count-bound (tens of µs on stdlib-sized
tables, invisible next to ~3ms process startup; hello-world medians
unchanged). Non-prebuilt binaries stay fully lazy: first-use
construction allocates, which has no place in init, and programs that
never introspect pay nothing.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
-depths generates deep_<N> scenarios at configurable call depths;
-bigsizes generates bigfunc scenarios (funcs x statements) whose large
bodies stress statement-level pcline density, mid-function pc
symbolization, and ordinary performance of big method bodies.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@cpunion cpunion force-pushed the codex/goroot-interface-coverage branch from 8412508 to aacb3cc Compare July 2, 2026 15:03
@cpunion

cpunion commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Rebased onto #2016 (codex/pclntab-linkphase-p1, which includes #2012) per the review-order plan: #2012#2016 → this line of semantics fixes. Conflicts resolved were additive (context fields, the noinline condition set, and runtime.Panic now calls SavePanicCallerFrames() before the panic-node bookkeeping). Note the PR base is still main, so the diff shows #2012/#2016 commits until those merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants