fix(codegen): stop emitting GC releases for globals through uninitialized frame offsets#86
Merged
Merged
Conversation
…ized frame offsets
Root cause of the long-standing intermittent exit-time crash in downstream
rcce2 CI (ItemsTest / OnlinePlayerChainTest, ~2-12% of runs, historically
mislabeled "Stack overflow!").
node.cpp deleteVars() emitted the GC release
__bbRelease( [ebp + Decl::offset], "type" )
for every struct/blitz-typed decl in the environ -- missing the d->kind
gate every sibling branch has. main()'s environ includes the program
GLOBALS, whose Decl::offset is never assigned by frame layout and was
omitted from the Decl constructor's initializer list: uninitialized
memory. When the stale value was 0 (most runs) the instruction read the
saved-EBP slot and _bbRelease lookup-missed -- harmless. When it was heap
garbage, main's epilogue performed a wild [ebp+garbage] read and access-
violated. GC on/off is irrelevant: the wild read precedes the release,
which is why the non-GC chain test crashed identically.
Fix: gate the emission on frame-resident kinds (DECL_LOCAL || DECL_PARAM
-- params must stay released: the call protocol references args on the
way in and the callee releases them; a LOCAL-only gate fails
GarbageCollectionTest), and zero-init Decl::offset as defense-in-depth.
Also extends the crash diagnostics that produced the evidence: registers,
a hex dump of the faulting anonymous executable allocation (dynamically
emitted Blitz code has no symbols), and a dbghelp-symbolized EBP-chain
stack walk in seTranslator.
Verification: pre-fix baseline ItemsTest 2/100 + 12/300 failures, chain
test 12/100 + 27/300; post-fix 0/300 + 0/300 (expected ~6-12 and ~27-36
at baseline rates; p < 1e-8). Full test.bat green.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
CoreyRDean
added a commit
that referenced
this pull request
Jun 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Root cause of the rcce2 CI flake, found and statistically killed. Follow-up to #85 (which fixed two real
Delete Eachdefects but the flake persisted at baseline rate — see the honest-update comment there).The mechanism
node.cpp deleteVars()emitted the GC release__bbRelease( [ebp + Decl::offset], "type" )for every struct/blitz-typed decl in the function's environ — missing thed->kindgate that every sibling branch (__bbStrRelease,__bbObjRelease,__bbVecFree) has. Formain(), that environ includes the program globals, whoseDecl::offsetis never assigned by frame layout and was omitted from theDeclconstructor's initializer list — uninitialized memory:mov ebx,[ebp+0]reads the saved-EBP slot,_bbReleaselookup-misses → run passes;mov ebx,[ebp+0x006D766B]-style wild read in main's epilogue →0xC0000005at a stable program-relative offset, after the last test, with counters already at 0.GC on/off is irrelevant — the wild read precedes the no-op release, which is why the non-GC
OnlinePlayerChainTestcrashed identically to the GCItemsTest.Evidence trail
0xC0000005at stable offsets (…0207/…0231) reading run-varying garbage.8B 9D <4 run-varying ASCII-fragment bytes>— while the same instruction in a-oexe's embedded object is8B 5D 00([ebp+0], disp8). Different encodings of the same logical instruction ⇒ the displacement differed at assembly time ⇒ codegen consumed run-varying uninitialized state.deleteVars+ uninitializedDecl::offsetis that state.The fix
__bbReleaseemission on frame-resident kinds:DECL_LOCAL || DECL_PARAM. Params must stay — the call protocol references args on the way in, the callee releases them here; a LOCAL-only first attempt failedGarbageCollectionTest(param refs leaked), which pins the contract.Decl::offsetin the constructor (defense-in-depth: any future read of a non-frame decl's offset is a deterministic 0, not heap noise).seTranslator(registers, anonymous-allocation image dump, dbghelp-symbolized stack walk) — they are what made this finable and will make the next one cheaper.Verification
At baseline rates, ~6-12 and ~27-36 failures were expected in 300 runs; observing 0 in both is p < 1e-8. Full BlitzForge
test.batgreen (includingGarbageCollectionTest, which caught the over-narrow first gate). An rcce2 submodule-bump PR follows; the rcce2 CLAUDE.md "known intermittent flake" workaround gets retired there.Risk
Codegen change is a strict narrowing (stops emitting one bogus instruction sequence for non-frame decls); param/local release behavior pinned by the existing GC suite. Diagnostics are crash-path-only.
🤖 Generated with Claude Code