fix(cpu): deliver #GP instead of panicking on malformed IDT entries#1552
Open
ajmeese7 wants to merge 6 commits into
Open
fix(cpu): deliver #GP instead of panicking on malformed IDT entries#1552ajmeese7 wants to merge 6 commits into
ajmeese7 wants to merge 6 commits into
Conversation
Closes copy#636. call_interrupt_vector previously hit `panic!("Unimplemented: #GP handler")` at five sites when the IDT entry being dispatched had a bad gate type, a reserved-zeros violation, a null or out-of-table CS selector, a non-executable CS, or fell through the inter-privilege-level check. A real CPU raises #GP in those cases, escalates to #DF on a second nested fault, and shuts down (resets, on a typical PC) on a third. v86 was giving up at the first malformed-descriptor encounter instead of letting the guest see its own #GP. Each panic site now calls trigger_gp(error_code); return; matching the existing pattern other (non-panic) branches in the same function already use for software-int dpl-cpl mismatches and #NP delivery. Error codes follow Intel SDM 6.13: vector-index | IDT-bit for IDT-related faults, and stripped selector for CS-related faults. To bound the recursion, a new interrupt_recursion_depth counter is incremented on entry to call_interrupt_vector and decremented on every return path via Drop. When the depth exceeds INTERRUPT_RECURSION_LIMIT (8 -- well above the original-fault -> #GP -> #DF chain a real CPU goes through before triple-faulting), reset_cpu is called as the triple-fault analog. reset_cpu also zeroes the counter so a guest that triple-faulted into reset starts the next boot clean. Tests: - Added five host-side unit tests covering the new logic: * descriptor_validates_reserved_zeros - flags bit 4 of the access byte as a reserved-zeros violation * descriptor_recognizes_invalid_gate_type - rejects non-{trap,int, task} gate types * recursion_guard_decrements_on_drop - normal-return Drop path * recursion_guard_decrements_on_panic_unwind - unwind-return Drop path so early returns can never leak the counter * recursion_limit_is_above_typical_fault_chain - guards against a future tweak accidentally cutting the limit below 2 cargo test passes 7/7 (2 existing + 5 new). - tests/api/{reset,reboot,clean-shutdown,state}.js still pass on the patched build, confirming no regression on the existing reboot/reset paths. Outstanding validation: the original buildroot-bzimage reboot repro from the issue description does not currently reach a kernel-level reboot under either the stock or patched wasm with the buildroot images currently published at i.copy.sh -- busybox `reboot` exits silently without invoking the syscall on this configuration. The patched code is structurally what the panic sites should always have done; a maintainer with the original repro environment can re-enable reboot-buildroot.js (currently commented out in the api-tests target specifically with a copy#636 reference) to confirm.
… underflow Follow-up review fixes for the malformed-IDT #GP delivery path: - Triple-fault backstop in call_interrupt_vector zeroed the depth counter and then implicitly Drop'd the guard on return, leaving the counter at -1 and miscounting subsequent interrupts. Drop the guard explicitly before reset_cpu(). - Add gp_error_code_for_idt(vector, is_software_int) that sets the EXT bit (Intel SDM Vol 3 Section 6.13) for hardware-originated faults rather than always emitting `vector << 3 | 2`. Route all five IDT-related #GP sites in call_interrupt_vector through it. - Rename INTERRUPT_RECURSION_LIMIT to INTERRUPT_RUNAWAY_LIMIT and lower to 5; rewrite the comment to be honest that it bounds runaway recursion rather than emulating triple-fault semantics. - Drop pub from interrupt_recursion_depth (only the test module reads it, via super::). - Replace the host-only catch_unwind test (production uses panic = "abort") with a regression test for the Drop-underflow fix and a unit test for the new error-code helper. - Use std::ptr::read on the depth static in tests to silence static_mut_refs lints.
Hand-crafted boot sector that switches to 32-bit protected mode, builds
an IDT with vector 0x80 deliberately malformed (access byte 0x9E, the
reserved-zeros violation that previously hit panic!("Unimplemented:
#GP handler") in cpu.rs), then INT 0x80s into it.
The expected outcome is that call_interrupt_vector rejects the
malformed entry and delivers #GP to the in-guest vector-13 handler with
the error code prescribed by Intel SDM Vol 3 Section 6.13:
(vector << 3) | IDT(2) | EXT(0) = (0x80 << 3) | 2 = 0x402
The handler echoes "gp=00000402 OK" over COM1; the Node harness asserts
that exact string and treats a wasm panic during delivery as a hard
test failure.
Wires:
- tests/api/repro-636-malformed-idt.asm: the boot sector source.
- tests/api/repro-636-malformed-idt.js: the Node test harness, same
src/main.js loading pattern as the rest of tests/api/.
- Makefile: a generic tests/api/%.bin: tests/api/%.asm rule using
nasm -f bin, and an api-tests dependency on the new .bin plus a
final invocation of the harness.
A companion runaway-recursion test (malformed #GP IDT entry too) was
prototyped but dropped: its assertion ("no panic in 5 seconds") was
weak compared to what the host-side unit tests (recursion_guard_*,
explicit_drop_before_reset_avoids_underflow) already cover, and the
guard's correctness is locked in there.
Owner
|
Thanks for the contribution. Two notes:
|
Author
|
@copy I used Claude to write code and PR description. And yeah you're correct on the second point, sorry for the misrepresentation of the scope. I updated the description to note that this hits on the symptoms but doesn't address the root cause. |
The Makefile's api-tests target invokes test scripts directly (./tests/api/foo.js), which requires the executable bit. The new repro-636-malformed-idt.js was committed as 100644, causing CI to fail with "Permission denied" (exit 127). Set mode to 100755 to match the other api test scripts.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
call_interrupt_vectorhad six call sites that hitpanic!("Unimplemented: #GP handler")when the guest's IDT contained a malformed entry (bad gate type, reserved-bits violation, oversized vector, null/out-of-table CS selector). On real x86 hardware these conditions deliver#GPto the guest. In v86 they aborted the wasm module and surfaced to JS as a process-fatal panic.This PR replaces those panics with
trigger_gp(...)calls, adds a runaway-recursion backstop so a guest with a broken#GPhandler can't infinite-loop the emulator, and adds a small helper to encode the IDT error code per Intel SDM Vol 3 §6.13 (EXT bit set when the originating event was external to the program).Not a fix for #636. The bzImage-reboot bug is upstream of this panic, in the BIOS /
load_kernelinteraction (load_kernelonly runs atCPU.init,reboot_internaldoesn't reconstruct bzImage state). After this PR, that scenario will stop crashing the wasm module but still won't produce a working guest.Changes
Behavioral
panic!("Unimplemented: #GP handler")sites incall_interrupt_vectornow deliver#GPwith a properly formed error code:idtr_sizedpl < cpl(already non-panic, retained for consistency through the helper)gp_error_code_for_idt(vector, is_software_int)helper encodes the error code as(vector << 3) | IDT | EXT, whereEXT = !is_software_int. Previously the code unconditionally used| 2, dropping the EXT bit for hardware-originated faults.interrupt_recursion_depthcounter andInterruptRecursionGuard(Drop-based decrement) at the entry tocall_interrupt_vector. If recursion exceedsINTERRUPT_RUNAWAY_LIMIT(5), log it andreset_cpu()rather than stack-overflow the wasm interpreter. The guard isdrop'd explicitly before the reset to avoid an off-by-one underflow on the next interrupt.reset_cpu()zeros the counter on all reset paths.Tests
Host-side unit tests (
#[cfg(test)] mod testsincpu.rs)descriptor_validates_reserved_zeros—InterruptDescriptorflags bit-4-set as a reserved-zeros violation.descriptor_recognizes_invalid_gate_type— non-task/trap/interrupt gate types are rejected.recursion_guard_decrements_on_drop— Drop-based counter accounting works on normal exit.explicit_drop_before_reset_avoids_underflow— regression test that the runaway branch leaves the counter at 0, not -1.runaway_limit_allows_real_hardware_fault_chain— limit must allow the legitimate original → #GP → #DF chain.gp_error_code_sets_idt_bit_and_ext_per_origin— error-code helper sets IDT bit always and EXT bit only for hardware origin.Tests serialize on a local
Mutexbecause they touch astatic mut. Aread_depth()helper usesstd::ptr::read(&raw const ...)to avoidstatic_mut_refslints inassert_eq!macros.End-to-end api test
tests/api/repro-636-malformed-idt.{asm,js}— hand-crafted boot sector that switches to 32-bit protected mode, builds an IDT with vector 0x80 deliberately malformed (access byte 0x9E, the reserved-zeros violation), thenINT 0x80s into it. The in-guest #GP handler echoesgp=00000402 OKover COM1; the harness asserts that exact string and treats a wasm panic during delivery as a hard test failure.tests/api/%.bin: tests/api/%.asmrule inMakefileassembles the boot sector vianasm -f bin. The new test is wired into theapi-teststarget afterpic.js.Test plan
cargo test --lib cpu::tests::passes (6/6 new tests).cargo check --lib --testsclean (0 warnings on changed code).INT ns into it, and the guest's #GP handler receives the expected error code. Codified astests/api/repro-636-malformed-idt.{asm,js}and run as part ofmake api-tests.make api-testscovered the subset that runs in Node.)Notes for reviewers
selector & !3sites (lines ~943, ~952, etc., for CS-selector errors) use a different error-code structure (no IDT bit) and are out of scope; their pre-existing convention of dropping the EXT bit is unchanged.panic = "abort"is set for bothdevandreleaseprofiles, so theDropguard only matters on normalreturnpaths, which is what the existing early-exit branches use (return_on_pagefault!, explicitreturnaftertrigger_gp).panic!("Unimplemented: ...")siblings remain incpu.rs(VME, non-zero CS DPL from VM86 mode, etc.). Left as follow-up.