Skip to content

fix(cpu): deliver #GP instead of panicking on malformed IDT entries#1552

Open
ajmeese7 wants to merge 6 commits into
copy:masterfrom
ajmeese7:fix-636-gp-handler-panic
Open

fix(cpu): deliver #GP instead of panicking on malformed IDT entries#1552
ajmeese7 wants to merge 6 commits into
copy:masterfrom
ajmeese7:fix-636-gp-handler-panic

Conversation

@ajmeese7
Copy link
Copy Markdown

@ajmeese7 ajmeese7 commented May 3, 2026

Summary

call_interrupt_vector had six call sites that hit panic!("Unimplemented: #GP handler") when the guest's IDT contained a malformed entry (bad gate type, reserved-bits violation, oversized vector, null/out-of-table CS selector). On real x86 hardware these conditions deliver #GP to the guest. In v86 they aborted the wasm module and surfaced to JS as a process-fatal panic.

This PR replaces those panics with trigger_gp(...) calls, adds a runaway-recursion backstop so a guest with a broken #GP handler can't infinite-loop the emulator, and adds a small helper to encode the IDT error code per Intel SDM Vol 3 §6.13 (EXT bit set when the originating event was external to the program).

Not a fix for #636. The bzImage-reboot bug is upstream of this panic, in the BIOS / load_kernel interaction (load_kernel only runs at CPU.init, reboot_internal doesn't reconstruct bzImage state). After this PR, that scenario will stop crashing the wasm module but still won't produce a working guest.

Changes

Behavioral

  • Six panic!("Unimplemented: #GP handler") sites in call_interrupt_vector now deliver #GP with a properly formed error code:
    • vector out of idtr_size
    • software interrupt with dpl < cpl (already non-panic, retained for consistency through the helper)
    • invalid gate type
    • reserved-zeros violation in the descriptor
    • null CS selector in the gate
    • CS selector outside the GDT/LDT limit
  • New gp_error_code_for_idt(vector, is_software_int) helper encodes the error code as (vector << 3) | IDT | EXT, where EXT = !is_software_int. Previously the code unconditionally used | 2, dropping the EXT bit for hardware-originated faults.
  • Added interrupt_recursion_depth counter and InterruptRecursionGuard (Drop-based decrement) at the entry to call_interrupt_vector. If recursion exceeds INTERRUPT_RUNAWAY_LIMIT (5), log it and reset_cpu() rather than stack-overflow the wasm interpreter. The guard is drop'd explicitly before the reset to avoid an off-by-one underflow on the next interrupt. reset_cpu() zeros the counter on all reset paths.

Tests

Host-side unit tests (#[cfg(test)] mod tests in cpu.rs)

  • descriptor_validates_reserved_zerosInterruptDescriptor flags bit-4-set as a reserved-zeros violation.
  • descriptor_recognizes_invalid_gate_type — non-task/trap/interrupt gate types are rejected.
  • recursion_guard_decrements_on_drop — Drop-based counter accounting works on normal exit.
  • explicit_drop_before_reset_avoids_underflow — regression test that the runaway branch leaves the counter at 0, not -1.
  • runaway_limit_allows_real_hardware_fault_chain — limit must allow the legitimate original → #GP → #DF chain.
  • gp_error_code_sets_idt_bit_and_ext_per_origin — error-code helper sets IDT bit always and EXT bit only for hardware origin.

Tests serialize on a local Mutex because they touch a static mut. A read_depth() helper uses std::ptr::read(&raw const ...) to avoid static_mut_refs lints in assert_eq! macros.

End-to-end api test

  • tests/api/repro-636-malformed-idt.{asm,js} — hand-crafted boot sector that switches to 32-bit protected mode, builds an IDT with vector 0x80 deliberately malformed (access byte 0x9E, the reserved-zeros violation), then INT 0x80s into it. The in-guest #GP handler echoes gp=00000402 OK over COM1; the harness asserts that exact string and treats a wasm panic during delivery as a hard test failure.
  • A generic tests/api/%.bin: tests/api/%.asm rule in Makefile assembles the boot sector via nasm -f bin. The new test is wired into the api-tests target after pic.js.

Test plan

  • cargo test --lib cpu::tests:: passes (6/6 new tests).
  • cargo check --lib --tests clean (0 warnings on changed code).
  • Hand-crafted boot sector installs a malformed IDT entry, INT ns into it, and the guest's #GP handler receives the expected error code. Codified as tests/api/repro-636-malformed-idt.{asm,js} and run as part of make api-tests.
  • Sanity-check existing OS boots (Linux, ReactOS, etc.) for any regression in normal interrupt delivery. (Maintainer-run; make api-tests covered the subset that runs in Node.)

Notes for reviewers

  • The error-code helper is intentionally scoped to the IDT-related #GP sites. The selector & !3 sites (lines ~943, ~952, etc., for CS-selector errors) use a different error-code structure (no IDT bit) and are out of scope; their pre-existing convention of dropping the EXT bit is unchanged.
  • panic = "abort" is set for both dev and release profiles, so the Drop guard only matters on normal return paths, which is what the existing early-exit branches use (return_on_pagefault!, explicit return after trigger_gp).
  • A handful of pre-existing panic!("Unimplemented: ...") siblings remain in cpu.rs (VME, non-zero CS DPL from VM86 mode, etc.). Left as follow-up.

ajmeese7 and others added 4 commits May 2, 2026 21:52
Closes copy#636.

call_interrupt_vector previously hit `panic!("Unimplemented: #GP handler")`
at five sites when the IDT entry being dispatched had a bad gate type, a
reserved-zeros violation, a null or out-of-table CS selector, a
non-executable CS, or fell through the inter-privilege-level check. A
real CPU raises #GP in those cases, escalates to #DF on a second nested
fault, and shuts down (resets, on a typical PC) on a third. v86 was
giving up at the first malformed-descriptor encounter instead of letting
the guest see its own #GP.

Each panic site now calls trigger_gp(error_code); return; matching the
existing pattern other (non-panic) branches in the same function already
use for software-int dpl-cpl mismatches and #NP delivery. Error codes
follow Intel SDM 6.13: vector-index | IDT-bit for IDT-related faults,
and stripped selector for CS-related faults.

To bound the recursion, a new interrupt_recursion_depth counter is
incremented on entry to call_interrupt_vector and decremented on every
return path via Drop. When the depth exceeds INTERRUPT_RECURSION_LIMIT
(8 -- well above the original-fault -> #GP -> #DF chain a real CPU
goes through before triple-faulting), reset_cpu is called as the
triple-fault analog. reset_cpu also zeroes the counter so a guest that
triple-faulted into reset starts the next boot clean.

Tests:

  - Added five host-side unit tests covering the new logic:
      * descriptor_validates_reserved_zeros - flags bit 4 of the access
        byte as a reserved-zeros violation
      * descriptor_recognizes_invalid_gate_type - rejects non-{trap,int,
        task} gate types
      * recursion_guard_decrements_on_drop - normal-return Drop path
      * recursion_guard_decrements_on_panic_unwind - unwind-return Drop
        path so early returns can never leak the counter
      * recursion_limit_is_above_typical_fault_chain - guards against a
        future tweak accidentally cutting the limit below 2
    cargo test passes 7/7 (2 existing + 5 new).

  - tests/api/{reset,reboot,clean-shutdown,state}.js still pass on the
    patched build, confirming no regression on the existing reboot/reset
    paths.

Outstanding validation: the original buildroot-bzimage reboot repro from
the issue description does not currently reach a kernel-level reboot
under either the stock or patched wasm with the buildroot images
currently published at i.copy.sh -- busybox `reboot` exits silently
without invoking the syscall on this configuration. The patched code
is structurally what the panic sites should always have done; a
maintainer with the original repro environment can re-enable
reboot-buildroot.js (currently commented out in the api-tests target
specifically with a copy#636 reference) to confirm.
… underflow

Follow-up review fixes for the malformed-IDT #GP delivery path:

- Triple-fault backstop in call_interrupt_vector zeroed the depth counter
  and then implicitly Drop'd the guard on return, leaving the counter at
  -1 and miscounting subsequent interrupts. Drop the guard explicitly
  before reset_cpu().
- Add gp_error_code_for_idt(vector, is_software_int) that sets the EXT
  bit (Intel SDM Vol 3 Section 6.13) for hardware-originated faults
  rather than always emitting `vector << 3 | 2`. Route all five
  IDT-related #GP sites in call_interrupt_vector through it.
- Rename INTERRUPT_RECURSION_LIMIT to INTERRUPT_RUNAWAY_LIMIT and lower
  to 5; rewrite the comment to be honest that it bounds runaway recursion
  rather than emulating triple-fault semantics.
- Drop pub from interrupt_recursion_depth (only the test module reads
  it, via super::).
- Replace the host-only catch_unwind test (production uses
  panic = "abort") with a regression test for the Drop-underflow fix and
  a unit test for the new error-code helper.
- Use std::ptr::read on the depth static in tests to silence
  static_mut_refs lints.
Hand-crafted boot sector that switches to 32-bit protected mode, builds
an IDT with vector 0x80 deliberately malformed (access byte 0x9E, the
reserved-zeros violation that previously hit panic!("Unimplemented:
#GP handler") in cpu.rs), then INT 0x80s into it.

The expected outcome is that call_interrupt_vector rejects the
malformed entry and delivers #GP to the in-guest vector-13 handler with
the error code prescribed by Intel SDM Vol 3 Section 6.13:
  (vector << 3) | IDT(2) | EXT(0) = (0x80 << 3) | 2 = 0x402
The handler echoes "gp=00000402 OK" over COM1; the Node harness asserts
that exact string and treats a wasm panic during delivery as a hard
test failure.

Wires:
  - tests/api/repro-636-malformed-idt.asm: the boot sector source.
  - tests/api/repro-636-malformed-idt.js: the Node test harness, same
    src/main.js loading pattern as the rest of tests/api/.
  - Makefile: a generic tests/api/%.bin: tests/api/%.asm rule using
    nasm -f bin, and an api-tests dependency on the new .bin plus a
    final invocation of the harness.

A companion runaway-recursion test (malformed #GP IDT entry too) was
prototyped but dropped: its assertion ("no panic in 5 seconds") was
weak compared to what the host-side unit tests (recursion_guard_*,
explicit_drop_before_reset_avoids_underflow) already cover, and the
guard's correctness is locked in there.
@copy
Copy link
Copy Markdown
Owner

copy commented May 3, 2026

Thanks for the contribution. Two notes:

  • Could you please disclose if LLMs were used in this PR and if so, in what capacity? (e.g., to write code, only for reviews, etc.)
  • Even though these changes are fine, I don't think they fix Guest reboots fail when using bzImage #636. Last time I checked, it was caused by some interaction with the bios and load_kernel, where load_kernel is only called on boot, and later the bios doesn't properly put the bzimage into memory (most of the work of a reboot is done by the bios)

@ajmeese7
Copy link
Copy Markdown
Author

ajmeese7 commented May 5, 2026

@copy I used Claude to write code and PR description. And yeah you're correct on the second point, sorry for the misrepresentation of the scope. I updated the description to note that this hits on the symptoms but doesn't address the root cause.

The Makefile's api-tests target invokes test scripts directly
(./tests/api/foo.js), which requires the executable bit. The new
repro-636-malformed-idt.js was committed as 100644, causing CI to fail
with "Permission denied" (exit 127). Set mode to 100755 to match the
other api test scripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants