Skip to content

fix: propagate a raising key __eq__ instead of masking it as SystemError (#36)#79

Merged
toloco merged 2 commits into
masterfrom
fix/36-richcompare-error
Jun 18, 2026
Merged

fix: propagate a raising key __eq__ instead of masking it as SystemError (#36)#79
toloco merged 2 commits into
masterfrom
fix/36-richcompare-error

Conversation

@toloco

@toloco toloco commented Jun 18, 2026

Copy link
Copy Markdown
Owner

Closes #36

What & why

CacheKey::eq and BorrowedArgs::equivalent compared PyObject_RichCompareBool(...) == 1. RichCompareBool returns -1 with a Python exception set when a key's __eq__ raises — and == 1 maps that to "not equal", leaving the exception pending on the thread. The lookup reports a spurious miss; the call path recomputes / returns Ok while an exception is set; PyO3 then raises a confusing SystemError: ... returned a result with an exception set, masking the user's real error (e.g. RuntimeError). functools.lru_cache checks the -1 sentinel; we didn't.

Reachable purely from safe Python: cache a key whose __hash__ collides with a stored key and whose __eq__ raises (e.g. f(Bad(42)); f(Bad(42))).

The fix

Both comparisons now go through a shared key.rs::rich_compare_eq helper:

  • returns false ("not equal") for both 0 and -1, but on -1 leaves the exception set for the caller to fetch;
  • bails early (returns false) if an exception is already pending, so a multi-key probe never re-enters the interpreter with a live exception and clobbers the original error.

Each memory-backend lookup site then calls PyErr::take(py) after the lookup and returns the error instead of proceeding:

  • __call__ — read path (before recomputing) and the write-path double-check (before inserting),
  • get, _probe.

evict_one and the write-path remove/insert compare a key against itself (same PyObject*), so RichCompareBool short-circuits to 1 via the identity fast path and never runs __eq__ — no check needed there.

Test (tests/test_raising_eq.py)

A key whose __eq__ raises now propagates the original exception through __call__, get, and _probe; the cache stays usable afterward; and hash-colliding keys that compare cleanly still cache independently (guards against over-eager error detection).

Verified fail-before → pass-after by stashing just the fix: the three read-path tests + the resilience test raised SystemError: ... returned a result with an exception set; with the fix all five pass.

Gates run (risky change — src/key.rs key equality / FFI boundary)

  • make fmt / make lint (ruff, ty, clippy -D warnings) ✓
  • make testcargo test (11) + pytest (98, +5 new) ✓
  • make test-matrix — Python 3.10–3.13 ✓, new tests pass on each (3.14 skipped locally via the documented uv-resolves-stale-alpha guard; CI covers 3.14 final)
  • make bench — no regression: the added per-comparison PyErr_Occurred() is a TLS read; memory backend ~18.7M ops/s, single-thread 13.6–22.2M, within run-to-run variance

🤖 Generated with Claude Code

toloco and others added 2 commits June 18, 2026 09:43
CacheKey::eq and BorrowedArgs::equivalent compared
`PyObject_RichCompareBool(...) == 1`. RichCompareBool returns -1 with a
Python exception set when a key's __eq__ raises; `== 1` maps that to
"not equal" and the exception is left pending on the thread. The lookup
reports a (spurious) miss, the call path recomputes / returns Ok, and PyO3
then surfaces a confusing `SystemError: ... returned a result with an
exception set`, masking the user's real error (e.g. RuntimeError).

Route both comparisons through a shared `rich_compare_eq` helper that:
- returns false (not equal) for both 0 and -1, but on -1 leaves the
  exception set for the caller to fetch; and
- bails out early (returns false) if an exception is already pending, so a
  multi-key probe never re-enters Python with a live exception and clobbers
  the original error.

Memory-backend lookup sites then call `PyErr::take(py)` after the lookup and
return the error: `__call__` (read path, before recomputing; and the
write-path double-check, before inserting), `get`, and `_probe`. evict_one
and the write-path remove/insert compare a key against itself (identity), so
RichCompareBool short-circuits to 1 and never runs __eq__ — no check needed.

Add tests (tests/test_raising_eq.py): a key whose __eq__ raises now
propagates the original exception through __call__, get, and _probe; the
cache stays usable afterward; and hash-colliding keys that compare cleanly
still cache independently. Verified fail-before (SystemError) -> pass-after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@toloco toloco merged commit 0853ba1 into master Jun 18, 2026
3 checks passed
@toloco toloco deleted the fix/36-richcompare-error branch June 18, 2026 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[medium] PyObject_RichCompareBool error (-1) is silently treated as not-equal, leaving a dangling Python exception

1 participant