From 45164031f851ddf4ea9bde3af8208d899e69a088 Mon Sep 17 00:00:00 2001 From: Tolo Palmer Date: Thu, 18 Jun 2026 10:17:04 +0100 Subject: [PATCH] fix: validate ttl_nanos when reusing a shared region across processes (#42) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit create_or_open's reuse check compared version, capacity, max_key_size, and max_value_size but not ttl_nanos. A process opening an existing shared cache with a different TTL (e.g. ttl=None when the creator used ttl=10) silently reused the creator's region and inherited its TTL from the shared header — expiry is read from h.ttl_nanos at lookup time, so the second process's requested TTL was ignored. Same decorator params produced different effective behavior depending on which process won the create race, and it was inconsistent with the other config params (which force a recreate on mismatch). Add `header.ttl_nanos == ttl_nanos` to the reuse condition so a TTL mismatch recreates the region, exactly like a capacity/key/value-size mismatch (last-writer-wins). Add a regression test: a region created with ttl=0.1 then opened with ttl=None must honor ttl=None (entry survives past 0.1s) — before the fix it expired per the creator's TTL. The bug is in the reuse decision, exercised on the second construction, so a single-process test covers it deterministically. Also repair a merge artifact in ARCHITECTURE.md: the issue #37 invariant text had been stranded as a duplicated fragment after the #40 bullet (from #76 and #77 landing together); fold it back into the #[repr(C)] bullet. Document the config-gates-reuse invariant. Co-Authored-By: Claude Opus 4.8 (1M context) --- docs/ARCHITECTURE.md | 17 ++++++++++------ src/shm/region.rs | 8 +++++++- tests/test_shared_basic.py | 40 ++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+), 7 deletions(-) diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 8f80fd3..5def952 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -67,7 +67,11 @@ For day-to-day workflow and commands, see [`CLAUDE.md`](../CLAUDE.md) and - **Hash table capacity must be power-of-2** — bitmask probing uses `hash & (capacity - 1)`. Always use `.next_power_of_two()`. - **`#[repr(C)]` struct field ordering** — place u64 fields before u32 to avoid implicit - alignment padding; affects `size_of` assertions in `layout.rs`. + alignment padding; affects `size_of` assertions in `layout.rs`. Any field the lock-free + read path *writes* (currently only `SlotHeader.visited`) must be an atomic type accessed + with `Relaxed` — readers touch it without the write lock while writers reuse the slot, so a + plain field is a data race (issue #37). `AtomicU64` is layout-identical to `u64`, so this + doesn't change the cross-process layout. - **Seqlock writer ordering (issue #40).** The writer must publish the odd ("writer active") sequence number *before* its data mutations become visible: `write_lock` does the odd store then an `atomic::fence(Release)` (a Release store alone orders only *prior* ops). This pairs @@ -75,16 +79,17 @@ For day-to-day workflow and commands, see [`CLAUDE.md`](../CLAUDE.md) and weak-memory hardware a data write can float ahead of the odd publish and a reader can validate a torn read against a stale even seq. The ordering is model-checked under `loom` (run `RUSTFLAGS="--cfg loom" cargo test --lib seqlock_ordering`; loom is a `cfg(loom)`-only dep). - alignment padding; affects `size_of` assertions in `layout.rs`. Any field the lock-free - read path *writes* (currently only `SlotHeader.visited`) must be an atomic type accessed - with `Relaxed` — readers touch it without the write lock while writers reuse the slot, so a - plain field is a data race (issue #37). `AtomicU64` is layout-identical to `u64`, so this - doesn't change the cross-process layout. - **Cross-process timestamps must use a system-wide clock (issue #32).** `created_at_nanos` is written into shared memory by one process and compared against `now` in another, so `shm::current_time_nanos` uses `CLOCK_MONOTONIC` (process-independent on Linux, macOS, and the BSDs). Never use `std::time::Instant` for shm timestamps — its epoch is per-process, so the two bases are unrelated and TTL silently breaks across processes (the original macOS bug). +- **All behavior-affecting header config gates region reuse (issue #42).** When a process opens + an existing shm region, `region.rs::create_or_open` reuses it only if `version`, `capacity`, + `max_key_size`, `max_value_size`, **and `ttl_nanos`** all match; any mismatch recreates the + region. TTL lives in the shared header and governs expiry for every reader, so a new header + field that changes behavior must be added to this reuse check too — otherwise a process opening + with a different value silently inherits the creator's, producing config-dependent behavior. - **No second shard guard while one is live (reentrancy, issue #30).** A memory-backend lookup runs arbitrary Python `__eq__` (via `PyObject_RichCompareBool`) *while a shard guard is held*. That `__eq__` can re-enter the same `CachedFunction` (or, on GIL builds, hand off diff --git a/src/shm/region.rs b/src/shm/region.rs index 9e46176..f21e7eb 100644 --- a/src/shm/region.rs +++ b/src/shm/region.rs @@ -214,12 +214,18 @@ impl ShmRegion { if data_path.exists() && lock_path.exists() { match Self::open_paths(&data_path, &lock_path) { Ok(region) => { - // Validate parameters match + // Validate parameters match. ttl_nanos is included (#42): TTL + // lives in the shared header and governs expiry for every process + // (shm/mod.rs reads h.ttl_nanos at lookup time), so a process + // opening with a different TTL must recreate rather than silently + // inherit the creator's TTL — same last-writer-wins recreate as the + // other config params. let header = region.header(); if header.version == VERSION && header.capacity == capacity && header.max_key_size == max_key_size && header.max_value_size == max_value_size + && header.ttl_nanos == ttl_nanos { return Ok(region); } diff --git a/tests/test_shared_basic.py b/tests/test_shared_basic.py index c26fe44..ad3f2e2 100644 --- a/tests/test_shared_basic.py +++ b/tests/test_shared_basic.py @@ -303,6 +303,46 @@ def fn(x): assert fn.cache_info().hits == 1 +class TestSharedTTLConfigMismatch: + """Regression for #42: opening an existing shm region with a different TTL + used to silently reuse the creator's region (and its TTL stored in the + header), ignoring the second caller's requested TTL. A TTL mismatch must now + recreate the region, just like a capacity/key/value-size mismatch.""" + + def setup_method(self): + _cleanup_shm() + + def teardown_method(self): + _cleanup_shm() + + def test_ttl_mismatch_recreates_region(self): + import time + + from warp_cache._warp_cache_rs import SharedCachedFunction + + shm_name = "test_ttl_mismatch_42" + + # First constructor fixes the region's TTL to 0.1s. + SharedCachedFunction( + lambda x: x, 16, ttl=0.1, max_key_size=512, max_value_size=4096, shm_name=shm_name + ) + + # Second constructor requests ttl=None on the same region. After the fix + # the TTL mismatch recreates the region with no TTL; before the fix it + # silently reused the first region and honored ttl=0.1. + fn_b = SharedCachedFunction( + lambda x: x, 16, ttl=None, max_key_size=512, max_value_size=4096, shm_name=shm_name + ) + + assert fn_b(1) == 1 # miss -> store + time.sleep(0.3) # well past the first constructor's 0.1s TTL + assert fn_b(1) == 1 # ttl=None -> still a hit; ttl=0.1 (bug) -> expired miss + assert fn_b.cache_info().hits == 1, ( + "second constructor's ttl=None was ignored — entry expired per the " + "first constructor's ttl=0.1 (region was reused, not recreated)" + ) + + class TestSharedMemoryBackend: """Test backend='memory' vs backend='shared' routing."""