diff --git a/docs/ARCHITECTURE.md b/docs/ARCHITECTURE.md index 8f80fd3..5def952 100644 --- a/docs/ARCHITECTURE.md +++ b/docs/ARCHITECTURE.md @@ -67,7 +67,11 @@ For day-to-day workflow and commands, see [`CLAUDE.md`](../CLAUDE.md) and - **Hash table capacity must be power-of-2** — bitmask probing uses `hash & (capacity - 1)`. Always use `.next_power_of_two()`. - **`#[repr(C)]` struct field ordering** — place u64 fields before u32 to avoid implicit - alignment padding; affects `size_of` assertions in `layout.rs`. + alignment padding; affects `size_of` assertions in `layout.rs`. Any field the lock-free + read path *writes* (currently only `SlotHeader.visited`) must be an atomic type accessed + with `Relaxed` — readers touch it without the write lock while writers reuse the slot, so a + plain field is a data race (issue #37). `AtomicU64` is layout-identical to `u64`, so this + doesn't change the cross-process layout. - **Seqlock writer ordering (issue #40).** The writer must publish the odd ("writer active") sequence number *before* its data mutations become visible: `write_lock` does the odd store then an `atomic::fence(Release)` (a Release store alone orders only *prior* ops). This pairs @@ -75,16 +79,17 @@ For day-to-day workflow and commands, see [`CLAUDE.md`](../CLAUDE.md) and weak-memory hardware a data write can float ahead of the odd publish and a reader can validate a torn read against a stale even seq. The ordering is model-checked under `loom` (run `RUSTFLAGS="--cfg loom" cargo test --lib seqlock_ordering`; loom is a `cfg(loom)`-only dep). - alignment padding; affects `size_of` assertions in `layout.rs`. Any field the lock-free - read path *writes* (currently only `SlotHeader.visited`) must be an atomic type accessed - with `Relaxed` — readers touch it without the write lock while writers reuse the slot, so a - plain field is a data race (issue #37). `AtomicU64` is layout-identical to `u64`, so this - doesn't change the cross-process layout. - **Cross-process timestamps must use a system-wide clock (issue #32).** `created_at_nanos` is written into shared memory by one process and compared against `now` in another, so `shm::current_time_nanos` uses `CLOCK_MONOTONIC` (process-independent on Linux, macOS, and the BSDs). Never use `std::time::Instant` for shm timestamps — its epoch is per-process, so the two bases are unrelated and TTL silently breaks across processes (the original macOS bug). +- **All behavior-affecting header config gates region reuse (issue #42).** When a process opens + an existing shm region, `region.rs::create_or_open` reuses it only if `version`, `capacity`, + `max_key_size`, `max_value_size`, **and `ttl_nanos`** all match; any mismatch recreates the + region. TTL lives in the shared header and governs expiry for every reader, so a new header + field that changes behavior must be added to this reuse check too — otherwise a process opening + with a different value silently inherits the creator's, producing config-dependent behavior. - **No second shard guard while one is live (reentrancy, issue #30).** A memory-backend lookup runs arbitrary Python `__eq__` (via `PyObject_RichCompareBool`) *while a shard guard is held*. That `__eq__` can re-enter the same `CachedFunction` (or, on GIL builds, hand off diff --git a/src/shm/region.rs b/src/shm/region.rs index 9e46176..f21e7eb 100644 --- a/src/shm/region.rs +++ b/src/shm/region.rs @@ -214,12 +214,18 @@ impl ShmRegion { if data_path.exists() && lock_path.exists() { match Self::open_paths(&data_path, &lock_path) { Ok(region) => { - // Validate parameters match + // Validate parameters match. ttl_nanos is included (#42): TTL + // lives in the shared header and governs expiry for every process + // (shm/mod.rs reads h.ttl_nanos at lookup time), so a process + // opening with a different TTL must recreate rather than silently + // inherit the creator's TTL — same last-writer-wins recreate as the + // other config params. let header = region.header(); if header.version == VERSION && header.capacity == capacity && header.max_key_size == max_key_size && header.max_value_size == max_value_size + && header.ttl_nanos == ttl_nanos { return Ok(region); } diff --git a/tests/test_shared_basic.py b/tests/test_shared_basic.py index c26fe44..ad3f2e2 100644 --- a/tests/test_shared_basic.py +++ b/tests/test_shared_basic.py @@ -303,6 +303,46 @@ def fn(x): assert fn.cache_info().hits == 1 +class TestSharedTTLConfigMismatch: + """Regression for #42: opening an existing shm region with a different TTL + used to silently reuse the creator's region (and its TTL stored in the + header), ignoring the second caller's requested TTL. A TTL mismatch must now + recreate the region, just like a capacity/key/value-size mismatch.""" + + def setup_method(self): + _cleanup_shm() + + def teardown_method(self): + _cleanup_shm() + + def test_ttl_mismatch_recreates_region(self): + import time + + from warp_cache._warp_cache_rs import SharedCachedFunction + + shm_name = "test_ttl_mismatch_42" + + # First constructor fixes the region's TTL to 0.1s. + SharedCachedFunction( + lambda x: x, 16, ttl=0.1, max_key_size=512, max_value_size=4096, shm_name=shm_name + ) + + # Second constructor requests ttl=None on the same region. After the fix + # the TTL mismatch recreates the region with no TTL; before the fix it + # silently reused the first region and honored ttl=0.1. + fn_b = SharedCachedFunction( + lambda x: x, 16, ttl=None, max_key_size=512, max_value_size=4096, shm_name=shm_name + ) + + assert fn_b(1) == 1 # miss -> store + time.sleep(0.3) # well past the first constructor's 0.1s TTL + assert fn_b(1) == 1 # ttl=None -> still a hit; ttl=0.1 (bug) -> expired miss + assert fn_b.cache_info().hits == 1, ( + "second constructor's ttl=None was ignored — entry expired per the " + "first constructor's ttl=0.1 (region was reused, not recreated)" + ) + + class TestSharedMemoryBackend: """Test backend='memory' vs backend='shared' routing."""