Skip to content

[medium] TTAS spinlock has no death-handling: a process killed while holding write_lock deadlocks all others foreverΒ #38

@toloco

Description

@toloco

Severity: 🟑 medium β€’ Category: concurrency
Location: src/shm/lock.rs : 80-114

What's wrong

write_lock() is a plain TTAS spinlock stored in shared memory with no robustness. If a process is killed (SIGKILL, crash, panic across the FFI boundary, OOM) between write_lock() and write_unlock(), write_lock stays 1 and seq stays odd forever. Every other process: write_lock() spins forever, and read_begin() spins forever because it waits for seq to become even (seq & 1 == 0). There is no timeout, no owner PID, no robust-mutex/EOWNERDEAD recovery, no lock generation. The entire cross-process cache is permanently wedged with no recovery short of deleting the shm file. insert()/clear() are not panic-safe either: any panic between write_lock and write_unlock (e.g. a bug in serde or pointer math) leaks the lock with no RAII guard.

Trigger

kill -9 a process between acquiring the write lock and releasing it (e.g. during insert). All other processes hang in read_begin/write_lock indefinitely.

Suggested fix

Use a robust mechanism: store the owner PID + a lock generation, detect dead owners (kill(pid,0)==ESRCH) and steal/recover, or bound the spin with a deadline that triggers recovery, or use pthread robust mutexes (PTHREAD_MUTEX_ROBUST/EOWNERDEAD). Wrap the write section in an RAII guard so a Rust panic still releases the lock and restores seq parity.

Adversarial verification note

Confirmed in the real code. src/shm/lock.rs:82-102 implements write_lock() as a bare TTAS spinlock (compare_exchange_weak on write_lock_ptr) followed by bumping seq to odd; write_unlock() (106-114) bumps seq back to even and stores 0. There is no owner PID, generation, robust-mutex (EOWNERDEAD), timeout, or RAII guard β€” these atomics live in mmap'd shared memory (region.rs:207-208 hands out ShmSeqLock from the lock mmap). read_begin() (60-66) spins forever while seq & 1 != 0 with std::hint::spin_loop() and no exit; write_lock() likewise spins indefinitely. So if a process is SIGKILLed/crashes between write_lock() and write_unlock(), write_lock stays 1 and seq stays odd in shared memory permanently: every other process hangs forever in read_begin()/write_lock(). insert() (mod.rs:313-315) and clear() (mod.rs:462-464) call write_lock()/write_unlock() directly with no guard, so a panic in insert_inner (e.g. serde/pointer-math bug) also leaks the lock with no parity restoration. grep over src/shm confirms zero occurrences of getpid/kill/ESRCH/robust/EOWNERDEAD/generation/owner/deadline/timeout/recover; the one 'recover' in region.rs:188 only handles open errors or parameter mismatch, not a wedged lock. The evidence snippet in the finding accurately paraphrases the code. The bug is genuine. Severity 'medium' is fair: it is a real cross-process liveness defect with no recovery short of deleting the shm file, but it requires a process death within a narrow critical-section window (uncommon under normal operation), and normal steady-state usage is unaffected.


Filed from a multi-agent code review (finder β†’ adversarial verification β†’ synthesis). Confirmed real after a skeptic re-read the code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    concurrencyThread/process safety, locking, orderingfrom-reviewFiled from the multi-agent code reviewrustPull requests that update rust codeseverity:mediumEdge-case correctness or resource issue

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions