Skip to content

Reimplement writer-priority rwlock#3286

Open
chenBright wants to merge 1 commit intoapache:masterfrom
chenBright:rwlock_
Open

Reimplement writer-priority rwlock#3286
chenBright wants to merge 1 commit intoapache:masterfrom
chenBright:rwlock_

Conversation

@chenBright
Copy link
Copy Markdown
Contributor

@chenBright chenBright commented May 8, 2026

What problem does this PR solve?

Issue Number: resolve #3051

Problem Summary:

The current bthread RWLock is implemented with a go-like state machine
where the reader count is bumped first and only rolled back if everything
later succeeds. As pointed out in #3051, this state is not safely
reversible on a partial failure
:

  • When read times out while a writer holds the lock, the pre-incremented
    reader counter is left dangling. Subsequent writers can no longer acquire
    the lock, even after every legitimate reader has left.
  • For the same structural reason, bthread rwlock cannot offer correct
    try_* / timed_* APIs at all -- any cancellable wait risks corrupting
    the lock's accounting.

PR #1031 already proposed a writer-priority rwlock that fits this requirement;
this PR is built on top of that proposal.

What is changed and the side effects?

Changed:

Reimplement bthread_rwlock_t on top of a writer-priority algorithm
derived from PR #1031.
The lock state is split into:

  • lock_word (butex): bit 31 = writer held, bits 0..30 = reader count;
  • writer_wait_count (butex): in-flight writer counter used by readers
    to honor writer-priority;
  • writer_queue_mutex: serializes writers competing for lock_word.

Every wait/park step is fully reversible, so the lock now correctly
supports try_* / timed_* operations:

  • bthread_rwlock_tryrdlock / bthread_rwlock_timedrdlock
  • bthread_rwlock_trywrlock / bthread_rwlock_timedwrlock
  • bthread_rwlock_unlock (auto-dispatches to read/write unlock by
    inspecting lock_word).

On any failure path (EBUSY, ETIMEDOUT, EINTR-leading-to-fail) the
side effects are rolled back via rwlock_wrlock_cleanup (release the inner
mutex if held; decrement writer_wait_count; wake parked readers if we
were the last in-flight writer).

Replace the unconditional broadcast in unlock with selective wakeup:

  • unrdlock: only the last reader (lock_word 1 -> 0) does
    butex_wake(lock_word) for the at-most-one writer parked on it
    (writers are serialized by writer_queue_mutex, so a single wake
    suffices). Earlier readers do not wake anyone.
  • unwrlock: deliberately unlocks writer_queue_mutex first (so the
    next queued writer -- which has already self-accounted in
    writer_wait_count -- gets a chance), and only does
    butex_wake_all(writer_wait_count) when we were the last in-flight
    writer (fetch_sub returned 1). The intentional ordering preserves
    strict writer-priority and is documented in detail in the source.

This eliminates the previous "wake everybody, let them re-contend"
pattern for both readers and writers, which removes a stampede on
contended workloads.

Co-authored-by: @hairet

Side effects:

  • Performance effects:

  • Breaking backward compatibility:


Check List:

@chenBright chenBright changed the title Reimplement writer-priority RWLock Reimplement writer-priority rwlock May 8, 2026
@chenBright chenBright requested a review from Copilot May 9, 2026 06:47
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR reimplements bthread_rwlock_t with a writer-priority algorithm to fix incorrect accounting in cancellable (try_* / timed_*) RWLock operations (issue #3051), and adds unit tests covering writer-priority semantics, cleanup correctness, and memory-ordering expectations.

Changes:

  • Replaced the previous Go-like RWLock state machine with a writer-priority design using lock_word (butex), writer_wait_count (butex), and writer_queue_mutex.
  • Updated the public bthread_rwlock_t struct layout to match the new implementation.
  • Expanded RWLock unit tests to validate writer-priority behavior, cleanup after timed-out writers, and data consistency; minor perf-test logging tweaks.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 9 comments.

File Description
src/bthread/rwlock.cpp New writer-priority RWLock implementation, new contention sampling flow, new init/destroy paths.
src/bthread/types.h Updates bthread_rwlock_t fields to match the new implementation (butex pointers + mutex).
test/bthread_rwlock_unittest.cpp Adds new behavioral tests for writer-priority and cleanup; tweaks perf output formatting and concurrency setting.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/bthread/rwlock.cpp Outdated
Comment thread src/bthread/rwlock.cpp Outdated
Comment thread src/bthread/rwlock.cpp Outdated
Comment thread src/bthread/rwlock.cpp Outdated
Comment thread src/bthread/rwlock.cpp Outdated
Comment thread src/bthread/rwlock.cpp
Comment thread src/bthread/types.h Outdated
Comment thread src/bthread/types.h
Comment thread test/bthread_rwlock_unittest.cpp
The previous go-like RWLock bumps the reader count first and rolls it
back only on the full success path. The state is not reversible on a
partial failure, so a read timeout while a writer holds the lock leaves a
dangling reader credit and permanently blocks future writers (issue apache#3051).
For the same reason the old implementation could not offer correct try_/timed_ APIs
at all.

Co-authored-by: hairet
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bthread::RWLock在写者持有锁期间,读者try_lock_for锁超时失败,会导致读者计数错误

2 participants