Lock-free scheduler with per-priority queues by TrentHouliston · Pull Request #193 · Fastcode/NUClear

TrentHouliston · 2026-05-27T03:57:30Z

Summary

Rework Scheduler / Pool / Group to remove the global mutex contention that showed up on the existing implementation under load:

Per-priority lock-free task queues. Each Pool now owns five independent MPMC TaskQueues (one per priority bucket) instead of a single mutex-protected priority queue. Submitting and dequeuing on different priorities no longer contend at all, and same-priority submits use a block-based lock-free FIFO with a graveyard-style block reclamation scheme to avoid ABA.
MPSC specialisation for single-consumer pools. Pools with concurrency == 1 (e.g. MainThread, TraceController) now construct MPSCQueue<Task> instead of TaskQueue<Task>, giving a non-atomic consumer side and removing the dequeue CAS entirely for those pools. Both queues sit behind a Queue<T> polymorphic interface so Pool can hold a std::array<std::unique_ptr<Queue<Task>>> of buckets and dispatch the right type at construction.
Counting-semaphore Group fast path. Single-Sync groups now acquire via std::atomic<int> tokens + a lock-free TaskQueue<WaitEntry> per priority for waiters, falling back to the mutex-backed GroupLock only for multi-group submissions. Sync ordering inside a group is strict; equal-priority tasks across pools see relaxed global FIFO.
Idle bookkeeping off the pool mutex. Pool idle tracking moved off the main pool mutex so wake-ups no longer block submitters.

New lock-free primitives (TaskQueue, MPSCQueue, Semaphore) and Group::CountingLock are covered by Catch2 BDD-style tests, plus a scheduler microbenchmark (tests/tests/Benchmark.cpp).

TSAN race fixes (pre-existing)

While validating the changes under TSAN on macOS clang and Linux gcc 13 three pre-existing data races surfaced and are fixed in this PR:

IOController_Posix.ipp — pollfd::events. The IOFinished handler mutated watches[].events while only holding tasks_mutex, and the poll thread read the same field from inside ::poll() while only holding notifier.mutex. bump() woke poll but released notifier.mutex before the mutation, leaving the race window open. The handler now writes the wake byte inline and holds notifier.mutex across the watches update and the follow-up fire_event call (which can also touch watches[].events).
Scheduler::submit — Reaction::scheduler_data. The cached pool shared_ptr<void> was read/written from any submitting thread without synchronisation. Switched to std::atomic_load_explicit / std::atomic_store_explicit (acquire/release). All racers compute the same pool, so last-writer-wins is benign.
Watchdog data store. The service time_point was read by the chrono controller while being mutated by user threads emitting a service event, and the void specialisation returned a reference through a temporary shared_ptr (latent dangling reference). Centralised reads/writes through a per-(WatchdogGroup, RuntimeType) std::mutex, made get return by value, and routed WatchdogServicer::service through WatchdogDataStore::service so writes share the read mutex.

Other

.gitignore now matches build-*/ so out-of-tree TSAN/ASAN/Release build directories don't appear in git status.

Test plan

macOS clang TSAN: dsl/IO, dsl/Inline, dsl/Watchdog 30/30 clean each; full suite 63/63 ✓
macOS clang Release: full suite 64/64 ✓
Linux gcc 13 TSAN (Docker): same three tests 30/30 clean; 16 hot-path tests × 3 runs with no TSAN warnings; full ctest 63/64 (api/TimeTravel flaked once under parallel TSAN load, 10/10 passes in isolation — pre-existing timing flakiness, no race)
Linux gcc 9 Release (Docker): full suite ✓
Lock-free primitives have BDD-style unit tests (tests/threading/{TaskQueue,MPSCQueue,Semaphore,CountingLock,Group})
Scheduler microbenchmark exercises submit / dequeue / group acquire under contention

Made with Cursor

…ce fixes Rework Scheduler/Pool/Group to avoid the global mutex contention that showed up under load. Tasks are now placed into one of five lock-free queues (one per priority bucket) so producers and consumers no longer serialise on a single lock, and Group acquisition uses a counting-semaphore fast path with a lock-free waiter queue per priority. Pools whose concurrency is one (MainThread, TraceController, etc.) now use a specialised MPSC queue with a non-atomic consumer side. Tasks within a Sync group retain strict in-order delivery; equal-priority tasks across pools see relaxed global FIFO. The new lock-free primitives (TaskQueue, MPSCQueue, Semaphore) and the Group counting-lock are covered by Catch2 BDD-style tests plus a scheduler benchmark. During TSAN validation across macOS clang and Linux gcc 13 three pre-existing data races were uncovered and fixed: * IOController (POSIX): the IOFinished handler mutated watches[].events under tasks_mutex while the poll thread read the same field from inside ::poll() under notifier.mutex. The bump() call closed the wake window but released notifier.mutex before the mutation, leaving the race. Inline the wake and hold notifier.mutex across the watches mutation and follow-up fire_event. * Scheduler::submit: the cached Pool pointer on Reaction (scheduler_data) was read/written from any submitting thread without synchronisation. Switch the cache to std::atomic_load/store on the shared_ptr; the worst case is two submitters racing both compute the identical pool and last-writer-wins. * Watchdog data store: the service time_point was read by the chrono controller while being mutated by user threads emitting a service event, and the void specialisation returned a reference through a temporary shared_ptr. Centralise reads/writes through a per-(WatchdogGroup, RuntimeType) std::mutex, return the time_point by value, and route WatchdogServicer::service through WatchdogDataStore::service so writes share the read mutex. Validation: * macOS clang TSAN: dsl/IO, dsl/Inline, dsl/Watchdog 30/30 clean each; full suite 63/63. * Linux gcc 13 TSAN: same three tests 30/30 clean; 16 hot-path tests x 3 runs serially with no TSAN warnings. * macOS Release: 64/64. Also ignore build-*/ directories so out-of-tree build folders don't show up in git status. Co-authored-by: Cursor <cursoragent@cursor.com>

The Reaction::scheduler_data cache previously held an std::shared_ptr<void> read/written via std::atomic_load/atomic_store. On libstdc++ those fall back to a small global pool of mutexes (selected by pointer hash), which becomes a contention point on hot submission paths. Change scheduler_data to std::atomic<void*>{nullptr}. Pools live for the lifetime of the Scheduler and the PowerPlant tears reactors down before the scheduler, so a non-owning raw pointer is safe. Group::try_submit and WaitEntry are switched to Pool* accordingly, and the Scheduler field declaration order is changed so that pools outlive groups on destruction. Also fix the clang-tidy errors that were blocking the lint job: switch the queue Slot/Block backing storage to std::array (avoid-c-arrays, member init), explicit-base BLOCK_SIZE, do-while -> while, use auto with new, RunningLock special members, Semaphore destructor, missing direct includes, unused-and-kept includes, and a couple of small test cleanups (reserve before emplace, explicit lvalue MPSCQueue enqueue overload to work around an MSVC overload-resolution quirk on int(i)). Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot

Pull request overview

Reworks the Scheduler / Pool / Group machinery to remove global mutex contention by introducing per-priority lock-free task queues, an MPSC specialisation for single-consumer pools, and a counting-semaphore fast path for single-Sync groups. The change set also fixes three pre-existing TSAN races (IO poll/pollfd::events, Reaction::scheduler_data cache, and Watchdog data store) and adds BDD-style unit tests plus a scheduler microbenchmark.

Changes:

New lock-free primitives (Queue/TaskQueue/MPSCQueue/Semaphore/Priority) and a Pool that owns five priority-bucketed queues with off-mutex idle bookkeeping.
Group gains a lock-free fast path (signed tokens + per-priority wait_buckets of WaitEntry) and a RunningLock, with the existing mutex-based path retained for multi-group submissions.
TSAN race fixes in IOController_Posix.ipp, Scheduler::submit (atomic scheduler_data raw pointer), and WatchdogDataStore (mutex-protected reads/writes, get returns by value, new service entry point).

Reviewed changes

Copilot reviewed 21 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
.gitignore	Ignore `build-*/` out-of-tree build directories.
src/extension/IOController_Posix.ipp	Hold `notifier.mutex` across `watches[].events` mutation to close TSAN race with the poll thread.
src/dsl/word/Watchdog.hpp	Add per-store mutex; `get` returns by value; new `service` static.
src/dsl/word/emit/Watchdog.hpp	Delegate `WatchdogServicer::service` to the mutex-protected `WatchdogDataStore::service`.
src/threading/Reaction.hpp	Change `scheduler_data` to `std::atomic<void*>` to avoid shared_ptr atomic-load contention.
src/threading/scheduler/Scheduler.{hpp,cpp}	Switch cached pool to raw pointer; reorder members so groups destroy before pools; fast-path single-group submit.
src/threading/scheduler/Pool.{hpp,cpp}	Replace single mutex-protected queue with per-priority buckets (MPMC or MPSC), atomic `accept`/`pending_tasks`/`external_waiters`, external waiter tracking.
src/threading/scheduler/Group.{hpp,cpp}	Add `try_submit`/`try_acquire_running_lock` lock-free fast path with signed `tokens`, `slow_pending`, `wait_buckets`, and `RunningLock`.
src/threading/scheduler/queue/{Queue,TaskQueue,MPSCQueue,Semaphore,Priority}.hpp	New lock-free queue interface and concrete MPMC/MPSC implementations, counting semaphore, priority bucket helper.
tests/tests/threading/{TaskQueue,MPSCQueue,Semaphore,Group}.cpp	BDD-style unit tests for the new primitives; Group test gains a forward-declaration include.
tests/tests/Benchmark.cpp	New scheduler microbenchmark over no-sync / single-sync / two-sync configurations.
tests/tests/util/serialise/xxhash.cpp	Add missing `<cstring>` include.

+                bool link_next_block(Block* block) {
+                    Block* expected = nullptr;
+                    if (block->next.compare_exchange_strong(expected, allocate_block(), std::memory_order_acq_rel)) {
+                        return true;
+                    }
+                    return expected != nullptr;
+                }


+                bool link_next_block(Block* block) {
+                    Block* expected = nullptr;
+                    if (block->next.compare_exchange_strong(expected,
+                                                            allocate_block(),
+                                                            std::memory_order_acq_rel)) {
+                        return true;
+                    }
+                    return expected != nullptr;
+                }


+            /**
+             * Try to submit a task through the lock-free fast path.
+             *
+             * If a group token is available the task is submitted to the pool immediately.
+             * Otherwise the task is queued until a token is released.
+             *
+             * @param task       the reaction task to submit
+             * @param pool       the pool to submit to when runnable
+             * @param clear_idle if true, clear idle state on submission
+             *
+             * @return true if the task was submitted immediately
+             */
+            /**
+             * Try to acquire a token for inline execution without submitting to a pool.
+             *
+             * @return an RAII lock if a token was acquired, otherwise nullptr
+             */
+            std::unique_ptr<Lock> try_acquire_running_lock();


+            const std::size_t bucket = queue::priority_index(task->priority);
+            pool->register_external_waiter();
+            wait_buckets[bucket].enqueue(WaitEntry{std::move(task), pool, clear_idle});
+
+            // Reserve a slot in the signed counter; if a token was still available, run a waiter now.
+            const int prev = tokens.fetch_sub(1, std::memory_order_acq_rel);
+            if (prev > 0) {
+                if (slow_pending.load(std::memory_order_acquire) > 0) {
+                    // Hand the token back so the slow path can pick it up.
+                    release_token();
+                }
+                else {
+                    drain_one_to_pool();
+                }
+            }
+
+            return false;
+        }


TrentHouliston and others added 3 commits May 27, 2026 13:56

Formatting

c03ffd2

TrentHouliston requested a review from Copilot May 27, 2026 05:35

Copilot started reviewing on behalf of TrentHouliston May 27, 2026 05:35 View session

Copilot AI reviewed May 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lock-free scheduler with per-priority queues#193

Lock-free scheduler with per-priority queues#193
TrentHouliston wants to merge 3 commits into
mainfrom
houliston/scheduler

TrentHouliston commented May 27, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

TrentHouliston commented May 27, 2026

Summary

TSAN race fixes (pre-existing)

Other

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants