perf: 200-robot simulation optimization — parallel plugin loop + -O3/LTO#41
Open
sabarish-prasannna wants to merge 3 commits into
Open
Conversation
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Group model plugins by owning Model* and dispatch each robot's plugin group as a std::async task. Robots run concurrently; plugins within the same robot run sequentially. World plugins stay sequential. Expected 4-8x speedup on multi-core machines for 200-robot simulations. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
std::async(launch::async) on Linux/libstdc++ creates a new OS thread on every call — there is no implicit pooling. With 90 robots at 50 effective steps/sec this was 9000 thread create+destroy per second, explaining the 72.9% CPU on the main flatland_server thread despite the work being distributed. Fix: add ModelPluginThreadPool (created once in PluginManager constructor, sized to hardware_concurrency) and reuse its workers every step. Also pre-compute plugin_groups_ on Load/Delete instead of rebuilding an unordered_map on every BeforePhysicsStep/AfterPhysicsStep call. Expected result: main-thread CPU drops from ~70% to ~5-10% (synchronization and world-plugin cost only); total CPU scales with actual plugin work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Builds on
feature/90robots-physics-skip. Adds two complementary optimizations to bring 200-robot simulation CPU from ~90% down to <20%:PluginManager::BeforePhysicsStepandAfterPhysicsStepnow dispatch each robot's plugin group (SootballNavigatorPlugin + SootballPlugin) as an independentstd::asynctask, utilizing all available CPU cores. World plugins remain sequential after all model plugins complete.flatland_libandflatland_servernow compile with-O3 -march=nativeand link-time optimization (skipped whenCOVERAGE=ON).Details
Parallel plugin dispatch (
plugin_manager.cpp)model_plugins_byModel*pointer — each robot owns exactly its N plugins.std::future<void>per robot, waits on all before proceeding to world plugins.ros::Publisher::publish()is thread-safe in roscpp.Build flags (
CMakeLists.txt)Expected Impact
Test plan
catkin build flatland_serverfeature/90robots-physics-skipbaseline/tmp/flatland_profile_output.log— "Before Physics Step: model_plugins (parallel)" time should drop proportionally🤖 Generated with Claude Code