Skip to content

perf: 200-robot simulation optimization — parallel plugin loop + -O3/LTO#41

Open
sabarish-prasannna wants to merge 3 commits into
feature/90robots-physics-skipfrom
kaoiwt001_simulation_200robots_optimization
Open

perf: 200-robot simulation optimization — parallel plugin loop + -O3/LTO#41
sabarish-prasannna wants to merge 3 commits into
feature/90robots-physics-skipfrom
kaoiwt001_simulation_200robots_optimization

Conversation

@sabarish-prasannna

Copy link
Copy Markdown

Summary

Builds on feature/90robots-physics-skip. Adds two complementary optimizations to bring 200-robot simulation CPU from ~90% down to <20%:

  • Parallel plugin loop: PluginManager::BeforePhysicsStep and AfterPhysicsStep now dispatch each robot's plugin group (SootballNavigatorPlugin + SootballPlugin) as an independent std::async task, utilizing all available CPU cores. World plugins remain sequential after all model plugins complete.
  • -O3 + LTO: flatland_lib and flatland_server now compile with -O3 -march=native and link-time optimization (skipped when COVERAGE=ON).

Details

Parallel plugin dispatch (plugin_manager.cpp)

  • Groups model_plugins_ by Model* pointer — each robot owns exactly its N plugins.
  • Launches one std::future<void> per robot, waits on all before proceeding to world plugins.
  • Thread-safe: Box2D body reads outside physics step are safe, each robot writes only its own bodies, ros::Publisher::publish() is thread-safe in roscpp.
  • PROFILER macros kept outside lambdas to avoid concurrent map access.

Build flags (CMakeLists.txt)

if(NOT "${COVERAGE}" STREQUAL "ON")
    target_compile_options(flatland_lib PRIVATE -O3 -march=native)
    target_compile_options(flatland_server PRIVATE -O3 -march=native)
    if(CMAKE_VERSION VERSION_GREATER_EQUAL 3.9)
        set_property(TARGET flatland_lib PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
        set_property(TARGET flatland_server PROPERTY INTERPROCEDURAL_OPTIMIZATION TRUE)
    endif()
endif()

Expected Impact

Change Gain
Parallel plugin loop 4–8× on 8–16 core machine
-O3 + LTO 1.2–1.4× raw throughput

Test plan

  • Build: catkin build flatland_server
  • Run 90-robot fast sim, confirm CPU drop vs feature/90robots-physics-skip baseline
  • Run 200-robot fast sim, confirm CPU < 20%
  • Confirm robots still navigate correctly (no stuck robots, task completion rate unchanged)
  • Check /tmp/flatland_profile_output.log — "Before Physics Step: model_plugins (parallel)" time should drop proportionally

🤖 Generated with Claude Code

deeparaj24 and others added 3 commits June 25, 2026 11:57
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Group model plugins by owning Model* and dispatch each robot's plugin
group as a std::async task. Robots run concurrently; plugins within the
same robot run sequentially. World plugins stay sequential. Expected
4-8x speedup on multi-core machines for 200-robot simulations.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
std::async(launch::async) on Linux/libstdc++ creates a new OS thread on
every call — there is no implicit pooling. With 90 robots at 50 effective
steps/sec this was 9000 thread create+destroy per second, explaining the
72.9% CPU on the main flatland_server thread despite the work being
distributed.

Fix: add ModelPluginThreadPool (created once in PluginManager constructor,
sized to hardware_concurrency) and reuse its workers every step. Also
pre-compute plugin_groups_ on Load/Delete instead of rebuilding an
unordered_map on every BeforePhysicsStep/AfterPhysicsStep call.

Expected result: main-thread CPU drops from ~70% to ~5-10% (synchronization
and world-plugin cost only); total CPU scales with actual plugin work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants