Skip to content

Result should only show when we FAIL#10

Open
zeerekahmad wants to merge 7 commits into
mainfrom
zeerek/bugfix-axiomatic-flips-error-logic
Open

Result should only show when we FAIL#10
zeerekahmad wants to merge 7 commits into
mainfrom
zeerek/bugfix-axiomatic-flips-error-logic

Conversation

@zeerekahmad
Copy link
Copy Markdown
Contributor

@zeerekahmad zeerekahmad commented May 13, 2026

Merge from zeerek/bugfix-axiomatic-flips-error-logic to main

Description

ECU flashing through the Axiomatic CAN-to-Ethernet adapter has been failing intermittently, while the equivalent flash via Kvaser SocketCAN is reliable. A code audit traced the asymmetry to the TCP transport layer: the adapter was treating the byte stream as one-CAN-frame-per-read and only understood the deprecated "CAN Stream" message type, so under bursty flash traffic frames were being lost three different ways (coalesced reads, heartbeat-induced desync, packed multi-frame messages). This MR closes all eight audit findings across three phases — a quick correctness pass, a protocol-correct streaming parser rewrite, and a concurrency/throughput refactor onto a persistent io_context with a strand. Public API is unchanged for existing consumers (the new verbose argument on AxiomaticAdapter is defaulted). See CLAUDE_PLAN_AND_RESOLUTIONS.md for the full design doc and finding-by-finding status.

Highlighted Changes

axiomatic_adapter/include/axiomatic_adapter/axiomatic_frame_parser.hpp

  • New stateful parser class for the Axiomatic byte stream — owns a persistent buffer plus a pending-frames queue.
  • 6-byte SYNC_PREFIX (was a wrong 7-byte "header" that included Message ID), plus explicit MSG_ID_* constants for the four known message types.
  • Diagnostic counters (heartbeats_seen, notification_frames_skipped, dropped_bytes, etc.) and a set_verbose() switch.

axiomatic_adapter/src/axiomatic_frame_parser.cpp

  • tryParseFrame() walks the protocol envelope using the Message Data Length at bytes 9-10 — the canonical framing field — and dispatches on Message ID; heartbeats / status / FD / unknown are consumed and counted, never raised as errors.
  • decodeCanStreamBody() iterates packed CAN frames and skips Notification frames inside a single CAN Stream message, draining everything into the internal queue so the existing one-frame-per-call API still works.
  • Verbose mode emits a one-line std::cout per non-CAN event.

axiomatic_adapter/src/axiomatic_adapter.cpp

  • send() now writes exactly frame_data_length payload bytes — declared message_length and the actual wire payload finally agree for any DLC.
  • Persistent io_context with executor_work_guard runs on a dedicated worker thread for the adapter's lifetime; per-call restart()/run() is gone.
  • boost::asio::strand serializes every socket op; send() and receive() post async work and block on std::promise/std::future — public sync signatures unchanged.
  • Reception is a self-rescheduling async_receive chain posted onto the strand; startReceptionThread() no longer spawns a thread, joinReceptionThread() posts tcp_socket_.cancel() and waits for the chain to drain.
  • TCP_NODELAY set immediately after a successful connect.
  • Parser access mutex-guarded between the io thread and any sync receive() caller.

axiomatic_adapter/include/axiomatic_adapter/axiomatic_adapter.hpp

  • DEFAULT_SOCKET_RECEIVE_TIMEOUT_MS lowered from 100 ms → 20 ms now that the receive loop is async and the timeout only bounds the sync polling API.
  • New defaulted bool verbose = false constructor argument; no break for existing callers.

axiomatic_adapter/src/axiomatic_socketcan_bridge.cpp

  • Forwards the bridge's -v flag into the AxiomaticAdapter constructor so ros2 run axiomatic_adapter axiomatic_socketcan_bridge ... -v flips parser verbosity too.

axiomatic_adapter/test/axiomatic_frame_parser_test.cpp

  • New file: 19 unit tests covering single/coalesced/split CAN frames, packed multi-frame CAN Stream messages, mixed CAN+Notification bodies, heartbeats alone and coalesced with CAN traffic, status responses, CAN FD frames, unknown message IDs, garbage-prefix resync, header straddle, DLC 0/8 edges, and a 64-message stress.

axiomatic_adapter/test/axiomatic_send_wire_format_test.cpp

  • New file: 4 loopback tests asserting send() produces a wire-correct CAN Stream message for DLC 0 / 4 / 8 and round-trips cleanly through the parser. No hardware required.

axiomatic_adapter/test/axiomatic_concurrent_io_test.cpp

  • New file: localhost peer that simultaneously consumes our outbound traffic and emits inbound CAN Stream messages while one thread hammers send() and the reception chain delivers frames via callback. Validates the strand serialization end-to-end.

axiomatic_adapter/CMakeLists.txt

  • Adds axiomatic_frame_parser.cpp to the shared library, plus three no-hardware test executables.
  • Each test compiles the adapter/parser sources directly so it does not depend on libaxiomatic_adapter.so at runtime — avoids the symbol-lookup race against an older /opt/polymath/axiomatic_adapter install on LD_LIBRARY_PATH.

axiomatic_adapter/test/axiomatic_adapter_test.cpp

  • Removed the stale // NOTE: the first run may fail comment — that symptom was caused by the absent TCP stream framer and is fixed by this MR.

CLAUDE_PLAN_AND_RESOLUTIONS.md

  • New file: full design doc, finding-by-finding resolution table, and verification status.

Test Plan

Automated (added in this MR, no hardware required): 25 Catch2 cases / 273 assertions across test_axiomatic_frame_parser, test_axiomatic_send_wire_format, and test_axiomatic_concurrent_io. Build with colcon build --merge-install --packages-select axiomatic_adapter --cmake-args -DBUILD_TESTING=ON and run with colcon test --merge-install --packages-select axiomatic_adapter. All passing locally.

Manual (still owed before sign-off):

  • Re-run the existing hardware Catch2 suite against the Axiomatic device at 192.168.0.34:4000 and confirm the previously flaky "first read" no longer fails.
  • Capture a raw TCP stream from the converter for ~30 s while idle and feed those bytes through the parser via the unit-test harness — expect zero CAN frames returned, heartbeats_seen() ≈ 30, dropped_bytes() == 0.
  • End-to-end ECU flash through the bridge — should succeed without retries.
  • Latency probe: a single send()receive() round trip should drop from the previous ~40 ms (first-frame Nagle) to sub-millisecond.
  • Optional but recommended: rebuild with -fsanitize=thread and run test_axiomatic_concurrent_io for the formal TSAN sign-off on the Phase 3 strand refactor.

Pre-Merge Tasks

  • Has the Merge Description been filled out accurately?
  • Have you written tests?
  • Have you created issues from any TODOS or comments?
  • Have you posted in #code_reviews

Changes

  • 283ef78 Result should only show when we FAIL
  • 311afd6 Add a parser that handles incomplete data
  • ea9ae85 Fix shared test weirdness. We will clean up later
  • bc49dcc Claude did a number of fixes in order to try to synchronize tcp ip better
  • ad5ca44 Add in last group of changes to use a strand and make us more reliable hopefully

Copy link
Copy Markdown
Contributor

@davidt315 davidt315 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty

@troygibb
Copy link
Copy Markdown

ty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants