Skip to content

trialParams(waitForEnd=false): stale end-of-trial response can be matched as next ACK #25

@mbreiser

Description

@mbreiser

Summary

When PanelsController.trialParams(..., waitForEnd=false) is used, the end-of-trial completion message sent by the firmware is never consumed by the host. It sits in the receive buffer until a subsequent expectResponse call finds and matches it — at which point it can be returned as the ACK for an unrelated command and silently mask a real failure.

Where in code

Repro (consecutive trialParams)

pc = PanelsController('<arena-ip>'); pc.open(false);
pc.trialParams(2, 1, 10, 1, 0, 50, false);    % first trial; ACK consumed, completion stays in buffer for 5 s
pause(6);                                       % completion message arrives, sits in iBuf
pc.trialParams(2, 2, 10, 1, 0, 50, false);    % second trial; expectResponse([0 1], 0x08, [], 0.1) matches the OLD completion, returns true instantly even if the new ACK never arrived
pc.close();

The second call returns true from a stale message rather than from the actual ACK to the second command. With many consecutive trials, this cascades — host and firmware are off by one trial in their bookkeeping.

Why this surfaced now

Independently, floesche/LED-Display_G4.1_ArenaController_Slim#1 is fixing the firmware's end-of-trial message to be \"Sequence completed in N ms\" (it was previously empty \"\"). Both versions trigger this latent host bug — old empty payload was 3 bytes, new payload is 32 bytes; matching behavior is identical because both have cmd_echo=0x08. So this issue exists today regardless of which firmware version is deployed; the upcoming firmware change just makes the stale data more visible if it leaks somewhere.

Suggested fix options

  1. Drain in trialParams after non-blocking start. When waitForEnd=false, schedule a background read of the eventual completion message (expectResponse(0, 0x08, \"Sequence completed\", deciSeconds*0.1 + 1)), and discard it. Cleanest semantics — caller gets immediate return AND host stays sync'd.
  2. Drain in CommandExecutor. After each trialParams call in the YAML runner, pause(dur) then call a new pc.flushExpired() or similar that reads any orphaned cmd_echo=0x08 responses.
  3. Cmd-correlation tag. Send a per-command sequence number with each request and have the firmware echo it; host's expectResponse filters by sequence. Bigger protocol change — overkill if (1) suffices.

Recommend (1) — minimal blast radius, no protocol change, contained to PanelsController.

Severity

Latent today (the empty stale message rarely caused visible problems because most experiments don't loop trialParams without other commands in between, and setControlMode/setPatternID lookups skip the stale 0x08). Could become more visible after floesche#1 lands and the stale payload gets longer / more matchable.

🤖 Filed via Claude Code based on session analysis 2026-04-28.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions