Surfaced during the PR #20 review (timeout/reconcile analysis).
Problem
The commissioning_timeout path deliberately skips remove_node (commission_jobs.py:386-398) so an about-to-succeed background join isn't torn down — that's the right trade for the common case (#16: node observed joining ~64s after the RPC gave up).
But if the node never arrives, nothing ever cleans up matter-server's side: whatever partial commissioning/fabric state it accumulated for that attempt is left to rot. Repeated failed attempts could pile up half-open state in matter-server with no plugin-side bookkeeping or cleanup.
Recommendation
When RECONCILE_WINDOW (commission_jobs.py:63, 5 min) expires for a commissioning_timeout job with no node_added having claimed it, schedule a deferred best-effort cleanup — e.g. ask matter-server for its node list and remove_node / cancel-commissioning for anything matching the failed attempt, or at least log that the join definitively never happened. This pairs naturally with the window-expiry logging added in PR #20 (reconcile_node_added now warns when a node arrives outside the window), and with #23 (a late failure response would tell us cleanup is safe immediately).
Origin: analysis in #20
🤖 Generated with Claude Code
Surfaced during the PR #20 review (timeout/reconcile analysis).
Problem
The
commissioning_timeoutpath deliberately skipsremove_node(commission_jobs.py:386-398) so an about-to-succeed background join isn't torn down — that's the right trade for the common case (#16: node observed joining ~64s after the RPC gave up).But if the node never arrives, nothing ever cleans up matter-server's side: whatever partial commissioning/fabric state it accumulated for that attempt is left to rot. Repeated failed attempts could pile up half-open state in matter-server with no plugin-side bookkeeping or cleanup.
Recommendation
When
RECONCILE_WINDOW(commission_jobs.py:63, 5 min) expires for acommissioning_timeoutjob with nonode_addedhaving claimed it, schedule a deferred best-effort cleanup — e.g. ask matter-server for its node list andremove_node/ cancel-commissioning for anything matching the failed attempt, or at least log that the join definitively never happened. This pairs naturally with the window-expiry logging added in PR #20 (reconcile_node_addednow warns when a node arrives outside the window), and with #23 (a late failure response would tell us cleanup is safe immediately).Origin: analysis in #20
🤖 Generated with Claude Code