Skip to content

Half-open matter-server commissioning state can rot when a timed-out node never joins #24

Description

@simons-plugins

Surfaced during the PR #20 review (timeout/reconcile analysis).

Problem

The commissioning_timeout path deliberately skips remove_node (commission_jobs.py:386-398) so an about-to-succeed background join isn't torn down — that's the right trade for the common case (#16: node observed joining ~64s after the RPC gave up).

But if the node never arrives, nothing ever cleans up matter-server's side: whatever partial commissioning/fabric state it accumulated for that attempt is left to rot. Repeated failed attempts could pile up half-open state in matter-server with no plugin-side bookkeeping or cleanup.

Recommendation

When RECONCILE_WINDOW (commission_jobs.py:63, 5 min) expires for a commissioning_timeout job with no node_added having claimed it, schedule a deferred best-effort cleanup — e.g. ask matter-server for its node list and remove_node / cancel-commissioning for anything matching the failed attempt, or at least log that the join definitively never happened. This pairs naturally with the window-expiry logging added in PR #20 (reconcile_node_added now warns when a node arrives outside the window), and with #23 (a late failure response would tell us cleanup is safe immediately).

Origin: analysis in #20

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions