Surfaced during the PR #20 review (timeout/reconcile analysis).
Scenario
- Job A commissions, times out →
commissioning_timeout (deliberately no remove_node, per commission_jobs.py:383-398).
- User retries with the same/new setup code → job B starts.
- The original node finally joins →
node_added fires → A's reconcile claims it (reconcile_node_added, commission_jobs.py:226) and creates Indigo devices.
- If job B meanwhile obtained a
node_id from its own commission_with_code and then fails a later step (descriptor read, device create), B's _fail (commission_jobs.py:409-415) calls remove_node(node_id) — potentially tearing a live, just-claimed node (with Indigo devices) off the fabric.
Whether B's node_id can equal A's node depends on matter-server's behaviour when re-commissioning an already-joining device, but nothing in the job table prevents the destructive overlap.
Suggested work
- At minimum a characterization test pinning current behaviour for the interleaving.
- Possible fix:
_fail should skip remove_node if any other job (terminal-success or reconciled) owns that node_id, or if Indigo devices already exist for it (device_sync lookup).
Origin: analysis in #20
🤖 Generated with Claude Code
Surfaced during the PR #20 review (timeout/reconcile analysis).
Scenario
commissioning_timeout(deliberately noremove_node, percommission_jobs.py:383-398).node_addedfires → A's reconcile claims it (reconcile_node_added,commission_jobs.py:226) and creates Indigo devices.node_idfrom its owncommission_with_codeand then fails a later step (descriptor read, device create), B's_fail(commission_jobs.py:409-415) callsremove_node(node_id)— potentially tearing a live, just-claimed node (with Indigo devices) off the fabric.Whether B's node_id can equal A's node depends on matter-server's behaviour when re-commissioning an already-joining device, but nothing in the job table prevents the destructive overlap.
Suggested work
_failshould skipremove_nodeif any other job (terminal-success or reconciled) owns that node_id, or if Indigo devices already exist for it (device_sync lookup).Origin: analysis in #20
🤖 Generated with Claude Code