Skip to content

refactor: add mock-bus test harness, fix potential data race for gripper connection close#11

Draft
HipsterBrown wants to merge 15 commits into
mainfrom
nhehr/pr1-lifecycle-correctness
Draft

refactor: add mock-bus test harness, fix potential data race for gripper connection close#11
HipsterBrown wants to merge 15 commits into
mainfrom
nhehr/pr1-lifecycle-correctness

Conversation

@HipsterBrown

Copy link
Copy Markdown
Contributor
  • Delete runtime.Caller/callerPorts machinery in registry.go:332-360. Replace with explicit port tracking on each component struct (s.controllerPort = config.Port); Close calls globalRegistry.ReleaseController(s.controllerPort) directly.
  • Stop reusing s.initCtx (arm.go:111). Pass per-call ctx through to doServoInitialization, diagnoseConnection, verifyServoConfig. Remove the initCtx field.
  • Fix gripper-closes-first race: in getExistingController, cache one *SafeSoArmController per port in ControllerEntry and return that exact pointer (registry.go:98-104). Add closed atomic.Bool to SafeSoArmController; method calls return a sentinel ErrControllerClosed when set. ReleaseController at refcount zero sets the flag and closes the bus.
  • Tests: introduce mock-bus harness based on feetech/_examples/mock_transport. Cover refcount-0 close, double-acquire returns same pointer, post-close method call returns sentinel error.

HipsterBrown and others added 15 commits April 27, 2026 15:48
8-PR sequence prioritizing arm/gripper correctness fixes (lifecycle,
speed/accel, opMgr) before calibration sensor and discovery work.
Defers the singleton-vs-arm-as-core architecture decision to a
post-remediation revisit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Move commented radians-block deletion fully into PR3 (no overlap with PR5)
- Commit PR6 to delegating multi-baudrate sweep to feetech.Bus.Discover
- Narrow PR8 cmd/cli triage to keeping only debug_cli; delete the other 8

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Includes mock-bus harness scaffolding for reuse across PR2-PR5.
Reviewer feedback applied: drop unused atomic import in test, keep
strings import in registry.go, scope verification to package path
until cmd/cli is fixed in PR8, comment harness design choice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review follow-up to 5ee8cf2. Returning (0, nil) caused
feetech.Bus.readRawBytesLocked to busy-spin under the bus mutex
until the deadline; io.EOF triggers its 1ms-sleep retry path
instead. Important for PR5's planned 100Hz calibration reader.
Also documents Flush no-op semantics.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review follow-up to 3eebff6. Covers every gated method so a
silently-added 10th method without checkClosed gating is a visible
omission. Pattern reused by Task 3-5 tests.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GetController now returns the cached *SafeSoArmController pointer instead
of building a fresh struct copy per caller. ReleaseController and
ForceCloseController both set closed=true on the cached controller before
closing the bus, so all consumers (arm + gripper sharing a port) observe
ErrControllerClosed atomically on next call.

Also bundles two Task 2 follow-ups flagged by the code-quality review:
- Document the checkClosed race window: it is best-effort and callers
  must hold a registry refcount for the duration of any controller call.
- Make Close idempotent: set s.closed.Store(true) at the top so direct
  callers (tests bypassing the registry) get the same close contract as
  registry-driven release.

The trackCaller function definition is left in place for Task 4 to
delete; only the two call sites are removed here.
Code-review follow-up to 1614037. ReleaseController nils entry.config
at refcount zero; a concurrent GetController that blocked on entry.mu
during release would observe a nil config and panic on the error
formatting path. Prefer the caller's config.Port as fallback.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ndles

Components now store their controllerPort and call
globalRegistry.ReleaseController(port) directly, replacing the brittle
runtime.Caller(2)/Caller(3) PC-based release path. Removes callerPorts,
callerMu, trackCaller, releaseFromCaller from registry.go and the
ReleaseSharedController + GetCurrentCalibration shims from manager.go.
Updates TestRegistryCreation to no longer reference the deleted
callerPorts field. Adds TestRegistry_ExplicitPortReleaseDecrementsRefcount
to lock in the new contract.

Also removes the so101.initCtx field; the three call sites that read it
(doServoInitialization, diagnoseConnection, verifyServoConfig) now use
s.cancelCtx as a temporary placeholder. Task 5 will thread per-call ctx
properly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code-review follow-up to 40e7c28. Three error returns between
controller acquisition and arm-struct construction were missing
the explicit ReleaseController. Pre-existing leak that became
visible once Task 4 made every other release path explicit.

Also strengthens placeholder comments at the three doServoInit/
diagnose/verify sites to document the cancellation regression
Task 5 will close, and adds a regression test that releasing a
never-registered port is a safe no-op.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…le-scope assertion

Code-review follow-up to 2803bc7. The cancelCtx field was assigned
but never read once Task 5 threaded per-call ctx through the helpers;
only cancelFunc is consumed by Close.

Replaces TestArmHelpers_AcceptContext (a runtime-skipped
documentation-only test that the linter correctly flagged as
tautological) with file-scope signature assertions that fail to
compile if any helper drops its ctx parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Final-review follow-up to bc795be. Once cancelCtx was removed, the
context returned by WithCancel was being discarded into _, leaving
cancelFunc canceling a context nobody observed. PR4's opMgr work
will introduce proper cancellation; until then there's no reason to
keep the dead lifecycle field.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant