Summary
The test suite hard-deadlocks on several suites that spawn real pgrep/process subprocesses, making swift test unable to complete locally. The deadlock signature is a process parked at ~0.2% CPU in state S with no KeyPath code on any thread — the main thread sits inside XCTest's observation block waiting on a continuation that never resolves. Normal suite runtime is ~60s; a hung run sits forever until killed.
This was hit repeatedly while validating PR #697 (Tier-2 cleanup). The deadlock is unrelated to those changes (pure SwiftUI view edits) — it reproduces on its own.
Confirmed deadlocking suites/tests
ErrorHandlingTests (e.g. testResetToDefaultConfig / concurrent + reset config ops)
KanataManagerResetTests
KeyPathTests.testResetToDefaultConfig
There are likely more — these are just the ones reached before each run was killed. They share a pattern: they instantiate RuntimeCoordinator / call real saveConfiguration, resetToDefaultConfig, updateStatus(), cleanup().
Reproduces both in parallel and with --no-parallel, so it is intrinsic to these suites, not solely a parallelism artifact (though parallelism makes it more likely per the KeyPathTestCase docs).
Root cause
VHIDDeviceManager.detectConnectionHealth() (reached via RuntimeCoordinator) spawns pgrep subprocesses with 3s timeouts. Under repeated/parallel invocation these subprocesses deadlock. The project already has the fix — KeyPathTestCase base class — which installs the seam:
VHIDDeviceManager.testPIDProvider = { [] } // no real pgrep ever spawns
…plus WizardDependencies injection and singleton reset. Its header comment documents this exact hazard.
The gap is adoption, not design:
| Base class |
Suite count |
XCTestCase (direct) |
249 |
KeyPathTestCase |
12 |
The 249 direct-XCTestCase suites are mostly fine (pure logic), but any that construct RuntimeCoordinator / InstallerEngine / SystemValidator / VHIDDeviceManager are latent deadlocks. ErrorHandlingTests is a concrete example: it does lazy var manager: RuntimeCoordinator = .init() while extending XCTestCase directly.
Secondary symptom (the documented 18-failure baseline)
Separately, RuleCollectionsManagerTests (8), PackInstallIntegrationTests, and MapperSaveIntegrationTests fail (~18 total) with CustomRulesStore "dataCorrupted/Unexpected character 'o'" + temp-dir write failures — these look environmental. They run alphabetically after K, so locally a K-suite deadlock prevents ever reaching them. Worth confirming whether these are test-isolation/temp-dir issues vs. real bugs.
Proposed fixes
Reliability
- Migrate offending suites to
KeyPathTestCase. Start with the confirmed three, then grep for every test file referencing RuntimeCoordinator|InstallerEngine|SystemValidator|VHIDDeviceManager and convert any that extend XCTestCase directly.
- Enforce mechanically — a SwiftLint custom rule or CI grep guard: a test file referencing those four types must subclass
KeyPathTestCase. This is the single highest-leverage guard against regression.
- Investigate the 18 baseline failures (temp-dir isolation in
CustomRulesStore/PackInstall/MapperSave).
Speed / fail-fast
4. Add a per-test or per-suite timeout in CI (watchdog or --xctest-timeout) so a deadlock fails loudly in ~30s instead of hanging the CI slot silently. A deadlock that fails is far more useful than one that hangs.
5. Preserve the <5s / ~530-test target by keeping the seams (no disk/process/network in unit tests) — e.g. ErrorHandlingTests.testConcurrentConfigurationOperations (5 concurrent real saveConfiguration calls) should drive a fake store, not the filesystem.
Acceptance criteria
Filed from PR #697 validation; see CLAUDE.md test rules and Tests/KeyPathTests/KeyPathTestCase.swift.
Summary
The test suite hard-deadlocks on several suites that spawn real
pgrep/process subprocesses, makingswift testunable to complete locally. The deadlock signature is a process parked at ~0.2% CPU in stateSwith no KeyPath code on any thread — the main thread sits inside XCTest's observation block waiting on a continuation that never resolves. Normal suite runtime is ~60s; a hung run sits forever until killed.This was hit repeatedly while validating PR #697 (Tier-2 cleanup). The deadlock is unrelated to those changes (pure SwiftUI view edits) — it reproduces on its own.
Confirmed deadlocking suites/tests
ErrorHandlingTests(e.g.testResetToDefaultConfig/ concurrent + reset config ops)KanataManagerResetTestsKeyPathTests.testResetToDefaultConfigThere are likely more — these are just the ones reached before each run was killed. They share a pattern: they instantiate
RuntimeCoordinator/ call realsaveConfiguration,resetToDefaultConfig,updateStatus(),cleanup().Reproduces both in parallel and with
--no-parallel, so it is intrinsic to these suites, not solely a parallelism artifact (though parallelism makes it more likely per theKeyPathTestCasedocs).Root cause
VHIDDeviceManager.detectConnectionHealth()(reached viaRuntimeCoordinator) spawnspgrepsubprocesses with 3s timeouts. Under repeated/parallel invocation these subprocesses deadlock. The project already has the fix —KeyPathTestCasebase class — which installs the seam:…plus
WizardDependenciesinjection and singleton reset. Its header comment documents this exact hazard.The gap is adoption, not design:
XCTestCase(direct)KeyPathTestCaseThe 249 direct-
XCTestCasesuites are mostly fine (pure logic), but any that constructRuntimeCoordinator/InstallerEngine/SystemValidator/VHIDDeviceManagerare latent deadlocks.ErrorHandlingTestsis a concrete example: it doeslazy var manager: RuntimeCoordinator = .init()while extendingXCTestCasedirectly.Secondary symptom (the documented 18-failure baseline)
Separately,
RuleCollectionsManagerTests(8),PackInstallIntegrationTests, andMapperSaveIntegrationTestsfail (~18 total) withCustomRulesStore"dataCorrupted/Unexpected character 'o'" + temp-dir write failures — these look environmental. They run alphabetically afterK, so locally aK-suite deadlock prevents ever reaching them. Worth confirming whether these are test-isolation/temp-dir issues vs. real bugs.Proposed fixes
Reliability
KeyPathTestCase. Start with the confirmed three, then grep for every test file referencingRuntimeCoordinator|InstallerEngine|SystemValidator|VHIDDeviceManagerand convert any that extendXCTestCasedirectly.KeyPathTestCase. This is the single highest-leverage guard against regression.CustomRulesStore/PackInstall/MapperSave).Speed / fail-fast
4. Add a per-test or per-suite timeout in CI (watchdog or
--xctest-timeout) so a deadlock fails loudly in ~30s instead of hanging the CI slot silently. A deadlock that fails is far more useful than one that hangs.5. Preserve the <5s / ~530-test target by keeping the seams (no disk/process/network in unit tests) — e.g.
ErrorHandlingTests.testConcurrentConfigurationOperations(5 concurrent realsaveConfigurationcalls) should drive a fake store, not the filesystem.Acceptance criteria
swift testcompletes locally (parallel) without hangingpgrep/launchctl(all coordinator-touching suites onKeyPathTestCase)XCTestCasesuites from touching the four hazard typesFiled from PR #697 validation; see CLAUDE.md test rules and
Tests/KeyPathTests/KeyPathTestCase.swift.