fix(fhir): no-issue: stop FHIR resolver job stalling forever on lock waits#10088
fix(fhir): no-issue: stop FHIR resolver job stalling forever on lock waits#10088passcod wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Want higher recall? High effort reviews run extra passes and find more bugs. A team admin can switch effort levels in the Cursor dashboard.
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit b5142f2. Configure here.
|
🦸 Review Hero Summary Below consensus threshold (7 unique issues not confirmed by majority)
Local fix prompt (copy to your coding agent)Fix these issues identified on the pull request. One commit per issue fixed.
|
b5142f2 to
add69fe
Compare
…waits The fhir.resolver job could sit in 'Started' indefinitely when its rematerialisation blocked on a lock held elsewhere (e.g. a long sync session or a bulk update). The worker kept heartbeating, so the dead-worker reclaim path never fired and the job never errored, which wedged the singleton resolver queue and tripped the long-running-job healthcheck. Bound each record's lock waits with a configurable lock_timeout (fhir.worker.resolverLockTimeout, default 2 minutes) so a blocked resolve errors and retries on the same worker instead of hanging. Also move the transaction boundary down to per-record, so lock holds stay short and progress is preserved if a later record fails. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
add69fe to
90a3700
Compare

Changes
🤖 The
fhir.resolverjob could sit inStartedforever when its rematerialisation blocked on a lock held by another transaction (e.g. a long sync session or a bulk update). Because the owning worker kept heartbeating, the dead-worker reclaim path never fired and the job never errored — wedging the singleton resolver queue (the job uses a fixed discriminant, so no new resolver can be enqueued while one is stuck) and tripping the long-running-job healthcheck.This bounds each record's lock waits with a configurable
lock_timeout(newfhir.worker.resolverLockTimeoutsetting, default 2 minutes), set viaset_configinside each resolve transaction, so a blocked resolve errors and retries on the same worker rather than hanging — keeping the "one at a time" invariant intact (no cross-worker stealing). It also moves the transaction boundary down from one transaction per run to one per record, so lock holds stay short and progress is preserved if a later record fails.Added an integration test that holds a row lock from a separate transaction and asserts the resolve fails fast instead of stalling.
Auto-Deploy
Options
Tests
Review Hero
.github/review-hero/suppressions.yml. Also runs automatically at the end of any auto-fix run.Remember to...