Fix attribution backend timeout handling#347
Conversation
Greptile SummaryThis PR fixes lib-backend timeout handling in the attribution service by offloading blocking
Confidence Score: 5/5Safe to merge — the async offloading and lock lifecycle are correct, and the restriction of launcher-managed attribution to the mcp backend is consistently enforced at both the CLI and dataclass construction layers. The asyncio.shield + asyncio.to_thread pattern correctly gives the coalescer a real await point to cancel on while keeping the singleton analyzer serialized. The done callback is registered before await, the lock is always released by the callback (not the caller), and waiter_cancelled is read only after all cancellation handling has run on the single event loop thread. No logic bugs or data-loss paths were found. No files require special attention. Important Files Changed
Sequence DiagramsequenceDiagram
participant C as Coalescer
participant F as _fetch_log_result_lib
participant S as _run_lib_log_analyzer_serialized
participant T as asyncio.to_thread (worker task)
participant Th as Thread (asyncio.run)
C->>F: "get_or_compute(..., compute_timeout=N)"
F->>S: await _run_lib_log_analyzer_serialized(analyzer, kwargs)
S->>S: await _log_analysis_lock.acquire()
S->>T: asyncio.create_task(asyncio.to_thread(...))
T->>Th: spawn thread → asyncio.run(analyzer.run(...))
S->>T: add_done_callback(_finish_lib_run)
S->>T: await asyncio.shield(worker_task)
alt Normal completion
Th-->>T: result
T->>S: _finish_lib_run: release lock, discard task
T-->>S: shield resolves → return result
S-->>F: result
F-->>C: log_result dict
else Coalescer timeout (CancelledError)
C->>S: cancel outer task (via wait_for)
S->>S: "CancelledError → waiter_cancelled=True, re-raise"
S-->>C: CancelledError propagates
C-->>C: "return {state: timeout}"
Note over T,Th: worker_task continues running in thread
Th-->>T: completes (success or error)
T->>T: _finish_lib_run: release lock, log warning if error
end
Reviews (3): Last reviewed commit: "Fix attribution backend timeout handling" | Re-trigger Greptile |
| def _finish_lib_run(task: "asyncio.Task[Any]") -> None: | ||
| self._lib_log_analysis_tasks.discard(task) | ||
| if self._log_analysis_lock.locked(): | ||
| self._log_analysis_lock.release() | ||
| try: | ||
| task.exception() | ||
| except asyncio.CancelledError: | ||
| pass |
There was a problem hiding this comment.
Worker exceptions silently dropped after caller timeout
When the coalescer times out and cancels _fetch_log_result_lib, asyncio.shield absorbs the cancellation and the worker task keeps running. If that worker later fails with an exception, task.exception() returns the exception object — but the callback neither logs nor re-raises it, so the failure is silently dropped. The caller already received CancelledError, so there is no other path that will surface this error. A logger.warning (or logger.debug) for the non-None return value of task.exception() would make post-timeout failures observable in service logs.
1134c08 to
53b6565
Compare
53b6565 to
90947cd
Compare
nvrx-attrsvcable to usemcporlib.