feat(telemetry): log and report lock-acquisition contention at info #152
Open
swarit-stepsecurity wants to merge 2 commits into
Open
feat(telemetry): log and report lock-acquisition contention at info #152swarit-stepsecurity wants to merge 2 commits into
swarit-stepsecurity wants to merge 2 commits into
Conversation
…evel Surface lock-acquisition failures (another instance already running) at info level instead of Debug so the contention is visible in agent.log, and report the failed run immediately at the failure site so the backend records that a second invocation contended for the lock. reportFailedOnce is idempotent, so the deferred handler firing on the error return is a no-op.
There was a problem hiding this comment.
Pull request overview
This PR aims to make lock-acquisition contention visible in normal (info-level) logs and ensure the backend records a failed run immediately when the agent can’t acquire its instance lock.
Changes:
- Promotes lock-acquisition failure logging from debug to info (
log.Progress) so it appears inagent.log. - Triggers an immediate
reportFailedOnce(...)on lock-acquisition failure (in addition to the deferred error handler, which becomes a no-op due to idempotency).
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // backend records that this invocation contended for the lock while one | ||
| // was already in flight. reportFailedOnce is idempotent, so the deferred | ||
| // handler that also fires on the error return is a no-op. | ||
| log.Progress("Lock acquisition failed (PID %d): %v — another instance is already running, exiting", os.Getpid(), err) |
Comment on lines
+366
to
+371
| // Another instance already holds the lock. Surface at info level (not | ||
| // Debug) so the contention is visible in agent.log, and report the | ||
| // failed run right here — don't wait for the deferred handler — so the | ||
| // backend records that this invocation contended for the lock while one | ||
| // was already in flight. reportFailedOnce is idempotent, so the deferred | ||
| // handler that also fires on the error return is a no-op. |
lock.Acquire can also fail on permission/IO errors creating the lock file; the underlying error carries the specific cause. Reword the log line and comment so they no longer hard-code the 'another instance is running' assumption.
Member
Author
|
Addressed in 1abf3c0 — reworded both the log line and the comment so they no longer assume contention is the only cause. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.