Skip to content

fix(post-review): H1-H4 + C1 from deep code review#56

Merged
aksOps merged 1 commit into
mainfrom
fix/post-review-bugs
Apr 28, 2026
Merged

fix(post-review): H1-H4 + C1 from deep code review#56
aksOps merged 1 commit into
mainfrom
fix/post-review-bugs

Conversation

@aksOps

@aksOps aksOps commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

Summary

Addresses real bugs surfaced by the parallel deep code review (superpowers + Codex) of PRs #49-#55.

Severity File Fix
C1 internal/mcp/server.go Refactor tools/call concurrency gate; replace dead break inside select{default} with an explicit acquired flag
H1 internal/mcp/server.go Slot held until inner goroutine finishes, not when the request thread gives up on timeout. Defends concurrency cap from slow handlers
H2 internal/tsdb/aggregator.go droppedBatchesatomic.Int64 (was plain int64 racing between flush() and DroppedBatches() reader)
H3 internal/storage/{repository.go,log_repo.go} FTS5→LIKE fallback now logs slog.Warn so operators see index-layer issues instead of silently degrading to LIKE
H4 internal/storage/partitions_scheduler.go Documented one-shot lifecycle. Start-Stop-Start is not supported (closes done once)

Reverted

  • H5 (Pipeline SoftThreshold == 0 clobbered): conflicts with TestPipeline_DefaultsApplied which enforces zero-value → default. The internal-only field has no env-var surface; "operator deliberately set 0" is not a real path.
  • H6 (Pipeline log loss when spans fail): conflicts with TestPipeline_FailedSpansSkipsLogs which enforces "orphan logs are not persisted without their spans" as the design contract. Added an explanatory comment so future reviewers don't re-flag.

Test plan

  • go test ./... -race -count=1 — full suite green
  • New TestRobustness_TimeoutHoldsSemaphoreSlot locks in the H1 ordering contract
  • go vet ./... — clean
  • golangci-lint run --new-from-rev=origin/main — clean

M/L findings (Submit TOCTOU, aggregator unsync field reads, callback under mutex, isQueueFull scope, pgLogsRelkind errors.Is, MCP config clamp, logsPartitioned atomic.Bool, FTS5 trigger perf doc) deferred to a follow-up cleanup PR.

🤖 Generated with Claude Code

- C1 / H1 (mcp/server.go): tools/call concurrency gate refactored to use
  an explicit `acquired` flag so the dead `break` inside select{default}
  is gone. The semaphore slot is now held until the inner goroutine
  finishes — not when the request thread gives up on timeout. Without
  this, slow handlers could keep accumulating goroutines past the cap.
  Added TestRobustness_TimeoutHoldsSemaphoreSlot to lock in the new
  ordering contract.

- H2 (tsdb/aggregator.go): droppedBatches changed to atomic.Int64.
  Previously a plain int64 was incremented in flush() (after the unlock)
  and read concurrently by DroppedBatches() from telemetry scrape paths.
  No test surfaced it because no scrape ran during a flush; live
  Prometheus would have.

- H3 (storage): FTS5→LIKE fallback now logs slog.Warn with tenant +
  query + error so the operator notices instead of leaving the seatbelt
  on. Applied to both SearchLogs and GetLogsV2.

- H4 (storage/partitions_scheduler.go): documented the one-shot
  lifecycle. Start-Stop-Start is NOT supported because Stop closes the
  internal `done` channel; restarts must construct a fresh scheduler.

Reverted: H5 and H6 from the review conflict with existing test contracts
(TestPipeline_DefaultsApplied requires SoftThreshold zero-value to be
treated as "use default"; TestPipeline_FailedSpansSkipsLogs enforces
"orphan logs are not persisted without their spans"). Kept the existing
behaviour and added comments explaining it so future reviewers don't
re-flag the same items.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@sonarqubecloud

Copy link
Copy Markdown

@aksOps aksOps merged commit b595b82 into main Apr 28, 2026
17 checks passed
@aksOps aksOps deleted the fix/post-review-bugs branch April 28, 2026 00:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant