fix(storage): constants + gossip tuning + Delete error surface (#495, #498, #500, #501, #502)#18
fix(storage): constants + gossip tuning + Delete error surface (#495, #498, #500, #501, #502)#18jackthepunished wants to merge 4 commits into
Conversation
…lete errors (poyrazK#495, poyrazK#498, poyrazK#500, poyrazK#501, poyrazK#502) * internal/storage/storageconst: new package that owns the previously duplicated MaxObjectSize and ChunkSize values; coordinator and node packages now alias from there so they cannot drift. (poyrazK#500, poyrazK#502) * gossip.go: replace hardcoded 2s / 5s timeouts with named constants (defaultGossipRPCTimeout, defaultFailureDetectionTimeout, defaultFailCheckInterval) and add Set* overrides so embedders and tests can tune for higher-latency networks without recompiling. The alive→suspect threshold no longer assumes LAN. (poyrazK#500, poyrazK#501) * gossip.go.sendGossip: tighten the double-checked locking note — the read-lock-release / write-lock-reacquire pattern already covers the documented race because the second hold re-checks both peers and members maps, but the new comments make that explicit and link it to the closePeerLocked invariant. (poyrazK#495) * node/store.go.Delete: propagate os.Remove errors instead of dropping the meta-file removal error on the floor. Both failures are returned together so the operator can distinguish data-vs-meta partial failures. (poyrazK#498)
|
Warning Rate limit exceeded
You’ve run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR consolidates hardcoded storage size limits and chunk sizes into a shared ChangesStorage Constants and Gossip Tuning
🎯 2 (Simple) | ⏱️ ~12 minutes
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Use %w for the primary error and %s for the secondary so errorlint's no-non-wrapping-format-verb rule is happy without losing the meta error context.
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@internal/storage/node/gossip.go`:
- Around line 18-22: The comment wrongly states the gossip tuning constants are
"exported" while they are implemented as unexported (lowercase) identifiers;
update the comment in internal/storage/node/gossip.go to accurately describe
them as package-private (unexported) lowercase constants or alternatively export
the identifiers if intended—i.e., either change the wording to say "These
identifiers are package-private (unexported) so tests, embedders, and ops
tooling must override them within the package or via explicit export" or rename
the constants to exported names (capitalize) so the original text matches the
code; ensure the comment references the same "gossip protocol tuning constants"
block.
- Around line 75-97: The three setters
(GossipProtocol.SetFailureDetectionTimeout, SetGossipRPCTimeout,
SetFailCheckInterval) must guard against non-positive durations to avoid
panics/degenerate timers; add an early check like "if d <= 0 { return }" at the
start of each method before acquiring g.mu and assigning
g.failureDetectionTimeout, g.gossipRPCTimeout or g.failCheckInterval so invalid
inputs are ignored and existing routines using time.NewTicker or timeout logic
won't receive zero/negative durations.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 7ddeee1a-2227-4f71-8c5a-f08501c36453
📒 Files selected for processing (5)
internal/storage/coordinator/service.gointernal/storage/node/gossip.gointernal/storage/node/rpc.gointernal/storage/node/store.gointernal/storage/storageconst/constants.go
| // Gossip protocol tuning constants. These are deliberately exported so that | ||
| // tests, embedders, and ops tooling can override them in lock-step without | ||
| // editing source. The defaults are conservative for low-latency LAN clusters | ||
| // and may need to be raised for WAN/high-latency deployments to avoid false | ||
| // positives. |
There was a problem hiding this comment.
Comment says constants are “exported”, but they are not.
These identifiers are lowercase (package-private), so the wording is inaccurate and can mislead embedders.
Suggested wording tweak
-// Gossip protocol tuning constants. These are deliberately exported so that
+// Gossip protocol tuning defaults. These are package-level defaults so that
// tests, embedders, and ops tooling can override them in lock-step without
// editing source. The defaults are conservative for low-latency LAN clusters🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/storage/node/gossip.go` around lines 18 - 22, The comment wrongly
states the gossip tuning constants are "exported" while they are implemented as
unexported (lowercase) identifiers; update the comment in
internal/storage/node/gossip.go to accurately describe them as package-private
(unexported) lowercase constants or alternatively export the identifiers if
intended—i.e., either change the wording to say "These identifiers are
package-private (unexported) so tests, embedders, and ops tooling must override
them within the package or via explicit export" or rename the constants to
exported names (capitalize) so the original text matches the code; ensure the
comment references the same "gossip protocol tuning constants" block.
| // SetFailureDetectionTimeout overrides the alive→suspect promotion threshold. | ||
| // Must be called before Start. | ||
| func (g *GossipProtocol) SetFailureDetectionTimeout(d time.Duration) { | ||
| g.mu.Lock() | ||
| defer g.mu.Unlock() | ||
| g.failureDetectionTimeout = d | ||
| } | ||
|
|
||
| // SetGossipRPCTimeout overrides the per-RPC Gossip timeout. | ||
| // Must be called before Start. | ||
| func (g *GossipProtocol) SetGossipRPCTimeout(d time.Duration) { | ||
| g.mu.Lock() | ||
| defer g.mu.Unlock() | ||
| g.gossipRPCTimeout = d | ||
| } | ||
|
|
||
| // SetFailCheckInterval overrides how often detectFailures runs. | ||
| // Must be called before Start. | ||
| func (g *GossipProtocol) SetFailCheckInterval(d time.Duration) { | ||
| g.mu.Lock() | ||
| defer g.mu.Unlock() | ||
| g.failCheckInterval = d | ||
| } |
There was a problem hiding this comment.
Validate non-positive duration overrides in setter APIs.
Set* currently accepts <= 0 durations. That can panic time.NewTicker at Line 141 and cause immediate/degenerate timeouts at Line 163 and Line 320.
Suggested guard pattern
func (g *GossipProtocol) SetFailureDetectionTimeout(d time.Duration) {
g.mu.Lock()
defer g.mu.Unlock()
+ if d <= 0 {
+ g.logger.Warn("ignoring non-positive failure detection timeout", "value", d, "using_default", defaultFailureDetectionTimeout)
+ d = defaultFailureDetectionTimeout
+ }
g.failureDetectionTimeout = d
}
func (g *GossipProtocol) SetGossipRPCTimeout(d time.Duration) {
g.mu.Lock()
defer g.mu.Unlock()
+ if d <= 0 {
+ g.logger.Warn("ignoring non-positive gossip RPC timeout", "value", d, "using_default", defaultGossipRPCTimeout)
+ d = defaultGossipRPCTimeout
+ }
g.gossipRPCTimeout = d
}
func (g *GossipProtocol) SetFailCheckInterval(d time.Duration) {
g.mu.Lock()
defer g.mu.Unlock()
+ if d <= 0 {
+ g.logger.Warn("ignoring non-positive fail-check interval", "value", d, "using_default", defaultFailCheckInterval)
+ d = defaultFailCheckInterval
+ }
g.failCheckInterval = d
}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@internal/storage/node/gossip.go` around lines 75 - 97, The three setters
(GossipProtocol.SetFailureDetectionTimeout, SetGossipRPCTimeout,
SetFailCheckInterval) must guard against non-positive durations to avoid
panics/degenerate timers; add an early check like "if d <= 0 { return }" at the
start of each method before acquiring g.mu and assigning
g.failureDetectionTimeout, g.gossipRPCTimeout or g.failCheckInterval so invalid
inputs are ignored and existing routines using time.NewTicker or timeout logic
won't receive zero/negative durations.
Setters no-op after Start() is called to prevent confusing race conditions where a goroutine reads a new value while the old value is already live in the tickers. A started flag is set under mutex protection in Start().
Summary
Targets upstream: poyrazK#495, poyrazK#498, poyrazK#500, poyrazK#501, poyrazK#502
Test plan
Summary by CodeRabbit
Bug Fixes
Improvements