feat(graphrag): retune ingest defaults for 100-200 services by aksOps · Pull Request #49 · RandomCodeSpace/otelcontext

aksOps · 2026-04-27T15:51:02Z

Summary

Raise GRAPHRAG_WORKER_COUNT default 4→16 and GRAPHRAG_EVENT_QUEUE_SIZE default 10k→100k. Sized for the documented 100–200 service operational target.
New "Edge Pre-processing (OTel Collector)" section in docs/OPERATIONS.md with a tail-sampling pipeline recipe and a double-sampling caveat.
Make TestOnSpanIngested_DropsIncrementMetric use defaultChannelSize+1000 instead of a hardcoded 11000 so future retuning doesn't silently invalidate it.

This is Phase 0 of a multi-phase robustness push for 150–200 component scale. Subsequent phases (already brainstormed and approved): async ingest pipeline with hybrid backpressure, per-tenant cardinality fairness, SQLite FTS5+BM25 for log search, Postgres partitioning as opt-in adapter, wire-level RESOURCE_EXHAUSTED/429 backpressure, DROP-PARTITION retention.

Test plan

go build ./... clean
go vet ./... clean
go test ./... — all 12 packages pass
TestOnSpanIngested_DropsIncrementMetric passes against new 100k buffer (validates the test is not silently no-op)
CLAUDE.md and OPERATIONS.md doc lines reflect the new defaults
Memory impact of bumped defaults: ~5MB channel + ~50KB goroutine stacks (verified by inspection of the event struct shape)

🤖 Generated with Claude Code

Raise GraphRAG worker pool from 4 to 16 and event channel buffer from 10k to 100k slots. The previous defaults were sized for a handful of services; at the 100-200 service scale the documented operational target loud services would saturate the buffer and trigger `graphrag_events_dropped_total` increments under steady-state load. Memory cost of the new defaults is ~5MB extra channel capacity plus ~50KB extra goroutine stacks — negligible at the deployment scale where this matters. Also adds an "Edge Pre-processing (OTel Collector)" section to docs/OPERATIONS.md with a recommended Collector pipeline (memory_limiter + tail_sampling + batch) and notes on avoiding double-sampling between the edge Collector and OtelContext's internal sampler. TestOnSpanIngested_DropsIncrementMetric now uses defaultChannelSize+1000 instead of a hardcoded 11000 so it stays valid through future retuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sonarqubecloud · 2026-04-27T15:51:32Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

Five small follow-ups from the second-pass review of PRs #49–#55: - tsdb: fire cardinality-overflow callback AFTER releasing the Aggregator mutex. The callback is currently a Prometheus increment (atomic) but holding mu across an external function call is a footgun for any future hook. Capture the tenant under lock; invoke after Unlock. - storage: use errors.Is(err, sql.ErrNoRows) in pgLogsRelkind instead of strings.Contains(err.Error(), "no rows"). Robust against driver wrapping. - storage: convert Repository.logsPartitioned from plain bool to atomic.Bool. Removes the memory-model fragility of "the writer ran first" — read by retention.go from a separate goroutine. - config: reject negative MCP_MAX_CONCURRENT / MCP_CALL_TIMEOUT_MS / MCP_CACHE_TTL_MS at Validate(). 0 stays the documented "disable" sentinel; negatives are typos that should fail loud. - mcp: upgrade SetCallLimit doc to flag it startup-only — runtime resize leaks a slot in the old channel. Skipped (with rationale, not silently dropped): - M1 Submit TOCTOU on closed pipeline — cosmetic only, current ordering is documented. - M2 ring/onIngest setter races — would require API change to fix properly; benign during normal startup-only usage. - M4 FTS5 trigger throughput — needs a bulk-rebuild path, not a one-line tweak. - M5 isQueueFull scope — hypothetical concern with no observed symptom; revisit only if metrics show drift. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

aksOps merged commit acf904d into main Apr 27, 2026
17 checks passed

aksOps deleted the chore/ingest-pipeline-phase0-defaults branch April 27, 2026 15:53

This was referenced Apr 28, 2026

fix(post-review): H1-H4 + C1 from deep code review #56

Merged

fix(post-review): M/L cleanup from deep code review #57

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(graphrag): retune ingest defaults for 100-200 services#49

feat(graphrag): retune ingest defaults for 100-200 services#49
aksOps merged 1 commit into
mainfrom
chore/ingest-pipeline-phase0-defaults

aksOps commented Apr 27, 2026

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aksOps commented Apr 27, 2026

Summary

Test plan

Uh oh!

sonarqubecloud Bot commented Apr 27, 2026

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant