feat(confgen): parallelize per-sheet row parsing with bounded worker pool#411
Open
Kybxd wants to merge 1 commit into
Open
feat(confgen): parallelize per-sheet row parsing with bounded worker pool#411Kybxd wants to merge 1 commit into
Kybxd wants to merge 1 commit into
Conversation
…pool Add an optional intra-sheet parallel parsing path for vertical map/list/struct sheets, gated by a process-wide semaphore so total live workers stay at GOMAXPROCS even when many sheets parse concurrently. - internal/confgen/parallel.go: planShards (row-range / key-hash strategies), worker fan-out, merge step, eligibility walk, and bucketForKey with documented string-vs-typed-key risk and follow-up options (typed-key hashing / merge-time defensive check). - internal/confgen/parallel_test.go: planner, bucketForKey, and end-to-end coverage. - internal/x/xpool/sem.go,sem_test.go: tiny context-aware counting semaphore (NewSemaphore / NewCPUSemaphore) used to cap leaf row-parsing workers; intentionally NOT shared with book/sheet level fan-out to avoid cross-level deadlock. - internal/confgen/parser.go: add parse/export sum-time logging in ScatterAndExport for perf diagnostics. - internal/confgen/table_parser.go, internal/importer/book/tableparser/parser.go: expose ScanColumn and rangeDataRowsAtIndices/InRange so the parallel planner can pre-scan the outer-key column and workers can iterate non-contiguous row sets. - test/functest: ParallelConf functional fixture (sheet, proto, golden JSON). Empirical wall-clock on zonesvr (135 workbooks, 351 sheets, N=5): cap=GOMAXPROCS median 13501ms (min 13237, max 13684, stdev 165) cap=1<<20 (~no cap) median 13739ms (min 13657, max 13847, stdev 79) -> ~1.76% wall improvement plus per-sheet timing stability (large sheets no longer regress vs isolated runs under concurrent load).
|
The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #411 +/- ##
==========================================
+ Coverage 74.87% 75.07% +0.20%
==========================================
Files 88 90 +2
Lines 9384 9688 +304
==========================================
+ Hits 7026 7273 +247
- Misses 1785 1818 +33
- Partials 573 597 +24 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add an optional intra-sheet parallel parsing path for vertical map / list /
struct sheets. Total concurrently running row-parsing workers across all
sheets are capped at
GOMAXPROCSvia a process-wide counting semaphore, sothe existing book/sheet level fan-out can stay unbounded without inducing
scheduler thrash.
Motivation
Profiles on large workbooks (e.g. zonesvr's 580k-row
ActorConf) showedsingle-sheet row parsing dominating the critical path. Naively spawning
workers per sheet would push live goroutine count to
N_concurrent_sheets * GOMAXPROCS, regressing per-sheet timings vs anisolated single-sheet run. We need worker-level parallelism that does NOT
oversubscribe the scheduler.
Design
parallel.go: parallelEligibility): walks thedescriptor to decide whether a sheet's layout is safe to parallelize
(vertical map/list/struct, no transpose, etc.). Cached only per call,
not globally, because it depends on per-sheet
WorksheetOptions.planShards): two strategies.strategyRowRange: even row blocks, used when row order is the onlyinvariant.
strategyKeyHash: pre-scan the outer-key column via the newtableparser.ScanColumn, FNV-1a-hash each cell to a bucket so allrows sharing an outer key go to the same worker (preserves
last-write-wins / append-order / E2005 duplicate-key detection).
sheetParserwith a privatesheet-collector child and fresh cards map, parsing into per-shard
partial messages that the main goroutine merges after
g.Wait.xpool.NewCPUSemaphore): leaf workers acquire atoken before the heavy parse phase. The semaphore is intentionally
not shared with
gen.convert(book/sheet level): an outergoroutine holding a token would deadlock waiting for its own inner
workers.
Files
internal/confgen/parallel.go(+parallel_test.go): planner,worker fan-out, merge, eligibility walk,
bucketForKey. The lattercarries a documented risk note: it hashes raw cell strings, so
variants of the same logical key (
"1"vs"01","true"vs"TRUE", etc.) can route to different buckets. Two follow-ups arespelled out in the comment for when this actually surfaces:
defensive backstop.
internal/x/xpool/sem.go(+sem_test.go): a tiny context-awarecounting semaphore (
NewSemaphore,NewCPUSemaphore).internal/confgen/parser.go: parse/export sum-time logging inScatterAndExportfor perf diagnostics (CPU-style sums, notwall-clock).
internal/confgen/table_parser.go,internal/importer/book/tableparser/parser.go: exposeScanColumnand
rangeDataRowsAtIndices/rangeDataRowsInRangeso the plannercan pre-scan the outer-key column and workers can iterate
non-contiguous row sets.
test/functest/...ParallelConf...: functional fixture (sheet, proto,golden JSON).
Performance
zonesvr (135 workbooks, 351 sheets), N=5 full runs, identical binary,
only
workerSemcapacity differs:~1.76% wall improvement; more importantly, per-sheet timings of
critical-path large sheets under concurrent load are now on par with
their isolated single-sheet runs, instead of being amplified by
scheduler contention.
Tests
xpoolsemaphore andbucketForKey/planShards.ParallelConfexercising the parallel pathend-to-end.
go test ./...green locally.Known follow-ups
bucketForKeystring-vs-typed-key equivalence (documented inline,two fix paths listed; deferred until observed in real configs).
ScatterAndExportare perf-diagnostic;consider gating behind a debug flag in a follow-up if too noisy.