feat(benchmarks): add --config to tpch datafusion + contributor guide#1631
Draft
andygrove wants to merge 4 commits intoapache:mainfrom
Draft
feat(benchmarks): add --config to tpch datafusion + contributor guide#1631andygrove wants to merge 4 commits intoapache:mainfrom
andygrove wants to merge 4 commits intoapache:mainfrom
Conversation
Mirrors the existing flag on the ballista subcommand and lets contributors override DataFusion session config keys on the command line. Document the flag and the commonly tuned keys in the new contributor benchmarking guide.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes #.
Rationale for this change
Two related gaps for contributors who want to benchmark Ballista:
tpch benchmark datafusionsubcommand has no way to override DataFusion session config keys. Thetpch benchmark ballistasubcommand has had a-c key=valueflag for this since chore(deps): update to datafusion v.53 #1486 era, but the in-process DataFusion path requires editing source to change a single config. That makes apples-to-apples comparisons between the two modes hard.Benchmarkingsection that links out tobenchmarks/README.md. The README covers cluster setup but not the day-to-day contributor workflow: which subcommand to use, how to override configs, how to read the metrics, and how to capture a flame graph when something is unexpectedly slow.What changes are included in this PR?
benchmarks/src/bin/tpch.rsadds a repeatable-c/--config key=valueflag to thedatafusionsubcommand, mirroring the one already onballista. Overrides are applied to theSessionConfigbefore constructing theSessionContext. Invalid keys log a warning and are skipped, matching the existing behavior on the ballista path.docs/source/contributors-guide/benchmarking.mdcovering:tpchgen-rstpchbinary inbenchmark datafusion(in-process) andbenchmark ballista(against a cluster) modes, with notes on picking a representative single query while iterating-c key=value, with a table of commonly tuneddatafusion.*andballista.*keyscargo flamegraphandsamplyBenchmarkingsection incontributors-guide/development.mdnow points at the new page and keeps the link tobenchmarks/README.mdindex.rsttoctree under Contributors GuideAre there any user-facing changes?
tpch benchmark datafusionaccepts a new-c/--configflag. No existing flags or defaults change, so prior invocations behave the same.