Skip to content

feat(benchmarks): add --config to tpch datafusion + contributor guide#1631

Draft
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:docs/contributor-guide-benchmarks
Draft

feat(benchmarks): add --config to tpch datafusion + contributor guide#1631
andygrove wants to merge 4 commits intoapache:mainfrom
andygrove:docs/contributor-guide-benchmarks

Conversation

@andygrove
Copy link
Copy Markdown
Member

@andygrove andygrove commented Apr 30, 2026

Which issue does this PR close?

Closes #.

Rationale for this change

Two related gaps for contributors who want to benchmark Ballista:

  1. The tpch benchmark datafusion subcommand has no way to override DataFusion session config keys. The tpch benchmark ballista subcommand has had a -c key=value flag for this since chore(deps): update to datafusion v.53 #1486 era, but the in-process DataFusion path requires editing source to change a single config. That makes apples-to-apples comparisons between the two modes hard.
  2. The contributor development guide currently has a one-paragraph Benchmarking section that links out to benchmarks/README.md. The README covers cluster setup but not the day-to-day contributor workflow: which subcommand to use, how to override configs, how to read the metrics, and how to capture a flame graph when something is unexpectedly slow.

What changes are included in this PR?

  • Code: benchmarks/src/bin/tpch.rs adds a repeatable -c/--config key=value flag to the datafusion subcommand, mirroring the one already on ballista. Overrides are applied to the SessionConfig before constructing the SessionContext. Invalid keys log a warning and are skipped, matching the existing behavior on the ballista path.
  • Docs: New page docs/source/contributors-guide/benchmarking.md covering:
    • Generating TPC-H input with tpchgen-rs
    • Running the tpch binary in benchmark datafusion (in-process) and benchmark ballista (against a cluster) modes, with notes on picking a representative single query while iterating
    • Setting session configs via -c key=value, with a table of commonly tuned datafusion.* and ballista.* keys
    • Reading metrics and a pointer to the existing Metrics user guide
    • Profiling with cargo flamegraph and samply
  • The Benchmarking section in contributors-guide/development.md now points at the new page and keeps the link to benchmarks/README.md
  • The new page is registered in the index.rst toctree under Contributors Guide

Are there any user-facing changes?

tpch benchmark datafusion accepts a new -c/--config flag. No existing flags or defaults change, so prior invocations behave the same.

@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Apr 30, 2026
Mirrors the existing flag on the ballista subcommand and lets contributors
override DataFusion session config keys on the command line. Document the
flag and the commonly tuned keys in the new contributor benchmarking guide.
@andygrove andygrove changed the title docs: add benchmarking guide for contributors feat(benchmarks): add --config to tpch datafusion + contributor guide Apr 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant