Skip to content

Docs: Document adaptive split sizing configurations#16557

Open
pratham76 wants to merge 1 commit into
apache:mainfrom
pratham76:issue-16556
Open

Docs: Document adaptive split sizing configurations#16557
pratham76 wants to merge 1 commit into
apache:mainfrom
pratham76:issue-16556

Conversation

@pratham76
Copy link
Copy Markdown

Documenting the adaptive split sizing configurations that were added in #16088 but were missed to be documented.

Closes #16556

@github-actions github-actions Bot added the docs label May 24, 2026
@pratham76
Copy link
Copy Markdown
Author

Local Rendering:
Screenshot 2026-05-24 at 3 18 40 PM

@pratham76
Copy link
Copy Markdown
Author

@RussellSpitzer @karuppayya Could you PTAL? Thanks!

@pratham76
Copy link
Copy Markdown
Author

@kevinjqliu @huaxingao Could you take a look. Thanks!

Comment thread docs/docs/spark-configuration.md Outdated
| spark.sql.iceberg.merge-schema | false | Enables modifying the table schema to match the write schema. Only adds columns missing columns |
| spark.sql.iceberg.report-column-stats | true | Report Puffin Table Statistics if available to Spark's Cost Based Optimizer. CBO must be enabled for this to be effective |
| spark.sql.iceberg.read.adaptive-split-size.enabled | Table default | Enables adaptive split sizing for read operations. When enabled, split size is automatically adjusted based on scan size and parallelism |
| spark.sql.iceberg.read.adaptive-split-size.parallelism | max(spark.default.parallelism, spark.sql.shuffle.partitions) | Overrides the parallelism used for adaptive split sizing. Must be greater than 0 |
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than documenting spark config, I would just say "Spark's default parallelism"

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @RussellSpitzer , have modified.
Updated Local rendering.
Screenshot 2026-05-27 at 2 54 52 AM

Copy link
Copy Markdown
Author

@pratham76 pratham76 May 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On another thought, the default value does not exactly correspond to spark's default parallelism in this case, as it is maximum of both values spark.default.parallelism and spark.sql.shuffle.partitions, thought of documenting it explicitly. Please do provide your thoughts on this. Thanks!

@pratham76 pratham76 force-pushed the issue-16556 branch 3 times, most recently from 7aeaca5 to 61b4540 Compare May 26, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Require Documentation for adaptive split sizing configurations

3 participants