feat: add Spark commit audit process#4206
Draft
andygrove wants to merge 9 commits intoapache:mainfrom
Draft
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Closes 4188
Rationale for this change
Comet emulates Spark behavior across many subsystems: expressions, the optimizer, Parquet read and write, shuffle, joins, aggregates, and more. When Spark changes behavior on
master, Comet may need to follow. Today there is no documented, repeatable process for the community to notice those changes commit-by-commit. This PR introduces that process so the project can stay aware of upstream Spark changes sincebranch-4.2was cut and not silently diverge.The work was scaffolded with the project
superpowers:brainstormingskill, with the spec and plan kept on disk only.What changes are included in this PR?
docs/source/contributor-guide/spark_commit_audit.md: human-facing process page with rubric, scope, states, and workflow. Linked from the contributor guide index.dev/spark-commit-audit.md: the audit log itself, populated with the 2 in-scopesql/commits onapache/sparkmastersincebranch-4.2was cut. Each line carries a short hash, date, state, and subject.dev/regenerate-spark-audit.py: bootstrap and incremental update script. Idempotent; preserves existing verdicts and prose notes by short hash. Reuses the existingdev/release/venv(PyGithub).dev/test_regenerate_spark_audit.py: 15 unit tests over the script's pure helpers (parse_existing_block,format_new_line,is_in_scope,merge_lines,replace_block)..claude/skills/audit-spark-commit/SKILL.md: thin Claude skill that audits one commit per invocation, reads the contributor guide for the rubric, proposes a verdict, and updates the audit log line in place after user review.How are these changes tested?
python3 dev/test_regenerate_spark_audit.py: 15 unit tests over the script's pure functions, all pass.python dev/regenerate-spark-audit.py --dry-run --limit 5, then full bootstrap.