ci(boilerplate): enhance copyright boilerplate enforcement#3665
ci(boilerplate): enhance copyright boilerplate enforcement#3665tariq-hasan wants to merge 2 commits into
Conversation
|
🎉 Welcome to the Kubeflow Trainer! 🎉 Thanks for opening your first PR! We're happy to have you as part of our community 🚀 Here's what happens next:
Join the community:
Feel free to ask questions in the comments if you need any help or clarification! |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
08cfd88 to
847b79a
Compare
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
847b79a to
cfecaa8
Compare
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
|
/assign |
andreyvelich
left a comment
There was a problem hiding this comment.
Thanks for the work on tightening up the boilerplate enforcement, @tariq-hasan! I left a few inline comments. This review was performed with the help of AI tools.
| """Return the merge-base of master and HEAD, or None if unavailable.""" | ||
| try: | ||
| result = subprocess.run( | ||
| ["git", "-C", rootdir, "merge-base", "master", "HEAD"], |
There was a problem hiding this comment.
merge-base master HEAD uses the local master branch, which doesn't exist in CI's detached PR checkout — so this fails and the whole check becomes a no-op. Confirmed in this PR's own CI log: "could not find a merge-base with master; no files were checked" → passed. Suggest comparing against the PR base ref / origin/master (the helm-lint step already threads TARGET_BRANCH for exactly this).
| else: | ||
| # Default: decide per file changed relative to master. | ||
| base = merge_base_with_master(rootdir) | ||
| if base is None: |
There was a problem hiding this comment.
When base is None, this falls through to return 0, so a guard that couldn't establish its baseline reports success. Suggest return 1 here — an unobtainable baseline should fail CI, not pass silently.
| check=True, | ||
| ) | ||
| except (subprocess.CalledProcessError, FileNotFoundError) as e: | ||
| print(f"WARNING: could not diff against {base}: {e}", file=sys.stderr) |
There was a problem hiding this comment.
A git diff failure returns [] → "no files checked" → green. Suggest propagating this as a hard failure instead of an empty result, so a broken diff can't pass vacuously.
| for status, path, old_path in changed_files(base, rootdir) if base else []: | ||
| if template_stem_for(path) is None: | ||
| continue # not a checkable file type | ||
| if status != "D" and not os.path.isfile(path): |
There was a problem hiding this comment.
path is repo-relative (from git diff) but is passed straight to os.path.isfile/file_passes — this only works when cwd is the repo root. Suggest os.path.join(rootdir, path) for disk access (keep path for reporting), matching the explicit-filename branch above.
| text=True, | ||
| check=True, | ||
| ) | ||
| except (subprocess.CalledProcessError, FileNotFoundError): |
There was a problem hiding this comment.
None here means both "absent at ref" (benign) and "git errored" (a real problem); had_copyright_at collapses both to "no header," silently changing which policy branch runs. Consider distinguishing genuine absence from a git error.
| """Files changed since base in the working tree (committed/staged/unstaged). | ||
|
|
||
| Returns (status, path, old_path) tuples; old_path is set only for renames. | ||
| Rename detection is enabled (-M). |
There was a problem hiding this comment.
Docstring says -M, but the code uses -M90%. Worth noting the 90% threshold here: rename+edits below it are reported as add/delete and treated as new files requiring the year-less header.
| # Maps file extension (or special basename) to the boilerplate template | ||
| # stem on disk (boilerplate.<stem>.txt). One template is reused across | ||
| # all hash-comment languages. | ||
| # all hash-comment languages (sh, bash, py, Dockerfile, yaml, yml). |
There was a problem hiding this comment.
This comment lists only the hash-comment languages, but the dict below also maps helm/gotmpl to a different template (boilerplate.helm.txt). Suggest scoping the sentence to the sh entries and mentioning the helm stem.
@tariq-hasan Do you see any problems to just always verify all files on every PR submitted? |
We do not maintain an explicit list of which year maps to which file so for all files (old or new) we can strip the year from the copyright header and match the normalized header against the boilerplate template. This check we already currently enforce in the CI. And for new files specifically we can enforce an additional check to enforce year-less copyright header - this second check would require a diff against master. This will be the new enforcement in the code. I can update the logic in the implementation if this would be a better approach. |
What this PR does / why we need it:
The boilerplate verifier mapped only Go, Rust, shell, Python and Dockerfiles to a template. This PR extends boilerplate support for YAML and YML. The file types supported include Helm templates and controller-gen generated manifests.
In addition this PR updates the decision logic for copyright enforcement. The following is the proposed decision logic. This enforcement only applies for changed/staged/committed files in working branch:
Which issue(s) this PR fixes (optional, in
Fixes #<issue number>, #<issue number>, ...format, will close the issue(s) when PR gets merged):Fixes #
Checklist: