Skip to content

ci(boilerplate): enhance copyright boilerplate enforcement#3665

Draft
tariq-hasan wants to merge 2 commits into
kubeflow:masterfrom
tariq-hasan:ci/enhance-boilerplate-enforcement
Draft

ci(boilerplate): enhance copyright boilerplate enforcement#3665
tariq-hasan wants to merge 2 commits into
kubeflow:masterfrom
tariq-hasan:ci/enhance-boilerplate-enforcement

Conversation

@tariq-hasan

@tariq-hasan tariq-hasan commented Jun 30, 2026

Copy link
Copy Markdown
Member

What this PR does / why we need it:

The boilerplate verifier mapped only Go, Rust, shell, Python and Dockerfiles to a template. This PR extends boilerplate support for YAML and YML. The file types supported include Helm templates and controller-gen generated manifests.

In addition this PR updates the decision logic for copyright enforcement. The following is the proposed decision logic. This enforcement only applies for changed/staged/committed files in working branch:

  • not in master → enforce full check (year-less required)
  • in master, no header → enforce full check
  • in master, has header → enforce header still present
  • deleted / relocated-with-header → skip

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #

Checklist:

  • Docs included if any changes are user facing

@github-actions

Copy link
Copy Markdown

🎉 Welcome to the Kubeflow Trainer! 🎉

Thanks for opening your first PR! We're happy to have you as part of our community 🚀

Here's what happens next:

  • If you haven't already, please check out our Contributing Guide for repo-specific guidelines and the Kubeflow Contributor Guide for general community standards.
  • Our team will review your PR soon! cc @kubeflow/kubeflow-trainer-team

Join the community:

Feel free to ask questions in the comments if you need any help or clarification!
Thanks again for contributing to Kubeflow! 🙏

@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign gaocegege for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tariq-hasan tariq-hasan force-pushed the ci/enhance-boilerplate-enforcement branch 2 times, most recently from 08cfd88 to 847b79a Compare June 30, 2026 02:45
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
@tariq-hasan tariq-hasan force-pushed the ci/enhance-boilerplate-enforcement branch from 847b79a to cfecaa8 Compare June 30, 2026 02:58
@tariq-hasan tariq-hasan changed the title ci: enhance boilerplate enforcement ci(boilerplate): enforce copyright headers on YAML files Jun 30, 2026
@tariq-hasan tariq-hasan changed the title ci(boilerplate): enforce copyright headers on YAML files ci(boilerplate): enhance copyright boilerplate enforcement Jun 30, 2026
Signed-off-by: tariq-hasan <mmtariquehsn@gmail.com>
@Sridhar1030

Copy link
Copy Markdown
Member

/assign

@andreyvelich andreyvelich left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work on tightening up the boilerplate enforcement, @tariq-hasan! I left a few inline comments. This review was performed with the help of AI tools.

"""Return the merge-base of master and HEAD, or None if unavailable."""
try:
result = subprocess.run(
["git", "-C", rootdir, "merge-base", "master", "HEAD"],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge-base master HEAD uses the local master branch, which doesn't exist in CI's detached PR checkout — so this fails and the whole check becomes a no-op. Confirmed in this PR's own CI log: "could not find a merge-base with master; no files were checked" → passed. Suggest comparing against the PR base ref / origin/master (the helm-lint step already threads TARGET_BRANCH for exactly this).

else:
# Default: decide per file changed relative to master.
base = merge_base_with_master(rootdir)
if base is None:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When base is None, this falls through to return 0, so a guard that couldn't establish its baseline reports success. Suggest return 1 here — an unobtainable baseline should fail CI, not pass silently.

check=True,
)
except (subprocess.CalledProcessError, FileNotFoundError) as e:
print(f"WARNING: could not diff against {base}: {e}", file=sys.stderr)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A git diff failure returns [] → "no files checked" → green. Suggest propagating this as a hard failure instead of an empty result, so a broken diff can't pass vacuously.

for status, path, old_path in changed_files(base, rootdir) if base else []:
if template_stem_for(path) is None:
continue # not a checkable file type
if status != "D" and not os.path.isfile(path):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

path is repo-relative (from git diff) but is passed straight to os.path.isfile/file_passes — this only works when cwd is the repo root. Suggest os.path.join(rootdir, path) for disk access (keep path for reporting), matching the explicit-filename branch above.

text=True,
check=True,
)
except (subprocess.CalledProcessError, FileNotFoundError):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None here means both "absent at ref" (benign) and "git errored" (a real problem); had_copyright_at collapses both to "no header," silently changing which policy branch runs. Consider distinguishing genuine absence from a git error.

"""Files changed since base in the working tree (committed/staged/unstaged).

Returns (status, path, old_path) tuples; old_path is set only for renames.
Rename detection is enabled (-M).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says -M, but the code uses -M90%. Worth noting the 90% threshold here: rename+edits below it are reported as add/delete and treated as new files requiring the year-less header.

# Maps file extension (or special basename) to the boilerplate template
# stem on disk (boilerplate.<stem>.txt). One template is reused across
# all hash-comment languages.
# all hash-comment languages (sh, bash, py, Dockerfile, yaml, yml).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment lists only the hash-comment languages, but the dict below also maps helm/gotmpl to a different template (boilerplate.helm.txt). Suggest scoping the sentence to the sh entries and mentioning the helm stem.

@andreyvelich

Copy link
Copy Markdown
Member

In addition this PR updates the decision logic for copyright enforcement. The following is the proposed decision logic. This enforcement only applies for changed/staged/committed files in working branch:

@tariq-hasan Do you see any problems to just always verify all files on every PR submitted?

@tariq-hasan

tariq-hasan commented Jun 30, 2026

Copy link
Copy Markdown
Member Author

In addition this PR updates the decision logic for copyright enforcement. The following is the proposed decision logic. This enforcement only applies for changed/staged/committed files in working branch:

@tariq-hasan Do you see any problems to just always verify all files on every PR submitted?

We do not maintain an explicit list of which year maps to which file so for all files (old or new) we can strip the year from the copyright header and match the normalized header against the boilerplate template. This check we already currently enforce in the CI.

And for new files specifically we can enforce an additional check to enforce year-less copyright header - this second check would require a diff against master. This will be the new enforcement in the code.

I can update the logic in the implementation if this would be a better approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants