OAPE-696: Add E2E coverage reporting with Codecov integration#111
OAPE-696: Add E2E coverage reporting with Codecov integration#111PillaiManish wants to merge 2 commits into
Conversation
Add coverage-instrumented build and collection scripts for E2E test coverage reporting. Uses a no-PVC approach: SIGUSR1 flushes coverage data from the running operator pod, which is then copied out via oc cp and uploaded to Codecov. Co-authored-by: Cursor <cursoragent@cursor.com>
WalkthroughAdded E2E coverage instrumentation infrastructure for the secrets-store-csi-driver-operator. Includes a coverage-instrumented container image build via a new Dockerfile, Makefile targets for building/pushing the image and collecting coverage artifacts, and a lifecycle script that patches the operator's CSV, orchestrates pod rollouts, flushes coverage, and optionally uploads coverage reports to Codecov. ChangesE2E Coverage Instrumentation and Collection
Sequence DiagramsequenceDiagram
participant User as User/CI
participant K8s as Kubernetes<br/>(CSV, Deployment)
participant Pod as Operator Pod<br/>(coverage-instrumented)
participant LocalFS as Local Filesystem<br/>(artifacts)
participant Codecov as Codecov Service
User->>K8s: setup: patch CSV<br/>replace image, set GOCOVERDIR
K8s->>Pod: rollout deployment<br/>new pod with coverage binary
Pod->>Pod: run operator<br/>collect coverage data
User->>Pod: collect: send SIGTERM<br/>flush coverage to disk
Pod->>Pod: restart (ready again)
Pod->>LocalFS: copy coverage artifacts<br/>from /tmp/e2e-cover
LocalFS->>LocalFS: convert via go tool covdata<br/>generate coverage-e2e.out
alt CODECOV_TOKEN present
LocalFS->>Codecov: download uploader<br/>(with SHA256/PGP verify)
LocalFS->>Codecov: upload coverage report<br/>(non-fatal on failure)
Codecov-->>LocalFS: report acknowledged
end
LocalFS->>User: coverage report ready<br/>in ARTIFACT_DIR
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 11 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (11 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: PillaiManish The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@PillaiManish: This pull request references OAPE-696 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the sub-task to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@cmd/secrets-store-csi-driver-operator/coverage_flush.go`:
- Around line 14-16: The GOCOVERDIR creation currently uses os.MkdirAll(dir,
0o777) and ignores errors; change this to create the directory with restrictive
permissions (e.g., 0o700 or 0o755 as appropriate) and handle the returned error
from os.MkdirAll instead of discarding it: call os.MkdirAll(dir, 0o700), check
the error, and propagate or log/fail fast (using the existing operator logger or
by returning the error) so coverage flush code won’t proceed when directory
creation fails; update the block that reads the GOCOVERDIR env var and the
os.MkdirAll call accordingly.
In `@hack/e2e-coverage.sh`:
- Around line 93-95: The current send of SIGUSR1 uses a hardcoded PID 1 (the
line invoking oc exec ... 'kill -USR1 1'), which can be wrong if the operator
isn't PID 1; update the script to locate the actual operator process inside the
container (using pidof or pgrep) and send SIGUSR1 to that PID, falling back to
PID 1 if no process is found. Inside the block that uses "${NAMESPACE}" and
"${pod}", run something like pidof <operator-binary-name> || pgrep -f
<operator-binary-name> to capture the PID(s) and then call kill -USR1 "$PID" (or
kill -USR1 1 if the lookup returns empty) so the script remains compatible with
single-binary containers while handling wrapper/init cases. Ensure the lookup
happens inside the oc exec command so the PID refers to the container namespace.
In `@Makefile`:
- Around line 80-85: The build-coverage target is generating invalid Go flag
syntax by naively concatenating "$(GO_BUILD_FLAGS),e2ecoverage" and the
suggested "-tags e2ecoverage" breaks FIPS by producing duplicate -tags; fix by
producing a single -tags value for the coverage build: detect if GO_BUILD_FLAGS
already contains a -tags clause and if so append ",e2ecoverage" to that existing
tag list, otherwise add a new "-tags e2ecoverage" flag; implement this logic in
the Makefile when constructing flags for the build-coverage target (use
GO_BUILD_FLAGS and create/compute a temporary coverage tags variable or a
modified FLAGS variable) so the final go build invocation has at most one -tags
flag and includes e2ecoverage.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f4068362-35f1-4771-9630-6d924234b1da
📒 Files selected for processing (4)
Dockerfile.coverageMakefilecmd/secrets-store-csi-driver-operator/coverage_flush.gohack/e2e-coverage.sh
Remove coverage_flush.go and use SIGTERM to flush coverage data instead of SIGUSR1. Container restarts after SIGTERM, emptyDir preserves coverage files, and oc cp runs from the restarted container. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
Dockerfile.coverage (1)
6-9: ⚡ Quick winDrop root default in the coverage runtime image.
The runtime stage has no explicit
USER, so it defaults to root. Please set a non-root user for defense-in-depth.Proposed hardening
FROM registry.ci.openshift.org/ocp/4.22:base-rhel9 COPY --from=builder /go/src/github.com/openshift/secrets-store-csi-driver-operator/secrets-store-csi-driver-operator /usr/bin/ ENV GOCOVERDIR=/tmp/e2e-cover +USER 65532 ENTRYPOINT ["/bin/sh", "-c", "mkdir -p /tmp/e2e-cover && exec /usr/bin/secrets-store-csi-driver-operator \"$@\"", "--"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Dockerfile.coverage` around lines 6 - 9, The runtime stage in Dockerfile.coverage defaults to root; add a non-root user and switch to it before ENTRYPOINT to improve hardening: create a dedicated user/group (or use an existing non-root UID), ensure ownership/permissions of /tmp/e2e-cover and the binary at /usr/bin/secrets-store-csi-driver-operator are adjusted (chown/chmod) so the non-root user can write to GOCOVERDIR and execute the binary, then add a USER instruction before ENTRYPOINT to run the container as that non-root user.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@Dockerfile.coverage`:
- Around line 6-9: The runtime stage in Dockerfile.coverage defaults to root;
add a non-root user and switch to it before ENTRYPOINT to improve hardening:
create a dedicated user/group (or use an existing non-root UID), ensure
ownership/permissions of /tmp/e2e-cover and the binary at
/usr/bin/secrets-store-csi-driver-operator are adjusted (chown/chmod) so the
non-root user can write to GOCOVERDIR and execute the binary, then add a USER
instruction before ENTRYPOINT to run the container as that non-root user.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 67bd0df7-10b1-406d-8f94-fae8b631578c
📒 Files selected for processing (3)
Dockerfile.coverageMakefilehack/e2e-coverage.sh
🚧 Files skipped from review as they are similar to previous changes (1)
- hack/e2e-coverage.sh
|
/retest |
|
/label tide/merge-method-squash |
|
/retest |
2 similar comments
|
/retest |
|
/retest |
chiragkyal
left a comment
There was a problem hiding this comment.
LGTM, with a few questions.
/cc @mytreya-rh
| fi | ||
| echo "Found CSV: ${csv}" | ||
|
|
||
| echo "Patching CSV with coverage image..." |
There was a problem hiding this comment.
Which image is it exactly replacing?
There was a problem hiding this comment.
It replaces the standard secrets-store-csi-driver-operator image (built from Dockerfile.openshift via make → normal go build) with the coverage-instrumented image (built from Dockerfile.coverage via make build-coverage → go build -cover -covermode=atomic -coverpkg=./...).
Both produce the same binary at /usr/bin/secrets-store-csi-driver-operator, but the coverage version is compiled with Go's coverage instrumentation. The JSON patch targets /spec/install/spec/deployments[0]/spec/template/spec/containers[0]/image — the first container of the first deployment in the CSV (which is the operator container).
The coverage image additionally sets GOCOVERDIR=/tmp/e2e-cover and wraps the entrypoint with mkdir -p /tmp/e2e-cover so coverage data is written to disk during execution.
| -o jsonpath='{.spec.install.spec.deployments[0].spec.template.spec.containers[0].env[?(@.name=="GOCOVERDIR")].name}' 2>/dev/null) | ||
| if [[ -z "${has_gocoverdir}" ]]; then | ||
| echo "Adding GOCOVERDIR env var to CSV..." | ||
| oc patch csv "${csv}" -n "${NAMESPACE}" --type=json -p "[ |
There was a problem hiding this comment.
I believe CSV patch won't be overwritten by OLM.
There was a problem hiding this comment.
Correct — this is safe. OLM does not reconcile/overwrite the CSV's spec.install.spec.deployments once the CSV is in the Succeeded phase. OLM creates the deployment from the CSV spec, and the CSV is essentially static after that. The JSON patch triggers OLM to update the deployment with the new image and env var, which is exactly what we want.
The only risk would be if OLM reinstalled/upgraded the operator during the test, but in a CI e2e run that doesn't happen.
| local job_type="${JOB_TYPE:-local}" | ||
| if [[ "${job_type}" == "presubmit" ]]; then | ||
| echo "Detected presubmit (PR #${PULL_NUMBER:-unknown})" | ||
| [[ -n "${PULL_NUMBER:-}" ]] && codecov_args+=(--pr "${PULL_NUMBER}") | ||
| [[ -n "${PULL_PULL_SHA:-}" ]] && codecov_args+=(--sha "${PULL_PULL_SHA}") | ||
| [[ -n "${PULL_BASE_REF:-}" ]] && codecov_args+=(--branch "${PULL_BASE_REF}") | ||
| [[ -n "${REPO_OWNER:-}" && -n "${REPO_NAME:-}" ]] && codecov_args+=(--slug "${REPO_OWNER}/${REPO_NAME}") |
There was a problem hiding this comment.
How are we defining these env vars?
There was a problem hiding this comment.
These are not defined by us — they are standard Prow job environment variables automatically injected into every CI job container by Prow:
| Variable | Description |
|---|---|
JOB_TYPE |
presubmit, postsubmit, or periodic |
PULL_NUMBER |
PR number (presubmit only) |
PULL_PULL_SHA |
HEAD commit SHA of the PR |
PULL_BASE_REF |
Base branch name (e.g., main) |
PULL_BASE_SHA |
Base branch commit SHA |
REPO_OWNER |
GitHub org (e.g., openshift) |
REPO_NAME |
GitHub repo name |
The script uses them to provide correct PR/commit context to the Codecov uploader so coverage reports show up on the right PR. When running locally (JOB_TYPE is unset, defaults to local), these are all skipped and Codecov auto-detects from git.
There was a problem hiding this comment.
Did we check in any rehearsal whether they are getting populated correctly?
There was a problem hiding this comment.
Yes — these Prow env vars are standard and well-established. They're populated for every CI job automatically. You can verify in any Prow job log by looking at the environment. For example, in the rehearsal run from the release repo PR, the Prow container env includes JOB_TYPE=presubmit, PULL_NUMBER, REPO_OWNER=openshift, etc.
That said, the Codecov upload only happens if CODECOV_TOKEN is set (from the mounted secret). In the rehearsal, the upload was skipped since the secret wasn't available in the rehearsal namespace — which is expected and harmless. The env vars themselves are always present; they're part of Prow's core infrastructure, documented at https://docs.prow.k8s.io/docs/jobs/#job-environment-variables.
There was a problem hiding this comment.
Are these Make commands meant to be used locally?
There was a problem hiding this comment.
Yes, the Makefile targets support both CI and local use:
make build-coverage— builds the coverage-instrumented binary. Used byDockerfile.coveragein CI, but can also be run locally.make docker-build-coverage— builds the coverage Docker image locally (convenience for dev testing).make docker-push-coverage— pushes the coverage image to a registry (local workflow only).make e2e-coverage-collect— runshack/e2e-coverage.sh collectlocally, for collecting coverage after you've run e2e tests against a dev cluster with the coverage image deployed.
In CI, only build-coverage is used directly (inside Dockerfile.coverage). The other targets are convenience wrappers for developers who want to test the coverage flow against a dev cluster without going through the full CI pipeline.
There was a problem hiding this comment.
Except make build-coverage, others are not used in CI. If they are for local testing, I think some documentation on how to run and verify coverage would help.
There was a problem hiding this comment.
Good point. I'll add a section to the README documenting how to run and verify coverage locally. Something like:
## E2E Coverage (Local)
1. Build and push the coverage image:
make docker-build-coverage docker-push-coverage
2. Deploy the operator with the coverage image on your dev cluster:
hack/e2e-coverage.sh setup
3. Run e2e tests:
make test-e2e
4. Collect coverage:
make e2e-coverage-collect
Will add this in the next push.
|
/retest |
|
/retest-required |
|
/retest |
|
/test operator-e2e-fips |
|
/retest-required |
|
@PillaiManish: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Dockerfile.coverage) that builds the operator with Go's-coverflags and FIPS compliancehack/e2e-coverage.shwithsetupandcollectsubcommands for CI and local usebuild-coverage,docker-build-coverage,docker-push-coverage,e2e-coverage-collect)How it works
Dockerfile.coverage(usesmake build-coveragewhich adds-cover -covermode=atomic -coverpkg=./...)hack/e2e-coverage.sh setuppatches the live CSV to swap in the coverage image and setGOCOVERDIR=/tmp/e2e-coverhack/e2e-coverage.sh collectsends SIGTERM to flush coverage data (Go runtime writes on clean exit), waits for container restart, copies data viaoc cp, converts to Go profile, and uploads to CodecovCoverage data survives the container restart because the operator's CSV already defines an emptyDir volume at
/tmp(medium: Memory), which persists across container restarts within the same pod. No PVC or extractor pod is needed.Files
Dockerfile.coverageDockerfile.openshiftbut builds with coverage flagshack/e2e-coverage.shMakefile(appended)Tested
-cover -covermode=atomic -coverpkg=./...pkg/operator64.5%,assets100%,pkg/version100%Follow-up (separate PR)
sscsi-ci-secrets/codecov-token