From c3d77697ffb3f37cfca6c2c196b48aa5518eafa7 Mon Sep 17 00:00:00 2001 From: siddhi bhor Date: Mon, 8 Jun 2026 12:47:14 +0530 Subject: [PATCH 1/3] Adds agents.md --- AGENTS.md | 204 ++++++++++++++++++++++++++++++ CLAUDE.md | 51 ++++++++ docs/api-contracts-guidelines.md | 157 +++++++++++++++++++++++ docs/error-handling-guidelines.md | 137 ++++++++++++++++++++ docs/integration-guidelines.md | 95 ++++++++++++++ docs/performance-guidelines.md | 92 ++++++++++++++ docs/security-guidelines.md | 125 ++++++++++++++++++ docs/testing-guidelines.md | 124 ++++++++++++++++++ 8 files changed, 985 insertions(+) create mode 100644 AGENTS.md create mode 100644 CLAUDE.md create mode 100644 docs/api-contracts-guidelines.md create mode 100644 docs/error-handling-guidelines.md create mode 100644 docs/integration-guidelines.md create mode 100644 docs/performance-guidelines.md create mode 100644 docs/security-guidelines.md create mode 100644 docs/testing-guidelines.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 000000000..c1c183842 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,204 @@ +# AGENTS.md + +## Docs Index + +Detailed domain-specific guidelines are in these files -- read them before working in the corresponding area: + +- [docs/security-guidelines.md](docs/security-guidelines.md) -- Container security, RBAC, TLS, network policies, webhook security +- [docs/performance-guidelines.md](docs/performance-guidelines.md) -- Cache architecture, watch predicates, reconciliation patterns, drift detection +- [docs/error-handling-guidelines.md](docs/error-handling-guidelines.md) -- ReconcileError types, retry logic, status conditions, logging +- [docs/api-contracts-guidelines.md](docs/api-contracts-guidelines.md) -- CRD types, kubebuilder markers, CEL validation, code generation +- [docs/testing-guidelines.md](docs/testing-guidelines.md) -- Unit/API/E2E test tiers, frameworks, test helpers, CI integration +- [docs/integration-guidelines.md](docs/integration-guidelines.md) -- Bindata pattern, cert-manager, OpenShift platform, Bitwarden plugin + +## Project Overview + +This is a Kubernetes operator (built with kubebuilder/controller-runtime) that deploys and manages the upstream [external-secrets](https://github.com/openshift/external-secrets) application on OpenShift clusters. The operator does not embed upstream code -- it manages upstream resources as static YAML manifests (bindata) applied imperatively. + +Three controllers run in a single binary: + +| Controller | Package | Watches | Purpose | +|---|---|---|---| +| `external-secrets-controller` | `pkg/controller/external_secrets/` | `ExternalSecretsConfig` CR + all managed resources | Installs/reconciles operand deployments, RBAC, services, webhooks, network policies | +| `external-secrets-manager` | `pkg/controller/external_secrets_manager/` | `ExternalSecretsManager` CR + `ExternalSecretsConfig` status | Aggregates controller statuses into a global status CR | +| `crd-annotator` | `pkg/controller/crd_annotator/` | ESO CRDs + `ExternalSecretsConfig` | Adds cert-manager CA injection annotations to operand CRDs (conditional) | + +## Project Structure + +``` +api/v1alpha1/ CRD type definitions, conditions, shared types, deepcopy +api/v1alpha1/tests/ Declarative YAML test suites for CRD validation +bindata/ Static operand YAML manifests (compiled into Go via go-bindata) +cmd/ Operator entrypoint (main.go) +config/ Kustomize manifests (CRDs, RBAC, manager, samples, bundle) +hack/ Shell scripts (codegen, verification, CI helpers) +images/ci/ CI Dockerfiles (coverage-instrumented builds) +pkg/controller/ Controller implementations + client/ CtrlClient interface + counterfeiter fakes + common/ Shared utilities (errors, constants, decode helpers, drift detection) + commontest/ Shared test fixtures (TestExternalSecretsConfig, TestExternalSecretsManager) + crd_annotator/ CRD annotation controller + external_secrets/ Main operand reconciliation controller + external_secrets_manager/ Status aggregation controller +pkg/operator/ Manager setup, controller registration +pkg/version/ Build-time version info (ldflags) +test/apis/ API integration tests (Ginkgo + envtest) +test/e2e/ End-to-end tests (Ginkgo + live cluster) +test/utils/ E2E test helpers +tools/ Go module for build-time tool dependencies +``` + +## Go Workspace and Module Layout + +The repo uses `go.work` with four modules: `.`, `./cmd/external-secrets-operator`, `./test`, `./tools`. This means: + +- `GOFLAGS` is cleared for test and fmt targets to avoid `-mod=vendor` conflicting with `go.work`. +- When adding a dependency, use `make update-dep PKG=pkg@version` to update across all modules, then `make update-vendor`. +- Vendoring is workspace-level (`go work vendor`), not per-module. +- All build-time tools (controller-gen, golangci-lint, ginkgo, etc.) are vendored and built from source via `go build -mod=vendor`. + +## Build System (Key Makefile Targets) + +| Target | What it does | +|---|---| +| `make build` | Full build: generate + manifests + fmt + vet + compile binary | +| `make build-operator` | Compile binary only (no codegen, fastest iteration) | +| `make test` | Run unit + API integration tests (no cluster needed) | +| `make test-unit` | Unit tests only | +| `make test-apis` | API validation tests via envtest | +| `make test-e2e` | E2E tests against a live cluster | +| `make lint` | Run golangci-lint with all configured linters | +| `make lint-fix` | Run golangci-lint with auto-fix | +| `make update` | Full regeneration: codegen + manifests + operand manifests + bindata + bundle + docs | +| `make verify` | Run vet + fmt + verify-deps + verify-bindata + verify-generated + govulncheck + check-git-diff | +| `make update-vendor` | Update vendor directory across all workspace modules | +| `make update-dep PKG=x@v` | Update a single dependency across all modules | +| `make manifests` | Regenerate CRD YAML, RBAC, webhook configs from kubebuilder markers | +| `make generate` | Regenerate DeepCopy methods | +| `make docs` | Regenerate API reference docs | +| `make clean` | Remove bin/, _output/, cover.out | + +After any code change, run `make update && make verify` to ensure all generated files are consistent. CI runs `check-git-diff` which fails if generated files are stale. + +## Code Style and Formatting + +### Import Order + +Imports must follow the order enforced by `gci` in `.golangci.yml`: + +1. Standard library +2. Third-party packages +3. `github.com/openshift/external-secrets-operator` (project-local) +4. Blank imports, dot imports, aliases, local module + +### Linting + +The repo uses golangci-lint v2 with a comprehensive set of linters (see `.golangci.yml`). Key rules: + +- **depguard** blocks `math/rand` (use `math/rand/v2`), `github.com/sirupsen/logrus`, and `github.com/pkg/errors` (use `errors`/`fmt`). +- **kubeapilinter** runs only on `api/v1alpha1/*` files. Use `//nolint:kubeapilinter` with a comment for intentional deviations. +- **golines** max line length is 200 characters. +- **gofmt** rewrites `interface{}` to `any` and `a[b:len(a)]` to `a[b:]`. +- Generated files are excluded in `lax` mode. +- Test files have relaxed rules for `gocyclo`, `errcheck`, `gosec`, `forcetypeassert`, and others. + +### File Headers + +All `.go` files must include the Apache 2.0 license header from `hack/boilerplate.go.txt` (year 2025). + +### FIPS Build + +Production builds use `hack/go-fips.sh` which enables `GOEXPERIMENT=strictfipsruntime` and build tags `strictfipsruntime,openssl` when the Go compiler supports it. Local dev builds without FIPS work but cannot be used in CI/production. + +## Naming Conventions + +### Go Packages + +Controller packages use `snake_case`: `external_secrets`, `external_secrets_manager`, `crd_annotator`. This matches kubebuilder conventions. + +### Constants + +- Controller names: `"external-secrets-controller"`, `"external-secrets-manager"`, `"crd-annotator"` (kebab-case in strings). +- Finalizer format: `./` (e.g., `externalsecretsconfigs.operator.openshift.io/external-secrets-controller`). +- Asset name constants: `_AssetName` in camelCase (e.g., `controllerDeploymentAssetName`). +- Environment variable constants: all-caps with suffix `EnvVarName` (e.g., `externalSecretsImageEnvVarName`). + +### Bindata YAML Files + +Files in `bindata/external-secrets/resources/` follow the pattern: `_.yml` (e.g., `deployment_external-secrets.yml`, `clusterrole_external-secrets-controller.yml`). Network policies in `bindata/external-secrets/` use `.yaml` extension. + +### CRD Object Names + +Both `ExternalSecretsConfig` and `ExternalSecretsManager` are singletons named `"cluster"` (enforced by CEL). Constants `ExternalSecretsConfigObjectName` and `ExternalSecretsManagerObjectName` in `pkg/controller/common/constants.go` hold this value. + +## Architectural Patterns + +### Reconciler Structure + +Every controller follows the same pattern: + +1. A `Reconciler` struct embedding `operatorclient.CtrlClient`. +2. A `New(ctx, mgr)` constructor that builds the reconciler and client(s). +3. A `SetupWithManager(mgr)` method that wires watches, predicates, and map functions. +4. A `Reconcile(ctx, req)` method that fetches the primary CR and delegates to `processReconcileRequest`. +5. An `updateStatus(ctx, obj)` method using `retry.RetryOnConflict`. + +### CtrlClient Interface + +All controllers interact with Kubernetes through `pkg/controller/client.CtrlClient`, not the raw `controller-runtime client.Client`. This interface adds `UpdateWithRetry`, `StatusUpdate`, and `Exists` methods. Unit tests use counterfeiter-generated fakes (`pkg/controller/client/fakes/`). + +To regenerate fakes after changing the interface: `go generate ./pkg/controller/client/...` + +### Resource Reconciliation Pattern + +For each resource type (deployments, services, RBAC, etc.), the same flow is used: + +1. Decode static YAML from bindata (`common.Decode*ObjBytes(assets.MustAsset(...))`). +2. Mutate the decoded object (set namespace, labels, annotations, images, env vars, security context). +3. Check existence with `r.Exists()`. +4. If new: `r.Create()`. If existing: compare with `common.HasObjectChanged()`, then `r.UpdateWithRetry()` if drifted. +5. Record Kubernetes events for create/update operations. +6. Wrap errors with `common.FromClientError()`. + +### Conditional Resources + +Resources that depend on CR configuration use a slice of `{assetName string, condition bool}` structs. Only assets with `condition: true` are applied. Follow this pattern when adding new conditionally-created resources. + +## Common Pitfalls + +1. **Never return both `RequeueAfter` and a non-nil error** from `Reconcile`. Return one or the other. +2. **Never edit generated files by hand**: `zz_generated.deepcopy.go`, `config/crd/bases/*.yaml`, `pkg/operator/assets/bindata.go`, `config/rbac/role.yaml`, `docs/api_reference.md`. Always use `make update`. +3. **Always run `make update && make verify`** after any code change. CI will reject PRs with stale generated files. +4. **Use the cached client for managed resources** (those with `app: external-secrets` label). Use the uncached client only for external resources like cert-manager Issuers or user-provided Secrets. +5. **Add new watched resources to both `controllerManagedResources` and `buildCacheObjectList()`** in `pkg/controller/external_secrets/controller.go`. +6. **Add new resource types to `HasObjectChanged`** in `pkg/controller/common/utils.go` with field-level comparison, not full `DeepEqual`. +7. **Decode helpers panic on failure** -- this is intentional for build-time-constant assets. Do not add error handling around `Decode*ObjBytes` calls. +8. **RBAC markers** (`+kubebuilder:rbac`) go in `pkg/controller/external_secrets/controller.go` for the operator's own permissions. Operand RBAC is in static YAML under `bindata/`. +9. **The `go.work` workspace** means you cannot use `-mod=vendor` for `go test` or `go fmt`. The Makefile already clears `GOFLAGS` for affected targets. +10. **Container tool defaults to `podman`** (`CONTAINER_TOOL ?= podman`), not Docker. Override with `CONTAINER_TOOL=docker` if needed. + +## PR and Contribution Expectations + +- Run `make lint` and `make test` locally before submitting. +- Run `make verify` to ensure generated files are in sync. +- Add unit tests for new reconciliation logic using table-driven tests and `FakeCtrlClient`. +- Add API test cases in `api/v1alpha1/tests//` for any new CRD field or validation rule. +- Add E2E test cases with appropriate Ginkgo labels for platform-specific tests. +- Follow existing error wrapping patterns: `common.FromClientError` for API calls, `common.NewIrrecoverableError` for config validation failures. +- Commit messages should reference the relevant Jira ticket (e.g., `OCPBUGS-12345: description`). +- PR reviewers/approvers are listed in `OWNERS`. + +## Environment Variables + +The operator reads these at runtime (typically set by OLM/CSV): + +| Variable | Purpose | +|---|---| +| `RELATED_IMAGE_EXTERNAL_SECRETS` | Container image for the external-secrets operand | +| `RELATED_IMAGE_BITWARDEN_SDK_SERVER` | Container image for the Bitwarden SDK server | +| `OPERAND_EXTERNAL_SECRETS_IMAGE_VERSION` | Version label for operand resources | +| `BITWARDEN_SDK_SERVER_IMAGE_VERSION` | Version label for Bitwarden resources | +| `OPERATOR_IMAGE_VERSION` | Operator version string | +| `HTTP_PROXY`, `HTTPS_PROXY`, `NO_PROXY` | Proxy fallback from OLM environment | + +For local development, set at minimum `RELATED_IMAGE_EXTERNAL_SECRETS` via `t.Setenv()` in tests or shell export for `make run`. diff --git a/CLAUDE.md b/CLAUDE.md new file mode 100644 index 000000000..588f94e42 --- /dev/null +++ b/CLAUDE.md @@ -0,0 +1,51 @@ +# CLAUDE.md + +@AGENTS.md + +## Build and Test Commands + +When working in this repository, use these commands: + +### Before Committing +```bash +make update && make verify +``` +This regenerates all code, manifests, and bindata, then runs verification checks. CI will reject PRs with stale generated files. + +### Development Workflow +```bash +make build # Full build with codegen +make build-operator # Fast rebuild (binary only, no codegen) +make test # Run unit + API tests (no cluster needed) +make lint # Run golangci-lint +make lint-fix # Auto-fix linting issues +``` + +### Testing +```bash +make test-unit # Unit tests only +make test-apis # API validation tests (envtest) +make test-e2e # E2E tests (requires cluster) +``` + +### Dependency Management +```bash +make update-vendor # Sync vendor across all workspace modules +make update-dep PKG=package@version # Update a dependency in all modules +``` + +## Claude Code Preferences + +- Always run `make update && make verify` after code changes that affect: + - CRD definitions (`api/v1alpha1/`) + - Kubebuilder markers (`+kubebuilder:*`) + - Bindata YAML (`bindata/`) + - Generated code triggers + +- Use `make lint-fix` to automatically fix formatting and linting issues before suggesting manual fixes + +- The repository uses a Go workspace (`go.work`) with 4 modules. Never manually edit `GOFLAGS` or use `-mod=vendor` in test/fmt commands — the Makefile handles this + +- Container builds default to `podman`. Override with `CONTAINER_TOOL=docker` if needed + +- All build-time tools are vendored. Do not suggest installing tools globally diff --git a/docs/api-contracts-guidelines.md b/docs/api-contracts-guidelines.md new file mode 100644 index 000000000..708535ca1 --- /dev/null +++ b/docs/api-contracts-guidelines.md @@ -0,0 +1,157 @@ +# API Contracts Guidelines + +## API Group and Versioning + +- Group: `operator.openshift.io`, Version: `v1alpha1`. All types live in `api/v1alpha1/`. +- Only one version exists. The package-level marker `+groupName=operator.openshift.io` in `groupversion_info.go` sets the group. +- Every new type must be registered via `SchemeBuilder.Register()` in its `init()` function within the types file. + +## CRD Resources + +Two cluster-scoped singletons exist: + +| Kind | Plural | Short Names | Purpose | +|---|---|---|---| +| ExternalSecretsConfig | externalsecretsconfigs | esc, externalsecretsconfig, esconfig | Configures the external-secrets operand | +| ExternalSecretsManager | externalsecretsmanagers | esm, externalsecretsmanager, esmanager | Global operator-level config, auto-created at install | + +Both are enforced singletons via CEL: `self.metadata.name == 'cluster'`. Both use `+genclient:nonNamespaced` and `scope=Cluster`. + +## Required Kubebuilder Markers on CRD Types + +Every root CRD type must carry these markers: +``` ++genclient ++genclient:nonNamespaced ++k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object ++kubebuilder:object:root=true ++kubebuilder:subresource:status ++kubebuilder:resource:path=,scope=Cluster,categories={external-secrets-operator, external-secrets},shortName= ++kubebuilder:metadata:labels={"app.kubernetes.io/name=", "app.kubernetes.io/part-of=external-secrets-operator"} ++operator-sdk:csv:customresourcedefinitions:displayName="" +``` + +List types need `+kubebuilder:object:root=true` and `+k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object`. + +## Field-Level Validation Conventions + +### String Fields +- Set `+kubebuilder:validation:MinLength` and `+kubebuilder:validation:MaxLength` on spec string fields (e.g., 1-253 for names, 0-2048 for URLs, 0-4096 for noProxy). +- Status fields (e.g., `ExternalSecretsImage`, `BitwardenSDKServerImage`) and custom condition message fields typically omit validation markers. +- Use `+kubebuilder:validation:Pattern` for key-format constraints (e.g., ConfigMap keys: `^[-._a-zA-Z0-9]+$`). + +### Map Fields +- Set `+kubebuilder:validation:MinProperties` and `+kubebuilder:validation:MaxProperties` on spec maps (typical max: 20 for labels/annotations, 50 for nodeSelector). +- Set `+mapType=granular` for strategic-merge-patchable maps; `+mapType=atomic` for replace-on-update maps. + +### List Fields +- Set `+kubebuilder:validation:MinItems` and `+kubebuilder:validation:MaxItems` on spec lists. +- Status lists (e.g., `Conditions`) typically omit MinItems/MaxItems validation. +- For merge-patchable lists: use `+listType=map` with `+listMapKey=`. Composite keys are supported (e.g., networkPolicies uses both `name` and `componentName`). +- For replace-on-update lists: use `+listType=atomic`. +- Status subresource lists use `+patchMergeKey=` and `+patchStrategy=merge` in addition to listType markers for proper merge behavior. + +### Numeric Fields +- Use `+kubebuilder:validation:Minimum` and `+kubebuilder:validation:Maximum` (e.g., logLevel: 1-5, revisionHistoryLimit: 1-50). + +### Enum Fields +- Use `+kubebuilder:validation:Enum` for closed sets. Define a named Go type with exported constants (e.g., `Mode` with `Enabled`/`Disabled`, `ManagementState` with `Managed`/`Unmanaged`, `ComponentName` with four values). + +## CEL Validation Rules (XValidation) + +CEL rules via `+kubebuilder:validation:XValidation` are the primary mechanism for cross-field and immutability validation. Patterns used: + +- **Singleton enforcement**: `rule="self.metadata.name == 'cluster'"` on the root type. +- **Immutability**: `rule="self == oldSelf"` on individual fields (e.g., certManager `mode`, `issuerRef`, `injectAnnotations`). +- **List key immutability**: `rule="oldSelf.all(op, self.exists(p, p.name == op.name && p.componentName == op.componentName))"` to prevent renaming list entries. +- **Cross-field dependencies**: Complex rules on spec structs to enforce "if X is enabled, then Y or Z must be configured." +- **Annotation domain blocklists**: Regex-based `self.all(key, !key.matches(...))` rules blocking `kubernetes.io/`, `openshift.io/`, `k8s.io/`, `cert-manager.io/` prefixes. +- **Reserved env var names**: `self.all(e, !['PREFIX_'].exists(p, e.name.startsWith(p)) && e.name != 'EXACT_NAME')`. + +Always provide a `message` on every XValidation rule. + +## kubeapilinter Nolint Directives + +The repo uses `//nolint:kubeapilinter` comments for intentional deviations. Document the reason inline. Known patterns: +- `listMapKey` fields must NOT have `omitempty` (for proper patch merge identification). Apply `//nolint:kubeapilinter // is a listMapKey and must not have omitempty`. +- Custom `Condition` type (on ExternalSecretsManager) that intentionally omits some `metav1.Condition` fields. +- `metav1.Duration` fields retained despite linter preference, annotated with `// Duration type retained to avoid breaking API change`. + +## Status Subresource Patterns + +### ExternalSecretsConfig Status +- Embeds `ConditionalStatus` (from `meta.go`) which holds `[]metav1.Condition` with standard merge markers (`+listType=map`, `+listMapKey=type`, `+patchMergeKey=type`, `+patchStrategy=merge`). +- Uses standard `metav1.Condition` with `ObservedGeneration` set to `esc.GetGeneration()`. +- Condition types: `Ready` and `Degraded` (defined as constants in `conditions.go`). +- Reasons: `Failed`, `Ready`, `Progressing`, `Completed` (also constants, with `Progressing` defined as constant `ReasonInProgress`). +- Both conditions are set atomically before a single status update call. +- Additional status fields: `externalSecretsImage`, `bitwardenSDKServerImage`. + +### ExternalSecretsManager Status +- Uses a custom `ControllerStatus` list (`+listType=map`, `+listMapKey=name`) tracking per-controller conditions. +- Each `ControllerStatus` contains a custom `[]Condition` type (Type, Status, Message -- no Reason or LastTransitionTime) and `ObservedGeneration`. +- Top-level `lastTransitionTime` is set on every condition change. + +### Status Update Pattern +All controllers follow the same retry-on-conflict pattern: +1. Fetch current object from API server. +2. DeepCopy the changed status into the fetched object. +3. Call `StatusUpdate` (status subresource update). +4. Wrap in `retry.RetryOnConflict(retry.DefaultRetry, ...)`. + +## Defaults + +Use `+kubebuilder:default` for server-side defaulting: +- `logLevel`: 1 +- `mode`: `Disabled` +- `injectAnnotations`: `"false"` +- `certificateCheckInterval`: `"5m"` +- `certificateDuration`: `"8760h"` +- `certificateRenewBefore`: `"30m"` +- `revisionHistoryLimit`: 10 +- `networkPolicyProvisioning`: `Managed` +- `key` (ConfigMapKeyReference): `"ca-bundle.crt"` + +## Shared Types and Composition + +- `CommonConfigs` (in `meta.go`) is embedded via `json:",inline"` in both `ApplicationConfig` and `GlobalConfig`. It provides `logLevel`, `resources`, `affinity`, `tolerations`, `nodeSelector`, `proxy`. +- `ObjectReference`, `SecretReference`, `ConfigMapKeyReference`, `ConditionalStatus` are reusable building blocks in `meta.go`. +- Named types (`Mode`, `ManagementState`, `ComponentName`) with constants must be used instead of raw strings for enum fields. + +## Code Generation Pipeline + +After any API type change, run: +1. `make generate` -- regenerates `zz_generated.deepcopy.go` via controller-gen. +2. `make manifests` -- regenerates CRD YAMLs in `config/crd/bases/`, RBAC, and webhook configs. +3. `make update` -- full pipeline: generate + manifests + operand manifests + bindata + bundle + docs. +4. `make docs` -- regenerates `docs/api_reference.md` from `api/v1alpha1/` using `crd-ref-docs`. +5. `make verify` -- runs `check-git-diff` to ensure all generated files are committed. + +Never edit `zz_generated.deepcopy.go` or CRD YAML files by hand. + +## API Integration Tests + +API validation is tested via declarative YAML test suites in `api/v1alpha1/tests//`. Each `.testsuite.yaml` file defines `onCreate` and `onUpdate` test cases specifying `initial`, `expected`, and `expectedError` YAML. The test framework (in `test/apis/`) installs CRDs into a real envtest API server (requires Kube >= 1.25 for CEL). + +When adding a new CEL validation rule or field constraint, add corresponding test cases to the relevant `.testsuite.yaml` covering: +- Valid creation (initial + expected) +- Invalid creation (initial + expectedError with substring match) +- Immutability on update (initial + updated + expectedError) +- Boundary values (min/max lengths, min/max items) + +Run API tests with `make test-apis` or via the general `make test`. + +## Webhook Architecture + +This operator does NOT use kubebuilder-generated admission webhooks for its own CRDs. CEL-based CRD validation handles all admission validation. The webhook code in `pkg/controller/external_secrets/` manages the upstream external-secrets operand's `ValidatingWebhookConfiguration` resources (not the operator's own API validation). + +## Adding New API Fields Checklist + +1. Add the Go field to the appropriate struct in `api/v1alpha1/`. +2. Add validation markers (min/max, enum, XValidation) for spec fields; status fields typically omit validation. +3. Use pointer types for optional sub-structs; value types for required scalars with defaults. +4. Add `+optional` or `+required` marker. Use `json:"fieldName,omitempty"` for optional fields. +5. Exception: listMapKey fields must NOT have `omitempty`; add `//nolint:kubeapilinter` with reason. +6. Write test cases in the corresponding `.testsuite.yaml` for both valid and invalid inputs. +7. Run `make update && make verify` to regenerate and validate all artifacts. +8. If the field adds a new condition type or reason, add constants in `conditions.go`. diff --git a/docs/error-handling-guidelines.md b/docs/error-handling-guidelines.md new file mode 100644 index 000000000..2eb97326c --- /dev/null +++ b/docs/error-handling-guidelines.md @@ -0,0 +1,137 @@ +# Error Handling Guidelines + +## Custom Error Type: `ReconcileError` + +Resource reconciliation errors flow through the `ReconcileError` type defined in `pkg/controller/common/errors.go`. It carries a `Reason` (`ErrorReason` string) that determines requeue behavior, a human-readable `Message`, and the underlying `Err`. Setup errors (finalizers, client construction, CR fetching) use plain `fmt.Errorf` wrapping as described below. + +### Two Error Reasons + +| Reason | Constant | Requeue? | When to use | +|---|---|---|---| +| `IrrecoverableError` | `common.IrrecoverableError` | No | Invalid config, missing env vars, permission errors, bad requests | +| `RetryRequiredError` | `common.RetryRequiredError` | Yes (30s) | Transient network issues, resource conflicts, timeouts, not-found | + +### Constructor Functions + +- `common.NewIrrecoverableError(err, messageFmt, args...)` -- for errors that cannot be fixed by retrying. +- `common.NewRetryRequiredError(err, messageFmt, args...)` -- for transient errors worth retrying. +- `common.FromClientError(err, messageFmt, args...)` -- auto-classifies Kubernetes API errors: `Unauthorized`, `Forbidden`, `Invalid`, and `BadRequest` become irrecoverable; everything else becomes retry-required. +- All three return `nil` when the input `err` is `nil`. Always check this at call sites when the constructor is the last expression before return. + +### Checking Error Type + +Use `common.IsIrrecoverableError(err)` to check whether an error is irrecoverable. It uses `errors.As` to traverse wrapped error chains. There is no corresponding `IsRetryRequiredError` -- the convention is: if it is not irrecoverable and is a `ReconcileError`, it is retryable. + +## Error-to-Reconcile-Result Mapping + +The main reconcile dispatcher in `processReconcileRequest` (in `pkg/controller/external_secrets/controller.go`) maps errors to `ctrl.Result` as follows: + +``` +err == nil -> Result{}, nil (success, no requeue) +IsIrrecoverableError -> Result{}, errUpdate (no requeue; only status update error propagates) +retryable error -> Result{RequeueAfter: 30s}, nil (manual requeue, no error to controller-runtime) +status update failure -> Result{}, errUpdate (let controller-runtime handle backoff) +NotFound on primary CR -> Result{}, nil (skip reconciliation silently) +``` + +Key rule: never return both `RequeueAfter` and a non-nil error. For recoverable errors, return `RequeueAfter` with `nil` error. For irrecoverable errors, return empty `Result` so no requeue happens. + +The default requeue interval is `common.DefaultRequeueTime` (30 seconds), defined in `pkg/controller/common/constants.go`. + +## Wrapping Errors from Kubernetes API Calls + +For any Kubernetes client operation (Get, Create, Update, Delete, Patch, Exists), wrap the error with `common.FromClientError`: + +```go +if err := r.Create(r.ctx, obj); err != nil { + return common.FromClientError(err, "failed to create %s deployment resource", name) +} +``` + +This is the standard pattern across all resource reconciliation files (rbacs.go, deployments.go, services.go, secret.go, certificate.go, configmap.go, networkpolicy.go, validatingwebhook.go, serviceaccounts.go). `FromClientError` auto-classifies the API error so callers do not manually pick irrecoverable vs retryable. + +For non-API errors that are definitively unrecoverable (e.g., missing environment variable, invalid configuration, failed validation), use `common.NewIrrecoverableError` directly: + +```go +return common.NewIrrecoverableError( + fmt.Errorf("ENV_VAR not set"), + "failed to update image in %s deployment object", name, +) +``` + +## `fmt.Errorf` with `%w` for Non-Reconcile Errors + +For errors outside the reconcile-result classification path (setup, client construction, utility functions), use standard `fmt.Errorf` with `%w` wrapping: + +```go +return fmt.Errorf("failed to create uncached client: %w", err) +``` + +Do not wrap these in `ReconcileError` -- they propagate as plain errors up through controller setup or `Reconcile` return. + +## Retry Logic + +### `UpdateWithRetry` (Client-Level Retry) + +All resource updates that may hit conflicts use `r.UpdateWithRetry(ctx, obj)` instead of `r.Update(ctx, obj)`. This method (in `pkg/controller/client/client.go`) wraps `retry.RetryOnConflict(retry.DefaultRetry, ...)`, re-fetching the latest resource version before each attempt. + +Use `UpdateWithRetry` for updating managed resources (Deployments, RBAC, Services, etc.) and for finalizer updates. + +### `retry.RetryOnConflict` (Status Updates) + +Status subresource updates use `retry.RetryOnConflict(retry.DefaultRetry, ...)` directly. The pattern is: fetch latest, deep-copy desired status into it, then call `StatusUpdate`. This appears identically in all three controllers. + +### `retry.OnError` with Custom Predicate + +The ESM default resource creation (`pkg/controller/external_secrets_manager/externalsecretsmanager.go`) uses `retry.OnError` with a custom `shouldRetryOnError` predicate that stops retrying on `AlreadyExists`, `Conflict`, `Invalid`, `BadRequest`, `Unauthorized`, `Forbidden`, and `TooManyRequests`. + +## Status Condition Updates + +### Condition Types + +| Type | Defined In | Used By | +|---|---|---| +| `Degraded` | `api/v1alpha1/conditions.go` | external_secrets controller | +| `Ready` | `api/v1alpha1/conditions.go` | external_secrets controller | +| `UpdateAnnotation` | `api/v1alpha1/conditions.go` | crd_annotator controller | + +### Condition Update Rules + +1. On irrecoverable error: set `Degraded=True/Failed` and `Ready=False/Ready`. +2. On retryable error: set `Degraded=False/Ready` and `Ready=False/Progressing` with the error message. +3. On success: set `Degraded=False/Ready` and `Ready=True/Ready`. +4. Set both conditions atomically via `apimeta.SetStatusCondition` before calling `updateStatus`. +5. Only call `updateStatus` when `SetStatusCondition` returns `true` (condition actually changed). +6. Always include `ObservedGeneration` from the CR's current `.metadata.generation`. + +### Error Aggregation on Status Update Failure + +When the status update itself fails alongside a reconciliation error, aggregate both using `utilerrors.NewAggregate([]error{err, errUpdate})`. This pattern exists in both the external_secrets and crd_annotator controllers. + +## Kubernetes Event Recording + +- Use `corev1.EventTypeNormal` + reason `"Reconciled"` for successful create/update of resources. +- Use `corev1.EventTypeWarning` + reason `"ResourceAlreadyExists"` when a resource exists during initial creation reconciliation. +- Use `corev1.EventTypeWarning` + reason `"RemoveDeployment"` on CR deletion. +- Use `corev1.EventTypeWarning` + reason `"Read"` on fetch failures (ESM controller). +- Event messages must include the resource name (namespace/name format). + +## Logging Conventions + +- `r.log.V(1).Info(...)` -- operational events (reconcile start, resource modifications, label/annotation changes). +- `r.log.V(4).Info(...)` -- detailed debug (resource already in expected state, status update attempts, metadata builds). +- `r.log.Error(err, "message")` -- always log the error at the point it occurs in `reconcileExternalSecretsDeployment`, before returning it upward. The top-level `processReconcileRequest` also logs the error again with `r.log.Error(err, "failed to reconcile external-secrets deployment")`. +- `ctrl.Log.V(1).WithName("subsystem")` -- for setup-time logging outside reconciler instances (cache setup, CRD discovery). +- Use structured key-value pairs (`"name"`, `"request"`, `"key"`, `"namespace"`, `"error"`, `"installed"`), never string interpolation in log messages. + +## `Exists` Helper + +The `CtrlClient.Exists(ctx, key, obj)` method converts `NotFound` errors to `(false, nil)`, passing all other errors through. Use this instead of raw `Get` + manual `IsNotFound` checks when you only need existence. + +## Panics (Decode Helpers Only) + +The `Decode*ObjBytes` functions in `pkg/controller/common/utils.go` panic on decode failure or type assertion failure. These are called only with compile-time-known static assets (`assets.MustAsset`), so panics indicate a build-time bug, not a runtime error. Never use panic for runtime error handling elsewhere. + +## NotFound Handling in Reconcile + +When the primary CR (`ExternalSecretsConfig` or `ExternalSecretsManager`) is not found, return `ctrl.Result{}, nil` -- do not requeue. Log at V(1) and skip reconciliation. When a secondary/dependent CR is not found, either skip gracefully (ESM looking for ESC) or wrap in `fmt.Errorf` and return for requeue. diff --git a/docs/integration-guidelines.md b/docs/integration-guidelines.md new file mode 100644 index 000000000..2f9692502 --- /dev/null +++ b/docs/integration-guidelines.md @@ -0,0 +1,95 @@ +# Integration Guidelines + +## Architecture Overview + +This operator does NOT embed or fork the upstream external-secrets project. It manages the upstream operand as a set of static YAML manifests (bindata) that are decoded at runtime, mutated with operator-controlled labels/annotations/env/images, and applied to the cluster. There is no Helm chart integration; resource creation is fully imperative via controller-runtime `Create`/`Update`. + +## CRD Hierarchy and Singleton Pattern + +- **ExternalSecretsConfig** (`externalsecretsconfigs.operator.openshift.io`): primary CR, singleton named `cluster`. Controls operand installation, cert-manager toggle, Bitwarden plugin, proxy, network policies, and per-component overrides. +- **ExternalSecretsManager** (`externalsecretsmanagers.operator.openshift.io`): global config CR, also singleton named `cluster`. Provides lower-priority defaults (labels, resources, proxy, tolerations, affinity, nodeSelector, logLevel) that ExternalSecretsConfig overrides. +- Precedence chain for shared fields: `ExternalSecretsConfig > ExternalSecretsManager > OLM environment variables` (proxy only). + +## Static Manifest (Bindata) Pattern + +All operand Kubernetes resources live as YAML in `bindata/external-secrets/` and `bindata/external-secrets/resources/`. They are compiled into Go via `pkg/operator/assets/bindata.go` (generated with `go-bindata`). At reconcile time: +1. Decode bytes with typed decoders in `pkg/controller/common/utils.go` (e.g., `DecodeDeploymentObjBytes`, `DecodeServiceObjBytes`). +2. Mutate the decoded object: set namespace, apply labels/annotations via `common.ApplyResourceMetadata`, update container images/args, inject proxy env vars, mount CA bundles. +3. Check existence with `r.Exists()`, compare with `common.HasObjectChanged()`, then `Create` or `UpdateWithRetry`. + +When adding a new resource: add the YAML to `bindata/`, add a constant in `constants.go` mapping the asset path, regenerate bindata, and follow the existing `createOrApply*` pattern. + +## Reconciliation Order + +Resources are created in strict dependency order within `reconcileExternalSecretsDeployment`: +Namespace -> NetworkPolicies -> ServiceAccounts -> Certificates -> Secrets -> TrustedCA ConfigMap -> RBAC -> Services -> Deployments -> ValidatingWebhooks -> CR annotation tracking. + +Never reorder. CR annotation tracking (managed-annotations) is patched last to ensure obsolete annotations are removed from resources before tracking is updated. + +## Conditional Resource Creation + +Many resources are conditionally created based on CR spec. The pattern uses a slice of `{assetName, condition}` structs: +- **cert-controller deployment/service**: created when cert-manager is NOT enabled (`!isCertManagerConfigEnabled(esc)`) +- **bitwarden deployment/service/certificate/network-policy**: created when Bitwarden IS enabled (`isBitwardenConfigEnabled(esc)`) +- **webhook TLS secret**: created only when cert-manager is NOT enabled (cert-manager manages the secret otherwise, named `external-secrets-webhook-cm` to avoid collision) + +## cert-manager Integration + +- cert-manager is an optional dependency detected at startup via CRD discovery (`isCRDInstalled` checks `cert-manager.io/v1` `certificates` resource). +- When detected, the Certificate CRD informer is registered with the manager cache. When not detected, the cache omits it entirely to prevent startup failures. +- The `CertManagerConfig` in ExternalSecretsConfig spec requires an `issuerRef` (Issuer or ClusterIssuer). The operator validates the issuer exists via `assertIssuerRefExists` using the uncached client before creating Certificate resources. +- When cert-manager is enabled with `injectAnnotations: "true"`, the `crd-annotator` controller patches all ESO CRDs with `cert-manager.io/inject-ca-from: external-secrets/external-secrets-webhook`. +- Webhook ValidatingWebhookConfigurations get the `cert-manager.io/inject-ca-from` annotation via `withCertManagerAnnotation()`, tracked in managed-annotations for proper lifecycle. + +## OpenShift Platform Integrations + +- **Trusted CA bundle**: when proxy is configured, a ConfigMap `external-secrets-trusted-ca-bundle` is created with label `config.openshift.io/inject-trusted-cabundle: "true"`. OpenShift's Cluster Network Operator (CNO) injects cluster CA certs into it. The ConfigMap data is owned by CNO; the operator only manages labels/annotations. Mounted at `/etc/pki/tls/certs` in all containers. +- **Proxy**: both uppercase and lowercase variants (`HTTP_PROXY`/`http_proxy`, etc.) are set on all containers and init containers. Proxy is removed cleanly when config is cleared. +- **Network policies**: default deny-all is always applied. Static allow policies for API server egress, webhook, DNS, and optionally cert-controller/bitwarden are applied from bindata. Users add custom egress-only policies via `controllerConfig.networkPolicies` with component targeting. +- **Security context**: all containers get hardened `SecurityContext` (non-root, read-only root FS, drop ALL capabilities, seccomp RuntimeDefault). +- **Console capability annotation**: ConsoleYAMLSample resources require `capability.openshift.io/name: Console` annotation. + +## Bitwarden Plugin Integration + +- Bitwarden is the only currently supported plugin. Enabled via `spec.plugins.bitwardenSecretManagerProvider.mode: Enabled`. +- Requires either a `secretRef` (pre-existing TLS secret with keys `tls.crt`, `tls.key`, `ca.crt`) or cert-manager configuration for automated TLS. +- The bitwarden-sdk-server runs as a separate deployment in the operand namespace. Its image comes from env var `RELATED_IMAGE_BITWARDEN_SDK_SERVER`. +- Network policies must explicitly allow egress from the core controller to bitwarden-sdk-server on port 9998. +- ClusterSecretStore for Bitwarden requires: `bitwardenServerSDKURL`, `caBundle` (base64 CA cert), `organizationID`, `projectID`, and auth `secretRef.credentials` pointing to an access token secret. + +## Image Management + +Container images in static YAML manifests (`bindata/`) have placeholder values that are replaced at runtime. The reconciler reads images from environment variables set on the operator pod (typically by OLM/CSV): +- `RELATED_IMAGE_EXTERNAL_SECRETS`: external-secrets operand image +- `RELATED_IMAGE_BITWARDEN_SDK_SERVER`: bitwarden-sdk-server image +- `OPERAND_EXTERNAL_SECRETS_IMAGE_VERSION`: version label for operand resources +- `BITWARDEN_SDK_SERVER_IMAGE_VERSION`: version label for bitwarden resources +- `OPERATOR_IMAGE_VERSION`: operator version + +The reconciler returns an irrecoverable error if image env vars are empty. + +## Client Architecture + +- **Cached client** (`r.CtrlClient`): reads from manager cache filtered by `app=external-secrets` label selector. Used for all managed resources. +- **Uncached client** (`r.UncachedClient`): reads directly from API server. Used for objects NOT managed by the controller (issuer validation, secret ref existence checks). +- **UpdateWithRetry**: all mutations use `UpdateWithRetry` which fetches latest resourceVersion before update, wrapped in `retry.RetryOnConflict`. + +## Label and Annotation Conventions + +- All managed resources carry `app: external-secrets` label (cache filter key). +- Default labels: `app`, `app.kubernetes.io/version`, `app.kubernetes.io/managed-by: external-secrets-operator`, `app.kubernetes.io/part-of: external-secrets-operator`. +- Disallowed label prefixes (rejected silently): `app.kubernetes.io/`, `external-secrets.io/`, `rbac.authorization.k8s.io/`, `servicebinding.io/controller`, `app`. +- Disallowed annotation domain prefixes (rejected by CRD CEL validation): `kubernetes.io/`, `k8s.io/`, `openshift.io/`, `cert-manager.io/`. +- Managed annotation keys are tracked in a base64-encoded JSON array on the CR annotation `externalsecretsconfig.operator.openshift.io/managed-annotations`. + +## Adding a New Provider Plugin + +Follow the Bitwarden pattern: +1. Add new `Mode`-gated fields to `PluginsConfig` in `api/v1alpha1/external_secrets_config_types.go`. +2. Add static YAML manifests (deployment, service, service account, network policy) to `bindata/external-secrets/resources/`. +3. Add asset name constants to `constants.go`. +4. Add conditional entries to `createOrApplyDeployments`, `createOrApplyServices`, `createOrApplyServiceAccounts`, `createOrApplyNetworkPolicies` using the `{assetName, condition}` struct pattern. +5. Add image env var constants and wire them into `getDeploymentObject`. +6. Register the new component name in the `ComponentName` enum for network policy targeting. +7. Add the container name mapping in `getComponentNameFromAsset`. +8. Add e2e tests with appropriate Ginkgo labels and credential secret conventions. diff --git a/docs/performance-guidelines.md b/docs/performance-guidelines.md new file mode 100644 index 000000000..cbd4c439c --- /dev/null +++ b/docs/performance-guidelines.md @@ -0,0 +1,92 @@ +# Performance Guidelines + +## Cache Architecture + +### Manager-Level Label-Filtered Cache +The operator uses a single manager-level cache with per-object label selectors configured via `NewCacheBuilder()` in `pkg/controller/external_secrets/controller.go`. All controller-managed resources (Deployments, Services, RBAC, etc.) are filtered by `app=external-secrets` label. Own CRs (`ExternalSecretsConfig`, `ExternalSecretsManager`) are cached without label filters. + +- When adding a new watched resource type, register it in both `controllerManagedResources` (line ~68) and `buildCacheObjectList()` with the appropriate label selector. +- Never create a second manager-level cache for the main controller. The `crd_annotator` controller is the only one that builds its own custom cache (`BuildCustomClient`), because it watches a disjoint resource set (`CustomResourceDefinition` with different label selectors). +- Set `ReaderFailOnMissingInformer: true` on custom caches (as `crd_annotator` does) to fail fast on missing informers rather than silently returning empty results. + +### Uncached Client Usage +The operator maintains a separate uncached `CtrlClient` (`r.UncachedClient`) for objects not tracked by the cache, specifically cert-manager Issuers/ClusterIssuers and user-provided Secrets (Bitwarden secretRef). Use uncached reads only for: +- Validating existence of external resources (e.g., `assertIssuerRefExists`, `assertSecretRefExists`) +- One-time bootstrap operations (e.g., `CreateDefaultESMResource` in `setup_manager.go`) + +Never use the uncached client for objects in `controllerManagedResources` -- the cache is designed for those. + +### Conditional CRD Cache Registration +Before creating the cache, `NewCacheBuilder` checks whether the cert-manager CRD exists via `isCRDInstalled`. If absent, `certmanagerv1.Certificate` is excluded from the cache object list to prevent startup failure. When the CRD exists, an informer is explicitly registered via `mgr.GetCache().GetInformer()`. Follow this pattern for any future optional CRD dependencies. + +## Watch and Predicate Patterns + +### Predicate Strategy by Resource Type +The controller uses different predicate combinations per resource type to minimize reconciliation: + +| Resource | Watch Method | Predicates | Rationale | +|---|---|---|---| +| ExternalSecretsConfig (primary) | `For()` | `GenerationChangedPredicate` | Ignore status-only updates | +| Deployments | `Watches()` | `GenerationChangedPredicate` + `managedResources` | Skip status/replica updates | +| Secrets | `WatchesMetadata()` | `LabelChangedPredicate` | Avoid caching Secret data in memory | +| Other managed resources | `Watches()` | `managedResources` (label filter) | Only watch operator-owned objects | +| ExternalSecretsManager | `Watches()` | `GenerationChangedPredicate` + `managedResources` | Skip status-only changes | +| CRDs (crd_annotator) | `WatchesMetadata()` | `AnnotationChangedPredicate` + label filter | Only metadata matters | + +When adding new watched resources, always apply at minimum the `managedResources` predicate to avoid reconciling unrelated objects. Use `WatchesMetadata()` when only labels/annotations matter -- it avoids fetching full object bodies. + +### Map Function Convention +All controllers map events to a single reconcile key: the CR name (`common.ExternalSecretsConfigObjectName` = "cluster"). The map function checks `obj.GetLabels()[requestEnqueueLabelKey] == requestEnqueueLabelValue` and returns an empty slice for non-matching objects. Never enqueue multiple requests from a single event. + +## Reconciliation Patterns + +### Error Classification and Requeue Strategy +The operator classifies errors into two categories via `common.ReconcileError`: + +- **IrrecoverableError**: Config validation failures, permission errors, bad requests. Returns `ctrl.Result{}` with no requeue. Examples: missing CRD, invalid cert-manager config, missing env var. +- **RetryRequiredError**: Transient API failures, conflicts. Requeues with `DefaultRequeueTime` (30s). The helper `common.FromClientError()` auto-classifies based on API error type. + +Convention: never return both a `RequeueAfter` result and an error simultaneously. Return the error alone and let controller-runtime handle backoff, or return `RequeueAfter` with nil error. + +### Condition Update Optimization +Status conditions are updated only when they actually change. The pattern uses `apimeta.SetStatusCondition()` which returns a boolean, and the controller checks both `degradedChanged` and `readyChanged` before issuing a status update call. Follow this pattern to avoid unnecessary API writes. + +### CR Annotation Patching +After all resources are reconciled, CR annotations (managed-annotations tracking + processed annotation) are updated via a single `MergePatch` rather than a full object update. This reduces conflict risk and avoids overwriting spec changes made concurrently. + +## Retry and Conflict Handling + +### UpdateWithRetry Pattern +Regular object updates (non-status) use `retry.RetryOnConflict(retry.DefaultRetry, ...)`. The `UpdateWithRetry` method on `CtrlClient` follows the read-modify-write pattern: it fetches the latest `ResourceVersion`, sets it on the object, then updates. Use `UpdateWithRetry` instead of bare `Update` for any object that might be modified concurrently. + +### Status Update Pattern +Status subresource updates use `retry.RetryOnConflict(retry.DefaultRetry, ...)` directly. The pattern is: fetch latest, deep-copy desired status into it, then call `StatusUpdate`. This appears identically in all three controllers. + +### Bootstrap Resource Creation +`CreateDefaultESMResource` uses `retry.OnError` with a custom `shouldRetryOnError` function that stops on permanent errors (AlreadyExists, Conflict, Invalid, BadRequest, Unauthorized, Forbidden, TooManyRequests) and retries on transient ones. Follow this pattern for one-time resource creation at startup. + +## Concurrency Primitives + +### Resettable sync.Once (`common.Now`) +The `Now` type in `pkg/controller/common/utils.go` extends `sync.Once` with a `Reset()` method using `sync.Mutex` + `atomic.Uint32` with double-checked locking. It is used in the `external_secrets_manager` controller to emit a warning event only once per error cycle, resetting when the error resolves. Use this type when you need one-shot behavior that can be re-armed. + +No goroutines are spawned directly by controller code -- all concurrency is handled by the controller-runtime framework. Do not introduce raw goroutines in reconcile loops. + +## Drift Detection + +### HasObjectChanged Pattern +The `common.HasObjectChanged()` function uses type-specific field comparison (not full `reflect.DeepEqual` on the entire object) to detect drift. Each resource type has a dedicated comparison: +- Deployments: compares replicas, containers, volumes, affinity, tolerations, node selector, env vars individually +- RBAC: compares only Rules (Roles) or RoleRef+Subjects (Bindings) +- Services: compares Type, Ports, Selector +- Webhooks: compares individual webhook entries by name + +Annotations are compared using managed-key tracking (`annotationMapsModified`) which only checks keys the operator manages, ignoring annotations set by external controllers (e.g., `deployment.kubernetes.io/revision`). This prevents infinite reconcile loops. + +When adding a new managed resource type, add a case to the `HasObjectChanged` switch and implement field-level comparison rather than using full `DeepEqual` on the entire spec. + +## Asset Decoding +Static manifests are embedded via go-bindata (`pkg/operator/assets/bindata.go`) and decoded at reconcile time using typed `Decode*ObjBytes` helpers. Each decode call allocates a new object. These helpers `panic` on decode failure since the assets are build-time constants. Do not add error handling around these -- a panic here indicates a build or manifest corruption problem. + +## E2E Test Polling +E2E tests use `wait.PollUntilContextTimeout` with 5-second intervals and 2-minute default timeouts. Bitwarden-related tests use 4-minute timeouts due to SDK server startup latency. Follow these conventions for new e2e tests rather than using arbitrary sleep durations. diff --git a/docs/security-guidelines.md b/docs/security-guidelines.md new file mode 100644 index 000000000..af391d8a1 --- /dev/null +++ b/docs/security-guidelines.md @@ -0,0 +1,125 @@ +# Security Guidelines + +## Container Security Context (Hardened Defaults) + +All operand containers are hardened programmatically via `updateContainerSecurityContext()` in `pkg/controller/external_secrets/deployments.go`. This function is called for every container (controller, webhook, cert-controller, bitwarden-sdk-server) during deployment reconciliation. The enforced settings are: + +- `AllowPrivilegeEscalation: false` +- `Capabilities.Drop: ["ALL"]` +- `ReadOnlyRootFilesystem: true` +- `RunAsNonRoot: true` +- `RunAsUser: nil` (defers to pod-level or image default; static manifests use `1000`) +- `SeccompProfile.Type: RuntimeDefault` + +When adding a new container or deployment, always call `updateContainerSecurityContext(&container)` on the container spec. Never set `RunAsUser` to `0` or add capabilities. The reconciler drift-detects security context changes and reverts them. + +Static deployment manifests in `bindata/external-secrets/resources/` also set `hostNetwork: false` explicitly. Never change this. + +## Container Image (Non-root) + +All Dockerfiles (`Dockerfile`, `images/ci/Dockerfile`, `images/ci/operand.Dockerfile`) run as `USER 65534:65534` (nobody). Never change the USER to root or remove this line. The base image is `ubi9-minimal`. + +## HTTP/2 Disabled by Default + +In `cmd/external-secrets-operator/main.go`, HTTP/2 is disabled for both metrics and webhook servers to mitigate known HTTP/2 vulnerabilities. The `--enable-http2` flag defaults to `false`. The implementation sets `c.NextProtos = []string{"http/1.1"}` on the TLS config. Do not change this default. + +## Metrics Endpoint Security + +The metrics server uses `filters.WithAuthenticationAndAuthorization` as `FilterProvider`, enforcing authn/authz on the metrics endpoint. Secure metrics serving (HTTPS on `:8443`) is enabled by default (`--metrics-secure=true`). The OpenShift service CA (`/var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt`) is loaded for client verification. When modifying metrics configuration, preserve these protections. + +## RBAC Architecture (Three Tiers) + +The repo manages RBAC at three distinct levels. Changing one level does not affect the others. + +**1. Operator's own ClusterRole** (`config/rbac/role.yaml`): Grants permissions the operator process needs. Generated from `+kubebuilder:rbac` markers in `pkg/controller/external_secrets/controller.go`. When adding new API interactions, add markers there and run `make manifests`. + +**2. Operand ClusterRoles** (static YAML in `bindata/external-secrets/resources/`): +- `clusterrole_external-secrets-controller.yml` -- main controller (secrets CRUD, ESO CRDs) +- `clusterrole_external-secrets-cert-controller.yml` -- cert-controller (webhook configs, secrets for TLS, CRDs) +- `clusterrole_external-secrets-view.yml` -- aggregated to `view`/`edit`/`admin` (read-only ESO resources) +- `clusterrole_external-secrets-edit.yml` -- aggregated to `edit`/`admin` (mutate ESO resources) +- `clusterrole_external-secrets-servicebindings.yml` -- service bindings integration (read-only ESO resources with servicebinding.io/controller label) + +The cert-controller role uses `resourceNames` restrictions on webhook configurations (`secretstore-validate`, `externalsecret-validate`) for its update permissions. Follow this pattern of scoping when adding new permissions. + +**3. Namespace-scoped roles**: `role_external-secrets-leaderelection.yml` restricts leader election to the operand namespace. + +Key convention: The controller ClusterRole grants `secrets` verbs `get/list/watch/create/update/delete/patch`. This is inherently powerful. Never broaden this (e.g., never add `*` verbs or `*` resources). + +## RBAC Label Protection + +The regex `disallowedLabelMatcher` in `install_external_secrets.go` prevents users from overriding internal labels via `ControllerConfig.Labels`: +``` +^app.kubernetes.io\/|^external-secrets.io\/|^rbac.authorization.k8s.io\/|^servicebinding.io\/controller$|^app$ +``` +When adding new internal labels, add them to this regex. + +## Annotation Domain Restrictions + +The CRD validation (CEL rules in `external_secrets_config_types.go`) blocks user annotations with reserved domain prefixes: `kubernetes.io/`, `openshift.io/`, `k8s.io/`, `cert-manager.io/`. These rules are enforced at admission time. Do not weaken them. When the operator adds its own annotations (e.g., `cert-manager.io/inject-ca-from`), it does so programmatically, bypassing user-facing validation. + +## Environment Variable Reservation + +The `ComponentConfig.OverrideEnv` field uses CEL validation to block reserved env var names/prefixes: `KUBERNETES_*`, `EXTERNAL_SECRETS_*`, `HOSTNAME`, `SSL_CERT_DIR`, `SSL_CERT_FILE`. When adding new operator-managed env vars that users should not override, add them to the CEL rule in the CRD type definition. + +## TLS Certificate Management (Dual Mode) + +Two certificate provisioning paths exist, controlled by `spec.controllerConfig.certProvider.certManager.mode`: + +**Built-in cert-controller** (default): A separate deployment (`external-secrets-cert-controller`) manages TLS certificates for the webhook. The cert-controller has its own RBAC and creates/updates the `external-secrets-webhook` Secret. The webhook reads certs from `/tmp/certs` volume mount. + +**cert-manager integration**: When enabled, `Certificate` resources are created from templates in `bindata/`. The `issuerRef` is validated to exist before creating the Certificate (via `assertIssuerRefExists`). The webhook secret name changes to `external-secrets-webhook-cm` to avoid clashing with the cert-controller secret. The `cert-manager.io/inject-ca-from` annotation is conditionally added to webhook configurations. + +Key constraint: `certManager.mode` and `issuerRef` are immutable once set (enforced via `XValidation:rule="self == oldSelf"`). + +## Bitwarden TLS Requirements + +When `BitwardenSecretManagerProvider` is enabled, TLS certificates are mandatory. Users must provide either: +- A `secretRef` pointing to a K8s Secret with keys `tls.crt`, `tls.key`, `ca.crt` (validated via `assertSecretRefExists` using an uncached client) +- cert-manager configuration to auto-generate certificates + +This is enforced at the CRD level via CEL and in controller code. The bitwarden deployment mounts certs at `/certs` and probes use `scheme: HTTPS`. + +## Network Policy Architecture (Default-Deny) + +The operator deploys a **deny-all** base NetworkPolicy (`networkpolicy_deny-all.yaml`) that blocks all ingress and egress for all pods in the operand namespace. Specific allow-policies are then layered: + +- `allow-api-server-egress-for-main-controller-traffic` -- egress to API server (port 6443) +- `allow-api-server-and-webhook-traffic` -- egress to API server + ingress on webhook port 10250 and metrics port 8080 (from monitoring namespace only) +- `allow-api-server-egress-for-cert-controller-traffic` -- only when cert-controller is active +- `allow-api-server-egress-for-bitwarden-sever` -- only when Bitwarden is enabled (note: file has typo "sever" instead of "server") +- `allow-dns` -- DNS egress + +When adding new components or network requirements, follow this pattern: add a conditional static policy in `bindata/` and register it in `createOrApplyStaticNetworkPolicies()` with the appropriate condition. + +User-defined network policies via `spec.controllerConfig.networkPolicies` are prefixed with `eso-user-` and restricted to `Egress` policy type only. The operator auto-selects pods via component label. + +## Webhook Security Configuration + +Validating webhooks use: +- `failurePolicy: Fail` (on the ExternalSecret webhook) -- rejecting requests if the webhook is unavailable +- `sideEffects: None` +- `timeoutSeconds: 5` +- Webhook listens on port `10250` (not the default 443) + +The SecretStore webhook does not explicitly set `failurePolicy`, which defaults to `Fail` in Kubernetes. Maintain `Fail` for security-critical validations. + +## Singleton Pattern (Cluster-Scoped CRs) + +Both `ExternalSecretsConfig` and `ExternalSecretsManager` are singletons enforced via CEL: `self.metadata.name == 'cluster'`. This prevents privilege escalation through multiple conflicting configurations. Never remove this validation. + +## Trusted CA Bundle Handling + +When proxy configuration is present, a ConfigMap labeled `config.openshift.io/inject-trusted-cabundle: "true"` is created. The Cluster Network Operator (CNO) populates it with cluster-wide CA certificates. The operator mounts this at `/etc/pki/tls/certs` (Go's default cert path) as a read-only volume. The operator never writes CA data directly -- it only manages the label and lets CNO handle the content. + +## Sensitive Data in Tests + +E2E tests reference AWS credentials via a well-known Secret (`aws-creds` in `kube-system`). Credential keys are `aws_access_key_id` and `aws_secret_access_key`. These are read from the cluster, never hardcoded. When adding e2e tests for new providers, follow this pattern: reference credentials from pre-existing cluster Secrets, never embed them in test code or YAML fixtures. + +## Reconciler Drift Detection + +The operator reconciles all managed resources back to desired state. RBAC rules, deployments, webhook configurations, and network policies are compared field-by-field (via `HasObjectChanged` in `pkg/controller/common/utils.go`). If any resource is externally modified (e.g., someone manually adds permissions to a ClusterRole), the operator detects the drift and reverts it. This is a critical security property -- do not disable drift detection for security-sensitive resources. + +## Error Classification for Security + +`FromClientError` in `pkg/controller/common/errors.go` classifies `Unauthorized` and `Forbidden` API errors as `IrrecoverableError`, meaning the operator will not retry. This prevents retry storms against permission boundaries. Maintain this classification when adding new API interactions. diff --git a/docs/testing-guidelines.md b/docs/testing-guidelines.md new file mode 100644 index 000000000..270acef99 --- /dev/null +++ b/docs/testing-guidelines.md @@ -0,0 +1,124 @@ +# Testing Guidelines + +## Test Architecture + +This repo has three test tiers, each in a separate Go module boundary (the repo uses `go.work` with modules at `.`, `./test`, `./cmd/external-secrets-operator`, `./tools`): + +| Tier | Location | Make Target | Framework | Cluster Required | +|------|----------|-------------|-----------|-----------------| +| Unit | `pkg/**/*_test.go` | `make test-unit` | stdlib `testing` + table-driven | No | +| API Integration | `test/apis/` | `make test-apis` | Ginkgo/Gomega + envtest | No (envtest) | +| E2E | `test/e2e/` | `make test-e2e` | Ginkgo/Gomega + live cluster | Yes | + +`make test` runs both `test-apis` and `test-unit` (no cluster needed). + +## Unit Tests (`pkg/`) + +### Conventions +- Files live alongside source in the same package (e.g., `deployments.go` / `deployments_test.go`). +- Use stdlib `testing` package exclusively; no Ginkgo. +- Follow table-driven test pattern: define a `tests` slice of structs with `name`, `preReq`, `wantErr`, and optional validation funcs, then iterate with `t.Run`. +- No build tags; unit tests compile under normal `go test`. + +### Fake Client Pattern +- Use `pkg/controller/client/fakes/FakeCtrlClient` (generated by counterfeiter) to mock the controller-runtime client. +- Set up behavior per test via `mock.ExistsCalls(...)`, `mock.CreateCalls(...)`, `mock.UpdateWithRetryCalls(...)`, etc. +- Assign mock to `r.CtrlClient = mock` on the test reconciler. + +### Test Helpers +- `pkg/controller/external_secrets/test_utils.go` provides `testReconciler(t)`, `testDeployment(name)`, `testService(name)`, `testResourceMetadata(esc)`, and typed decode helpers (`testClusterRole`, `testSecret`, etc.). +- `pkg/controller/commontest/utils.go` provides `TestExternalSecretsConfig()`, `TestExternalSecretsManager()`, `ErrTestClient`, and standard test constants (`TestExternalSecretsImageName`, `TestBitwardenImageName`, `TestExternalSecretsNamespace`). + +### Error Assertion +- Compare exact error strings: `if err == nil || err.Error() != tt.wantErr`. +- Use `t.Setenv()` for environment variables needed by the code under test (e.g., `RELATED_IMAGE_EXTERNAL_SECRETS`). + +### Adding a New Unit Test +1. Create or edit `*_test.go` next to the source file, same package. +2. Define a table-driven test with `name`, mock setup (`preReq`), optional input mutation (`updateExternalSecretsConfig`), and `wantErr` string. +3. Use `testReconciler(t)` for the reconciler, `fakes.FakeCtrlClient{}` for the client. +4. Assert with `t.Errorf`; avoid external assertion libraries in unit tests. + +## API Integration Tests (`test/apis/`) + +### Data-Driven Test Suites +- Tests are defined in YAML files at `api/v1alpha1/tests//.testsuite.yaml`. +- Each file specifies `crdName`, `tests.onCreate` (create + validate or expect error), and `tests.onUpdate` (create, update, validate or expect error). +- The Go code in `test/apis/generator.go` auto-generates Ginkgo specs from these YAML files using `DescribeTable`. + +### envtest Setup +- Suite file: `test/apis/suite_test.go`. +- Bootstraps `envtest.Environment` with CRDs from `config/crd/bases/`. +- Uses Kubernetes API version >= 1.25 (required for CEL validation). +- Uses `komega` for object comparison with `IgnoreAutogeneratedMetadata`. + +### Running +- `make test-apis` invokes `hack/test-apis.sh`, which runs Ginkgo with `--randomize-all --randomize-suites --keep-going --timeout=30m`. +- In CI (`OPENSHIFT_CI=true`), JUnit and coverage artifacts are written to `ARTIFACT_DIR`. + +### Adding a New API Test +1. Create or edit a `.testsuite.yaml` file under `api/v1alpha1/tests//`. +2. Define `onCreate` entries with `initial` YAML (input) and either `expected` YAML (success) or `expectedError` (failure substring). +3. Define `onUpdate` entries with `initial`, `updated`, `expected`/`expectedError`/`expectedStatusError`. Optionally use `initialCRDPatches` for ratcheting tests. +4. The generator picks them up automatically; no Go code changes needed. + +## E2E Tests (`test/e2e/`) + +### Build Tag +- All files in `test/e2e/` and `test/utils/` require `//go:build e2e`. The Makefile passes `-tags e2e` via `go test -C test`. + +### Label-Based Platform Selection +Tests are filtered by Ginkgo labels. The default filter is `"Platform: isSubsetOf {AWS}"`. + +| Label | Scope | Required Secrets | +|-------|-------|-----------------| +| `Platform:AWS` | AWS SecretStore + ExternalSecret + PushSecret | `aws-creds` in `kube-system` | +| `CrossPlatform:GCP-AWS` | GCP cluster using AWS Secrets Manager | `aws-creds` in `kube-system` | +| `Provider:Bitwarden` | Bitwarden ESO provider via ClusterSecretStore | `bitwarden-creds` in `external-secrets-operator` | +| `API:Bitwarden` | Direct HTTP tests to bitwarden-sdk-server | `bitwarden-creds` in `external-secrets-operator` | + +Run a specific suite: `make test-e2e E2E_GINKGO_LABEL_FILTER="Provider:Bitwarden"`. +Run all: `make test-e2e E2E_GINKGO_LABEL_FILTER=""`. + +### E2E Structure +- Suite setup (`e2e_suite_test.go`): initializes K8s clients (`kubernetes.Clientset`, `dynamic.DynamicClient`, `controller-runtime client.Client`), sets timeout with 5-minute cleanup buffer, configures JSON/JUnit reports. +- `BeforeAll`: creates test namespace (prefix `external-secrets-e2e-test-`), creates/verifies `ExternalSecretsConfig` CR, waits for operator/operand pods. +- `BeforeEach`: verifies operand pods are ready. +- `AfterEach`: on failure, dumps artifacts via `utils.DumpE2EArtifacts()` to `ARTIFACT_DIR` or `_output/`. +- `AfterAll`: deletes test namespace and cluster CR. + +### Test Data +- YAML fixtures embedded via `//go:embed testdata/*` in `e2e_test.go`. +- Pattern substitution in YAML via `utils.ReplacePatternInAsset("${PATTERN}", "value")`. +- Fixtures live in `test/e2e/testdata/`. + +### E2E Test Utilities (`test/utils/`) +Key helpers (all require `e2e` build tag): +- `DynamicResourceLoader`: create/delete K8s resources from YAML files or `Unstructured` objects. +- `VerifyPodsReadyByPrefix(ctx, clientset, namespace, prefixes)`: poll until pods with given name prefixes are Running + Ready. +- `WaitForESOResourceReady(ctx, dynamicClient, gvr, namespace, name, timeout)`: poll until ESO resource has `Ready=True` condition. +- `WaitForExternalSecretsConfigReady(ctx, dynamicClient, name, timeout)`: wait for both `Ready=True` and `Degraded=False`. +- `CleanupESOOperandAndRelated(ctx, cfg)`: best-effort cleanup of operand CRs, webhooks, RBAC, namespace. +- `GetRandomString(n)`: random alphanumeric string for unique resource names. + +### Adding a New E2E Test +1. Add `//go:build e2e` at the top of any new file. +2. Place test in `test/e2e/` package; use existing `suiteClientset`, `suiteDynamicClient`, `suiteRuntimeClient`. +3. Apply a Ginkgo `Label(...)` to the `Context` or `Describe` for platform filtering. +4. Use `Ordered` for tests with shared state dependencies. +5. Create test namespace with `GenerateName: testNamespacePrefix`; clean up in `AfterAll`. +6. Use `Eventually(...).Should(Succeed())` for async assertions with polling. +7. Use `defer loader.DeleteFromFile(...)` for resource cleanup. +8. Add YAML fixtures to `test/e2e/testdata/`. + +## Coverage + +- `make test-unit` writes `cover.out` (Go coverage profile) to the repo root. +- E2E coverage uses a special instrumented image: `make docker-build-coverage` builds from `images/ci/Dockerfile.coverage`. `make e2e-coverage-collect` gathers profiles from the running operator and optionally uploads to Codecov. + +## CI Notes + +- `GOFLAGS` is cleared for test targets (line: `fmt vet test test-unit test-e2e run update-vendor update-dep: GOFLAGS=`) to avoid conflicts with `go.work`. +- envtest uses Kubernetes `1.32.0` binaries (`ENVTEST_K8S_VERSION`). +- E2E timeout defaults to `1h` (`E2E_TIMEOUT`); Ginkgo suite timeout is set to `E2E_TIMEOUT - 5min`. +- `ARTIFACT_DIR` controls where CI artifacts (JUnit XML, JSON reports, failure dumps) are written. From dfed87b228ac2c04c0036029550ea725461d41b4 Mon Sep 17 00:00:00 2001 From: Siddhi Bhor Date: Mon, 29 Jun 2026 17:04:17 +0530 Subject: [PATCH 2/3] Adds CONTRIBUTIONG.MD & ARCHITECTURE.MD --- AGENTS.md | 18 ++- ARCHITECTURE.md | 248 ++++++++++++++++++++++++++++++ CONTRIBUTING.md | 191 +++++++++++++++++++++++ docs/api-contracts-guidelines.md | 6 +- docs/error-handling-guidelines.md | 38 +++-- docs/integration-guidelines.md | 9 +- docs/performance-guidelines.md | 20 ++- docs/security-guidelines.md | 12 +- docs/testing-guidelines.md | 17 +- 9 files changed, 514 insertions(+), 45 deletions(-) create mode 100644 ARCHITECTURE.md create mode 100644 CONTRIBUTING.md diff --git a/AGENTS.md b/AGENTS.md index c1c183842..b8f5d4184 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -19,9 +19,9 @@ Three controllers run in a single binary: | Controller | Package | Watches | Purpose | |---|---|---|---| -| `external-secrets-controller` | `pkg/controller/external_secrets/` | `ExternalSecretsConfig` CR + all managed resources | Installs/reconciles operand deployments, RBAC, services, webhooks, network policies | +| `external-secrets-controller` | `pkg/controller/external_secrets/` | `ExternalSecretsConfig` CR + `ExternalSecretsManager` (spec) + managed resources (Deployments, RBAC, Services, Secrets metadata, ConfigMaps, NetworkPolicies, Webhooks) + conditionally `cert-manager.io/Certificate` | Installs/reconciles operand deployments, RBAC, services, webhooks, network policies | | `external-secrets-manager` | `pkg/controller/external_secrets_manager/` | `ExternalSecretsManager` CR + `ExternalSecretsConfig` status | Aggregates controller statuses into a global status CR | -| `crd-annotator` | `pkg/controller/crd_annotator/` | ESO CRDs + `ExternalSecretsConfig` | Adds cert-manager CA injection annotations to operand CRDs (conditional) | +| `crd-annotator` | `pkg/controller/crd_annotator/` | ESO CRDs (metadata, label-filtered) + `ExternalSecretsConfig` | Adds cert-manager CA injection annotations to operand CRDs (conditional; only registered when cert-manager is installed) | ## Project Structure @@ -29,7 +29,9 @@ Three controllers run in a single binary: api/v1alpha1/ CRD type definitions, conditions, shared types, deepcopy api/v1alpha1/tests/ Declarative YAML test suites for CRD validation bindata/ Static operand YAML manifests (compiled into Go via go-bindata) -cmd/ Operator entrypoint (main.go) +bundle/ OLM bundle manifests +cmd/external-secrets-operator/ Operator entrypoint (main.go, separate Go module) +docs/ Guideline docs (security, performance, error handling, etc.) config/ Kustomize manifests (CRDs, RBAC, manager, samples, bundle) hack/ Shell scripts (codegen, verification, CI helpers) images/ci/ CI Dockerfiles (coverage-instrumented builds) @@ -41,11 +43,13 @@ pkg/controller/ Controller implementations external_secrets/ Main operand reconciliation controller external_secrets_manager/ Status aggregation controller pkg/operator/ Manager setup, controller registration + assets/ Generated bindata (bindata.go) pkg/version/ Build-time version info (ldflags) test/apis/ API integration tests (Ginkgo + envtest) test/e2e/ End-to-end tests (Ginkgo + live cluster) test/utils/ E2E test helpers tools/ Go module for build-time tool dependencies +vendor/ Workspace-level vendoring (go work vendor) ``` ## Go Workspace and Module Layout @@ -70,13 +74,13 @@ The repo uses `go.work` with four modules: `.`, `./cmd/external-secrets-operator | `make lint` | Run golangci-lint with all configured linters | | `make lint-fix` | Run golangci-lint with auto-fix | | `make update` | Full regeneration: codegen + manifests + operand manifests + bindata + bundle + docs | -| `make verify` | Run vet + fmt + verify-deps + verify-bindata + verify-generated + govulncheck + check-git-diff | +| `make verify` | Run vet + fmt + verify-deps + verify-bindata + verify-bindata-assets + verify-generated + govulncheck + check-git-diff | | `make update-vendor` | Update vendor directory across all workspace modules | | `make update-dep PKG=x@v` | Update a single dependency across all modules | | `make manifests` | Regenerate CRD YAML, RBAC, webhook configs from kubebuilder markers | | `make generate` | Regenerate DeepCopy methods | | `make docs` | Regenerate API reference docs | -| `make clean` | Remove bin/, _output/, cover.out | +| `make clean` | Remove bin/, _output/, cover.out, dist/ | After any code change, run `make update && make verify` to ensure all generated files are consistent. CI runs `check-git-diff` which fails if generated files are stale. @@ -182,9 +186,9 @@ Resources that depend on CR configuration use a slice of `{assetName string, con - Run `make lint` and `make test` locally before submitting. - Run `make verify` to ensure generated files are in sync. - Add unit tests for new reconciliation logic using table-driven tests and `FakeCtrlClient`. -- Add API test cases in `api/v1alpha1/tests//` for any new CRD field or validation rule. +- Add API test cases in `api/v1alpha1/tests//` (e.g., `externalsecretsconfig.operator.openshift.io/`) for any new CRD field or validation rule. - Add E2E test cases with appropriate Ginkgo labels for platform-specific tests. -- Follow existing error wrapping patterns: `common.FromClientError` for API calls, `common.NewIrrecoverableError` for config validation failures. +- Follow existing error wrapping patterns: `common.FromClientError` for API calls, `common.NewIrrecoverableError` for config validation failures, `common.NewUserConfigurationError` for invalid user-provided configuration. - Commit messages should reference the relevant Jira ticket (e.g., `OCPBUGS-12345: description`). - PR reviewers/approvers are listed in `OWNERS`. diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md new file mode 100644 index 000000000..2aa6da03b --- /dev/null +++ b/ARCHITECTURE.md @@ -0,0 +1,248 @@ +# Architecture + +This document describes the high-level architecture of the External Secrets Operator for Red Hat OpenShift. +If you want to familiarize yourself with the codebase, you are in the right place! + +For detailed guidelines on specific areas, see the files listed in `AGENTS.md`. +For contribution process, see `CONTRIBUTING.md`. + +## Bird's Eye View + +On the highest level, this is a Kubernetes operator that installs and manages the upstream [external-secrets](https://github.com/external-secrets/external-secrets) application on OpenShift clusters. The upstream project provides the actual secret-syncing logic (ExternalSecret, SecretStore, etc.). This operator does **not** embed or fork that code. Instead, it manages upstream resources as a set of static YAML manifests that are compiled into the binary, decoded at runtime, mutated with operator-controlled configuration, and applied imperatively to the cluster. + +``` + ┌─────────────────────────────────────────────┐ + │ ExternalSecretsConfig CR │ + │ (singleton "cluster", user-facing API) │ + └──────────────────┬──────────────────────────┘ + │ + ┌──────────────────────┼──────────────────────┐ + │ │ │ + ┌─────────▼──────────┐ ┌───────▼────────┐ ┌──────────▼─────────┐ + │ external-secrets │ │ external- │ │ crd-annotator │ + │ controller │ │ secrets-manager│ │ controller │ + │ │ │ │ │ │ + │ Decodes bindata, │ │ Aggregates │ │ Patches ESO CRDs │ + │ mutates, creates │ │ controller │ │ with cert-manager │ + │ operand resources │ │ statuses into │ │ CA injection │ + │ (Deployments, │ │ a global │ │ annotations │ + │ RBAC, Services, │ │ status CR │ │ │ + │ NetworkPolicies, │ │ │ │ (conditional: only │ + │ Webhooks, etc.) │ │ │ │ when cert-manager │ + │ │ │ │ │ is installed) │ + └────────────────────┘ └────────────────┘ └────────────────────┘ + │ + ▼ + ┌────────────────────┐ + │ Upstream operand │ + │ (external-secrets │ + │ deployments in │ + │ "external-secrets"│ + │ namespace) │ + └────────────────────┘ +``` + +Two singleton CRs drive the system: + +- **ExternalSecretsConfig** (name: `cluster`): the primary user-facing CR. Controls operand installation, cert-manager toggle, Bitwarden plugin, proxy, network policies, per-component overrides, and custom trusted CA bundles. +- **ExternalSecretsManager** (name: `cluster`): global operator-level config, auto-created at install. Provides lower-priority defaults and optional features (e.g., `UnsafeAllowGenericTargets`). + +The precedence chain for shared fields is: `ExternalSecretsConfig > ExternalSecretsManager > OLM environment variables` (proxy only). + +## Code Map + +This section describes important directories and data structures. +Pay attention to the **Architecture Invariant** sections. + +### `api/v1alpha1/` + +CRD type definitions. This is where `ExternalSecretsConfig` and `ExternalSecretsManager` structs live, along with shared types (`CommonConfigs`, `ProxyConfig`, `Mode`, etc.), condition constants, and deepcopy code. + +Validation is entirely CEL-based via `+kubebuilder:validation:XValidation` markers on the types. There are no admission webhooks for the operator's own CRDs. + +`api/v1alpha1/tests/` contains declarative YAML test suites (`.testsuite.yaml` files) organized by CRD API group domain. These are automatically picked up by the test generator. + +**Architecture Invariant:** both CRDs are cluster-scoped singletons enforced by CEL (`self.metadata.name == 'cluster'`). There is exactly one instance of each, ever. + +**Architecture Invariant:** several fields on `CertManagerConfig` (`mode`, `issuerRef`, `injectAnnotations`) are immutable once set, enforced by CEL `self == oldSelf` rules. This prevents configuration drift that would leave the cluster in an inconsistent state. + +### `bindata/` + +Static YAML manifests for the upstream operand. These are the Deployments, Services, RBAC, NetworkPolicies, Certificates, and ValidatingWebhookConfigurations that the operator creates in the cluster. + +Files in `bindata/external-secrets/resources/` follow the pattern `_.yml`. Network policies in `bindata/external-secrets/` use `.yaml`. + +These files are compiled into Go via go-bindata and live as `pkg/operator/assets/bindata.go` at build time. + +**Architecture Invariant:** bindata manifests are templates, not final resources. The controller decodes them, mutates them (namespace, labels, annotations, images, env vars, security context), and then creates or updates them. Never deploy bindata YAML directly to a cluster. + +**Architecture Invariant:** do not hand-edit `bindata.go`. Run `make update` to regenerate it from the YAML sources. + +### `cmd/external-secrets-operator/` + +The operator binary entrypoint. This is a separate Go module. It sets up the controller-runtime manager, configures metrics/webhook servers (with HTTP/2 disabled by default), wires the cache builder, and starts the controllers. + +**Architecture Invariant:** HTTP/2 is disabled for metrics and webhook servers to mitigate known HTTP/2 vulnerabilities. The `--enable-http2` flag defaults to `false`. + +### `pkg/controller/` + +The heart of the operator. Three controller packages plus shared infrastructure: + +#### `pkg/controller/external_secrets/` + +The main reconciliation controller. Watches `ExternalSecretsConfig`, `ExternalSecretsManager`, and all managed operand resources. The reconciliation flow in `install_external_secrets.go` creates resources in strict dependency order: + +``` +Namespace → NetworkPolicies → ServiceAccounts → Certificates → Secrets +→ TrustedCA ConfigMap → RBAC → Services → Deployments +→ ValidatingWebhooks → CR annotation tracking +``` + +Each resource follows the same pattern: decode from bindata → mutate → check existence → create or update if drifted → record events. + +`controller.go` is the entry point. It defines `controllerManagedResources` (the set of resource types the controller owns), `NewCacheBuilder` (manager-level label-filtered cache), and `SetupWithManager` (watch/predicate wiring). + +`constants.go` contains all asset path constants, label keys, env var names, and network policy prefixes. + +**Architecture Invariant:** the reconciliation order must not change. CR annotation tracking is patched last to ensure obsolete annotations are removed before tracking is updated. + +**Architecture Invariant:** never return both `RequeueAfter` and a non-nil error from `Reconcile`. Return one or the other. + +#### `pkg/controller/external_secrets_manager/` + +Status aggregation controller. Watches `ExternalSecretsManager` and `ExternalSecretsConfig` status. Copies per-controller conditions into the manager's `controllerStatuses` list. Also handles auto-creation of the default `ExternalSecretsManager` CR at startup. + +#### `pkg/controller/crd_annotator/` + +Conditionally registered (only when cert-manager is installed). Watches ESO CRD metadata and `ExternalSecretsConfig`. When cert-manager is enabled with `injectAnnotations: "true"`, patches all ESO CRDs with the `cert-manager.io/inject-ca-from` annotation. + +Builds its own custom cache (`BuildCustomClient`) because it watches a disjoint resource set from the main controller. + +#### `pkg/controller/client/` + +The `CtrlClient` interface. All controllers interact with Kubernetes through this interface, not the raw controller-runtime `client.Client`. It adds `UpdateWithRetry`, `StatusUpdate`, and `Exists` methods. Unit tests use counterfeiter-generated fakes in `client/fakes/`. + +**Architecture Invariant:** all resource updates go through `UpdateWithRetry`, which wraps `retry.RetryOnConflict` with a read-modify-write pattern. This prevents silent data loss from stale resource versions. + +#### `pkg/controller/common/` + +Shared utilities used by all controllers: + +- `errors.go` — `ReconcileError` type with three reasons: `IrrecoverableError` (no retry), `RetryRequiredError` (30s requeue), `UserConfigurationError` (no retry, watches drive recovery). `FromClientError` auto-classifies Kubernetes API errors. +- `constants.go` — singleton object names, annotation keys, version strings. +- `utils.go` — `HasObjectChanged` (type-specific drift detection), `Decode*ObjBytes` (bindata decoders that panic on failure), `ApplyResourceMetadata`, finalizer helpers, the `Now` resettable-once type. + +**Architecture Invariant:** `Decode*ObjBytes` helpers panic on failure. This is intentional — they are called only with compile-time-known static assets, so a panic indicates a build-time bug, not a runtime error. + +#### `pkg/controller/commontest/` + +Shared test fixtures: `TestExternalSecretsConfig()`, `TestExternalSecretsManager()`, standard test constants and a sentinel error. + +### `pkg/operator/` + +Manager setup and controller registration. `setup_manager.go` wires all three controllers and handles the default ESM resource creation. `assets/bindata.go` is the generated bindata. + +### `config/` + +Kustomize manifests for CRDs, RBAC, manager deployment, samples, and OLM bundle inputs. CRDs in `config/crd/bases/` and RBAC in `config/rbac/role.yaml` are generated — do not hand-edit them. + +### `test/` + +A separate Go module (`./test` in `go.work`) containing: + +- `test/apis/` — API integration tests using Ginkgo + envtest. A generator in `generator.go` auto-creates Ginkgo specs from the YAML test suites in `api/v1alpha1/tests/`. +- `test/e2e/` — End-to-end tests using Ginkgo against a live cluster. Filtered by Ginkgo labels (`Platform:AWS`, `Provider:Bitwarden`, etc.). +- `test/utils/` — Shared E2E helpers (dynamic resource loading, pod readiness polling, artifact dumping). + +### `hack/` + +Shell scripts for codegen, verification, and CI. Notable scripts: +- `go-fips.sh` — enables FIPS build tags for production. +- `test-apis.sh` — runs API tests with Ginkgo flags. +- `e2e-coverage.sh` — manages coverage-instrumented E2E builds. + +### `tools/` + +A separate Go module for build-time tool dependencies (controller-gen, golangci-lint, ginkgo, etc.). All tools are vendored and built from source. + +## Cross-Cutting Concerns + +### Two Clients + +The operator maintains two Kubernetes clients: + +- **Cached client** (`r.CtrlClient`): reads from the manager cache, which is configured with label selectors (`app=external-secrets`) via `NewCacheBuilder`. Used for all managed operand resources. +- **Uncached client** (`r.UncachedClient`): reads directly from the API server. Used only for objects not tracked by the cache — cert-manager Issuers, user-provided Secrets (Bitwarden `secretRef`), and one-time bootstrap operations. + +**Architecture Invariant:** never use the uncached client for objects in `controllerManagedResources`. The cache is designed for those. + +### Drift Detection + +The operator continuously reconciles all managed resources back to desired state. `HasObjectChanged` in `pkg/controller/common/utils.go` uses type-specific field comparison (not full `DeepEqual`) to detect drift: + +- Deployments: containers, init containers, volumes, affinity, tolerations, env vars, security context, revision history limit, etc. +- RBAC: Rules (Roles) or RoleRef+Subjects (Bindings) +- NetworkPolicies: PodSelector, PolicyTypes, Ingress, Egress +- Webhooks: individual webhook entries by name +- Certificates: full Spec via DeepEqual + +Annotations are compared using managed-key tracking, which only checks keys the operator manages, ignoring annotations set by external controllers. This prevents infinite reconcile loops. + +**Architecture Invariant:** if someone manually modifies a managed ClusterRole, Deployment, or NetworkPolicy, the operator will detect the change and revert it. This is a critical security property. + +### Error Classification + +Reconciliation errors flow through three channels: + +| Reason | Requeue? | Status | +|---|---|---| +| `IrrecoverableError` | No | Degraded=True | +| `UserConfigurationError` | No (except NotFound → 30s) | Degraded=True | +| `RetryRequiredError` | Yes (30s) | Ready=False/Progressing | + +`FromClientError` auto-classifies Kubernetes API errors: `Unauthorized`, `Forbidden`, `Invalid`, and `BadRequest` become irrecoverable; everything else becomes retry-required. + +### Code Generation + +Several artifacts are generated and must be committed: + +- `zz_generated.deepcopy.go` — from `make generate` +- `config/crd/bases/*.yaml` — from `make manifests` +- `pkg/operator/assets/bindata.go` — from `make update-bindata` +- `config/rbac/role.yaml` — from `make manifests` +- `docs/api_reference.md` — from `make docs` + +`make update` runs the full pipeline. `make verify` checks that generated files are fresh via `check-git-diff`. CI will reject PRs with stale generated files. + +**Architecture Invariant:** never edit generated files by hand. Always use `make update`. + +### Container Security + +All operand containers are hardened programmatically via `updateContainerSecurityContext()`: + +- `AllowPrivilegeEscalation: false` +- `Capabilities.Drop: ["ALL"]` +- `ReadOnlyRootFilesystem: true` +- `RunAsNonRoot: true` +- `SeccompProfile.Type: RuntimeDefault` + +All Dockerfiles run as `USER 65534:65534` (nobody). The reconciler drift-detects container security context changes and reverts them. + +### Network Policy Architecture + +The operator deploys a **deny-all** base NetworkPolicy, then layers specific allow-policies (prefixed `eso-sys-`) for API server egress, webhook traffic, DNS, and conditionally cert-controller and Bitwarden. User-defined policies (prefixed `eso-user-`) are restricted to egress only. + +### Go Workspace + +The repo uses `go.work` with four modules: `.`, `./cmd/external-secrets-operator`, `./test`, `./tools`. This means: + +- `GOFLAGS` is cleared for test and fmt targets to avoid `-mod=vendor` conflicting with `go.work`. +- Dependencies are updated across all modules via `make update-dep PKG=pkg@version`. +- Vendoring is workspace-level (`go work vendor`). +- All build-time tools are vendored and built from source. + +### Relationship to Upstream + +This operator manages — but does not contain — the upstream [external-secrets](https://github.com/external-secrets/external-secrets) project. The upstream project defines the CRDs that end users interact with (`ExternalSecret`, `SecretStore`, `ClusterSecretStore`, `PushSecret`, etc.) and the controllers that sync secrets from external providers. This operator's job is to deploy, configure, and lifecycle-manage those upstream components on OpenShift, providing an opinionated, security-hardened, and OLM-integrated installation. + +The upstream operand version is controlled by the `RELATED_IMAGE_EXTERNAL_SECRETS` environment variable, typically set by OLM/CSV. The operator has no compile-time dependency on the upstream Go code. diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 000000000..efc84d71f --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,191 @@ +# Contributing to External Secrets Operator for Red Hat OpenShift + +Thank you for your interest in contributing! This guide covers the PR process, review expectations, coding conventions, and testing requirements. + +## Getting Started + +1. Fork the repository and clone your fork. +2. Create a feature branch from `main`. +3. Set up your environment (Go version matching `go.mod`, `kubectl` v1.32.1+, access to a Kubernetes cluster for E2E tests). + +## Development Workflow + +### Build and Verify + +```sh +make build # Full build: codegen + manifests + fmt + vet + compile +make build-operator # Fast rebuild (binary only, no codegen) +``` + +After **any** code change, regenerate and verify before committing: + +```sh +make update && make verify +``` + +This regenerates all code, manifests, bindata, and bundle artifacts, then runs verification checks including `check-git-diff`. CI will reject PRs with stale generated files. + +### Linting + +```sh +make lint # Run golangci-lint with all configured linters +make lint-fix # Auto-fix linting issues +``` + +### Testing + +Run the full local test suite (no cluster required): + +```sh +make test # Unit + API integration tests +``` + +Individual test targets: + +| Target | Description | +|---|---| +| `make test-unit` | Unit tests (excludes `test/e2e`, `test/apis`, `test/utils`) | +| `make test-apis` | API validation tests via envtest | +| `make test-e2e` | End-to-end tests against a live cluster | + +For E2E tests with label filters: + +```sh +make test-e2e E2E_GINKGO_LABEL_FILTER="" +``` + +See [test/e2e/README.md](test/e2e/README.md) for E2E prerequisites and suite-specific instructions. + +### Pre-Submission Checklist + +Before opening a PR, always run: + +```sh +make lint +make test +make update && make verify +``` + +## Coding Conventions + +### Import Order + +Imports must follow the order enforced by `gci` in `.golangci.yml`: + +1. Standard library +2. Third-party packages +3. `github.com/openshift/external-secrets-operator` (project-local) +4. Blank imports, dot imports, aliases, local module + +### File Headers + +All `.go` files must include the Apache 2.0 license header from [`hack/boilerplate.go.txt`](hack/boilerplate.go.txt). + +### Naming Conventions + +- **Go packages**: Controller packages use `snake_case` (`external_secrets`, `crd_annotator`). +- **Controller names**: kebab-case in strings (`"external-secrets-controller"`, `"crd-annotator"`). +- **Finalizers**: `./`. +- **Asset name constants**: camelCase with pattern `_AssetName`. +- **Bindata YAML files**: `_.yml` in `bindata/external-secrets/resources/`. + +### Linter Rules + +The repo uses golangci-lint v2 (see [`.golangci.yml`](.golangci.yml)). Key rules to be aware of: + +- **depguard** blocks `math/rand` (use `math/rand/v2`), `github.com/sirupsen/logrus`, and `github.com/pkg/errors` (use `errors`/`fmt`). +- **golines** enforces a max line length of 200 characters. +- **gofmt** rewrites `interface{}` to `any`. + +### Error Handling + +- Wrap API call errors with `common.FromClientError`. +- Use `common.NewIrrecoverableError` for config validation failures. +- Never return both `RequeueAfter` and a non-nil error from `Reconcile`. + +### Generated Files + +Never edit these files by hand — always use `make update`: + +- `zz_generated.deepcopy.go` +- `config/crd/bases/*.yaml` +- `pkg/operator/assets/bindata.go` +- `config/rbac/role.yaml` +- `docs/api_reference.md` + +## Testing Requirements + +### Unit Tests + +- Add unit tests for new reconciliation logic using table-driven tests and `FakeCtrlClient`. +- Unit tests use counterfeiter-generated fakes from `pkg/controller/client/fakes/`. +- Regenerate fakes after changing the `CtrlClient` interface: `go generate ./pkg/controller/client/...` + +### API Tests + +- Add test cases in `api/v1alpha1/tests//` for any new CRD field or validation rule. +- API tests run via envtest (Ginkgo). + +### E2E Tests + +- Add E2E test cases with appropriate Ginkgo labels for platform-specific tests. +- E2E tests require a live cluster with the operator deployed and operands stable. + +## Pull Request Process + +### Creating a PR + +1. Keep diffs small and focused. One logical change per PR. +2. Write a clear, descriptive PR title. +3. Reference the relevant Jira ticket in your commit messages (e.g., `OCPBUGS-12345: description`). +4. Describe your changes and the motivation behind them in the PR body. + +### What to Include + +- **Code changes** with corresponding tests (unit, API, or E2E as appropriate). +- **Generated file updates** — run `make update` and commit the results. CI checks for staleness via `check-git-diff`. +- **Documentation updates** — update `README.md` or files under `docs/` when behavior changes are user-visible. + +### Review Process + +- PRs are reviewed and approved by maintainers listed in the [OWNERS](OWNERS) file. +- All CI checks must pass before merge. +- Reviewers will check for adherence to the architectural patterns documented in [AGENTS.md](AGENTS.md), including reconciler structure, resource reconciliation flow, and proper use of the `CtrlClient` interface. + +### CI Expectations + +CI runs the following checks (among others): + +- `make verify` — vet, fmt, dependency verification, bindata verification, generated file verification, govulncheck, and `check-git-diff`. +- `make lint` — full golangci-lint suite. +- `make test` — unit and API integration tests. + +Ensure all of these pass locally before pushing. + +## Go Workspace + +The repo uses `go.work` with four modules (`.`, `./cmd/external-secrets-operator`, `./test`, `./tools`). Key implications: + +- Do **not** use `-mod=vendor` for `go test` or `go fmt` — the Makefile handles `GOFLAGS` clearing. +- To add a dependency: `make update-dep PKG=pkg@version`, then `make update-vendor`. +- All build-time tools are vendored and built from source. Do not install tools globally. + +## Container Builds + +The default container tool is `podman`. Override with: + +```sh +CONTAINER_TOOL=docker make docker-build +``` + +## Additional Resources + +- [AGENTS.md](AGENTS.md) — Architecture, controller details, and common pitfalls +- [docs/testing-guidelines.md](docs/testing-guidelines.md) — Detailed testing framework and patterns +- [docs/api-contracts-guidelines.md](docs/api-contracts-guidelines.md) — CRD types, kubebuilder markers, CEL validation +- [docs/error-handling-guidelines.md](docs/error-handling-guidelines.md) — ReconcileError types, retry logic, status conditions +- [docs/security-guidelines.md](docs/security-guidelines.md) — Container security, RBAC, TLS, network policies + +## License + +By contributing, you agree that your contributions will be licensed under the [Apache License 2.0](LICENSE). diff --git a/docs/api-contracts-guidelines.md b/docs/api-contracts-guidelines.md index 708535ca1..71e947d86 100644 --- a/docs/api-contracts-guidelines.md +++ b/docs/api-contracts-guidelines.md @@ -82,7 +82,7 @@ The repo uses `//nolint:kubeapilinter` comments for intentional deviations. Docu ### ExternalSecretsConfig Status - Embeds `ConditionalStatus` (from `meta.go`) which holds `[]metav1.Condition` with standard merge markers (`+listType=map`, `+listMapKey=type`, `+patchMergeKey=type`, `+patchStrategy=merge`). - Uses standard `metav1.Condition` with `ObservedGeneration` set to `esc.GetGeneration()`. -- Condition types: `Ready` and `Degraded` (defined as constants in `conditions.go`). +- Condition types: `Ready`, `Degraded`, and `UpdateAnnotation` (all defined as constants in `conditions.go`). The `UpdateAnnotation` condition is set by the `crd-annotator` controller on the `ExternalSecretsConfig` status. - Reasons: `Failed`, `Ready`, `Progressing`, `Completed` (also constants, with `Progressing` defined as constant `ReasonInProgress`). - Both conditions are set atomically before a single status update call. - Additional status fields: `externalSecretsImage`, `bitwardenSDKServerImage`. @@ -116,7 +116,7 @@ Use `+kubebuilder:default` for server-side defaulting: - `CommonConfigs` (in `meta.go`) is embedded via `json:",inline"` in both `ApplicationConfig` and `GlobalConfig`. It provides `logLevel`, `resources`, `affinity`, `tolerations`, `nodeSelector`, `proxy`. - `ObjectReference`, `SecretReference`, `ConfigMapKeyReference`, `ConditionalStatus` are reusable building blocks in `meta.go`. -- Named types (`Mode`, `ManagementState`, `ComponentName`) with constants must be used instead of raw strings for enum fields. +- Named types (`Mode`, `ManagementState`, `ComponentName`, `FeatureName`) with constants must be used instead of raw strings for enum fields. ## Code Generation Pipeline @@ -131,7 +131,7 @@ Never edit `zz_generated.deepcopy.go` or CRD YAML files by hand. ## API Integration Tests -API validation is tested via declarative YAML test suites in `api/v1alpha1/tests//`. Each `.testsuite.yaml` file defines `onCreate` and `onUpdate` test cases specifying `initial`, `expected`, and `expectedError` YAML. The test framework (in `test/apis/`) installs CRDs into a real envtest API server (requires Kube >= 1.25 for CEL). +API validation is tested via declarative YAML test suites in `api/v1alpha1/tests//` (e.g., `externalsecretsconfig.operator.openshift.io/`, `externalsecretsmanager.operator.openshift.io/`). Each `.testsuite.yaml` file defines `onCreate` and `onUpdate` test cases specifying `initial`, `expected`, and `expectedError` YAML. The test framework (in `test/apis/`) installs CRDs into a real envtest API server (requires Kube >= 1.25 for CEL). When adding a new CEL validation rule or field constraint, add corresponding test cases to the relevant `.testsuite.yaml` covering: - Valid creation (initial + expected) diff --git a/docs/error-handling-guidelines.md b/docs/error-handling-guidelines.md index 2eb97326c..6d7b7ed71 100644 --- a/docs/error-handling-guidelines.md +++ b/docs/error-handling-guidelines.md @@ -4,37 +4,44 @@ Resource reconciliation errors flow through the `ReconcileError` type defined in `pkg/controller/common/errors.go`. It carries a `Reason` (`ErrorReason` string) that determines requeue behavior, a human-readable `Message`, and the underlying `Err`. Setup errors (finalizers, client construction, CR fetching) use plain `fmt.Errorf` wrapping as described below. -### Two Error Reasons +### Three Error Reasons | Reason | Constant | Requeue? | When to use | |---|---|---|---| | `IrrecoverableError` | `common.IrrecoverableError` | No | Invalid config, missing env vars, permission errors, bad requests | | `RetryRequiredError` | `common.RetryRequiredError` | Yes (30s) | Transient network issues, resource conflicts, timeouts, not-found | +| `UserConfigurationError` | `common.UserConfigurationError` | No (unless NotFound) | Invalid or incomplete user-provided configuration (e.g., missing referenced Secret or Issuer). Sets Degraded. Recovery is driven by watches on the affected resource; NotFound errors still use periodic requeue (30s) until the referenced object exists. | ### Constructor Functions - `common.NewIrrecoverableError(err, messageFmt, args...)` -- for errors that cannot be fixed by retrying. - `common.NewRetryRequiredError(err, messageFmt, args...)` -- for transient errors worth retrying. +- `common.NewUserConfigurationError(err, messageFmt, args...)` -- for errors caused by invalid or incomplete user configuration. The operator sets Degraded but generally does not requeue (relying on watches instead), except when the underlying error is NotFound. - `common.FromClientError(err, messageFmt, args...)` -- auto-classifies Kubernetes API errors: `Unauthorized`, `Forbidden`, `Invalid`, and `BadRequest` become irrecoverable; everything else becomes retry-required. -- All three return `nil` when the input `err` is `nil`. Always check this at call sites when the constructor is the last expression before return. +- All four return `nil` when the input `err` is `nil`. Always check this at call sites when the constructor is the last expression before return. ### Checking Error Type -Use `common.IsIrrecoverableError(err)` to check whether an error is irrecoverable. It uses `errors.As` to traverse wrapped error chains. There is no corresponding `IsRetryRequiredError` -- the convention is: if it is not irrecoverable and is a `ReconcileError`, it is retryable. +- `common.IsIrrecoverableError(err)` -- checks whether an error is irrecoverable. Uses `errors.As` to traverse wrapped error chains. +- `common.IsRetryRequiredError(err)` -- checks whether an error is retry-required. +- `common.IsUserConfigurationError(err)` -- checks whether an error is a user configuration error. +- `common.IsUserConfigurationNotFound(err)` -- checks whether a user configuration error is caused by a missing object (NotFound). Used to decide whether to requeue. ## Error-to-Reconcile-Result Mapping The main reconcile dispatcher in `processReconcileRequest` (in `pkg/controller/external_secrets/controller.go`) maps errors to `ctrl.Result` as follows: ``` -err == nil -> Result{}, nil (success, no requeue) -IsIrrecoverableError -> Result{}, errUpdate (no requeue; only status update error propagates) -retryable error -> Result{RequeueAfter: 30s}, nil (manual requeue, no error to controller-runtime) -status update failure -> Result{}, errUpdate (let controller-runtime handle backoff) -NotFound on primary CR -> Result{}, nil (skip reconciliation silently) +err == nil -> Result{}, nil (success, no requeue) +IsIrrecoverableError -> Result{}, errUpdate (no requeue; only status update error propagates) +IsUserConfigurationError -> Result{}, errUpdate (no requeue; Degraded=True; watches drive recovery) + IsUserConfigurationNotFound -> Result{RequeueAfter: 30s}, nil (requeue because watches won't fire for unmanaged objects) +RetryRequiredError -> Result{RequeueAfter: 30s}, nil (manual requeue, no error to controller-runtime) +status update failure -> Result{}, errUpdate (let controller-runtime handle backoff) +NotFound on primary CR -> Result{}, nil (skip reconciliation silently) ``` -Key rule: never return both `RequeueAfter` and a non-nil error. For recoverable errors, return `RequeueAfter` with `nil` error. For irrecoverable errors, return empty `Result` so no requeue happens. +Key rule: never return both `RequeueAfter` and a non-nil error. For recoverable errors, return `RequeueAfter` with `nil` error. For irrecoverable and user-configuration errors, return empty `Result` so no requeue happens (except user-configuration NotFound, which requeues). The default requeue interval is `common.DefaultRequeueTime` (30 seconds), defined in `pkg/controller/common/constants.go`. @@ -97,12 +104,13 @@ The ESM default resource creation (`pkg/controller/external_secrets_manager/exte ### Condition Update Rules -1. On irrecoverable error: set `Degraded=True/Failed` and `Ready=False/Ready`. -2. On retryable error: set `Degraded=False/Ready` and `Ready=False/Progressing` with the error message. -3. On success: set `Degraded=False/Ready` and `Ready=True/Ready`. -4. Set both conditions atomically via `apimeta.SetStatusCondition` before calling `updateStatus`. -5. Only call `updateStatus` when `SetStatusCondition` returns `true` (condition actually changed). -6. Always include `ObservedGeneration` from the CR's current `.metadata.generation`. +1. On irrecoverable error: set `Degraded=True/Failed` and `Ready=False/Failed`. +2. On user configuration error: set `Degraded=True/Failed` and `Ready=False/Failed` with message `"user configuration is invalid: ..."`. +3. On retryable error: set `Degraded=False/Ready` and `Ready=False/Progressing` with the error message. +4. On success: set `Degraded=False/Ready` and `Ready=True/Ready`. +5. Set both conditions atomically via `apimeta.SetStatusCondition` before calling `updateStatus`. +6. Only call `updateStatus` when `SetStatusCondition` returns `true` (condition actually changed). +7. Always include `ObservedGeneration` from the CR's current `.metadata.generation`. ### Error Aggregation on Status Update Failure diff --git a/docs/integration-guidelines.md b/docs/integration-guidelines.md index 2f9692502..32ab07684 100644 --- a/docs/integration-guidelines.md +++ b/docs/integration-guidelines.md @@ -7,7 +7,7 @@ This operator does NOT embed or fork the upstream external-secrets project. It m ## CRD Hierarchy and Singleton Pattern - **ExternalSecretsConfig** (`externalsecretsconfigs.operator.openshift.io`): primary CR, singleton named `cluster`. Controls operand installation, cert-manager toggle, Bitwarden plugin, proxy, network policies, and per-component overrides. -- **ExternalSecretsManager** (`externalsecretsmanagers.operator.openshift.io`): global config CR, also singleton named `cluster`. Provides lower-priority defaults (labels, resources, proxy, tolerations, affinity, nodeSelector, logLevel) that ExternalSecretsConfig overrides. +- **ExternalSecretsManager** (`externalsecretsmanagers.operator.openshift.io`): global config CR, also singleton named `cluster`. Provides lower-priority defaults (labels, resources, proxy, tolerations, affinity, nodeSelector, logLevel) that ExternalSecretsConfig overrides. Also provides optional `features` (e.g., `UnsafeAllowGenericTargets`) that apply across managed deployments. - Precedence chain for shared fields: `ExternalSecretsConfig > ExternalSecretsManager > OLM environment variables` (proxy only). ## Static Manifest (Bindata) Pattern @@ -30,7 +30,8 @@ Never reorder. CR annotation tracking (managed-annotations) is patched last to e Many resources are conditionally created based on CR spec. The pattern uses a slice of `{assetName, condition}` structs: - **cert-controller deployment/service**: created when cert-manager is NOT enabled (`!isCertManagerConfigEnabled(esc)`) -- **bitwarden deployment/service/certificate/network-policy**: created when Bitwarden IS enabled (`isBitwardenConfigEnabled(esc)`) +- **bitwarden deployment/service/service-account/network-policy**: created when Bitwarden IS enabled (`isBitwardenConfigEnabled(esc)`) +- **bitwarden certificate**: created only when Bitwarden IS enabled AND no `secretRef` is provided AND cert-manager is enabled - **webhook TLS secret**: created only when cert-manager is NOT enabled (cert-manager manages the secret otherwise, named `external-secrets-webhook-cm` to avoid collision) ## cert-manager Integration @@ -45,7 +46,7 @@ Many resources are conditionally created based on CR spec. The pattern uses a sl - **Trusted CA bundle**: when proxy is configured, a ConfigMap `external-secrets-trusted-ca-bundle` is created with label `config.openshift.io/inject-trusted-cabundle: "true"`. OpenShift's Cluster Network Operator (CNO) injects cluster CA certs into it. The ConfigMap data is owned by CNO; the operator only manages labels/annotations. Mounted at `/etc/pki/tls/certs` in all containers. - **Proxy**: both uppercase and lowercase variants (`HTTP_PROXY`/`http_proxy`, etc.) are set on all containers and init containers. Proxy is removed cleanly when config is cleared. -- **Network policies**: default deny-all is always applied. Static allow policies for API server egress, webhook, DNS, and optionally cert-controller/bitwarden are applied from bindata. Users add custom egress-only policies via `controllerConfig.networkPolicies` with component targeting. +- **Network policies**: default deny-all is always applied. Static allow policies (prefixed `eso-sys-`) for API server egress, webhook, DNS, and optionally cert-controller/bitwarden are applied from bindata. An automatic proxy egress NetworkPolicy (`eso-sys-allow-proxy-egress`) is created when proxy URLs are set and `networkPolicyProvisioning` is `Managed`. Users add custom egress-only policies via `controllerConfig.networkPolicies` (prefixed `eso-user-`) with component targeting. - **Security context**: all containers get hardened `SecurityContext` (non-root, read-only root FS, drop ALL capabilities, seccomp RuntimeDefault). - **Console capability annotation**: ConsoleYAMLSample resources require `capability.openshift.io/name: Console` annotation. @@ -66,7 +67,7 @@ Container images in static YAML manifests (`bindata/`) have placeholder values t - `BITWARDEN_SDK_SERVER_IMAGE_VERSION`: version label for bitwarden resources - `OPERATOR_IMAGE_VERSION`: operator version -The reconciler returns an irrecoverable error if image env vars are empty. +The reconciler returns an irrecoverable error if `RELATED_IMAGE_EXTERNAL_SECRETS` or `RELATED_IMAGE_BITWARDEN_SDK_SERVER` is empty. The version env vars (`OPERAND_EXTERNAL_SECRETS_IMAGE_VERSION`, `BITWARDEN_SDK_SERVER_IMAGE_VERSION`, `OPERATOR_IMAGE_VERSION`) may be empty without causing errors -- version labels will simply be blank. ## Client Architecture diff --git a/docs/performance-guidelines.md b/docs/performance-guidelines.md index cbd4c439c..2dc9bca60 100644 --- a/docs/performance-guidelines.md +++ b/docs/performance-guidelines.md @@ -27,23 +27,26 @@ The controller uses different predicate combinations per resource type to minimi | Resource | Watch Method | Predicates | Rationale | |---|---|---|---| | ExternalSecretsConfig (primary) | `For()` | `GenerationChangedPredicate` | Ignore status-only updates | -| Deployments | `Watches()` | `GenerationChangedPredicate` + `managedResources` | Skip status/replica updates | +| Deployments | `Watches()` | `Or(GenerationChangedPredicate, LabelChangedPredicate)` + `managedResources` | Skip status/replica updates; catch label changes | | Secrets | `WatchesMetadata()` | `LabelChangedPredicate` | Avoid caching Secret data in memory | +| ConfigMaps | `Watches()` | `ResourceVersionChangedPredicate` + `managedOrWatchedResources` | Watch both operator-managed and user-referenced (trustedCABundle) ConfigMaps | | Other managed resources | `Watches()` | `managedResources` (label filter) | Only watch operator-owned objects | -| ExternalSecretsManager | `Watches()` | `GenerationChangedPredicate` + `managedResources` | Skip status-only changes | +| ExternalSecretsManager | `Watches()` | `GenerationChangedPredicate` (no `managedResources`) | Skip status-only changes; ESM is not labeled `app=external-secrets` | +| Certificates (conditional) | `Watches()` | `managedResources` | Only when cert-manager CRD is installed | | CRDs (crd_annotator) | `WatchesMetadata()` | `AnnotationChangedPredicate` + label filter | Only metadata matters | When adding new watched resources, always apply at minimum the `managedResources` predicate to avoid reconciling unrelated objects. Use `WatchesMetadata()` when only labels/annotations matter -- it avoids fetching full object bodies. ### Map Function Convention -All controllers map events to a single reconcile key: the CR name (`common.ExternalSecretsConfigObjectName` = "cluster"). The map function checks `obj.GetLabels()[requestEnqueueLabelKey] == requestEnqueueLabelValue` and returns an empty slice for non-matching objects. Never enqueue multiple requests from a single event. +The `external-secrets-controller` maps events to a single reconcile key: `common.ExternalSecretsConfigObjectName` ("cluster"). The map function checks `hasManagedOrWatchLabel(objLabels)` which matches either the managed resource label (`app=external-secrets` via `ManagedResourceLabelKey`/`ManagedResourceLabelValue`) or the watch label (`externalsecretsconfig.operator.openshift.io/watching`), and returns an empty slice for non-matching objects. The `crd-annotator` uses a separate map function with its own label filter (`external-secrets.io/component=controller`). The `external-secrets-manager` controller uses `EnqueueRequestForObject` directly for its primary CR. Never enqueue multiple requests from a single event. ## Reconciliation Patterns ### Error Classification and Requeue Strategy -The operator classifies errors into two categories via `common.ReconcileError`: +The operator classifies errors into three categories via `common.ReconcileError`: - **IrrecoverableError**: Config validation failures, permission errors, bad requests. Returns `ctrl.Result{}` with no requeue. Examples: missing CRD, invalid cert-manager config, missing env var. +- **UserConfigurationError**: Invalid or incomplete user-provided configuration. Returns `ctrl.Result{}` with no requeue (Degraded=True); recovery is driven by watches. Exception: `IsUserConfigurationNotFound` requeues with 30s since watches won't fire for unmanaged objects. - **RetryRequiredError**: Transient API failures, conflicts. Requeues with `DefaultRequeueTime` (30s). The helper `common.FromClientError()` auto-classifies based on API error type. Convention: never return both a `RequeueAfter` result and an error simultaneously. Return the error alone and let controller-runtime handle backoff, or return `RequeueAfter` with nil error. @@ -68,7 +71,7 @@ Status subresource updates use `retry.RetryOnConflict(retry.DefaultRetry, ...)` ## Concurrency Primitives ### Resettable sync.Once (`common.Now`) -The `Now` type in `pkg/controller/common/utils.go` extends `sync.Once` with a `Reset()` method using `sync.Mutex` + `atomic.Uint32` with double-checked locking. It is used in the `external_secrets_manager` controller to emit a warning event only once per error cycle, resetting when the error resolves. Use this type when you need one-shot behavior that can be re-armed. +The `Now` type in `pkg/controller/common/utils.go` extends `sync.Once` with a `Reset()` method using `sync.Mutex` + `atomic.Uint32` with double-checked locking. It is used in both the `external_secrets_manager` controller (to emit a warning event only once per error cycle) and the `external_secrets` controller (for one-shot validation warnings in trusted CA bundle handling, reset on successful reconciliation via `r.now.Reset()`). Use this type when you need one-shot behavior that can be re-armed. No goroutines are spawned directly by controller code -- all concurrency is handled by the controller-runtime framework. Do not introduce raw goroutines in reconcile loops. @@ -76,10 +79,15 @@ No goroutines are spawned directly by controller code -- all concurrency is hand ### HasObjectChanged Pattern The `common.HasObjectChanged()` function uses type-specific field comparison (not full `reflect.DeepEqual` on the entire object) to detect drift. Each resource type has a dedicated comparison: -- Deployments: compares replicas, containers, volumes, affinity, tolerations, node selector, env vars individually +- Deployments: compares replicas, containers (including init containers, security context, resources, ports, probes), volumes, affinity, tolerations, node selector, env vars, service account, DNS policy, template labels, revision history limit - RBAC: compares only Rules (Roles) or RoleRef+Subjects (Bindings) - Services: compares Type, Ports, Selector - Webhooks: compares individual webhook entries by name +- Certificates: compares full `Spec` via `DeepEqual` +- NetworkPolicies: compares PodSelector, PolicyTypes, Ingress, Egress +- ServiceAccounts: metadata-only drift detection (spec comparison always returns `false`) +- Secrets: uses `ObjectMetadataModified()` only (data co-managed by cert-controller) +- ConfigMaps: metadata-only patching pattern (similar to Secrets) Annotations are compared using managed-key tracking (`annotationMapsModified`) which only checks keys the operator manages, ignoring annotations set by external controllers (e.g., `deployment.kubernetes.io/revision`). This prevents infinite reconcile loops. diff --git a/docs/security-guidelines.md b/docs/security-guidelines.md index af391d8a1..a7863e678 100644 --- a/docs/security-guidelines.md +++ b/docs/security-guidelines.md @@ -31,7 +31,7 @@ The metrics server uses `filters.WithAuthenticationAndAuthorization` as `FilterP The repo manages RBAC at three distinct levels. Changing one level does not affect the others. -**1. Operator's own ClusterRole** (`config/rbac/role.yaml`): Grants permissions the operator process needs. Generated from `+kubebuilder:rbac` markers in `pkg/controller/external_secrets/controller.go`. When adding new API interactions, add markers there and run `make manifests`. +**1. Operator's own ClusterRole** (`config/rbac/role.yaml`): Grants permissions the operator process needs. Generated from `+kubebuilder:rbac` markers in `pkg/controller/external_secrets/controller.go` and `pkg/controller/external_secrets_manager/controller.go`. When adding new API interactions, add markers to the appropriate controller file and run `make manifests`. **2. Operand ClusterRoles** (static YAML in `bindata/external-secrets/resources/`): - `clusterrole_external-secrets-controller.yml` -- main controller (secrets CRUD, ESO CRDs) @@ -68,9 +68,9 @@ Two certificate provisioning paths exist, controlled by `spec.controllerConfig.c **Built-in cert-controller** (default): A separate deployment (`external-secrets-cert-controller`) manages TLS certificates for the webhook. The cert-controller has its own RBAC and creates/updates the `external-secrets-webhook` Secret. The webhook reads certs from `/tmp/certs` volume mount. -**cert-manager integration**: When enabled, `Certificate` resources are created from templates in `bindata/`. The `issuerRef` is validated to exist before creating the Certificate (via `assertIssuerRefExists`). The webhook secret name changes to `external-secrets-webhook-cm` to avoid clashing with the cert-controller secret. The `cert-manager.io/inject-ca-from` annotation is conditionally added to webhook configurations. +**cert-manager integration**: When enabled, `Certificate` resources are created from templates in `bindata/`. The `issuerRef` is validated to exist before creating the Certificate (via `assertIssuerRefExists`). The webhook secret name changes to `external-secrets-webhook-cm` to avoid clashing with the cert-controller secret. The `cert-manager.io/inject-ca-from` annotation is conditionally added to webhook configurations and ESO CRDs (via the `crd-annotator` controller) only when `injectAnnotations` is set to `"true"` (not merely when cert-manager mode is Enabled). -Key constraint: `certManager.mode` and `issuerRef` are immutable once set (enforced via `XValidation:rule="self == oldSelf"`). +Key constraint: `certManager.mode`, `issuerRef`, and `injectAnnotations` are immutable once set (enforced via `XValidation:rule="self == oldSelf"`). ## Bitwarden TLS Requirements @@ -102,7 +102,7 @@ Validating webhooks use: - `timeoutSeconds: 5` - Webhook listens on port `10250` (not the default 443) -The SecretStore webhook does not explicitly set `failurePolicy`, which defaults to `Fail` in Kubernetes. Maintain `Fail` for security-critical validations. +The SecretStore webhook also explicitly sets `failurePolicy: Fail` on both SecretStore and ClusterSecretStore webhooks. Maintain `Fail` for security-critical validations. ## Singleton Pattern (Cluster-Scoped CRs) @@ -118,7 +118,9 @@ E2E tests reference AWS credentials via a well-known Secret (`aws-creds` in `kub ## Reconciler Drift Detection -The operator reconciles all managed resources back to desired state. RBAC rules, deployments, webhook configurations, and network policies are compared field-by-field (via `HasObjectChanged` in `pkg/controller/common/utils.go`). If any resource is externally modified (e.g., someone manually adds permissions to a ClusterRole), the operator detects the drift and reverts it. This is a critical security property -- do not disable drift detection for security-sensitive resources. +The operator reconciles all managed resources back to desired state. RBAC rules, deployments (including container security context), webhook configurations, network policies, and certificates are compared field-by-field (via `HasObjectChanged` in `pkg/controller/common/utils.go`). If any resource is externally modified (e.g., someone manually adds permissions to a ClusterRole), the operator detects the drift and reverts it. This is a critical security property -- do not disable drift detection for security-sensitive resources. + +Note: drift detection does not currently cover pod-level `SecurityContext`, `hostNetwork`, or webhook `failurePolicy`. Manual changes to those fields will not be automatically reverted. ## Error Classification for Security diff --git a/docs/testing-guidelines.md b/docs/testing-guidelines.md index 270acef99..87e5b9ba0 100644 --- a/docs/testing-guidelines.md +++ b/docs/testing-guidelines.md @@ -2,7 +2,7 @@ ## Test Architecture -This repo has three test tiers, each in a separate Go module boundary (the repo uses `go.work` with modules at `.`, `./test`, `./cmd/external-secrets-operator`, `./tools`): +This repo has three test tiers. The repo uses `go.work` with modules at `.`, `./test`, `./cmd/external-secrets-operator`, `./tools`. Unit tests live in the root `.` module; API and E2E tests share the `./test` module: | Tier | Location | Make Target | Framework | Cluster Required | |------|----------|-------------|-----------|-----------------| @@ -10,7 +10,7 @@ This repo has three test tiers, each in a separate Go module boundary (the repo | API Integration | `test/apis/` | `make test-apis` | Ginkgo/Gomega + envtest | No (envtest) | | E2E | `test/e2e/` | `make test-e2e` | Ginkgo/Gomega + live cluster | Yes | -`make test` runs both `test-apis` and `test-unit` (no cluster needed). +`make test` runs `manifests`, `generate`, `fmt`, `vet`, then both `test-apis` and `test-unit` (no cluster needed). It also builds envtest binaries as a prerequisite. ## Unit Tests (`pkg/`) @@ -73,19 +73,26 @@ Tests are filtered by Ginkgo labels. The default filter is `"Platform: isSubsetO | Label | Scope | Required Secrets | |-------|-------|-----------------| | `Platform:AWS` | AWS SecretStore + ExternalSecret + PushSecret | `aws-creds` in `kube-system` | +| `Platform:Generic` | Platform-independent operand tests | None | | `CrossPlatform:GCP-AWS` | GCP cluster using AWS Secrets Manager | `aws-creds` in `kube-system` | | `Provider:Bitwarden` | Bitwarden ESO provider via ClusterSecretStore | `bitwarden-creds` in `external-secrets-operator` | | `API:Bitwarden` | Direct HTTP tests to bitwarden-sdk-server | `bitwarden-creds` in `external-secrets-operator` | +| `Suite:Bitwarden` | Full Bitwarden suite | `bitwarden-creds` in `external-secrets-operator` | +| `Proxy:HTTP` | Proxy-related tests | Proxy configuration | +| `Migration` | Migration/upgrade tests | None | +| `PostUpgradeCheck` | Post-upgrade verification | None | + +Note: some E2E test contexts (e.g., Environment Variables, Deployment Revision History, Annotations, Network Policies, Trusted CA Bundle) have **no platform label** and only run when the label filter is empty (`E2E_GINKGO_LABEL_FILTER=""`). Run a specific suite: `make test-e2e E2E_GINKGO_LABEL_FILTER="Provider:Bitwarden"`. Run all: `make test-e2e E2E_GINKGO_LABEL_FILTER=""`. ### E2E Structure -- Suite setup (`e2e_suite_test.go`): initializes K8s clients (`kubernetes.Clientset`, `dynamic.DynamicClient`, `controller-runtime client.Client`), sets timeout with 5-minute cleanup buffer, configures JSON/JUnit reports. -- `BeforeAll`: creates test namespace (prefix `external-secrets-e2e-test-`), creates/verifies `ExternalSecretsConfig` CR, waits for operator/operand pods. +- Suite setup (`e2e_suite_test.go`): `BeforeSuite` initializes K8s clients (`kubernetes.Clientset`, `dynamic.DynamicClient`, `controller-runtime client.Client`), sets timeout with 5-minute cleanup buffer, configures JSON/JUnit reports. `AfterSuite` runs global `CleanupESOOperandAndRelated`. +- `BeforeAll` (in `e2e_test.go`): creates test namespace (prefix `external-secrets-e2e-test-`), creates/verifies `ExternalSecretsConfig` CR, waits for operator/operand pods. - `BeforeEach`: verifies operand pods are ready. - `AfterEach`: on failure, dumps artifacts via `utils.DumpE2EArtifacts()` to `ARTIFACT_DIR` or `_output/`. -- `AfterAll`: deletes test namespace and cluster CR. +- `AfterAll` (in `e2e_test.go`): deletes test namespace and cluster CR. ### Test Data - YAML fixtures embedded via `//go:embed testdata/*` in `e2e_test.go`. From d220dda6446f74cd07df4369a322fd1832f7728d Mon Sep 17 00:00:00 2001 From: Siddhi Bhor Date: Wed, 1 Jul 2026 15:24:50 +0530 Subject: [PATCH 3/3] resolved review comments --- AGENTS.md | 2 +- ARCHITECTURE.md | 4 ++-- docs/api-contracts-guidelines.md | 36 ++++++++++++++++++++++++-------- docs/integration-guidelines.md | 6 +++--- docs/security-guidelines.md | 4 +++- 5 files changed, 36 insertions(+), 16 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index b8f5d4184..8c897be8b 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -25,7 +25,7 @@ Three controllers run in a single binary: ## Project Structure -``` +```text api/v1alpha1/ CRD type definitions, conditions, shared types, deepcopy api/v1alpha1/tests/ Declarative YAML test suites for CRD validation bindata/ Static operand YAML manifests (compiled into Go via go-bindata) diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md index 2aa6da03b..a7e5ee424 100644 --- a/ARCHITECTURE.md +++ b/ARCHITECTURE.md @@ -10,7 +10,7 @@ For contribution process, see `CONTRIBUTING.md`. On the highest level, this is a Kubernetes operator that installs and manages the upstream [external-secrets](https://github.com/external-secrets/external-secrets) application on OpenShift clusters. The upstream project provides the actual secret-syncing logic (ExternalSecret, SecretStore, etc.). This operator does **not** embed or fork that code. Instead, it manages upstream resources as a set of static YAML manifests that are compiled into the binary, decoded at runtime, mutated with operator-controlled configuration, and applied imperatively to the cluster. -``` +```text ┌─────────────────────────────────────────────┐ │ ExternalSecretsConfig CR │ │ (singleton "cluster", user-facing API) │ @@ -92,7 +92,7 @@ The heart of the operator. Three controller packages plus shared infrastructure: The main reconciliation controller. Watches `ExternalSecretsConfig`, `ExternalSecretsManager`, and all managed operand resources. The reconciliation flow in `install_external_secrets.go` creates resources in strict dependency order: -``` +```text Namespace → NetworkPolicies → ServiceAccounts → Certificates → Secrets → TrustedCA ConfigMap → RBAC → Services → Deployments → ValidatingWebhooks → CR annotation tracking diff --git a/docs/api-contracts-guidelines.md b/docs/api-contracts-guidelines.md index 71e947d86..06d6c3aa4 100644 --- a/docs/api-contracts-guidelines.md +++ b/docs/api-contracts-guidelines.md @@ -1,5 +1,9 @@ # API Contracts Guidelines +## General Rules + +- **No functions in the API package.** The `api/v1alpha1/` package must contain only type definitions, constants, markers, and generated code (deepcopy). Business logic, helpers, and utility functions belong in `pkg/controller/` or `pkg/operator/`. This keeps the API package purely declarative and avoids unintentional coupling between API and implementation. + ## API Group and Versioning - Group: `operator.openshift.io`, Version: `v1alpha1`. All types live in `api/v1alpha1/`. @@ -20,7 +24,8 @@ Both are enforced singletons via CEL: `self.metadata.name == 'cluster'`. Both us ## Required Kubebuilder Markers on CRD Types Every root CRD type must carry these markers: -``` + +```go +genclient +genclient:nonNamespaced +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object @@ -101,7 +106,9 @@ All controllers follow the same retry-on-conflict pattern: ## Defaults -Use `+kubebuilder:default` for server-side defaulting: +**Going forward:** prefer defaulting within the controller, not in the CRD schema. Per OpenShift API conventions: *"With configuration APIs, we typically default fields within the controller and not within the API. This means that the platform has the ability to make changes to the defaults over time."* New fields should omit `+kubebuilder:default` and instead apply defaults at reconcile time so the operator retains the ability to evolve defaults across releases without requiring CRD schema migrations. + +The following existing fields already use `+kubebuilder:default` (legacy; do not add more): - `logLevel`: 1 - `mode`: `Disabled` - `injectAnnotations`: `"false"` @@ -145,13 +152,24 @@ Run API tests with `make test-apis` or via the general `make test`. This operator does NOT use kubebuilder-generated admission webhooks for its own CRDs. CEL-based CRD validation handles all admission validation. The webhook code in `pkg/controller/external_secrets/` manages the upstream external-secrets operand's `ValidatingWebhookConfiguration` resources (not the operator's own API validation). +## Godoc Requirements + +Every exported field in `api/v1alpha1/` types must have a Godoc comment that: +- Explains the field's **purpose** clearly enough for an end user unfamiliar with the implementation. +- Documents **interactions** with other fields (e.g., "Only relevant when `certManager.mode` is `Enabled`"). +- States **limitations** or constraints (e.g., max length, immutability, allowed values). +- Describes **default behavior** when the field is omitted or zero-valued. + +The Godoc comment on any API field serves as the primary user-facing documentation and is extracted into generated API reference docs (`make docs`). Write for the audience of a cluster administrator, not an operator developer. + ## Adding New API Fields Checklist 1. Add the Go field to the appropriate struct in `api/v1alpha1/`. -2. Add validation markers (min/max, enum, XValidation) for spec fields; status fields typically omit validation. -3. Use pointer types for optional sub-structs; value types for required scalars with defaults. -4. Add `+optional` or `+required` marker. Use `json:"fieldName,omitempty"` for optional fields. -5. Exception: listMapKey fields must NOT have `omitempty`; add `//nolint:kubeapilinter` with reason. -6. Write test cases in the corresponding `.testsuite.yaml` for both valid and invalid inputs. -7. Run `make update && make verify` to regenerate and validate all artifacts. -8. If the field adds a new condition type or reason, add constants in `conditions.go`. +2. Write a Godoc comment on the field covering purpose, interactions, limitations, and default behavior (see Godoc Requirements above). +3. Add validation markers (min/max, enum, XValidation) for spec fields; status fields typically omit validation. +4. Use pointer types for optional sub-structs; value types for required scalars with defaults. +5. Add `+optional` or `+required` marker. Use `json:"fieldName,omitempty"` for optional fields. +6. Exception: listMapKey fields must NOT have `omitempty`; add `//nolint:kubeapilinter` with reason. +7. Write test cases in the corresponding `.testsuite.yaml` for both valid and invalid inputs. +8. Run `make update && make verify` to regenerate and validate all artifacts. +9. If the field adds a new condition type or reason, add constants in `conditions.go`. diff --git a/docs/integration-guidelines.md b/docs/integration-guidelines.md index 32ab07684..99c160044 100644 --- a/docs/integration-guidelines.md +++ b/docs/integration-guidelines.md @@ -77,9 +77,9 @@ The reconciler returns an irrecoverable error if `RELATED_IMAGE_EXTERNAL_SECRETS ## Label and Annotation Conventions -- All managed resources carry `app: external-secrets` label (cache filter key). -- Default labels: `app`, `app.kubernetes.io/version`, `app.kubernetes.io/managed-by: external-secrets-operator`, `app.kubernetes.io/part-of: external-secrets-operator`. -- Disallowed label prefixes (rejected silently): `app.kubernetes.io/`, `external-secrets.io/`, `rbac.authorization.k8s.io/`, `servicebinding.io/controller`, `app`. +- All managed resources carry `app: external-secrets` label (cache filter key). This label is set programmatically by the operator and is not user-configurable. +- Default labels (operator-managed): `app`, `app.kubernetes.io/version`, `app.kubernetes.io/managed-by: external-secrets-operator`, `app.kubernetes.io/part-of: external-secrets-operator`. +- Disallowed label prefixes for **user-supplied** labels (rejected silently via `disallowedLabelMatcher`): `app.kubernetes.io/`, `external-secrets.io/`, `rbac.authorization.k8s.io/`, `servicebinding.io/controller`, `app`. Users cannot override operator-managed labels through the CR; the operator sets these itself. - Disallowed annotation domain prefixes (rejected by CRD CEL validation): `kubernetes.io/`, `k8s.io/`, `openshift.io/`, `cert-manager.io/`. - Managed annotation keys are tracked in a base64-encoded JSON array on the CR annotation `externalsecretsconfig.operator.openshift.io/managed-annotations`. diff --git a/docs/security-guidelines.md b/docs/security-guidelines.md index a7863e678..616ae032a 100644 --- a/docs/security-guidelines.md +++ b/docs/security-guidelines.md @@ -49,9 +49,11 @@ Key convention: The controller ClusterRole grants `secrets` verbs `get/list/watc ## RBAC Label Protection The regex `disallowedLabelMatcher` in `install_external_secrets.go` prevents users from overriding internal labels via `ControllerConfig.Labels`: -``` + +```regex ^app.kubernetes.io\/|^external-secrets.io\/|^rbac.authorization.k8s.io\/|^servicebinding.io\/controller$|^app$ ``` + When adding new internal labels, add them to this regex. ## Annotation Domain Restrictions