playbook: add pickupFirst mode for shared-certificate distribution (TPP) by jmeldrum76 · Pull Request #650 · Venafi/vcert

jmeldrum76 · 2026-05-12T20:21:23Z

Add `pickupFirst` mode to vcert playbook for shared-certificate distribution (TPP)

BUSINESS PROBLEM

Many customers operate the "one cert, many endpoints" pattern: a single TLS certificate (often a wildcard) is installed on dozens to hundreds of heterogeneous endpoints — Apache servers, NGINX, F5/NetScaler load balancers, Imperva, etc. — all serving the same FQDN(s). When the cert is renewed in TPP (manually via Aperture, automatically via a renewal policy, or via vcert on a designated leader host), every follower needs to install that exact same cert + key during its own maintenance window, which may be days or weeks after the renewal happens.

vcert's current playbook (vcert run -f apache.yaml) is built around the assumption that the host running the playbook owns the enrollment — it always tries to enroll / renew through the request block. That means:

On a shared-wildcard scenario, every follower host running the playbook would attempt to enroll its own cert against TPP, each generating its own keypair. That's the opposite of "one wildcard everywhere."
Followers can't simply track the leader's renewal — the playbook has no mode for "fetch whatever cert TPP currently has at this object DN and install it locally if it's different."
Operators end up writing custom shell wrappers around vcert pickup to bridge this gap. We did exactly this for our customer — ~250 lines of bash that drives vcert pickup, compares thumbprints, decides whether to install or defer to the existing renewal path.

Business impact: every customer with shared / wildcard certs across multiple endpoints either accepts staggered-renewal pain, builds bespoke distribution scripts, or pushes the cert manually. The pattern is common enough that vcert should support it natively.

PROPOSED SOLUTION

Add an opt-in pickupFirst mode to the playbook request block. With one new boolean field (and one optional override), a follower host's playbook becomes a self-healing converger to whatever the platform currently holds at a given cert object.

certificateTasks:
  - name: apache-cert
    renewBefore: 30d
    request:
      csr: service
      pickupFirst: true                    # ← new, default false
      pickupId: '\VED\Policy\...\cert-dn'  # ← new, optional; defaults to zone\CN
      zone: '\VED\Policy\Demo\Apache'
      subject:
        commonName: 'shared.example.com'
        ...
    installations:
      - format: PEM
        file: /etc/pki/tls/certs/apache.crt
        keyFile: /etc/pki/tls/private/apache.key
        chainFile: /etc/pki/tls/certs/apache-chain.crt
        afterInstallAction: "systemctl reload httpd"

When pickupFirst: true:

Locate (TPP) — RetrieveCertificateMetaData(dn) — one cheap GET, returns thumbprint + ValidTo with no PEM / key payload.
Compare the result against the installed cert's SHA-1 thumbprint and NotAfter.
Decide:

State	Action
Thumbprint matches installed	Defer to the existing `renewBefore` window check (normal playbook flow takes over).
Platform cert is newer	Full `RetrieveCertificate` for cert + chain + key, install at the playbook's paths via the existing installer chain, run `afterInstallAction`. No enrollment.
Platform cert is older than installed	Log "refusing downgrade", exit cleanly.
Platform cert not found	Fall through to the existing enroll flow (handles initial enrollment naturally).

Backwards compatibility: absent pickupFirst (or pickupFirst: false), the playbook behaves byte-identically to today. Existing customer playbooks are unaffected.

Architectural notes from a working prototype

Implemented as a single new file pkg/playbook/app/service/pickup_first.go (~150 lines) plus three small public helpers in vcertutil and installer. The patch is purely additive: zero deletions, zero modifications to existing logic. The new field defaults make every untouched code path identical to current behavior.
Hot path (thumbprint match) is ~50 ms on TPP — much cheaper than a full pickup. Doesn't touch the private-key vault. Doesn't exercise PKCS#8 decryption. Scales to any tenant size because RetrieveCertificateMetaData is O(1) by DN.
Reuses every existing component: runInstaller, CreateX509Cert (handles PKCS#8 encrypted-key decryption), afterInstallAction, backup / rollback. No new installer code.
The "platform older than installed" path is a genuine safety win — it prevents accidental downgrades when an admin imports an older cert into TPP by mistake.

Diffstat against `v5.13.2`

 pkg/playbook/app/domain/playbookRequest.go |   2 +
 pkg/playbook/app/installer/crypto.go       |   4 +
 pkg/playbook/app/service/pickup_first.go   | 133 +++++++++++++++++++++++++++
 pkg/playbook/app/service/service.go        |   8 ++
 pkg/playbook/app/vcertutil/vcertutil.go    | 143 +++++++++++++++++++++++++++++
 5 files changed, 290 insertions(+)

Scope for v1: TPP only

VCP support would require a different locator strategy. Its cert-object model is fundamentally different:

Multiple cert lineages per CN can coexist.
versionType (CURRENT / OLD) and certificateStatus (ACTIVE / RETIRED) are independent state machines.
managedCertificateId (the lineage identifier) is not currently a server-side searchable field.

The proposed implementation silently no-ops on non-TPP backends so VCP / Firefly / NGTS playbooks see zero behavior change and zero error noise. VCP-native support is a clean follow-up issue once the locator abstraction lands.

CURRENT ALTERNATIVES

In production for a customer today, we have evaluated or are doing all of the following:

Bespoke shell wrapper around vcert pickup and vcert run. Reads install paths and renewBefore from the playbook YAML, drives the four-branch decision tree (newer pickup → install / match in window → renew / match outside window → no-op / nothing in TPP → initial enroll), handles PKCS#8 key decryption before write (because Apache without SSLPassPhraseDialog can't load encrypted keys), filters stderr noise, writes timestamped backups. Roughly 250 lines of bash that every customer in this situation ends up writing variants of.
vcert pickup driven by cron with custom diffing. Same pattern, different language.
Cert pushed manually out of band (rsync from a leader, configuration-management drift). Skips vcert entirely; the cert object in TPP becomes informational rather than authoritative.
Accept staggered downtime — run vcert run --force-renew on every host on a coordinated maintenance window, even though only one of them actually needed to enroll.

All four approaches reinvent the same logic and put the burden on the operator. Native support in vcert would replace all of them with one YAML flag.

VENAFI EXPERIENCE

Working with vcert v5 (currently v5.12.3 in the customer environment; verified the proposed implementation also compiles and tests cleanly against v5.13.2 / master). Daily use of the playbook engine, vcert pickup, vcert run, and the standalone vcert enroll / vcert renew commands.
TPP customer for several years; both interactive Aperture use and API-driven via vcert. Mix of enrollment patterns: user-provided CSR, service-generated, mixed key-retrieval policies across folders.
Have prototyped this feature end-to-end on a live TPP lab and verified all seven decision scenarios:
- backwards-compat (no pickupFirst field)
- hot-path match
- install-newer
- refuse-downgrade
- in-renew-window-defer-to-enroll
- initial-enroll
- VCP-silent-noop

A working prototype patch (pickupFirst.patch) is attached. Five files, +290 lines, zero deletions, zero modifications to existing code paths. Apply with git apply pickupFirst.patch from the vcert repo root.

Adds an opt-in `pickupFirst: true` field on the playbook `request:` block. When enabled (TPP only for v1), `vcert run` queries the cert object's current metadata first and installs whatever the platform holds rather than enrolling a new cert on every follower host. This matches the common "one cert, many endpoints" pattern (wildcards, load-balancer pools) where one team renews centrally and many followers need to converge to the same cert+key on their own maintenance windows. Decision flow on each run: - locate (TPP RetrieveCertificateMetaData) - cheap O(1) metadata GET - thumbprint matches installed -> defer to renewBefore check - platform cert newer than installed -> full pickup + install, no enroll - platform cert older than installed -> refuse downgrade (safety guard) - platform cert not found -> fall through to existing enroll The change is purely additive: 5 files, +289 lines, 0 deletions, 0 modifications to existing logic. Existing playbooks without `pickupFirst` are byte-identical to current behavior. On VCP/Firefly/ NGTS the feature silently no-ops (ErrLocateNotSupported); VCP-native support is a planned follow-up that needs a different locator strategy (cert-object DN model differs). Files: - pkg/playbook/app/domain/playbookRequest.go: PickupFirst, PickupID fields - pkg/playbook/app/vcertutil/vcertutil.go: LocateLatestCN, locateTPP, PickupCertificateByLocator - pkg/playbook/app/installer/crypto.go: LoadInstalledPEM (export) - pkg/playbook/app/service/pickup_first.go: orchestrator (new file) - pkg/playbook/app/service/service.go: Execute() hook Verified end-to-end against a live TPP lab across seven scenarios: backwards-compat / hot-path match / install-newer-pickup / refuse- downgrade / in-renew-window-defer-to-enroll / initial-enroll / non-TPP silent-noop. Signed-off-by: Jeremy Meldrum <21229220+jmeldrum76@users.noreply.github.com>

jmeldrum76 · 2026-05-13T15:20:03Z

For context: I'm submitting this as a Venafi (CyberArk) colleague — happy to follow whatever internal review process the playbook engine maintainers want before this gets merged. The commit is from my personal GitHub identity but the work is internal to the company. Let me know if there's an internal Jira/design-doc step I should do, a reviewer to ping, or anything else (README-PLAYBOOK.md update, tests, CHANGELOG) you'd like added to this PR before review. Happy to push follow-up commits on the same branch.

jmeldrum76 requested a review from luispresuelVenafi as a code owner May 12, 2026 20:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

playbook: add pickupFirst mode for shared-certificate distribution (TPP)#650

playbook: add pickupFirst mode for shared-certificate distribution (TPP)#650
jmeldrum76 wants to merge 1 commit into
Venafi:masterfrom
jmeldrum76:add-pickup-first-mode

jmeldrum76 commented May 12, 2026

Uh oh!

jmeldrum76 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jmeldrum76 commented May 12, 2026

Add pickupFirst mode to vcert playbook for shared-certificate distribution (TPP)

BUSINESS PROBLEM

PROPOSED SOLUTION

Architectural notes from a working prototype

Diffstat against v5.13.2

Scope for v1: TPP only

CURRENT ALTERNATIVES

VENAFI EXPERIENCE

Uh oh!

jmeldrum76 commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add `pickupFirst` mode to vcert playbook for shared-certificate distribution (TPP)

Diffstat against `v5.13.2`