feat(tdx): add TDX backend with configfs-tsm direct adapter#68
feat(tdx): add TDX backend with configfs-tsm direct adapter#680xHansLee wants to merge 5 commits into
Conversation
Separate shared enclave interfaces and sealed DB plumbing from SGX-specific implementation so a TDX backend can be added without duplicating SGX assumptions or expanding caller coupling.
Constraint: TDX support needs a backend boundary while preserving existing SGX behavior.
Rejected: adding TDX directly into the monolithic enclave package | it would mix SGX and TDX assumptions and make review harder.
Confidence: high
Scope-risk: moderate
Directive: Keep backend-specific code under enclave/{sgx,tdx,noop}; shared callers should use enclave.TEE and narrow capability interfaces.
Tested: pre-commit hooks passed during history cleanup, including formatting, golangci-lint, and go tests.
Not-tested: fresh coverage profile before this commit was created.
Wiz Scan Summary
To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension. |
d32137d to
1ab3d10
Compare
05004f6 to
12f1e6f
Compare
|
The PR introduces a well-structured TDX backend with good test coverage and clear backend-agnostic abstractions. Coverage is limited by input truncation (most of Key findings:
Review iteration 5 · Commit 313e41e · 2026-06-08T09:23:33Z |
12f1e6f to
b504b4b
Compare
| } | ||
| } | ||
|
|
||
| hashAlg := providers[0].Hash |
There was a problem hiding this comment.
check if len(providers)>0
## Summary On-chain validator for TDX V4/V5 attestation quotes used by the DKG flow. Mirrors `SGXValidationHook` in structure (`Ownable2Step + Pausable`) with TDX-specific quote field offsets. ## Identity model (v3 — RTMR3-bound binary, RTMR2-included platform) Split along the operator / cloud boundary, where "operator-controlled" means *the running Go binary*: - **Binary identity** (operator-controlled): `keccak256(RTMR3)`, whitelisted via `EnclaveTypeData.codeCommitment` (same path as SGX `MRENCLAVE`). RTMR3 is self-extended by the kernel exactly once at startup with `SHA-384(/proc/self/exe)` (see [story-kernel #68](piplabs/story-kernel#68)), so its post-extend value `SHA-384(0x00..00 || SHA-384(elf))` is fully determined by the running ELF. - **Platform identity** (cloud-managed): `keccak256(MRTD || RTMR0 || RTMR1 || RTMR2)`, approved via the hook's `approvePlatform` admin. RTMR2 moves into the platform half because it measures TD initrd + kernel cmdline — both boot-image properties decided at TD launch by the firmware/boot chain, never the userspace Go binary loaded from disk after boot. Why not `keccak256(RTMR2)` for the binary half: on every direct-launch cloud TD measured, RTMR2 contains the TD initrd + cmdline digest. The story-kernel Go binary is loaded from the rootfs after boot and is never fed into RTMR2 by any link of the boot chain — so a `keccak256(RTMR2)` rule would commit to a boot-image artifact, not to the running ELF. The v3 split corrects this by claiming RTMR3 explicitly for the userspace binary half. This decouples binary release governance from cloud firmware vintage rotation, so each axis updates independently. `REPORT_DATA[0:32]` carries the kernel-bound caller commitment from `DKG.register` and is compared against `expectedDataCommitment`, mirroring SGX's instance binding. ## Onboarding flow A TDX enclave type is enabled in two steps: 1. Governance: `DKG.whitelistEnclaveType(enclaveType, { codeCommitment = keccak256(RTMR3), validationHookAddr = tdxProxy })`. 2. Hook owner: `TDXValidationHook.approvePlatform(keccak256(MRTD || RTMR0 || RTMR1 || RTMR2), label)`. ## Contract changes - `TDXValidationHook` (new). `_computeBinaryCommitment` reads `RTMR3` at offset 520; `_computePlatformCommitment` assembles the 192-byte preimage from MRTD (offset 184) + the contiguous RTMR0/RTMR1/RTMR2 block (offsets 376..519). - `ITDXValidationHook` (new). Admin surface: `setAutomataValidationAddr`, `approvePlatform`, `revokePlatform`, `isPlatformApproved`. - `IAutomataDcapAttestationFee`: adds 1-arg overload (PR #816 pattern) so standard TCB transitions take effect automatically. `DKG.sol` / `IDKG.sol` are unchanged — ABI/storage compatible with the live deployment. ## Adversary cost — both halves must succeed To pass `register()` an attacker must produce a TDX quote where: - `RTMR3` hashes to an approved binary commitment, **and** - `MRTD || RTMR0 || RTMR1 || RTMR2` hashes to an approved platform commitment. The RTMR3 self-extend can only push values *into* RTMR3 (RTMRs are append-only). An attacker binary `B'` that self-extends RTMR3 with `SHA-384(honest_B)` to forge the binary commitment still has to be loaded by the boot chain — which measures the kernel image + initrd into RTMR1 and the initrd + cmdline into RTMR2. Either modification shifts the platform commitment off any honest entry, and the platform-approval check fails.
Adds Intel TDX as a second supported TEE backend, alongside SGX. The backend lives under enclave/tdx/ and is selected at build time via 'make build-tdx' (-tags tdx). Hybrid identity model (schema v3): - Binary identity = keccak256(RTMR3), self-extended by the kernel once at startup with SHA-384(/proc/self/exe). Mirrors the SGX MRENCLAVE slot. - Platform identity = keccak256(MRTD || RTMR0 || RTMR1 || RTMR2). Computed and approved on chain by TDXValidationHook; the kernel only exposes the constituent measurements. Why RTMR3 (and not RTMR2): RTMR2 measures the TD initrd + cmdline (a boot-image property decided at TD launch). The userspace Go binary is loaded from disk after boot and is never reflected in RTMR2 — so a keccak256(RTMR2) rule would commit to a boot artifact instead of the running ELF. RTMR3 is the first software-defined register the TD payload can claim, so the kernel extends it via configfs-tsm exactly once during init(), before any operator-supplied code path runs. The post-extend value reduces to SHA-384(0x00..00 || SHA-384(elf)) and is fully determined by the running binary. Self-extend failure (missing tdx_guest driver, sysfs permission, EBUSY from the TDX module) is fail-closed: both the quote provider and the TPM are swapped for fail-closed stubs, so the backend never emits a quote with an unbound RTMR3 nor seals material against an unbound identity. Other highlights: - Vendor adapter abstraction at enclave/tdx/platform/. The supported vendor is 'direct' (configfs-tsm on kernel >= 6.7). Runtime selection via STORY_TDX_VENDOR with Probe() auto-detect. - Sealed storage uses TPM2 PolicyOR over per-provider PCR sets combined with an AES-GCM hybrid wrap, so the TPM only seals the data-encryption key. - Startup self-check verifies TPM responsiveness, non-zero PCRs, the PolicyOR digest, and a quote round-trip whose REPORT_DATA carries a canary. A bootstrap-mode WARN log surfaces the empirically measured digest so operators can pin it on first deploy. - Identity exposes 32-byte keccak256(RTMR3) as CodeCommitment, matching the chain-side TDXValidationHook contract. - Unit tests cover the hash-and-extend halves of extendBinaryMeasurementOnce (hashSelfBinary, writeRTMRExtend) and pin the fresh-boot RTMR3 derivation formula against the live devnet value (TestRTMR3_FreshBootDerivation). Tests pass on Linux under make test-cover (SGX), test-noop, and test-tdx; enclave/tdx package coverage is 87.5%. SGX backend behaviour is unchanged.
GitHub Actions runners are intermittently failing to download the SHA
that the v5 moving tag of actions/setup-go currently resolves to:
##[error]An action could not be found at the URI
'https://codeload.github.com/actions/setup-go/tar.gz/40f1582b...'
The same SHA is fetchable from outside the runner network, so this is
a runner-side cache miss against the resolved v5 commit. Bumping to v6
forces resolution to a fresh SHA. No behavioural change — v6 is a
drop-in for our 'go-version: 1.24' usage.
b504b4b to
6a3ca9c
Compare
…ary-bound sealing
The prior commit added PCR 12 to the seal policy as the binary-identity
half of the TPM2 PolicyPCR. PCR 12 is meaningful only if some component
above story-kernel — the initrd, in practice — extends it with the
running ELF's hash before exec'ing the kernel. This commit ships the
scripts and operator guide that make that hand-off concrete.
Adds enclave/tdx/setup/scripts/:
- measure-binary.sh initrd stub: SHA-256(story-kernel ELF) → PCR 12
- verify-pcr12.sh post-boot check: PCR 12 == SHA256(0 || SHA256(ELF))
- verify-platform.sh extract MRTD/RTMR0..2 from a fresh TDX quote and
compute the platform_commitment that
TDXValidationHook.approvedPlatforms expects
(for governance handoff)
- test/ docker harness that runs swtpm + tpm2-tools
and exercises the scripts end-to-end
Adds enclave/tdx/setup/production-image.md: the production deployment
recipe — what the image must do, boot sequence with the in-TD swtpm +
PCR 12 step, reproducibility constraints, GCP/bare-metal deployment,
on-chain platform_commitment handoff, and a post-boot operator
checklist.
Cross-links: README.md "Per–guest-interface setup" promotes the
production-image.md path to first position; direct.md keeps its DEV/QA
warning and now points to the production guide.
Tested locally:
- shellcheck (sh dialect) clean across all three scripts.
- Docker-based test harness runs swtpm in TCP socket mode + tpm2-tools
and asserts: (a) measure-binary.sh extends PCR 12 with the right
SHA-256, (b) verify-pcr12.sh PASSes for the legit ELF, (c)
verify-pcr12.sh FAILs for a tampered ELF, (d) PCR 12 is
reboot-deterministic for the same ELF (swtpm wipe → re-measure →
identical PCR value).
… note
Validation on a real GCP c3-standard-4 TDX VM surfaced two minor setup
issues that the docs had glossed over:
- eth-utils alone has no default keccak backend; `pip install eth-utils`
succeeds but the first keccak() call dies with ImportError unless
pycryptodome (or pysha3) is also installed. Add an explicit
`eth-hash[pycryptodome]` install instruction and probe the backend
at startup with an empty-string keccak so the error message points
at the missing backend, not just the missing module.
- Operators commonly want eth-utils inside a venv (stock Ubuntu pip
can't write to the system site-packages without --break-system-packages,
and -EXTERNALLY-MANAGED-ENVIRONMENT keeps biting). Add a PYTHON
env override so the script can be pointed at a venv's python3
without rewriting PATH.
The platform_commitment extraction was end-to-end validated against a
real V4 TDX quote pulled via configfs-tsm on a GCP c3-standard-4
direct-launch confidential VM. The MRTD captured matched the value
documented in enclave/tdx/README.md's "Supported platforms" table for
that SKU.
shellcheck (sh dialect) and the Docker-based test harness for the
measurement scripts both stay clean.
|
#68's 64/65 key-length asserts conflict with the now-merged #844 — please change them to 32/64 before merging. This PR pins
Pinning the lengths to keep the packed preimage injective is the right idea — just pin to the correct values. Before merging, please change:
See issue #73 and story PR #844 for the kernel↔contract alignment. |
Summary
Adds an Intel TDX backend to story-kernel and aligns the kernel-reported identity with Story's hybrid on-chain
TDXValidationHookunder schema v3 (RTMR3-bound binary commitment).enclave/tdx/, selectable via-tags tdx/make build-tdx. Implements the existingenclave.TEE/enclave.SealDBcontract soservice/,server/, andstore/stay backend-agnostic.enclave/tdx/platform/with the upstream Linuxconfigfs-tsmpath (directadapter) as the supported vendor. Runtime selection viaSTORY_TDX_VENDORwith auto-detect fallback throughProbe().supportedProviders) combined with an AES-GCM hybrid wrap, so the TPM only seals the data-encryption key. PolicyOR digest population is driven by a bootstrap-mode WARN log so operators can pin the empirically measured digest on first deploy.codeCommitment = keccak256(RTMR3)to match the hybrid hook's binary half. RTMR3 is self-extended by the kernel exactly once at startup withSHA-384(/proc/self/exe), so the post-extend valueSHA-384(0x00..00 || SHA-384(elf))is fully determined by the running ELF. The chain-side hook independently checks the platform halfkeccak256(MRTD || RTMR0 || RTMR1 || RTMR2)against itsapprovePlatformwhitelist; the kernel side does not gate on that.V4.report_datacarries a canary.How identity flows on chain
The matrix decomposes: N binaries × M platform vintages collapse from N×M into N + M governance rows, and each row tracks exactly one axis.