Skip to content

feat(tdx): add TDX backend with configfs-tsm direct adapter#68

Open
0xHansLee wants to merge 5 commits into
mainfrom
feat/impl-tdx-backend
Open

feat(tdx): add TDX backend with configfs-tsm direct adapter#68
0xHansLee wants to merge 5 commits into
mainfrom
feat/impl-tdx-backend

Conversation

@0xHansLee

@0xHansLee 0xHansLee commented May 24, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds an Intel TDX backend to story-kernel and aligns the kernel-reported identity with Story's hybrid on-chain TDXValidationHook under schema v3 (RTMR3-bound binary commitment).

  • New backend at enclave/tdx/, selectable via -tags tdx / make build-tdx. Implements the existing enclave.TEE / enclave.SealDB contract so service/, server/, and store/ stay backend-agnostic.
  • Vendor adapter abstraction at enclave/tdx/platform/ with the upstream Linux configfs-tsm path (direct adapter) as the supported vendor. Runtime selection via STORY_TDX_VENDOR with auto-detect fallback through Probe().
  • Sealed storage uses TPM2 PolicyOR over per-provider PCR sets (supportedProviders) combined with an AES-GCM hybrid wrap, so the TPM only seals the data-encryption key. PolicyOR digest population is driven by a bootstrap-mode WARN log so operators can pin the empirically measured digest on first deploy.
  • Identity reports codeCommitment = keccak256(RTMR3) to match the hybrid hook's binary half. RTMR3 is self-extended by the kernel exactly once at startup with SHA-384(/proc/self/exe), so the post-extend value SHA-384(0x00..00 || SHA-384(elf)) is fully determined by the running ELF. The chain-side hook independently checks the platform half keccak256(MRTD || RTMR0 || RTMR1 || RTMR2) against its approvePlatform whitelist; the kernel side does not gate on that.
  • Startup self-check asserts TPM responsiveness, non-zero PCRs, provider-digest match (or bootstrap-mode WARN), and a quote round-trip whose V4.report_data carries a canary.

How identity flows on chain

kernel side                                on-chain TDXValidationHook
RTMR3 ─── keccak256 ──► codeCommitment ──► whitelistEnclaveType[type=2].codeCommitment
                          (binary half)      (operator-managed; rotates with the ELF)

MRTD ┐
RTMR0│
RTMR1├── keccak256 ──► platformCommitment ──► approvedPlatforms[h]
RTMR2┘    (platform half, derived on chain)    (operator-managed; rotates with cloud firmware/initrd)

The matrix decomposes: N binaries × M platform vintages collapse from N×M into N + M governance rows, and each row tracks exactly one axis.

Separate shared enclave interfaces and sealed DB plumbing from SGX-specific implementation so a TDX backend can be added without duplicating SGX assumptions or expanding caller coupling.

Constraint: TDX support needs a backend boundary while preserving existing SGX behavior.
Rejected: adding TDX directly into the monolithic enclave package | it would mix SGX and TDX assumptions and make review harder.
Confidence: high
Scope-risk: moderate
Directive: Keep backend-specific code under enclave/{sgx,tdx,noop}; shared callers should use enclave.TEE and narrow capability interfaces.
Tested: pre-commit hooks passed during history cleanup, including formatting, golangci-lint, and go tests.
Not-tested: fresh coverage profile before this commit was created.
@0xHansLee 0xHansLee self-assigned this May 24, 2026
@wiz-837b06c6da

wiz-837b06c6da Bot commented May 24, 2026

Copy link
Copy Markdown

Wiz Scan Summary

Scanner Findings
Vulnerability Finding Vulnerabilities -
Data Finding Sensitive Data -
Secret Finding Secrets -
IaC Misconfiguration IaC Misconfigurations -
SAST Finding SAST Findings -
Software Management Finding Software Management Findings -
Total -

View scan details in Wiz

To detect these findings earlier in the dev lifecycle, try using Wiz Code VS Code Extension.

@0xHansLee 0xHansLee force-pushed the feat/impl-tdx-backend branch 2 times, most recently from d32137d to 1ab3d10 Compare May 26, 2026 09:06
@0xHansLee 0xHansLee force-pushed the feat/impl-tdx-backend branch from 05004f6 to 12f1e6f Compare May 26, 2026 13:11
@0xHansLee 0xHansLee marked this pull request as ready for review May 26, 2026 13:17
@jinn-agent

jinn-agent Bot commented May 26, 2026

Copy link
Copy Markdown

The PR introduces a well-structured TDX backend with good test coverage and clear backend-agnostic abstractions. Coverage is limited by input truncation (most of enclave/tdx/ source files are present but patches are null). A few correctness and reliability issues were found across the visible files.

Key findings:

  1. sealdb.Close() silently drops the stor.Close() error when db.Close() also fails — dual-error case is not surfaced.
  2. build-sgx/build-tdx Makefile targets forward the user-supplied $(build_tags) variable, meaning BUILD_TAGS=tdx make build-sgx produces -tags "sgx tdx" which breaks the documented mutual-exclusion invariant and will cause a runtime panic on the first enclave.Default() call.
  3. make test-cover (used by the CI gotest.yml) now passes -tags sgx, which is fine only if the SGX backend compiles cleanly without hardware — worth verifying the CI run.

Review iteration 5 · Commit 313e41e · 2026-06-08T09:23:33Z

@0xHansLee 0xHansLee force-pushed the feat/impl-tdx-backend branch from 12f1e6f to b504b4b Compare May 27, 2026 01:07
Comment thread enclave/tdx/seal.go
}
}

hashAlg := providers[0].Hash

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check if len(providers)>0

0xHansLee added a commit to piplabs/story that referenced this pull request May 29, 2026
## Summary

On-chain validator for TDX V4/V5 attestation quotes used by the DKG
flow. Mirrors `SGXValidationHook` in structure (`Ownable2Step +
Pausable`) with TDX-specific quote field offsets.

## Identity model (v3 — RTMR3-bound binary, RTMR2-included platform)

Split along the operator / cloud boundary, where "operator-controlled"
means *the running Go binary*:

- **Binary identity** (operator-controlled): `keccak256(RTMR3)`,
whitelisted via `EnclaveTypeData.codeCommitment` (same path as SGX
`MRENCLAVE`). RTMR3 is self-extended by the kernel exactly once at
startup with `SHA-384(/proc/self/exe)` (see [story-kernel
#68](piplabs/story-kernel#68)), so its
post-extend value `SHA-384(0x00..00 || SHA-384(elf))` is fully
determined by the running ELF.
- **Platform identity** (cloud-managed): `keccak256(MRTD || RTMR0 ||
RTMR1 || RTMR2)`, approved via the hook's `approvePlatform` admin. RTMR2
moves into the platform half because it measures TD initrd + kernel
cmdline — both boot-image properties decided at TD launch by the
firmware/boot chain, never the userspace Go binary loaded from disk
after boot.

Why not `keccak256(RTMR2)` for the binary half: on every direct-launch
cloud TD measured, RTMR2 contains the TD initrd + cmdline digest. The
story-kernel Go binary is loaded from the rootfs after boot and is never
fed into RTMR2 by any link of the boot chain — so a `keccak256(RTMR2)`
rule would commit to a boot-image artifact, not to the running ELF. The
v3 split corrects this by claiming RTMR3 explicitly for the userspace
binary half.

This decouples binary release governance from cloud firmware vintage
rotation, so each axis updates independently. `REPORT_DATA[0:32]`
carries the kernel-bound caller commitment from `DKG.register` and is
compared against `expectedDataCommitment`, mirroring SGX's instance
binding.

## Onboarding flow

A TDX enclave type is enabled in two steps:

1. Governance: `DKG.whitelistEnclaveType(enclaveType, { codeCommitment =
keccak256(RTMR3), validationHookAddr = tdxProxy })`.
2. Hook owner: `TDXValidationHook.approvePlatform(keccak256(MRTD ||
RTMR0 || RTMR1 || RTMR2), label)`.

## Contract changes

- `TDXValidationHook` (new). `_computeBinaryCommitment` reads `RTMR3` at
offset 520; `_computePlatformCommitment` assembles the 192-byte preimage
from MRTD (offset 184) + the contiguous RTMR0/RTMR1/RTMR2 block (offsets
376..519).
- `ITDXValidationHook` (new). Admin surface:
`setAutomataValidationAddr`, `approvePlatform`, `revokePlatform`,
`isPlatformApproved`.
- `IAutomataDcapAttestationFee`: adds 1-arg overload (PR #816 pattern)
so standard TCB transitions take effect automatically.

`DKG.sol` / `IDKG.sol` are unchanged — ABI/storage compatible with the
live deployment.

## Adversary cost — both halves must succeed

To pass `register()` an attacker must produce a TDX quote where:

- `RTMR3` hashes to an approved binary commitment, **and**
- `MRTD || RTMR0 || RTMR1 || RTMR2` hashes to an approved platform
commitment.

The RTMR3 self-extend can only push values *into* RTMR3 (RTMRs are
append-only). An attacker binary `B'` that self-extends RTMR3 with
`SHA-384(honest_B)` to forge the binary commitment still has to be
loaded by the boot chain — which measures the kernel image + initrd into
RTMR1 and the initrd + cmdline into RTMR2. Either modification shifts
the platform commitment off any honest entry, and the platform-approval
check fails.
0xHansLee added 2 commits June 8, 2026 17:29
Adds Intel TDX as a second supported TEE backend, alongside SGX. The
backend lives under enclave/tdx/ and is selected at build time via
'make build-tdx' (-tags tdx).

Hybrid identity model (schema v3):
- Binary identity  = keccak256(RTMR3), self-extended by the kernel once
  at startup with SHA-384(/proc/self/exe). Mirrors the SGX MRENCLAVE
  slot.
- Platform identity = keccak256(MRTD || RTMR0 || RTMR1 || RTMR2).
  Computed and approved on chain by TDXValidationHook; the kernel only
  exposes the constituent measurements.

Why RTMR3 (and not RTMR2): RTMR2 measures the TD initrd + cmdline (a
boot-image property decided at TD launch). The userspace Go binary is
loaded from disk after boot and is never reflected in RTMR2 — so a
keccak256(RTMR2) rule would commit to a boot artifact instead of the
running ELF. RTMR3 is the first software-defined register the TD
payload can claim, so the kernel extends it via configfs-tsm exactly
once during init(), before any operator-supplied code path runs. The
post-extend value reduces to SHA-384(0x00..00 || SHA-384(elf)) and is
fully determined by the running binary.

Self-extend failure (missing tdx_guest driver, sysfs permission, EBUSY
from the TDX module) is fail-closed: both the quote provider and the
TPM are swapped for fail-closed stubs, so the backend never emits a
quote with an unbound RTMR3 nor seals material against an unbound
identity.

Other highlights:
- Vendor adapter abstraction at enclave/tdx/platform/. The supported
  vendor is 'direct' (configfs-tsm on kernel >= 6.7). Runtime selection
  via STORY_TDX_VENDOR with Probe() auto-detect.
- Sealed storage uses TPM2 PolicyOR over per-provider PCR sets combined
  with an AES-GCM hybrid wrap, so the TPM only seals the data-encryption
  key.
- Startup self-check verifies TPM responsiveness, non-zero PCRs, the
  PolicyOR digest, and a quote round-trip whose REPORT_DATA carries a
  canary. A bootstrap-mode WARN log surfaces the empirically measured
  digest so operators can pin it on first deploy.
- Identity exposes 32-byte keccak256(RTMR3) as CodeCommitment, matching
  the chain-side TDXValidationHook contract.
- Unit tests cover the hash-and-extend halves of extendBinaryMeasurementOnce
  (hashSelfBinary, writeRTMRExtend) and pin the fresh-boot RTMR3 derivation
  formula against the live devnet value (TestRTMR3_FreshBootDerivation).

Tests pass on Linux under make test-cover (SGX), test-noop, and test-tdx;
enclave/tdx package coverage is 87.5%. SGX backend behaviour is
unchanged.
GitHub Actions runners are intermittently failing to download the SHA
that the v5 moving tag of actions/setup-go currently resolves to:

  ##[error]An action could not be found at the URI
    'https://codeload.github.com/actions/setup-go/tar.gz/40f1582b...'

The same SHA is fetchable from outside the runner network, so this is
a runner-side cache miss against the resolved v5 commit. Bumping to v6
forces resolution to a fresh SHA. No behavioural change — v6 is a
drop-in for our 'go-version: 1.24' usage.
@0xHansLee 0xHansLee force-pushed the feat/impl-tdx-backend branch from b504b4b to 6a3ca9c Compare June 8, 2026 08:30
0xHansLee added 2 commits June 8, 2026 17:53
…ary-bound sealing

The prior commit added PCR 12 to the seal policy as the binary-identity
half of the TPM2 PolicyPCR. PCR 12 is meaningful only if some component
above story-kernel — the initrd, in practice — extends it with the
running ELF's hash before exec'ing the kernel. This commit ships the
scripts and operator guide that make that hand-off concrete.

Adds enclave/tdx/setup/scripts/:
  - measure-binary.sh   initrd stub: SHA-256(story-kernel ELF) → PCR 12
  - verify-pcr12.sh     post-boot check: PCR 12 == SHA256(0 || SHA256(ELF))
  - verify-platform.sh  extract MRTD/RTMR0..2 from a fresh TDX quote and
                        compute the platform_commitment that
                        TDXValidationHook.approvedPlatforms expects
                        (for governance handoff)
  - test/               docker harness that runs swtpm + tpm2-tools
                        and exercises the scripts end-to-end

Adds enclave/tdx/setup/production-image.md: the production deployment
recipe — what the image must do, boot sequence with the in-TD swtpm +
PCR 12 step, reproducibility constraints, GCP/bare-metal deployment,
on-chain platform_commitment handoff, and a post-boot operator
checklist.

Cross-links: README.md "Per–guest-interface setup" promotes the
production-image.md path to first position; direct.md keeps its DEV/QA
warning and now points to the production guide.

Tested locally:
  - shellcheck (sh dialect) clean across all three scripts.
  - Docker-based test harness runs swtpm in TCP socket mode + tpm2-tools
    and asserts: (a) measure-binary.sh extends PCR 12 with the right
    SHA-256, (b) verify-pcr12.sh PASSes for the legit ELF, (c)
    verify-pcr12.sh FAILs for a tampered ELF, (d) PCR 12 is
    reboot-deterministic for the same ELF (swtpm wipe → re-measure →
    identical PCR value).
… note

Validation on a real GCP c3-standard-4 TDX VM surfaced two minor setup
issues that the docs had glossed over:

  - eth-utils alone has no default keccak backend; `pip install eth-utils`
    succeeds but the first keccak() call dies with ImportError unless
    pycryptodome (or pysha3) is also installed. Add an explicit
    `eth-hash[pycryptodome]` install instruction and probe the backend
    at startup with an empty-string keccak so the error message points
    at the missing backend, not just the missing module.
  - Operators commonly want eth-utils inside a venv (stock Ubuntu pip
    can't write to the system site-packages without --break-system-packages,
    and -EXTERNALLY-MANAGED-ENVIRONMENT keeps biting). Add a PYTHON
    env override so the script can be pointed at a venv's python3
    without rewriting PATH.

The platform_commitment extraction was end-to-end validated against a
real V4 TDX quote pulled via configfs-tsm on a GCP c3-standard-4
direct-launch confidential VM. The MRTD captured matched the value
documented in enclave/tdx/README.md's "Supported platforms" table for
that SKU.

shellcheck (sh dialect) and the Docker-based test harness for the
measurement scripts both stay clean.
@yingyangxu2026

Copy link
Copy Markdown
Contributor

#68's 64/65 key-length asserts conflict with the now-merged #844 — please change them to 32/64 before merging.

This PR pins dkgPubKeySize=64 / enclaveCommKeySize=65 in service/utils.go::calculateReportData, with a comment saying they match DKG.sol::register. Both values are now wrong:

  1. Out of sync with the contract: #844 (merged into dkg/dev) already changed DKG.sol::register to require 32/64, to align with what the kernel actually emits.
  2. Out of sync with the kernel's own output (the more serious one): the kernel's DKG runs on ed25519, so dkgPubKey is inherently 32 bytes; enclaveCommKey is the secp256k1 key with the 0x04 prefix stripped = 64 bytes. This PR would feed the real 32/64 keys into its own 64/65 asserts → registration fails. The current dd7eca07 kernel registers fine precisely because the older calculateReportData had no such asserts.

Pinning the lengths to keep the packed preimage injective is the right idea — just pin to the correct values. Before merging, please change:

  • service/utils.go: dkgPubKeySize 64 → 32, enclaveCommKeySize 65 → 64
  • service/utils_test.go: validDkgPubKey 64 → 32; validEnclaveCommKey 65 → 64 and drop k[0] = 0x04 (the kernel already strips the prefix, so there's no 0x04 in the 64-byte form)

See issue #73 and story PR #844 for the kernel↔contract alignment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants