Skip to content

feat: add Azure ephemeral full-caching preview#186

Merged
steipete merged 6 commits into
openclaw:mainfrom
jwmoss:feat/azure-ephemeral-preview
May 30, 2026
Merged

feat: add Azure ephemeral full-caching preview#186
steipete merged 6 commits into
openclaw:mainfrom
jwmoss:feat/azure-ephemeral-preview

Conversation

@jwmoss
Copy link
Copy Markdown
Contributor

@jwmoss jwmoss commented May 30, 2026

Summary

  • add --azure-os-disk ephemeral-preview / azure.osDisk: ephemeral-preview for Azure ephemeral OS disk full caching
  • set diffDiskSettings.enableFullCaching: true with Compute API 2025-04-01 on direct CLI and Worker VM create paths
  • filter Crabbox Azure fallback candidates for preview-compatible x64 and ARM64 local-disk SKUs, including Windows promotion from too-small classes to D8/D16
  • reject unsupported explicit preview SKUs before shared Azure network allocation and keep native checkpoint paths refusing ephemeral OS disk leases

Live findings

  • Azure announced public preview: Ephemeral OS Disk with full caching for VM/VMSS; this PR wires that preview capability into Crabbox with explicit SKU validation
  • Go SDK armcompute/v6 and checked v7.3.0 do not expose enableFullCaching, so only the preview VM create path uses a narrow raw ARM PUT
  • direct Linux x64 Standard_D8ads_v6 and brokered Linux x64 Standard_D8ads_v6 both returned diffDiskSettings.enableFullCaching: true from Azure after create
  • Azure Resource SKUs audit across the hardcoded Crabbox VM sizes matched the fallback filter; unsupported preview SKUs are now rejected before resource group/VNet/NIC/PIP allocation
  • Linux ARM64 pds_v6 SKUs advertise ephemeral OS disk support and NVMe placement in East US; live ARM64 create is blocked in this tenant by Dpdsv6 and spot vCPU quotas
  • Windows x64 Standard_D8ads_v6 DiskSpd, 4K random write, 512 MiB, 15s, QD32 x 4: ephemeral = 138.58 MiB/s, 35475.34 IOPS, 3.416 ms avg latency; ephemeral-preview = 139.09 MiB/s, 35607.92 IOPS, 3.551 ms avg latency

Verification

  • go build -trimpath -o bin/crabbox ./cmd/crabbox
  • go test ./internal/cli -run 'Azure|CoordinatorCreateLease.*Azure|ApplyNativeCheckpointForkConfigHonorsAzureOSDisk'
  • go test ./internal/providers/azure
  • go vet ./...
  • npm run format:check --prefix worker
  • npm run lint --prefix worker
  • npm run check --prefix worker
  • npm test --prefix worker -- test/azure.test.ts test/config.test.ts
  • npm run build --prefix worker
  • git diff --check
  • CRABBOX_CONFIG=/dev/null CRABBOX_AZURE_LOCATION=eastus bin/crabbox run --provider azure --target windows --type Standard_D8ads_v6 --azure-os-disk ephemeral --market on-demand --no-sync --preflight --stop-after always --slug diskspd-ephemeral --shell '<DiskSpd workload>'
  • CRABBOX_CONFIG=/dev/null CRABBOX_AZURE_LOCATION=eastus bin/crabbox run --provider azure --target windows --type Standard_D8ads_v6 --azure-os-disk ephemeral-preview --market on-demand --no-sync --preflight --stop-after always --slug diskspd-preview --shell '<DiskSpd workload>'
  • az resource list -g crabbox-leases --query "[?contains(name,'diskspd') || contains(name,'ephem-preview')].{type:type,name:name}" -o table returned no leftover resources

@clawsweeper
Copy link
Copy Markdown

clawsweeper Bot commented May 30, 2026

Codex review: needs maintainer review before merge. Reviewed May 30, 2026, 12:12 PM ET / 16:12 UTC.

Summary
The branch adds ephemeral-preview as an Azure OS disk mode across direct CLI and Worker provisioning, filters fallback SKUs, documents checkpoint limits, and adds Go/Worker tests.

Reproducibility: not applicable. this is a new feature PR, not a broken existing behavior report. The review checked the branch diff, current-main absence, tests, and the PR body's live Azure proof instead of reproducing a failing current-main bug.

Review metrics: 2 noteworthy metrics.

  • Diff Size: 16 files, +1073/-123. The feature spans CLI, Worker, docs, and tests, so maintainers should review both provider paths before merge.
  • Provisioning Paths: 2 VM create paths changed. Direct Go provisioning and Worker brokered provisioning both now send preview full-caching payloads.

Merge readiness
Overall: 🐚 platinum hermit
Proof: 🦞 diamond lobster
Patch quality: 🐚 platinum hermit
Result: ready for maintainer review.

Overall follows the weaker of proof and patch quality, so missing proof can cap an otherwise strong patch.

Rank-up moves:

  • [P2] Maintainer should decide whether Crabbox should accept the preview API and SKU-heuristic support burden before merge.

Risk before merge

  • [P1] The new mode depends on Azure Compute API 2025-04-01 public-preview behavior plus Crabbox-maintained SKU heuristics, so CI and one-region live proof cannot fully settle Azure rollout, quota, and regional support variance.
  • [P1] Both direct Go CLI and Worker VM creation paths now carry a raw preview ARM request path for this one field, so maintainers should explicitly accept that support burden before merge.

Maintainer options:

  1. Accept The Opt-In Preview Surface
    A maintainer can choose to merge with the preview clearly scoped to ephemeral-preview and own the API-version/SKU-heuristic risk.
  2. Require Broader Azure Proof
    Ask for another live check in a second Azure region or SKU family if rollout confidence is required before merge.
  3. Pause For Stable Azure Support
    Park or close the PR if maintaining a raw public-preview ARM path is not worth carrying until SDK or GA support exists.

Next step before merge

  • [P2] The remaining blocker is maintainer acceptance of Azure public-preview provider support, not a narrow automated repair.

Security
Cleared: The diff uses existing Azure credentials for a scoped ARM VM create path and I did not find new secret storage, broader permissions, or untrusted code execution.

Review details

Best possible solution:

Land the opt-in preview only if maintainers accept provider-preview ownership, keeping managed disks as the default and preserving checkpoint refusal for ephemeral OS disk leases.

Do we have a high-confidence way to reproduce the issue?

Not applicable: this is a new feature PR, not a broken existing behavior report. The review checked the branch diff, current-main absence, tests, and the PR body's live Azure proof instead of reproducing a failing current-main bug.

Is this the best way to solve the issue?

Unclear until maintainer signoff: the implementation is narrow and opt-in with focused tests and live proof, but accepting a public-preview Azure API and hardcoded SKU heuristic is a provider/product decision.

AGENTS.md: found and applied where relevant.

Codex review notes: model gpt-5.5, reasoning high; reviewed against c81969e6903b.

Label changes

Label justifications:

  • P2: This is a normal-priority opt-in provider feature with bounded blast radius but meaningful maintainer review needs.
  • merge-risk: 🚨 other: The concrete merge risk is Azure public-preview/provider-rollout uncertainty rather than compatibility, auth, security, delivery, or availability breakage in existing defaults.
  • rating: 🐚 platinum hermit: Overall readiness is 🐚 platinum hermit; proof is 🦞 diamond lobster and patch quality is 🐚 platinum hermit.
  • status: 👀 ready for maintainer look: ClawSweeper has no concrete contributor-facing blocker left for this PR. Sufficient (live_output): The PR body includes after-fix live Azure output for direct and brokered creates, DiskSpd comparison data, SKU audit notes, and cleanup verification.
  • proof: sufficient: Contributor real behavior proof is sufficient. The PR body includes after-fix live Azure output for direct and brokered creates, DiskSpd comparison data, SKU audit notes, and cleanup verification.
Evidence reviewed

What I checked:

  • Repository policy read: Read the full target AGENTS.md and applied the provider-neutral/core-boundary and release-note guidance while reviewing the Azure-specific adapter changes. (AGENTS.md:1, c81969e6903b)
  • Current main lacks the requested mode: Current main has no ephemeral-preview, enableFullCaching, or 2025-04-01 implementation, so the PR is not already implemented on the default branch. (c81969e6903b)
  • Direct CLI implementation: The PR branch validates the new mode and routes full-caching VM creation through a raw ARM PUT using the preview Compute API because the SDK path does not expose the new field. (internal/cli/azure.go:960, b107bde654fe)
  • Worker implementation: The Worker path sets diffDiskSettings.enableFullCaching, switches the VM PUT to the preview API for the new OS disk mode, and keeps snapshot-backed forks on the stable managed-disk path. (worker/src/azure.ts:709, b107bde654fe)
  • Focused regression coverage: The PR adds tests for full-caching payload/API version selection, unsupported SKU rejection before network allocation, filtered defaults, and snapshot forks staying managed. (worker/test/azure.test.ts:687, b107bde654fe)
  • Real behavior proof in PR body: The PR body reports live direct and brokered Azure creates returning diffDiskSettings.enableFullCaching: true, Windows DiskSpd comparison output, SKU audit findings, and post-run cleanup verification. (b107bde654fe)

Likely related people:

  • Vincent Koc: Blame on main attributes the Azure OS disk normalization and fallback tables to the merged Azure provider commit, and recent log history shows multiple Azure provider fixes. (role: introduced current Azure provider surface; confidence: high; commits: 710cf4b5b079, 11f143748dc1, 811d48708f8d; files: internal/cli/azure.go, worker/src/config.ts, docs/providers/azure.md)
  • steipete: Recent Azure fallback/network commits and the PR head's final repair commits are authored by Peter Steinberger, including snapshot-fork sizing and spot fallback cleanup. (role: recent adjacent owner; confidence: high; commits: b107bde654fe, 6f84ebe18c90, 1a93338fda11; files: internal/cli/azure.go, worker/src/azure.ts, worker/src/config.ts)
  • jwmoss: Jonathan Moss authored the initial PR commit and has prior merged Azure OS disk/provider work on main, including defaulting Azure leases to checkpointable OS disks. (role: Azure feature contributor; confidence: high; commits: 562dba5ca4f6, 389c696cb578, 00725544c767; files: internal/cli/azure.go, worker/src/azure.ts, docs/features/azure.md)
What the crustacean ranks mean
  • 🦀 challenger crab: rare, exceptional readiness with strong proof, clean implementation, and convincing validation.
  • 🦞 diamond lobster: very strong readiness with only minor maintainer review expected.
  • 🐚 platinum hermit: good normal PR, likely mergeable with ordinary maintainer review.
  • 🦐 gold shrimp: useful signal, but proof or patch confidence is still limited.
  • 🦪 silver shellfish: thin signal; proof, validation, or implementation needs work.
  • 🧂 unranked krab: not merge-ready because proof is missing/unusable or there are serious correctness or safety concerns.
  • 🌊 off-meta tidepool: rating does not apply to this item.

Shiny media proof means a screenshot, video, or linked artifact directly shows the changed behavior. Runtime, network, CSP, and security claims still need visible diagnostics.

How this review workflow works
  • ClawSweeper keeps one durable marker-backed review comment per issue or PR.
  • Re-runs edit this comment so the latest verdict, findings, and automation markers stay together instead of adding duplicate bot comments.
  • A fresh review can be triggered by eligible @clawsweeper re-review comments, exact-item GitHub events, scheduled/background review runs, or manual workflow dispatch.
  • PR/issue authors and users with repository write access can comment @clawsweeper re-review or @clawsweeper re-run on an open PR or issue to request a fresh review only.
  • Maintainers can also comment @clawsweeper review to request a fresh review only.
  • Fresh-review commands do not start repair, autofix, rebase, CI repair, or automerge.
  • Maintainer-only repair and merge flows require explicit commands such as @clawsweeper autofix, @clawsweeper automerge, @clawsweeper fix ci, or @clawsweeper address review.
  • Maintainers can comment @clawsweeper explain to ask for more context, or @clawsweeper stop to stop active automation.

@clawsweeper clawsweeper Bot added proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR. P2 Normal priority bug or improvement with limited blast radius. merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. labels May 30, 2026
@steipete steipete force-pushed the feat/azure-ephemeral-preview branch from 78dc325 to b107bde Compare May 30, 2026 16:05
@steipete steipete merged commit 190257b into openclaw:main May 30, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

merge-risk: 🚨 other 🚨 Merging this PR has meaningful risk outside the owned taxonomy. P2 Normal priority bug or improvement with limited blast radius. proof: sufficient Contributor real behavior proof is sufficient. rating: 🐚 platinum hermit Good normal PR readiness with ordinary maintainer review expected. status: 👀 ready for maintainer look ClawSweeper has no concrete contributor-facing blocker left for this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants