Harden Azure provisioning recovery#15697
Conversation
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 15697Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 15697" |
d224200 to
4956a53
Compare
|
Re-running the failed jobs in the CI workflow for this pull request because 1 job was identified as retry-safe transient failures in the CI run attempt.
|
|
@copilot fix the failing test |
I investigated all Azure test classes locally and all tests pass (BicepProvisionerTests, EnvironmentResourceExtensionsTests, ResourcePreparerTests, ProvisioningContextProviderTests, DeployerTests, BicepResourceTests — 178+ tests total). I was unable to access the CI logs to identify the specific failing test due to DNS restrictions in this environment. The previous two commits (95026c9, 7bbaf1f) addressed DI resolution failures for Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
/deployment-test |
|
🚀 Deployment tests starting on PR #15697... This will deploy to real Azure infrastructure. Results will be posted here when complete. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
How do you feel about test coverage? I think new code is greater than tests, which probably means there are gaps. However, it is difficult to unit test Azure APIs. Maybe if they were behind a client with an interface that could be mocks then the surrounding code can be well tested. |
Keep deployment cancellation available while an Azure deployment is active, show get-azure-resource JSON results immediately, preserve resource-specific location defaults, and address follow-up review feedback around command definitions and cached deployment JSON parsing. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
We have an end to end deployment test that actually passes, this code had 0 test coverage before so I'm feeling pretty good 😄 |
Some of them are that's why we can write unit tests to begin with but at some point we deploy a bicep file and that's the end of it. I'll see if I can get most of the scenarios unit tested. Though I care more about the ux right now. I just ran through the ux and there were some gaps to fill and experiences to improve. BTW we could really use a progress primitive for long running ops like this. |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Increase blocking wait budgets in Azure provisioning command tests so CI contention does not cause false timeouts while waiting for queued operations to start. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace fixed-time waits in Azure command interaction tests with signal-versus-operation coordination so tests do not depend on sleeps or timeout budgets. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Make Azure command interaction tests primarily synchronize on explicit operation signals, add watchdog diagnostics for hangs, and snapshot mutable fake state under locks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
❓ CLI E2E Tests unknown — 113 passed, 0 failed, 2 unknown (commit View all recordings
📹 Recordings uploaded automatically from CI run #27225236780 |
Description
This PR introduces
AzureProvisioningController, a serialized control loop that coordinates all run-mode Azure provisioning operations. It replaces the inline provisioning logic that previously lived inAzureProvisionerwith a channel-based queue that serializes startup provisioning, dashboard commands, CLI commands, per-resource recovery actions, and background drift detection through a single processing loop.Demos
Dashboard command menu: opens the storage resource commands and shows Cancel deployment disabled once provisioning has completed.
CLI demos:
get-azure-resourceController architecture
The controller uses a
Channel<QueuedOperation>with a single reader. Every operation — provision, reprovision, reset, change-location, change-context, cancel-deployment, delete-resource, delete-environment, drift-check — is modeled as a typed intent record that gets enqueued and processed one at a time. This eliminates races between concurrent dashboard commands, CLI commands, and the periodic drift monitor.Within a provisioning pass, individual resources fan out concurrently but are ordered by dependency. Each resource gets a per-resource
ProvisioningTaskCompletionSourcethat downstream resources await before starting their own deployment. The TCS is completed through the controller lifecycle paths, so dependents unblock as soon as their prerequisites finish rather than waiting for the entire batch.What the provisioning stack can do now
Resource commands (per-resource):
Environment commands (all resources):
Command state invariants:
Background drift detection:
Azure resource metadata:
azure.subscription.id,azure.resource.group,azure.tenant.id,azure.tenant.domain,azure.location, andresource.source(full ARM deployment id)Location overrides:
InvalidResourceLocationconflictsOther changes
BicepProvisioner— hardened checksum reuse validation, unified Azure identity metadata across fresh/cached paths, predicted deployment-id stamping for failed resourcesRunModeProvisioningContextProvider— refactored Azure context acquisition and interactive promptingAzureResourcePreparer— wires per-resource commands into the app model with command-specific enabled/disabled stateAzureProvisioningControllerin run mode (fixes DI failures in publish/test scenarios)Test coverage
Checklist
<remarks />and<code />elements on your triple slash comments?aspire.devissue: