Skip to content

Fail fast on terminal AKS machine failures#1212

Merged
karenychen merged 6 commits into
Azure:mainfrom
karenychen:feat/k8s-machine-client-az-cli
Jun 12, 2026
Merged

Fail fast on terminal AKS machine failures#1212
karenychen merged 6 commits into
Azure:mainfrom
karenychen:feat/k8s-machine-client-az-cli

Conversation

@karenychen

Copy link
Copy Markdown
Contributor

Summary

  • add ListMachines polling during machine readiness waits to fail early when expected machines terminally fail
  • keep BatchPutMachine request sizing within the 50-machine service limit
  • record successful_machines as a count instead of uploading machine names
  • add focused unit coverage for timeout propagation, ListMachines pagination, terminal failure detection, and batch sizing

Validation

  • python3 -m pytest modules/python/tests/test_aks_machine_client.py modules/python/tests/test_machine_crud.py modules/python/tests/test_crud_main.py -q
  • python3 -m py_compile modules/python/clients/aks_machine_client.py modules/python/tests/test_aks_machine_client.py
  • PYTHONPATH=modules/python python3 -m pylint modules/python/tests/test_aks_machine_client.py modules/python/clients/aks_machine_client.py modules/python/crud/azure/machine_crud.py modules/python/crud/main.py
  • git diff --check

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves AKS Machine-mode scale-out observability and failure behavior by failing readiness waits early when the AKS Machine resources reach terminal failure states, while also enforcing the 50-machine BatchPutMachine request limit and tightening recorded operation metadata.

Changes:

  • Add Machine API ListMachines polling during node readiness waits to fail fast when expected machines are terminal and failed.
  • Enforce BatchPutMachine sizing constraints (≤ 50 machines per request), including worker-based batch sizing validation.
  • Record successful_machines as a count (not machine names) and add targeted unit tests for the new behaviors.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
modules/python/clients/aks_machine_client.py Adds ListMachines pagination + terminal failure detection hooked into readiness polling; enforces 50-machine BatchPutMachine sizing; stores successful machine count metadata.
modules/python/tests/test_aks_machine_client.py Updates metadata assertions and adds focused unit coverage for timeout propagation, ListMachines pagination, terminal failure detection, and batch sizing limits.

@karenychen karenychen force-pushed the feat/k8s-machine-client-az-cli branch 12 times, most recently from 8d6a269 to d5821f9 Compare June 5, 2026 21:27
@karenychen karenychen force-pushed the feat/k8s-machine-client-az-cli branch from d5821f9 to f70bd18 Compare June 5, 2026 21:45
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
@karenychen karenychen force-pushed the feat/k8s-machine-client-az-cli branch from 86e59b2 to 20f516f Compare June 10, 2026 22:22
@karenychen karenychen force-pushed the feat/k8s-machine-client-az-cli branch from 20f516f to 6f93d9e Compare June 10, 2026 22:39
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
Comment thread modules/python/clients/aks_machine_client.py Outdated
@karenychen karenychen force-pushed the feat/k8s-machine-client-az-cli branch from 86dc1d6 to 62e2cb5 Compare June 12, 2026 00:37
@karenychen karenychen merged commit 7761d87 into Azure:main Jun 12, 2026
2 checks passed
PabloTriv pushed a commit that referenced this pull request Jun 12, 2026
## Summary
- add ListMachines polling during machine readiness waits to fail early
when expected machines terminally fail
- keep BatchPutMachine request sizing within the 50-machine service
limit
- record successful_machines as a count instead of uploading machine
names
- add focused unit coverage for timeout propagation, ListMachines
pagination, terminal failure detection, and batch sizing

## Validation
- python3 -m pytest modules/python/tests/test_aks_machine_client.py
modules/python/tests/test_machine_crud.py
modules/python/tests/test_crud_main.py -q
- python3 -m py_compile modules/python/clients/aks_machine_client.py
modules/python/tests/test_aks_machine_client.py
- PYTHONPATH=modules/python python3 -m pylint
modules/python/tests/test_aks_machine_client.py
modules/python/clients/aks_machine_client.py
modules/python/crud/azure/machine_crud.py modules/python/crud/main.py
- git diff --check
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants