Skip to content

Commit e9cb883

Browse files
committed
Add a section about Job integration
1 parent 3360920 commit e9cb883

1 file changed

Lines changed: 76 additions & 3 deletions

File tree

content/en/blog/_posts/2026/workload-aware-scheduling-1-36.md

Lines changed: 76 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ author: >
1111
Bartosz Rejman (Google),
1212
Jon Huhn (Microsoft),
1313
Maciej Wyrzuc (Google),
14-
TBD
14+
Heba Elayoty (Microsoft)
1515
---
1616

1717
AI/ML and batch workloads introduce unique scheduling challenges that go beyond simple Pod-by-Pod scheduling.
@@ -328,7 +328,80 @@ utilize DRA for scalable device management.
328328

329329
## Integration with the Job controller
330330

331-
TBD
331+
In Kubernetes v1.36, the Job controller can create and manage Workload and PodGroup objects on your behalf,
332+
so that Jobs representing a tightly coupled parallel application, such as distributed AI training,
333+
are gang-scheduled without any additional tooling. Without this integration, you would have to
334+
create the Workload and PodGroup yourself and wire their references into the Pod template.
335+
Now, the Job controller automates this process natively.
336+
337+
When the [`WorkloadWithJob`](/docs/reference/command-line-tools-reference/feature-gates/#WorkloadWithJob)
338+
feature gate is enabled, the Job controller automatically:
339+
340+
* creates a `Workload` and a corresponding runtime `PodGroup` for each qualifying Job,
341+
342+
* sets `.spec.schedulingGroup` onto every Pod the Job creates
343+
so the scheduler treats them as a single gang, and
344+
345+
* sets the Job as the owner of the generated objects,
346+
so they are garbage-collected when the Job is deleted.
347+
348+
### When does the integration kick in?
349+
350+
To keep the first feature iteration predictable, the Job controller only creates a
351+
Workload and PodGroup when the Job has a well-defined, fixed shape:
352+
353+
* `.spec.parallelism` is greater than 1
354+
355+
* [`.spec.completionMode`](/docs/concepts/workloads/controllers/job/#completion-mode) is set to `Indexed`
356+
357+
* `.spec.completions` is equal to `.spec.parallelism`
358+
359+
* The `schedulingGroup` is not already set on the Pod template.
360+
361+
These conditions describe the class of Jobs that gang scheduling can reason about:
362+
each Pod has a stable identity (`Indexed`), the gang size is known and fixed at admission time
363+
(`parallelism` == `completions`), and no other controller has already claimed scheduling responsibility
364+
(`schedulingGroup` field is unset). Jobs that do not meet these conditions are scheduled Pod-by-Pod,
365+
exactly as before.
366+
367+
If you set `schedulingGroup` on the Pod template yourself (for example,
368+
because a higher-level controller is managing the workload), the Job controller leaves
369+
the Pod template alone and does not create its own Workload or PodGroup. This makes the feature
370+
safe to enable in clusters that already use an external batch system.
371+
372+
Here is an example of a Job that qualifies for gang scheduling:
373+
374+
```yaml
375+
apiVersion: batch/v1
376+
kind: Job
377+
metadata:
378+
name: training-job
379+
namespace: job-ns
380+
spec:
381+
completionMode: Indexed
382+
parallelism: 4
383+
completions: 4
384+
template:
385+
spec:
386+
restartPolicy: Never
387+
containers:
388+
- name: worker
389+
image: registry.example/trainer:latest
390+
```
391+
392+
The Job controller creates a Workload and a PodGroup owned by this Job,
393+
and every Pod it creates carries a `.spec.schedulingGroup` that points at the generated PodGroup.
394+
The Pods are then scheduled together once all four can be placed at the same time using
395+
the PodGroup scheduling cycle described earlier in this post.
396+
397+
### What's not covered yet
398+
399+
The current constraints limit this integration to static, indexed, fully-parallel Jobs.
400+
Support for additional workload shapes, including elastic Jobs and other built-in controllers,
401+
is tracked in [KEP-5547](https://kep.k8s.io/5547).
402+
403+
In future Kubernetes releases, this integration will expand to support additional workload controllers,
404+
and the current constraints for Jobs may be relaxed.
332405

333406
## What's next?
334407

@@ -376,7 +449,7 @@ Once the prerequisite is met, you can enable specific features:
376449
[`DRAWorkloadResourceClaims`](/docs/reference/command-line-tools-reference/feature-gates/#DRAWorkloadResourceClaims)
377450
feature gate on the `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `kubelet`.
378451
* Workload API integration with the Job controller: Enable the
379-
[`EnableWorkloadWithJob`](/docs/reference/command-line-tools-reference/feature-gates/#EnableWorkloadWithJob)
452+
[`WorkloadWithJob`](/docs/reference/command-line-tools-reference/feature-gates/#EnableWorkloadWithJob)
380453
feature gate on the `kube-apiserver` and `kube-controller-manager`.
381454

382455
We encourage you to try out workload-aware scheduling in your test clusters

0 commit comments

Comments
 (0)