@@ -11,7 +11,7 @@ author: >
1111 Bartosz Rejman (Google),
1212 Jon Huhn (Microsoft),
1313 Maciej Wyrzuc (Google),
14- TBD
14+ Heba Elayoty (Microsoft)
1515---
1616
1717AI/ML and batch workloads introduce unique scheduling challenges that go beyond simple Pod-by-Pod scheduling.
@@ -328,7 +328,80 @@ utilize DRA for scalable device management.
328328
329329# # Integration with the Job controller
330330
331- TBD
331+ In Kubernetes v1.36, the Job controller can create and manage Workload and PodGroup objects on your behalf,
332+ so that Jobs representing a tightly coupled parallel application, such as distributed AI training,
333+ are gang-scheduled without any additional tooling. Without this integration, you would have to
334+ create the Workload and PodGroup yourself and wire their references into the Pod template.
335+ Now, the Job controller automates this process natively.
336+
337+ When the [`WorkloadWithJob`](/docs/reference/command-line-tools-reference/feature-gates/#WorkloadWithJob)
338+ feature gate is enabled, the Job controller automatically :
339+
340+ * creates a `Workload` and a corresponding runtime `PodGroup` for each qualifying Job,
341+
342+ * sets `.spec.schedulingGroup` onto every Pod the Job creates
343+ so the scheduler treats them as a single gang, and
344+
345+ * sets the Job as the owner of the generated objects,
346+ so they are garbage-collected when the Job is deleted.
347+
348+ # ## When does the integration kick in?
349+
350+ To keep the first feature iteration predictable, the Job controller only creates a
351+ Workload and PodGroup when the Job has a well-defined, fixed shape :
352+
353+ * `.spec.parallelism` is greater than 1
354+
355+ * [`.spec.completionMode`](/docs/concepts/workloads/controllers/job/#completion-mode) is set to `Indexed`
356+
357+ * `.spec.completions` is equal to `.spec.parallelism`
358+
359+ * The `schedulingGroup` is not already set on the Pod template.
360+
361+ These conditions describe the class of Jobs that gang scheduling can reason about :
362+ each Pod has a stable identity (`Indexed`), the gang size is known and fixed at admission time
363+ (`parallelism` == `completions`), and no other controller has already claimed scheduling responsibility
364+ (`schedulingGroup` field is unset). Jobs that do not meet these conditions are scheduled Pod-by-Pod,
365+ exactly as before.
366+
367+ If you set `schedulingGroup` on the Pod template yourself (for example,
368+ because a higher-level controller is managing the workload), the Job controller leaves
369+ the Pod template alone and does not create its own Workload or PodGroup. This makes the feature
370+ safe to enable in clusters that already use an external batch system.
371+
372+ Here is an example of a Job that qualifies for gang scheduling :
373+
374+ ` ` ` yaml
375+ apiVersion: batch/v1
376+ kind: Job
377+ metadata:
378+ name: training-job
379+ namespace: job-ns
380+ spec:
381+ completionMode: Indexed
382+ parallelism: 4
383+ completions: 4
384+ template:
385+ spec:
386+ restartPolicy: Never
387+ containers:
388+ - name: worker
389+ image: registry.example/trainer:latest
390+ ` ` `
391+
392+ The Job controller creates a Workload and a PodGroup owned by this Job,
393+ and every Pod it creates carries a `.spec.schedulingGroup` that points at the generated PodGroup.
394+ The Pods are then scheduled together once all four can be placed at the same time using
395+ the PodGroup scheduling cycle described earlier in this post.
396+
397+ # ## What's not covered yet
398+
399+ The current constraints limit this integration to static, indexed, fully-parallel Jobs.
400+ Support for additional workload shapes, including elastic Jobs and other built-in controllers,
401+ is tracked in [KEP-5547](https://kep.k8s.io/5547).
402+
403+ In future Kubernetes releases, this integration will expand to support additional workload controllers,
404+ and the current constraints for Jobs may be relaxed.
332405
333406# # What's next?
334407
@@ -376,7 +449,7 @@ Once the prerequisite is met, you can enable specific features:
376449 [`DRAWorkloadResourceClaims`](/docs/reference/command-line-tools-reference/feature-gates/#DRAWorkloadResourceClaims)
377450 feature gate on the `kube-apiserver`, `kube-controller-manager`, `kube-scheduler` and `kubelet`.
378451* Workload API integration with the Job controller: Enable the
379- [`EnableWorkloadWithJob `](/docs/reference/command-line-tools-reference/feature-gates/#EnableWorkloadWithJob)
452+ [`WorkloadWithJob `](/docs/reference/command-line-tools-reference/feature-gates/#EnableWorkloadWithJob)
380453 feature gate on the `kube-apiserver` and `kube-controller-manager`.
381454
382455We encourage you to try out workload-aware scheduling in your test clusters
0 commit comments