Was v1alpha2 batch scheduler#4962
Conversation
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
Signed-off-by: Mark Rossetti <marosset@microsoft.com>
|
@Future-Outlier - Here is the kubernetes workload aware scheduling work implemented as a batch scheduler plugin. |
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
Reviewed by Cursor Bugbot for commit 90d2a29. Configure here.
| if existing.DeletionTimestamp != nil { | ||
| return fmt.Errorf("PodGroup %s/%s is being deleted (finalizer pending), will retry", podGroup.Namespace, podGroup.Name) | ||
| } | ||
| } |
There was a problem hiding this comment.
Stale PodGroup kept on exists
High Severity
After a stale Workload is recreated, syncSchedulingResources treats an existing PodGroup as success when Create returns AlreadyExists and the object is not terminating. It never compares or updates SchedulingPolicy (e.g. gang MinCount). A pre-delete PodGroup can keep an old policy while the new Workload templates match the cluster, so gang sizing stays wrong until manual intervention.
Reviewed by Cursor Bugbot for commit 90d2a29. Configure here.


Why are these changes needed?
These changes implement batch scheduling using Kubernetes native workload aware scheduling APIs using the KubeRay batch scheduler interface.
These changes are a re-implementation of #4723 because there were a few different suggestions to see what this would look like as a batch scheduler plugin.
I think there are a few pros and a few cons to this approach
Pros
Cons
Related issue number
Part of #4344
Checks