Skip to content

fix(runtimes): implement JAX reserved environment variable validation in Validate()#3670

Open
AdeshDeshmukh wants to merge 1 commit into
kubeflow:masterfrom
AdeshDeshmukh:fix-jax-validate-reserved-envs
Open

fix(runtimes): implement JAX reserved environment variable validation in Validate()#3670
AdeshDeshmukh wants to merge 1 commit into
kubeflow:masterfrom
AdeshDeshmukh:fix-jax-validate-reserved-envs

Conversation

@AdeshDeshmukh

Copy link
Copy Markdown

What this PR does / why we need it:

The JAX plugins Validate()method atpkg/runtime/framework/plugins/jax/jax.go:52-54 was previously a no-op (return nil, nil). This allowed users to manually set JAX_NUM_PROCESSES, JAX_PROCESS_ID, or JAX_COORDINATOR_ADDRESS in their TrainJobs trainer.env. When the controller runs EnforceMLPolicy, it silently overwrites these user-provided values via apply.UpsertEnvVars, leading to hard-to-debug training failures.

All other ML policy plugins (Torch, MPI, XGBoost) validate and reject TrainJobs that set reserved environment variables. This PR brings JAX to the same standard by:

  1. Adding JAXEnvNumProcesses, JAXEnvProcessID, and JAXEnvCoordinatorAddress constants and a JAXReservedEnvNames set in pkg/constants/constants.go
  2. Implementing real validation in the JAX plugins Validate()method that checks for reserved envs and returnsfield.Invalid` errors, following the exact pattern used by the Torch plugin
  3. Adding comprehensive test coverage with 9 test cases covering guard clauses, non-reserved envs, and reserved env rejection

Which issue(s) this PR fixes (optional, in Fixes #<issue number>, #<issue number>, ... format, will close the issue(s) when PR gets merged):
Fixes #3669

Checklist:

  • Docs included if any changes are user facing

Copilot AI review requested due to automatic review settings June 30, 2026 08:45
@google-oss-prow google-oss-prow Bot requested review from jinchihe and kuizhiqing June 30, 2026 08:45
@google-oss-prow

Copy link
Copy Markdown

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andreyvelich for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot was unable to review this pull request because the user who requested the review has reached their quota limit.

… in Validate()

Signed-off-by: Adesh Deshmukh <adeshkd123@gmail.com>
@AdeshDeshmukh AdeshDeshmukh force-pushed the fix-jax-validate-reserved-envs branch from d76a25a to a878529 Compare June 30, 2026 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JAX plugin Validate() is a no-op - missing reserved env validation

2 participants