Skip to content

Remove deprecated and unsafe type=bool usage on boolean CLI flags#5528

Open
ahmadki wants to merge 3 commits into
NVIDIA:mainfrom
ahmadki:ahmadki/fix-boolean-type
Open

Remove deprecated and unsafe type=bool usage on boolean CLI flags#5528
ahmadki wants to merge 3 commits into
NVIDIA:mainfrom
ahmadki:ahmadki/fix-boolean-type

Conversation

@ahmadki

@ahmadki ahmadki commented Jun 28, 2026

Copy link
Copy Markdown
Member
  • I, the PR author, have personally reviewed every line of this PR.

What does this PR do ?

Removes type=bool (and related) usage on boolean CLI options. Across the
codebase this usage is deprecated at best and dangerous at worst, and adds a
guard test so the unsafe cases cannot reappear.

  • argparse.BooleanOptionalAction stores a literal bool, so type, choices,
    and metavar are never applied. Passing them is deprecated: older
    interpreters ignore them and newer ones reject them outright, which makes the
    parser fail to build. Removed type=bool from five RL flags in _add_rl_args.
  • type=bool on a value-taking argparse argument is dangerous: argparse runs
    the raw string through bool(), and bool() of any non-empty string is
    True, so --flag False silently enables the flag. Converted --onnx-safe
    and --mix-hidden-states to action='store_true'.
  • type=bool on a Click is_flag or --x/--no-x option is redundant: Click
    resolves these to its BOOL type regardless (verified on the pinned Click
    8.4.1, behavior identical). Removed it from seven options in the CI scripts.
  • Added a static guard test that rejects both argparse patterns anywhere under
    megatron/, so they cannot be reintroduced.

Issue tracking

Linked issue: N/A (small bug fix and cleanup)

Contribution process

Pre-checks

  • I have added relevant unit tests
  • I have added relevant functional tests
  • I have added proper typing to my code
  • I have added relevant documentation
  • I have run the autoformatter.sh on my PR

ahmadki added 3 commits June 28, 2026 14:13
argparse.BooleanOptionalAction stores a literal True/False constant, so the
type, choices, and metavar keyword arguments are never applied. Passing them is
meaningless and deprecated: older interpreters ignore them, and newer ones
reject them outright, raising TypeError when the parser is built.

Five RL flags in _add_rl_args passed type=bool to BooleanOptionalAction. Remove
it; behavior is unchanged wherever the code runs.

Add a static guard test that scans the megatron package for BooleanOptionalAction
calls passing type/choices/metavar, plus a behavioral test asserting the RL flags
still register and parse as real booleans.

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
argparse applies the `type` callable to the raw string, and `bool()` of any
non-empty string is True, so `type=bool` on a value-taking argument means
`--flag False` silently *enables* the flag (only `--flag ""` yields False).

Two flags were affected:
  - megatron/training/arguments.py: --onnx-safe (the parsed attribute is unused
    and no caller passes the flag)
  - examples/post_training/modelopt/convert_model.py: --mix-hidden-states
    (consumed only as a truthy check, default False, no caller passes a value)

Both are pure on/off toggles, so convert them to action='store_true'
(default False), matching the surrounding flags (e.g. --quick-geglu).

Extend the argparse guard test to also reject add_argument(type=bool) anywhere
under megatron/, so this footgun cannot reappear in library code.

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
These Click options are boolean flags (declared with is_flag=True or the
--x/--no-x slash form). Click resolves such flags to its BOOL type regardless
of the type argument, so type=bool is redundant: the resolved option type is
BOOL with or without it, and parsing is identical. It is also a documented
anti-pattern (pallets/click#1062); combined with is_flag it has historically
made a missing flag default to None instead of the declared default.

Removing it is behavior-preserving on the pinned Click (8.4.1), verified by
introspection (resolved type is BOOL either way) and by parity tests showing
the missing-flag value is unchanged (the explicit False/True default), and the
affected commands still build and --help cleanly.

No CLI behavior changes. The value-taking --record-checkpoints option (declared
type=str elsewhere) is unaffected.

Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
@copy-pr-bot

copy-pr-bot Bot commented Jun 28, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants