docs: fix README drift + add consistency test (NVBug 6366190)#4488
docs: fix README drift + add consistency test (NVBug 6366190)#4488pruprakash wants to merge 2 commits into
Conversation
Light Code ReviewNice cleanup — the README fixes all look correct against current source, and the regression test is a good idea. Two issues to address: 1. Test directory will be skipped by pytest (critical)
2. DCLM README: clone vs submodule path mismatch The preprocess command now correctly uses Suggested test cases No perf tests impacted. |
Align documented examples with current source and add a regression test so the drift cannot silently return: - scripts/training/README.md: gpt_126m_pretrain_config -> vanilla_gpt_pretrain_config; nonexistent qwen25_vl_pretrain_config -> qwen25_vl_7b_sft_config; document --hf_path. - tutorials/recipes/llama/README.md: ../../conversion -> examples/conversion; GPTDatasetConfig field seq_length -> sequence_length (and matching comment). - tutorials/data/dclm/README.md: preprocess_data.py -> 3rdparty/Megatron-LM/; setup step now inits the bundled submodule (git submodule update --init 3rdparty/Megatron-LM) instead of cloning a standalone copy, so the documented path is consistent for both container and repo. - tests/unit_tests/doc_consistency/test_readme_consistency.py: stdlib-only test asserting documented recipes/paths/fields/flags match source (red-green verified). Placed in doc_consistency/ (not docs/) so pyproject norecursedirs does not exclude it from collection. NVBug: 6366190 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Pruthviraj Prakash <pruprakash@nvidia.com>
5a1d5e9 to
aafba7a
Compare
|
/ok to test aafba7a |
|
/ok to test aafba7a |
|
/ok to test a839369 |
Summary
Fixes NVBug 6366190: 6 documentation drifts across 4 Megatron-Bridge README/tutorial files, aligned with current source, plus a regression test so the drift can't silently return.
Changes
scripts/training/README.md: gpt_126m_pretrain_config → vanilla_gpt_pretrain_config; nonexistent qwen25_vl_pretrain_config → qwen25_vl_7b_sft_config; document the --hf_path flag. tutorials/recipes/llama/README.md: conversion path ../../conversion → examples/conversion/convert_checkpoints.py; GPTDatasetConfig field seq_length → sequence_length (and matching comment). tutorials/data/dclm/README.md: Megatron-LM/tools/preprocess_data.py → 3rdparty/Megatron-LM/tools/preprocess_data.py. tests/unit_tests/docs/test_readme_consistency.py: new stdlib-only test asserting documented recipes/paths/fields/flags match source.Verification
Red-green: FAIL on pre-fix docs, PASS after. 5 passed in nvcr.io/nvidian/nemo:26.06.rc9.
All six issues reproduced inside the container on 2026-06-23.
Reference
NVBug 6366190