Releases: NVIDIA-NeMo/Export-Deploy
Releases · NVIDIA-NeMo/Export-Deploy
NVIDIA NeMo-Export-Deploy 0.6.0
Changelog Details
- Version bump to
0.6.0rc0.dev0by @github-actions[bot] :: PR: #642 - chore: bump
_code_freezeworkflow tov0.86.0by @ko3n1g :: PR: #643 - build: Bump vLLM to address CVE by @ko3n1g :: PR: #644
- chore(beep boop 🤖): bump FW-CI-templates workflow pins to v0.88.0 by @svcnvidia-nemo-ci :: PR: #646
- Fix MLA model issues by @oyilmaz-nvidia :: PR: #647
- build: drop rc0 pre-release tag and add dynamic git versioning by @ko3n1g :: PR: #648
- build: Set trt-llm and vllm for 26.04 by @chtruong814 :: PR: #650
- Fix VLM no image inference issue and add tests by @meatybobby :: PR: #634
- docs: bump versions1.json to 0.5.0 (latest) by @ko3n1g :: PR: #655
- docs: add SECURITY.md by @chtruong814 :: PR: #659
- ci: add base_sha to codecov/codecov-action upload step by @ko3n1g :: PR: #660
- ci: build container once and share across downstream tests by @chtruong814 :: PR: #661
- Remove trt-llm by @oyilmaz-nvidia :: PR: #662
- Bump to vllm 0.20.1 and latest MBridge commit by @chtruong814 :: PR: #678
- fix: Pin flashinfer-python to 0.6.8.post1 by @chtruong814 :: PR: #679
- ci: Major refactor of release-workflows by @ko3n1g :: PR: #663
- ci: remove build-docs workflow by @ko3n1g :: PR: #680
- ci: validate release branch-rules by @ko3n1g :: PR: #683
- ci: Bump CI image to 26.04 pytorch and 0.20.1 vllm by @chtruong814 :: PR: #696
- fix: use eager attention for bidirectional ONNX export by @oliverholworthy :: PR: #698
- Fix tokenizer issue with chat template by @oyilmaz-nvidia :: PR: #697
- Be able to run individual tests by @oyilmaz-nvidia :: PR: #694
- beep boop 🤖: Bumping NeMo-Export-Deploy to v0.6.1 by @nemo-automation-bot[bot] :: PR: #707
- Set PATCH version to 0 in package_info.py by @balasaajay :: PR: #708
- Version bump to
0.6.0rc0.dev0(#642) by @github-actions[bot] - chore: bump
_code_freezeworkflow tov0.86.0(#643) by @ko3n1g - build: Bump vLLM to address CVE (#644) by @ko3n1g
- chore(beep boop 🤖): bump FW-CI-templates workflow pins to v0.88.0 (#646) by @svcnvidia-nemo-ci
- Fix MLA model issues (#647) by @oyilmaz-nvidia
- build: drop rc0 pre-release tag and add dynamic git versioning (#648) by @ko3n1g
- build: Set trt-llm and vllm for 26.04 (#650) by @chtruong814
- Fix VLM no image inference issue and add tests (#634) by @meatybobby
- docs: bump versions1.json to 0.5.0 (latest) (#655) by @ko3n1g
- docs: add SECURITY.md (#659) by @chtruong814
- ci: add base_sha to codecov/codecov-action upload step (#660) by @ko3n1g
- ci: build container once and share across downstream tests (#661) by @chtruong814
- Remove trt-llm (#662) by @oyilmaz-nvidia
- Bump to vllm 0.20.1 and latest MBridge commit (#678) by @chtruong814
- fix: Pin flashinfer-python to 0.6.8.post1 (#679) by @chtruong814
- ci: Major refactor of release-workflows (#663) by @ko3n1g
- ci: remove build-docs workflow (#680) by @ko3n1g
- ci: validate release branch-rules (#683) by @ko3n1g
- ci: Bump CI image to 26.04 pytorch and 0.20.1 vllm (#696) by @chtruong814
- fix: use eager attention for bidirectional ONNX export (#698) by @oliverholworthy
- Fix tokenizer issue with chat template (#697) by @oyilmaz-nvidia
- Be able to run individual tests (#694) by @oyilmaz-nvidia
- beep boop 🤖: Bumping NeMo-Export-Deploy to v0.6.1 [skip ci] by @github-actions[bot]
- Set PATCH version to 0 in package_info.py (#708) by @balasaajay
NVIDIA NeMo-Export-Deploy 0.5.0
Changelog Details
- Version bump to
0.5.0rc0.dev0by @github-actions[bot] :: PR: #580 - ci: Add secrets detector by @chtruong814 :: PR: #578
- Add apply_chat_template to HF vllm Ray deployment by @athitten :: PR: #581
- Onur/remove nemo2 trtllm support by @oyilmaz-nvidia :: PR: #576
- Remove MM trt-llm files for nemo2 by @oyilmaz-nvidia :: PR: #583
- ci: Adding to codeowners by @chtruong814 :: PR: #585
- Remove more nemo2 and unused code. by @oyilmaz-nvidia :: PR: #584
- docs: Remove uv sync with uv_args by @thomasdhc :: PR: #586
- Update to use latest MBridge by @chtruong814 :: PR: #589
- Add inference_max_seq_len to ray mbridge deployment path by @athitten :: PR: #588
- Remove nemo imports by @oyilmaz-nvidia :: PR: #594
- ci: Fix wheel build test and publish by @chtruong814 :: PR: #595
- ci: Re-enable onnx test by @chtruong814 :: PR: #597
- ci: Update release-docs workflow to use FW-CI-templates v0.72.0 by @chtruong814 :: PR: #599
- feat: Pass ETP and Sequence Parallel to inframework Ray deployment by @ko3n1g :: PR: #600
- ci: Update release workflows to include changelog and docs by @chtruong814 :: PR: #604
- build: Remove torchao by @chtruong814 :: PR: #606
- build: Upgrade vllm to 0.14.1 by @chtruong814 :: PR: #609
- Add support for stop_words in Ray MBridge deployment by @athitten :: PR: #605
- Add vllm docs for mbridge ckpt by @oyilmaz-nvidia :: PR: #573
- Docs update: remove nemo2 and fix import by @oyilmaz-nvidia :: PR: #608
- Update CI docker image and set vllm eager enforce_eager to False by @chtruong814 :: PR: #614
- Fix building doc and remove all nemo 2.0 docs by @oyilmaz-nvidia :: PR: #615
- Fix multimodal deployment sampling params by @meatybobby :: PR: #602
- docs: Enable nightly docs build on main branch by @chtruong814 :: PR: #619
- Set materialize_only_last_token_logits=False when log_probs = True by @athitten :: PR: #613
- ci: Add-credentials-for-docs by @ko3n1g :: PR: #623
- Fix release workflow reference by @chtruong814 :: PR: #625
- Fix mbridge inference for latest mbridge by @oyilmaz-nvidia :: PR: #627
- feat: Add support for batching of Ray Serve requests by @pthombre :: PR: #629
- Remove all nemo2 imports from old repo by @oyilmaz-nvidia :: PR: #628
- build: Bump export-deploy dependencies for 26.04 by @chtruong814 :: PR: #633
- Docs: remove vLLM install step from mbridge vllm quickstart by @oyilmaz-nvidia :: PR: #618
- Announce Python 3.12 migration by @ko3n1g :: PR: #630
- ci: Enable claude review by @thomasdhc :: PR: #635
- ci: Fix sso user check by @chtruong814 :: PR: #637
- chore: test FW-CI-templates ko3n1g/fix/linkcheck-retry-backoff by @ko3n1g :: PR: #638
- ci: upgrade GitHub Actions for Node.js 24 compatibility by @ko3n1g :: PR: #639
- Add legacy_model_format param by @oyilmaz-nvidia :: PR: #641
- chore: Move to Py3.12 by @ko3n1g :: PR: #631
- cp:
build: Bump vLLM to address CVE (644)intor0.5.0by @svcnvidia-nemo-ci :: PR: #645 - cp:
Fix MLA model issues (647)intor0.5.0by @svcnvidia-nemo-ci :: PR: #649 - cp:
build: Set trt-llm and vllm for 26.04 (650)intor0.5.0by @svcnvidia-nemo-ci :: PR: #651
NVIDIA NeMo-Export-Deploy 0.4.0
Highlights
- vLLM support for Megatron-Bridge LLM checkpoints.
- Remove NeMo 2.0 support.
- Deployment of Megatron-Bridge VLM checkpoints
Changelog Details
- Eval logprob benchmarks support for HF via vLLM with Ray by @athitten :: PR: #479
- feat: add labeler by @pablo-garay :: PR: #483
- Support apply_chat_template in NeMo MM in-framework deployment by @meatybobby :: PR: #440
- NeMo-Export-Deploy 0.2.1 changelog by @pablo-garay :: PR: #489
- Add torch_dtype and default values by @oyilmaz-nvidia :: PR: #466
- Fix max token input by @oyilmaz-nvidia :: PR: #478
- Remove scheduled cron job from release workflow by @pablo-garay :: PR: #494
- feat: Add anchor by @pablo-garay :: PR: #495
- [Eval] Fixes for compatibility between Pytriton, Ray deployments with nemo-run by @athitten :: PR: #501
- New script path by @oyilmaz-nvidia :: PR: #487
- Update trt-llm doc for nemo 2 by @oyilmaz-nvidia :: PR: #506
- Change type for --runtime_env in ray in-fw deployment script by @athitten :: PR: #505
- fix : New peft release adjust fix by @pablo-garay :: PR: #514
- fix: ensure vLLM receives valid params regardless of env changes by @pablo-garay :: PR: #516
- Fix minor doc issue by @oyilmaz-nvidia :: PR: #521
- Update changelog for release 0.3.0 by @oyilmaz-nvidia :: PR: #522
- Update nvidia-sphinx-theme by @chtruong814 :: PR: #528
- Update changelog for version 0.3.1 by @pablo-garay :: PR: #537
- Minor fixes for MBridge nemotron deployment by @athitten :: PR: #518
- docs: Update docs version to latest by @chtruong814 :: PR: #553
- docs: Fixing version1.json by @aschilling-nv :: PR: #554
- Properly Handle DynamicInferenceRequestRecord with latest Mcore by @chtruong814 :: PR: #559
- Add vllm support for mbridge by @oyilmaz-nvidia :: PR: #555
- Temp fix for k8s issue by @ko3n1g :: PR: #565
- ci: Enable AWS runners by @chtruong814 :: PR: #557
- docs: Release docs by @ko3n1g :: PR: #566
- Remove nemo from in-framework deployment by @oyilmaz-nvidia :: PR: #568
- Fix chat endpoint support for Ray in-framework MBridge deployment by @athitten :: PR: #572
- build: Update dependencies for 26.02 by @chtruong814 :: PR: #567
- Remove nemo2 vllm support by @oyilmaz-nvidia :: PR: #571
- Update multimodal in-framework FastAPI from NeMo to Megatron Bridge by @meatybobby :: PR: #511
- Fix chat endpoint support for HF deployment with Ray by @athitten :: PR: #575
- Add Ray Serve Deployment Support for Multimodal Models by @meatybobby :: PR: #574
- cp:
Add apply_chat_template to HF vllm Ray deployment (581)intor0.4.0by @ko3n1g :: PR: #582 - cp:
Remove more nemo2 and unused code. (584)intor0.4.0by @ko3n1g :: PR: #587 - cp:
docs: Remove uv sync with uv_args (586)intor0.4.0by @ko3n1g :: PR: #591 - cp:
Add inference_max_seq_len to ray mbridge deployment path (588)intor0.4.0by @ko3n1g :: PR: #593 - cp: Fix wheel build test and publish (#595) in r0.4.0 by @chtruong814 :: PR: #596
- cp: Re-enable onnx test (#597) in r0.4.0 by @chtruong814 :: PR: #598
- cp:
ci: Update release-docs workflow to use FW-CI-templates v0.72.0 (599)intor0.4.0by @ko3n1g :: PR: #601 - cp:
ci: Update release workflows to include changelog and docs (604)intor0.4.0by @ko3n1g :: PR: #607 - cp:
build: Remove torchao (606)intor0.4.0by @ko3n1g :: PR: #610 - cp: build: Upgrade vllm to 0.14.1 (#609) into r0.4.0 by @chtruong814 :: PR: #611
- docs: Update docs for 0.4.0 by @chtruong814 :: PR: #612
- cp:
Update CI docker image and set vllm eager enforce_eager to False (614)intor0.4.0by @svcnvidia-nemo-ci :: PR: #617 - docs: Update docs version for 0.4.0 release by @chtruong814 :: PR: #620
NVIDIA NeMo-Export-Deploy 0.3.1
NVIDIA NeMo-Export-Deploy 0.3.0
- Update TensorRT-LLM export to use NeMo->HF->TensorRT-LLM export path
- Add chat template support for VLM deployment.
- Bug fixes and folder name updates such as updating nlp to llm.
NVIDIA NeMo-Export-Deploy 0.2.1
NVIDIA NeMo-Export-Deploy 0.2.0
- MegatronLM and Megatron-Bridge model deployment support with Triton Inference Server and Ray Serve
- Multi-node multi-instance Ray Serve based deployment for NeMo 2, Megatron-Bridge, and Megatron-LM models.
- Update vLLM export to use NeMo->HF->vLLM export path
- Multi-Modal deployment for NeMo 2 models with Triton Inference Server
- NeMo Retriever Text Reranking ONNX and TensorRT export support
NVIDIA NeMo-Export-Deploy 0.2.0rc2
Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc2 (2025-08-18)
NVIDIA NeMo-Export-Deploy 0.1.1
ci: Mock DCO check Signed-off-by: oliver könig <okoenig@nvidia.com>
NVIDIA NeMo-Export-Deploy 0.2.0rc1
Prerelease: NVIDIA NeMo-Export-Deploy 0.2.0rc1 (2025-08-14)