NVIDIA-NeMo · blisc · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026 · Jun 25, 2026
diff --git a/.claude/skills/babysit-pr/SKILL.md b/.claude/skills/babysit-pr/SKILL.md
@@ -21,9 +21,9 @@ If no PR number is clear, ask for it before proceeding.
 ### Step 1 — Get the full picture
 
 ```bash
-gh pr view <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
-gh pr checks <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
-gh pr diff <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
+gh pr view <PR_NUMBER> --repo NVIDIA-NeMo/Speech
+gh pr checks <PR_NUMBER> --repo NVIDIA-NeMo/Speech
+gh pr diff <PR_NUMBER> --repo NVIDIA-NeMo/Speech
 ```
 
 Determine the current state:
@@ -47,9 +47,9 @@ The **"Isort and Black Formatting"** workflow (`reformat_with_isort_and_black` j
 Check out the PR branch and inspect the failure logs:
 
 ```bash
-gh pr checkout <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
-gh run list --repo NVIDIA-NeMo/NeMo --branch <branch-name>
-gh run view <RUN_ID> --repo NVIDIA-NeMo/NeMo --log-failed
+gh pr checkout <PR_NUMBER> --repo NVIDIA-NeMo/Speech
+gh run list --repo NVIDIA-NeMo/Speech --branch <branch-name>
+gh run view <RUN_ID> --repo NVIDIA-NeMo/Speech --log-failed
 ```
 
 Before attempting a fix, check `git log` for recent commits. If you see a previous fix attempt that addressed the same failure and it is still failing, **stop and tell the user** — the issue needs human attention. Do not keep retrying the same fix.
@@ -67,7 +67,7 @@ git push
 After pushing a fix, add the "Run CICD" label to re-trigger the CI pipeline:
 
 ```bash
-gh pr edit <PR_NUMBER> --repo NVIDIA-NeMo/NeMo --add-label "Run CICD"
+gh pr edit <PR_NUMBER> --repo NVIDIA-NeMo/Speech --add-label "Run CICD"
 ```
 
 The "CICD NeMo" workflow is triggered by this label and removes it automatically when done.

diff --git a/.claude/skills/fix-issue/SKILL.md b/.claude/skills/fix-issue/SKILL.md
@@ -1,6 +1,6 @@
 ---
 name: fix-issue
-description: Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/NeMo). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.
+description: Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/Speech). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.
 ---
 
 # fix-issue
@@ -28,7 +28,7 @@ Read the issue description carefully. Identify:
 
 ## Workflow
 
-1. Read the issue: `gh issue view <ISSUE_NUMBER> --repo NVIDIA-NeMo/NeMo`
+1. Read the issue: `gh issue view <ISSUE_NUMBER> --repo NVIDIA-NeMo/Speech`
 2. Understand the bug — identify the relevant code
 3. Write a minimal reproduction test in `tests/` that demonstrates the failure
 4. Run the test to confirm it fails: `pytest <your_test_file> -v`
@@ -49,7 +49,7 @@ git checkout -b fix/<ISSUE_NUMBER>-<short-description>
 git add <changed files>
 git commit -s -m "Fix <short-description> (closes #<ISSUE_NUMBER>)"
 git push origin fix/<ISSUE_NUMBER>-<short-description>
-gh pr create --repo NVIDIA-NeMo/NeMo \
+gh pr create --repo NVIDIA-NeMo/Speech \
   --title "Fix <short-description>" \
   --body "$(cat <<'EOF'
 # What does this PR do ?

@@ -1,4 +1,4 @@
-> [!IMPORTANT]  
+> [!IMPORTANT]
 > The `Update branch` button must only be pressed in very rare occassions.
 > An outdated branch is never blocking the merge of a PR.
 > Please reach out to the automation team before pressing that button.
@@ -18,7 +18,7 @@ Add a one line overview of what this PR aims to accomplish.
 - You can potentially add a usage example below
 
 ```python
-# Add a code snippet demonstrating how to use this 
+# Add a code snippet demonstrating how to use this
 ```
 
 # GitHub Actions CI
@@ -33,12 +33,12 @@ To run CI on an untrusted fork, a NeMo user with write access must first click "
 
 **Pre checks**:
 
-- [ ] Make sure you read and followed [Contributor guidelines](https://github.com/NVIDIA/NeMo/blob/main/CONTRIBUTING.md)
+- [ ] Make sure you read and followed [Contributor guidelines](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md)
 - [ ] Did you write any new necessary tests?
 - [ ] Did you add or update any necessary documentation?
 - [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
   - [ ] Reviewer: Does the PR have correct import guards for all optional libraries?
-  
+
 **PR Type**:
 
 - [ ] New Feature
@@ -50,7 +50,7 @@ If you haven't finished some of the above items you can still open "Draft" PR.
 ## Who can review?
 
 Anyone in the community is free to review the PR once the checks have passed.
-[Contributor guidelines](https://github.com/NVIDIA/NeMo/blob/main/CONTRIBUTING.md) contains specific people who can review PRs to various areas.
+[Contributor guidelines](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md) contains specific people who can review PRs to various areas.
 
 # Additional Information
 

@@ -99,7 +99,7 @@ jobs:
           build-args: |
             IMAGE_LABEL=nemo-core
             NEMO_TAG=${{ github.sha }}
-            NEMO_REPO=https://github.com/NVIDIA/NeMo
+            NEMO_REPO=https://github.com/NVIDIA-NeMo/Speech
             PR_NUMBER=${{ github.event.pull_request.number || 0 }}
           cache-from: |
             type=registry,ref=${{ inputs.registry }}/nemo-speech:${{ inputs.image-name }}-buildcache-main,mode=max

@@ -15,15 +15,15 @@ jobs:
       - name: Get release branch names
         id: get-branch
         run: |
-          latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/Megatron-LM.git 'refs/heads/core_r*' | 
-            grep -o 'core_r[0-9]\+\.[0-9]\+\.[0-9]\+' | 
-            sort -V | 
+          latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/Megatron-LM.git 'refs/heads/core_r*' |
+            grep -o 'core_r[0-9]\+\.[0-9]\+\.[0-9]\+' |
+            sort -V |
             tail -n1)
           echo "mcore_release_branch=$latest_branch" >> $GITHUB_OUTPUT
 
-          latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/NeMo.git 'refs/heads/r*' | 
-            grep -o 'r[0-9]\+\.[0-9]\+\.[0-9]\+' | 
-            sort -V | 
+          latest_branch=$(git ls-remote --heads https://github.com/NVIDIA-NeMo/Speech.git 'refs/heads/r*' |
+            grep -o 'r[0-9]\+\.[0-9]\+\.[0-9]\+' |
+            sort -V |
             tail -n1)
           echo "nemo_release_branch=$latest_branch" >> $GITHUB_OUTPUT
 

@@ -22,7 +22,7 @@ jobs:
             -H "Accept: application/vnd.github+json" \
             -H "Authorization: Bearer $GITHUB_TOKEN" \
             -H "X-GitHub-Api-Version: 2022-11-28" \
-            https://api.github.com/repos/NVIDIA/NeMo/actions/runners)
+            https://api.github.com/repos/NVIDIA-NeMo/Speech/actions/runners)
 
           MATRIX=$(echo $RUNNERS \
             | jq -c '[

diff --git a/CITATION.cff b/CITATION.cff
@@ -2,7 +2,7 @@ cff-version: 1.2.0
 message: "If you use this software, please cite it as below."
 title: "NeMo: a toolkit for Conversational AI and Large Language Models"
 url: https://nvidia.github.io/NeMo/
-repository-code: https://github.com/NVIDIA/NeMo
+repository-code: https://github.com/NVIDIA-NeMo/Speech
 authors:
   - family-names: Harper
     given-names: Eric
@@ -16,7 +16,7 @@ authors:
     given-names: Yang
   - family-names: Bakhturina
     given-names: Evelina
-  - family-names: Noroozi 
+  - family-names: Noroozi
     given-names: Vahid
   - family-names: Subramanian
     given-names: Sandeep

diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 [![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
 [![Documentation](https://readthedocs.com/projects/nvidia-nemo/badge/?version=main)](https://docs.nvidia.com/nemo/speech/nightly/)
-[![CodeQL](https://github.com/nvidia/nemo/actions/workflows/codeql.yml/badge.svg?branch=main&event=push)](https://github.com/nvidia/nemo/actions/workflows/codeql.yml)
-[![NeMo core license and license for collections in this repo](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://github.com/NVIDIA/NeMo/blob/master/LICENSE)
+[![CodeQL](https://github.com/NVIDIA-NeMo/Speech/actions/workflows/codeql.yml/badge.svg?branch=main&event=push)](https://github.com/NVIDIA-NeMo/Speech/actions/workflows/codeql.yml)
+[![NeMo core license and license for collections in this repo](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://github.com/NVIDIA-NeMo/Speech/blob/master/LICENSE)
 [![Release version](https://badge.fury.io/py/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
 [![Python version](https://img.shields.io/pypi/pyversions/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
 [![PyPi total downloads](https://static.pepy.tech/personalized-badge/nemo-toolkit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)](https://pepy.tech/project/nemo-toolkit)
@@ -17,7 +17,7 @@ weight checkpoints and demos!
 > For the latest stable released version, please use [the 26.02 NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02).
 
 - 2026-06: [Nemotron-3.5-ASR-Streaming-0.6B](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) has been released with 40 languages supported, controllable latency 80ms-1s, and 240-2400 1xH100 concurrent streams. Built on cache-aware Fastconformer architecture.
-- 2026-04: [Parakeet-unified-en-0.6b](https://huggingface.co/nvidia/parakeet-unified-en-0.6b) has been released with high-quality offline and streaming (with a minimum latency of 160ms) inference in one model for English language with punctuation and capitalization support. 
+- 2026-04: [Parakeet-unified-en-0.6b](https://huggingface.co/nvidia/parakeet-unified-en-0.6b) has been released with high-quality offline and streaming (with a minimum latency of 160ms) inference in one model for English language with punctuation and capitalization support.
 - 2026-03: [Nemotron 3 VoiceChat](https://build.nvidia.com/nvidia/nemotron-voicechat/modelcard) is now released in Early Access. Built on the Nemotron Nano v2 LLM backbone with Nemotron speech and TTS decoder, VoiceChat delivers full-duplex, natural, interruptible conversations with low latency. Try out [the demo](https://build.nvidia.com/nvidia/nemotron-voicechat) and apply for [early access](https://developer.nvidia.com/nemotron-voicechat-early-access).
 - 2026-03: [Nemotron-Speech-Streaming v2603](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) has been
     updated. It has been trained on a larger and more diverse corpus, resulting in lower WER across all latency modes.
@@ -31,7 +31,7 @@ weight checkpoints and demos!
     on the latency-accuracy Pareto curve!
 - 2026-01: MagpieTTS was released.
 - 2026: This repo has pivoted to focus on audio, speech, and multimodal LLM. For the last NeMo release with support for more
-    modalities, see [v2.7.0](https://github.com/NVIDIA-NeMo/NeMo/releases/tag/v2.7.0)
+    modalities, see [v2.7.0](https://github.com/NVIDIA-NeMo/Speech/releases/tag/v2.7.0)
 - 2025-08: [Parakeet V3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) and
     [Canary V2](https://huggingface.co/nvidia/canary-1b-v2) have been released with speech recognition and translation
     support for 25 European languages.
@@ -77,7 +77,7 @@ The recommended way to install NeMo Speech is from source with [uv](https://docs
 ### From source with uv (recommended)
 
 ```bash
-git clone https://github.com/NVIDIA-NeMo/NeMo.git
+git clone https://github.com/NVIDIA-NeMo/Speech.git
 cd NeMo
 uv sync --extra all --extra cu13     # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
 ```
@@ -93,7 +93,7 @@ This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.
 To build the container from source (CUDA 13 / H100+ by default):
 
 ```bash
-git clone https://github.com/NVIDIA-NeMo/NeMo.git
+git clone https://github.com/NVIDIA-NeMo/Speech.git
 cd NeMo
 docker buildx build -f docker/Dockerfile -t nemo-speech .          # CUDA 13 / H100+ (default)
 docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
@@ -121,8 +121,8 @@ pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pyto
 ## Contribute to NeMo
 
 We welcome community contributions! Please refer to
-[CONTRIBUTING.md](https://github.com/NVIDIA-NeMo/NeMo/blob/main/CONTRIBUTING.md) for the process.
+[CONTRIBUTING.md](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md) for the process.
 
 ## Licenses
 
-NeMo is licensed under the [Apache License 2.0](https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file).
+NeMo is licensed under the [Apache License 2.0](https://github.com/NVIDIA-NeMo/Speech?tab=Apache-2.0-1-ov-file).
diff --git a/docs/source/apis.rst b/docs/source/apis.rst
@@ -6,7 +6,7 @@ NeMo APIs
 
 You can learn more about the underlying principles of the NeMo codebase in this section.
 
-The `NeMo Toolkit codebase <https://github.com/NVIDIA/NeMo>`__ is composed of a `core <https://github.com/NVIDIA/NeMo/tree/main/nemo/core>`__ section which contains the main building blocks of the framework, and various `collections <https://github.com/NVIDIA/NeMo/tree/main/nemo/collections>`__ which help you
+The `NeMo Toolkit codebase <https://github.com/NVIDIA-NeMo/Speech>`__ is composed of a `core <https://github.com/NVIDIA-NeMo/Speech/tree/main/nemo/core>`__ section which contains the main building blocks of the framework, and various `collections <https://github.com/NVIDIA-NeMo/Speech/tree/main/nemo/collections>`__ which help you
 build specialized AI models.
 
 You can learn more about aspects of the NeMo "core" by following the links below:

diff --git a/docs/source/asr/asr_customization/legacy_language_modeling_and_customization.rst b/docs/source/asr/asr_customization/legacy_language_modeling_and_customization.rst
@@ -7,11 +7,11 @@ N-gram Language Model Fusion
 In this approach, an N-gram LM is trained on text data, then it is used in fusion with beam search decoding to find the
 best candidates. The beam search decoders in NeMo support language models trained with KenLM library (
 `https://github.com/kpu/kenlm <https://github.com/kpu/kenlm>`__).
-The beam search decoders and KenLM library are not installed by default in NeMo. 
+The beam search decoders and KenLM library are not installed by default in NeMo.
 You need to install them to be able to use beam search decoding and N-gram LM.
-Please refer to `scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh>`__
+Please refer to `scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh>`__
 on how to install them. Alternatively, you can build Docker image
-`scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA/NeMo/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.
+`scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.
 
 Please, refer to :ref:`train-ngram-lm` for more details on how to train an N-gram LM using KenLM library.
 
@@ -31,7 +31,7 @@ Evaluate by Beam Search Decoding and N-gram LM
 
 NeMo's beam search decoders are capable of using the KenLM's N-gram models to find the best candidates.
 The script to evaluate an ASR model with beam search decoding and N-gram models can be found at
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
+`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
 
 This script has a large number of possible argument overrides; therefore, it is recommended that you use ``python eval_beamsearch_ngram_ctc.py --help`` to see the full list of arguments.
 
@@ -119,7 +119,7 @@ The width of the beam search (``--beam_width``) specifies the number of top cand
     and ``pyctcdecode`` via the ``decoding`` subconfig.
 
 To learn more about evaluating the ASR models with N-gram LM, refer to the tutorial here: Offline ASR Inference with Beam Search and External Language Model Rescoring
-`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/asr/Offline_ASR.ipynb>`_
+`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA-NeMo/Speech/blob/main/tutorials/asr/Offline_ASR.ipynb>`_
 
 Beam Search Engines
 -------------------
@@ -215,7 +215,7 @@ Beam Search ngram Decoding for Transducer Models (RNNT and HAT)
 ===============================================================
 
 You can also find a similar script to evaluate an RNNT/HAT model with beam search decoding and N-gram models at:
-`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py>`_
+`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py>`_
 
 .. code-block::
 
@@ -244,14 +244,14 @@ Weighted Finite-State Transducers (WFST) are finite-state machines with input an
     More precisely, WFST decoding is more of a greedy N-depth search with LM.
     Thus, it is asymptotically worse than conventional beam search decoding algorithms, but faster.
 
-**WARNING**  
+**WARNING**
 At the moment, NeMo supports WFST decoding only for CTC models and word-based LMs.
 
 To run WFST decoding in NeMo, one needs to provide a NeMo ASR model and either an ARPA LM or a WFST LM (advanced). An ARPA LM can be built from source text with KenLM as follows: ``<kenlm_bin_path>/lmplz -o <ngram_length> --arpa <out_arpa_path> --prune <ngram_prune>``.
 
 The script to evaluate an ASR model with WFST decoding and N-gram models can be found at
 `scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py
-<https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py>`__.
+<https://github.com/NVIDIA-NeMo/Speech/blob/main/scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py>`__.
 
 This script has a large number of possible argument overrides, therefore it is advised to use ``python eval_wfst_decoding_ctc.py --help`` to see the full list of arguments.