Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions .claude/skills/babysit-pr/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,9 @@ If no PR number is clear, ask for it before proceeding.
### Step 1 — Get the full picture

```bash
gh pr view <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
gh pr checks <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
gh pr diff <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
gh pr view <PR_NUMBER> --repo NVIDIA-NeMo/Speech
gh pr checks <PR_NUMBER> --repo NVIDIA-NeMo/Speech
gh pr diff <PR_NUMBER> --repo NVIDIA-NeMo/Speech
```

Determine the current state:
Expand All @@ -47,9 +47,9 @@ The **"Isort and Black Formatting"** workflow (`reformat_with_isort_and_black` j
Check out the PR branch and inspect the failure logs:

```bash
gh pr checkout <PR_NUMBER> --repo NVIDIA-NeMo/NeMo
gh run list --repo NVIDIA-NeMo/NeMo --branch <branch-name>
gh run view <RUN_ID> --repo NVIDIA-NeMo/NeMo --log-failed
gh pr checkout <PR_NUMBER> --repo NVIDIA-NeMo/Speech
gh run list --repo NVIDIA-NeMo/Speech --branch <branch-name>
gh run view <RUN_ID> --repo NVIDIA-NeMo/Speech --log-failed
```

Before attempting a fix, check `git log` for recent commits. If you see a previous fix attempt that addressed the same failure and it is still failing, **stop and tell the user** — the issue needs human attention. Do not keep retrying the same fix.
Expand All @@ -67,7 +67,7 @@ git push
After pushing a fix, add the "Run CICD" label to re-trigger the CI pipeline:

```bash
gh pr edit <PR_NUMBER> --repo NVIDIA-NeMo/NeMo --add-label "Run CICD"
gh pr edit <PR_NUMBER> --repo NVIDIA-NeMo/Speech --add-label "Run CICD"
```

The "CICD NeMo" workflow is triggered by this label and removes it automatically when done.
Expand Down
6 changes: 3 additions & 3 deletions .claude/skills/fix-issue/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
name: fix-issue
description: Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/NeMo). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.
description: Fix a GitHub issue in NeMo Speech (NVIDIA-NeMo/Speech). Read the issue, reproduce the bug with a failing test, implement the fix, and verify tests pass. Only opens a PR if the user explicitly asks for it.
---

# fix-issue
Expand Down Expand Up @@ -28,7 +28,7 @@ Read the issue description carefully. Identify:

## Workflow

1. Read the issue: `gh issue view <ISSUE_NUMBER> --repo NVIDIA-NeMo/NeMo`
1. Read the issue: `gh issue view <ISSUE_NUMBER> --repo NVIDIA-NeMo/Speech`
2. Understand the bug — identify the relevant code
3. Write a minimal reproduction test in `tests/` that demonstrates the failure
4. Run the test to confirm it fails: `pytest <your_test_file> -v`
Expand All @@ -49,7 +49,7 @@ git checkout -b fix/<ISSUE_NUMBER>-<short-description>
git add <changed files>
git commit -s -m "Fix <short-description> (closes #<ISSUE_NUMBER>)"
git push origin fix/<ISSUE_NUMBER>-<short-description>
gh pr create --repo NVIDIA-NeMo/NeMo \
gh pr create --repo NVIDIA-NeMo/Speech \
--title "Fix <short-description>" \
--body "$(cat <<'EOF'
# What does this PR do ?
Expand Down
10 changes: 5 additions & 5 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
> [!IMPORTANT]
> [!IMPORTANT]
> The `Update branch` button must only be pressed in very rare occassions.
> An outdated branch is never blocking the merge of a PR.
> Please reach out to the automation team before pressing that button.
Expand All @@ -18,7 +18,7 @@ Add a one line overview of what this PR aims to accomplish.
- You can potentially add a usage example below

```python
# Add a code snippet demonstrating how to use this
# Add a code snippet demonstrating how to use this
```

# GitHub Actions CI
Expand All @@ -33,12 +33,12 @@ To run CI on an untrusted fork, a NeMo user with write access must first click "

**Pre checks**:

- [ ] Make sure you read and followed [Contributor guidelines](https://github.com/NVIDIA/NeMo/blob/main/CONTRIBUTING.md)
- [ ] Make sure you read and followed [Contributor guidelines](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md)
- [ ] Did you write any new necessary tests?
- [ ] Did you add or update any necessary documentation?
- [ ] Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- [ ] Reviewer: Does the PR have correct import guards for all optional libraries?

**PR Type**:

- [ ] New Feature
Expand All @@ -50,7 +50,7 @@ If you haven't finished some of the above items you can still open "Draft" PR.
## Who can review?

Anyone in the community is free to review the PR once the checks have passed.
[Contributor guidelines](https://github.com/NVIDIA/NeMo/blob/main/CONTRIBUTING.md) contains specific people who can review PRs to various areas.
[Contributor guidelines](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md) contains specific people who can review PRs to various areas.

# Additional Information

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/_build_container.yml
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ jobs:
build-args: |
IMAGE_LABEL=nemo-core
NEMO_TAG=${{ github.sha }}
NEMO_REPO=https://github.com/NVIDIA/NeMo
NEMO_REPO=https://github.com/NVIDIA-NeMo/Speech
PR_NUMBER=${{ github.event.pull_request.number || 0 }}
cache-from: |
type=registry,ref=${{ inputs.registry }}/nemo-speech:${{ inputs.image-name }}-buildcache-main,mode=max
Expand Down
12 changes: 6 additions & 6 deletions .github/workflows/mcore-tag-bump-bot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,15 +15,15 @@ jobs:
- name: Get release branch names
id: get-branch
run: |
latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/Megatron-LM.git 'refs/heads/core_r*' |
grep -o 'core_r[0-9]\+\.[0-9]\+\.[0-9]\+' |
sort -V |
latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/Megatron-LM.git 'refs/heads/core_r*' |
grep -o 'core_r[0-9]\+\.[0-9]\+\.[0-9]\+' |
sort -V |
tail -n1)
echo "mcore_release_branch=$latest_branch" >> $GITHUB_OUTPUT

latest_branch=$(git ls-remote --heads https://github.com/NVIDIA/NeMo.git 'refs/heads/r*' |
grep -o 'r[0-9]\+\.[0-9]\+\.[0-9]\+' |
sort -V |
latest_branch=$(git ls-remote --heads https://github.com/NVIDIA-NeMo/Speech.git 'refs/heads/r*' |
grep -o 'r[0-9]\+\.[0-9]\+\.[0-9]\+' |
sort -V |
tail -n1)
echo "nemo_release_branch=$latest_branch" >> $GITHUB_OUTPUT

Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/monitor-vms.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ jobs:
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/NVIDIA/NeMo/actions/runners)
https://api.github.com/repos/NVIDIA-NeMo/Speech/actions/runners)
MATRIX=$(echo $RUNNERS \
| jq -c '[
Expand Down
4 changes: 2 additions & 2 deletions CITATION.cff
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "NeMo: a toolkit for Conversational AI and Large Language Models"
url: https://nvidia.github.io/NeMo/
repository-code: https://github.com/NVIDIA/NeMo
repository-code: https://github.com/NVIDIA-NeMo/Speech
authors:
- family-names: Harper
given-names: Eric
Expand All @@ -16,7 +16,7 @@ authors:
given-names: Yang
- family-names: Bakhturina
given-names: Evelina
- family-names: Noroozi
- family-names: Noroozi
given-names: Vahid
- family-names: Subramanian
given-names: Sandeep
Expand Down
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[![Project Status: Active -- The project has reached a stable, usable state and is being actively developed.](http://www.repostatus.org/badges/latest/active.svg)](http://www.repostatus.org/#active)
[![Documentation](https://readthedocs.com/projects/nvidia-nemo/badge/?version=main)](https://docs.nvidia.com/nemo/speech/nightly/)
[![CodeQL](https://github.com/nvidia/nemo/actions/workflows/codeql.yml/badge.svg?branch=main&event=push)](https://github.com/nvidia/nemo/actions/workflows/codeql.yml)
[![NeMo core license and license for collections in this repo](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://github.com/NVIDIA/NeMo/blob/master/LICENSE)
[![CodeQL](https://github.com/NVIDIA-NeMo/Speech/actions/workflows/codeql.yml/badge.svg?branch=main&event=push)](https://github.com/NVIDIA-NeMo/Speech/actions/workflows/codeql.yml)
[![NeMo core license and license for collections in this repo](https://img.shields.io/badge/License-Apache%202.0-brightgreen.svg)](https://github.com/NVIDIA-NeMo/Speech/blob/master/LICENSE)
[![Release version](https://badge.fury.io/py/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
[![Python version](https://img.shields.io/pypi/pyversions/nemo-toolkit.svg)](https://badge.fury.io/py/nemo-toolkit)
[![PyPi total downloads](https://static.pepy.tech/personalized-badge/nemo-toolkit?period=total&units=international_system&left_color=grey&right_color=brightgreen&left_text=downloads)](https://pepy.tech/project/nemo-toolkit)
Expand All @@ -17,7 +17,7 @@ weight checkpoints and demos!
> For the latest stable released version, please use [the 26.02 NGC container](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nemo?version=26.02).

- 2026-06: [Nemotron-3.5-ASR-Streaming-0.6B](https://huggingface.co/nvidia/nemotron-3.5-asr-streaming-0.6b) has been released with 40 languages supported, controllable latency 80ms-1s, and 240-2400 1xH100 concurrent streams. Built on cache-aware Fastconformer architecture.
- 2026-04: [Parakeet-unified-en-0.6b](https://huggingface.co/nvidia/parakeet-unified-en-0.6b) has been released with high-quality offline and streaming (with a minimum latency of 160ms) inference in one model for English language with punctuation and capitalization support.
- 2026-04: [Parakeet-unified-en-0.6b](https://huggingface.co/nvidia/parakeet-unified-en-0.6b) has been released with high-quality offline and streaming (with a minimum latency of 160ms) inference in one model for English language with punctuation and capitalization support.
- 2026-03: [Nemotron 3 VoiceChat](https://build.nvidia.com/nvidia/nemotron-voicechat/modelcard) is now released in Early Access. Built on the Nemotron Nano v2 LLM backbone with Nemotron speech and TTS decoder, VoiceChat delivers full-duplex, natural, interruptible conversations with low latency. Try out [the demo](https://build.nvidia.com/nvidia/nemotron-voicechat) and apply for [early access](https://developer.nvidia.com/nemotron-voicechat-early-access).
- 2026-03: [Nemotron-Speech-Streaming v2603](https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b) has been
updated. It has been trained on a larger and more diverse corpus, resulting in lower WER across all latency modes.
Expand All @@ -31,7 +31,7 @@ weight checkpoints and demos!
on the latency-accuracy Pareto curve!
- 2026-01: MagpieTTS was released.
- 2026: This repo has pivoted to focus on audio, speech, and multimodal LLM. For the last NeMo release with support for more
modalities, see [v2.7.0](https://github.com/NVIDIA-NeMo/NeMo/releases/tag/v2.7.0)
modalities, see [v2.7.0](https://github.com/NVIDIA-NeMo/Speech/releases/tag/v2.7.0)
- 2025-08: [Parakeet V3](https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3) and
[Canary V2](https://huggingface.co/nvidia/canary-1b-v2) have been released with speech recognition and translation
support for 25 European languages.
Expand Down Expand Up @@ -77,7 +77,7 @@ The recommended way to install NeMo Speech is from source with [uv](https://docs
### From source with uv (recommended)

```bash
git clone https://github.com/NVIDIA-NeMo/NeMo.git
git clone https://github.com/NVIDIA-NeMo/Speech.git
cd NeMo
uv sync --extra all --extra cu13 # CUDA 13.x (recommended) — use --extra cu12 for CUDA 12.x
```
Expand All @@ -93,7 +93,7 @@ This installs our supported stack (Python 3.13, PyTorch 2.12, CUDA 13.2) into `.
To build the container from source (CUDA 13 / H100+ by default):

```bash
git clone https://github.com/NVIDIA-NeMo/NeMo.git
git clone https://github.com/NVIDIA-NeMo/Speech.git
cd NeMo
docker buildx build -f docker/Dockerfile -t nemo-speech . # CUDA 13 / H100+ (default)
docker run --rm -it --gpus all -v "$PWD:/workspace" nemo-speech bash
Expand Down Expand Up @@ -121,8 +121,8 @@ pip install 'nemo-toolkit[asr,tts,cu12]' --extra-index-url https://download.pyto
## Contribute to NeMo

We welcome community contributions! Please refer to
[CONTRIBUTING.md](https://github.com/NVIDIA-NeMo/NeMo/blob/main/CONTRIBUTING.md) for the process.
[CONTRIBUTING.md](https://github.com/NVIDIA-NeMo/Speech/blob/main/CONTRIBUTING.md) for the process.

## Licenses

NeMo is licensed under the [Apache License 2.0](https://github.com/NVIDIA/NeMo?tab=Apache-2.0-1-ov-file).
NeMo is licensed under the [Apache License 2.0](https://github.com/NVIDIA-NeMo/Speech?tab=Apache-2.0-1-ov-file).
2 changes: 1 addition & 1 deletion docs/source/apis.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ NeMo APIs

You can learn more about the underlying principles of the NeMo codebase in this section.

The `NeMo Toolkit codebase <https://github.com/NVIDIA/NeMo>`__ is composed of a `core <https://github.com/NVIDIA/NeMo/tree/main/nemo/core>`__ section which contains the main building blocks of the framework, and various `collections <https://github.com/NVIDIA/NeMo/tree/main/nemo/collections>`__ which help you
The `NeMo Toolkit codebase <https://github.com/NVIDIA-NeMo/Speech>`__ is composed of a `core <https://github.com/NVIDIA-NeMo/Speech/tree/main/nemo/core>`__ section which contains the main building blocks of the framework, and various `collections <https://github.com/NVIDIA-NeMo/Speech/tree/main/nemo/collections>`__ which help you
build specialized AI models.

You can learn more about aspects of the NeMo "core" by following the links below:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ N-gram Language Model Fusion
In this approach, an N-gram LM is trained on text data, then it is used in fusion with beam search decoding to find the
best candidates. The beam search decoders in NeMo support language models trained with KenLM library (
`https://github.com/kpu/kenlm <https://github.com/kpu/kenlm>`__).
The beam search decoders and KenLM library are not installed by default in NeMo.
The beam search decoders and KenLM library are not installed by default in NeMo.
You need to install them to be able to use beam search decoding and N-gram LM.
Please refer to `scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh>`__
Please refer to `scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/install_beamsearch_decoders.sh>`__
on how to install them. Alternatively, you can build Docker image
`scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA/NeMo/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.
`scripts/installers/Dockerfile.ngramtools <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/installers/Dockerfile.ngramtools>`__ with all the necessary dependencies.

Please, refer to :ref:`train-ngram-lm` for more details on how to train an N-gram LM using KenLM library.

Expand All @@ -31,7 +31,7 @@ Evaluate by Beam Search Decoding and N-gram LM

NeMo's beam search decoders are capable of using the KenLM's N-gram models to find the best candidates.
The script to evaluate an ASR model with beam search decoding and N-gram models can be found at
`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.
`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_ctc.py>`__.

This script has a large number of possible argument overrides; therefore, it is recommended that you use ``python eval_beamsearch_ngram_ctc.py --help`` to see the full list of arguments.

Expand Down Expand Up @@ -119,7 +119,7 @@ The width of the beam search (``--beam_width``) specifies the number of top cand
and ``pyctcdecode`` via the ``decoding`` subconfig.

To learn more about evaluating the ASR models with N-gram LM, refer to the tutorial here: Offline ASR Inference with Beam Search and External Language Model Rescoring
`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA/NeMo/blob/main/tutorials/asr/Offline_ASR.ipynb>`_
`Offline ASR Inference with Beam Search and External Language Model Rescoring <https://colab.research.google.com/github/NVIDIA-NeMo/Speech/blob/main/tutorials/asr/Offline_ASR.ipynb>`_

Beam Search Engines
-------------------
Expand Down Expand Up @@ -215,7 +215,7 @@ Beam Search ngram Decoding for Transducer Models (RNNT and HAT)
===============================================================

You can also find a similar script to evaluate an RNNT/HAT model with beam search decoding and N-gram models at:
`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py <https://github.com/NVIDIA/NeMo/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py>`_
`scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py <https://github.com/NVIDIA-NeMo/Speech/blob/stable/scripts/asr_language_modeling/ngram_lm/eval_beamsearch_ngram_transducer.py>`_

.. code-block::

Expand Down Expand Up @@ -244,14 +244,14 @@ Weighted Finite-State Transducers (WFST) are finite-state machines with input an
More precisely, WFST decoding is more of a greedy N-depth search with LM.
Thus, it is asymptotically worse than conventional beam search decoding algorithms, but faster.

**WARNING**
**WARNING**
At the moment, NeMo supports WFST decoding only for CTC models and word-based LMs.

To run WFST decoding in NeMo, one needs to provide a NeMo ASR model and either an ARPA LM or a WFST LM (advanced). An ARPA LM can be built from source text with KenLM as follows: ``<kenlm_bin_path>/lmplz -o <ngram_length> --arpa <out_arpa_path> --prune <ngram_prune>``.

The script to evaluate an ASR model with WFST decoding and N-gram models can be found at
`scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py
<https://github.com/NVIDIA/NeMo/blob/main/scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py>`__.
<https://github.com/NVIDIA-NeMo/Speech/blob/main/scripts/asr_language_modeling/ngram_lm/eval_wfst_decoding_ctc.py>`__.

This script has a large number of possible argument overrides, therefore it is advised to use ``python eval_wfst_decoding_ctc.py --help`` to see the full list of arguments.

Expand Down
Loading
Loading