fix(asr): clamp diarization cluster count to max_num_speakers by vprosoho · Pull Request #15835 · NVIDIA-NeMo/Speech

vprosoho · 2026-06-26T20:35:51Z

What does this PR do ?

Small fix to speaker over-counting in clustering diarization for short sessions: the final number of clusters could exceed the configured max_num_speakers.

Collection: ASR (speaker diarization / clustering)

Changelog

nemo/collections/asr/parts/utils/offline_clustering.py: in SpeakerClustering.forward_unit_infer, limit the chosen cluster count with n_clusters = min(n_clusters, max_num_speakers).
tests/collections/speaker_tasks/utils/test_diar_utils.py: add test_offline_speaker_clustering_enhanced_count_respects_max_num_speakers_cpu, unit test for verifying with count larger than max_num_speakers.

Usage

No usage change. Behavior is the same as before, just fixes the problem.

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Hopefully these are right CCs from what I can tell by looking at git...
cc @tango4j & @nithinraok
Apologies if not!

Additional Information

For short sessions, SpeakerClustering.forward_infer estimates the speaker count via getEnhancedSpeakerCount(), which constructs NMESC with max_num_speakers=emb.shape[0] (the number of embedding segments) instead of the configured max_num_speakers. The resulting est_num_of_spk_enhanced is then consumed in forward_unit_infer without re-applying the limit, so a short audio file can be clustered into more speakers than max_num_speakers allows. Clamp n_clusters to max_num_speakers after the speaker count is selected. This is a no-op for the oracle and standard NME estimation paths (both already bounded by max_num_speakers) and fixes the over-counting that can occur on the enhanced-count path. Signed-off-by: Vadym Prokopov <vprokopov@sohosquared.com>

getEnhancedSpeakerCount estimates the speaker count with max_num_speakers=emb.shape[0], so for short sessions est_num_of_spk_enhanced can exceed the requested max_num_speakers. Add a CPU unit test that calls SpeakerClustering.forward_unit_infer with an enhanced count larger than max_num_speakers and asserts the number of output clusters is capped at max_num_speakers. Fails before the clamp fix (returns 8 clusters), passes after (capped at 2/3). Signed-off-by: Vadym Prokopov <vprokopov@sohosquared.com>

copy-pr-bot · 2026-06-26T20:35:54Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

vprosoho added 2 commits June 26, 2026 11:39

github-actions Bot added ASR community-request labels Jun 26, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(asr): clamp diarization cluster count to max_num_speakers#15835

fix(asr): clamp diarization cluster count to max_num_speakers#15835
vprosoho wants to merge 2 commits into
NVIDIA-NeMo:mainfrom
vprosoho:fix/diarization-clustering-respect-max-num-speakers

vprosoho commented Jun 26, 2026

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

vprosoho commented Jun 26, 2026

What does this PR do ?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

copy-pr-bot Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants