Skip to content

MM-Speech/MMSpeech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 

Repository files navigation

MMSpeech

Tools and frameworks for multimodal speech generation and dialogue

🗣️ Speech Synthesis

🧨 [2026]SwanBench-Speech

SwanBench-Speech is a comprehensive benchmark designed to evaluate the performance of long-form speech generation models. SwanBench-Speech has three key properties.:

  1. Rich speech scenarios; 2)Comprehensive evaluation dimensions; 3) Valuable Insights

👏 [2025] DiTReducio

DiTReducio is a training-free acceleration framework that compresses computations in DiT-based TTS models through a progressive calibration process.

🔦 [2023] Make-An-Audio

Make-An-Audio is a prompt-enhanced diffusion model that addresses these gaps by 1) introducing pseudo prompt enhancement with a distill-then-reprogram approach; 2) leveraging spectrogram autoencoder to predict the self-supervised audio representation instead of waveforms.

Code: https://github.com/Text-to-Audio/Make-An-Audio

🔥 [2021] FastSpeech1&2

FastSpeech propose a novel feed-forward network based on Transformer to generate mel-spectrogram in parallel for TTS.

Code: https://github.com/ming024/FastSpeech2

👥 Spoken Dialogue

🎧 Spatial Audio

🔥 [2025] MRSAudio

MRSAudio is a large-scale multimodal spatial audio dataset designed to advance research in spatial audio understanding and generation. MRSAudio spans four distinct components: MRSLife, MRSSpeech, MRSMusic, and MRSSing, covering diverse real-world scenarios.

Code: https://github.com/MRSAudio/MRSAudio_Main

Star History

Star History Chart

About

Tools and frameworks for multimodal speech generation and dialogue

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors