Releases · RapidAI/RapidSpeech.cpp

17 Jun 03:16

lovemefan

v1.2.0

6929847

Release v1.2.0 Latest

Latest

🎉 New Features / 新增功能

CosyVoice3 TTS Engine / CosyVoice3 TTS 引擎

Full C++ port of CosyVoice3: core structures, Qwen2 LLM backbone, frontend, speaker module, multi-quantization adaptor
完整 C++ 移植 CosyVoice3：核心结构、Qwen2 LLM 主干、frontend、speaker 模块、多量化类型适配
Streaming online mode (rs-tts-online) with chunked pipeline
流式在线模式（rs-tts-online），支持分块流水线
QKV concat fusion for ~9% LM speedup
QKV 融合优化，LM 推理提速约 9%
Voice baking / reuse (--save-voice / --voice)
音色保存与复用（--save-voice / --voice）

Kokoro TTS / Kokoro TTS

Kokoro TTS engine with frontend (Chinese G2P + ITN) and frontend data tools
Kokoro TTS 引擎，含 frontend（中文 G2P + ITN）及数据工具

Other TTS / 其他 TTS

MeloTTS checkpoint loading support
支持 MeloTTS checkpoint 加载
WeText text normalization processor + cppjieba submodule integration
WeText 文本归一化处理器 + cppjieba 子模块集成

SenseVoice KWS / SenseVoice 关键词检测

Keyword spotting on SenseVoice: rs-kws CLI + core engine
基于 SenseVoice 的关键词检测：rs-kws CLI + 核心引擎

ASR Enhancements / ASR 增强

Speaker verification for offline ASR
离线 ASR 增加说话人验证
imatrix quantization for ASR models
ASR 模型支持 imatrix 量化
Expanded quantization type coverage across models
各模型扩展支持更多量化类型
Renamed asr-online → rs-asr-vad-online for consistency
重命名 asr-online 为 rs-asr-vad-online 以统一命名

CI / Release / CI 与发布

GitHub Actions release workflow for prebuilt binaries (Linux/Windows/macOS, CPU/CUDA/Metal)
GitHub Actions 自动构建发布二进制（Linux/Windows/macOS，CPU/CUDA/Metal）

🐛 Bug Fixes / Bug 修复

Python 3.12 segfault: removed py::call_guard<py::gil_scoped_release>() from push_audio (ASR+VAD) and synthesize to avoid crash on Python 3.12
Python 3.12 段错误：移除 push_audio（ASR/VAD）与 synthesize 上的 py::call_guard<py::gil_scoped_release>()，修复 Python 3.12 崩溃
VAD segfault and test failure
VAD 段错误与测试失败
CosyVoice3: apply output_norm before speech_lm_head projection
CosyVoice3：speech_lm_head 投影前需先应用 output_norm
SenseVoice load error & f16 forward error
SenseVoice 加载错误与 f16 前向计算错误
Windows STATUS_DLL_NOT_FOUND error
Windows STATUS_DLL_NOT_FOUND 错误
Compile errors on Windows / Linux CUDA
Windows 与 Linux CUDA 编译错误
Missing pthread link on Linux
Linux 缺失 pthread 链接
Python package build no longer triggers RS_BUILD_TESTS
Python 包构建不再触发 RS_BUILD_TESTS
AudioProcessor methods properly hidden in bindings
Python 绑定中正确隐藏 AudioProcessor 内部方法
Test script + CI binding test fixes (SKIP_RETURN_CODE 77)
测试脚本与 CI 绑定测试修复（SKIP_RETURN_CODE 77）

Assets 8

28 May 14:25

lovemefan

v1.1.0

758f682

Releases v1.1.0

📦 模型支持

OmniVoice TTS:28 层 Qwen3-0.6B + 32 步 MaskGIT 扩散 + DAC 声码器,支持参考音频克隆
OpenVoice2 TTS + MeloTTS 文本前端,支持中/英/日
SenseVoice 流式 ASR(2-pass 部分结果 + 终段重打分)
FireRedVAD(100 fps,EMA 平滑 + 自适应静音阈值);Silero VAD 升级到 v6

wasm 体验：
https://rapidai-rapidspeech-wasm.hf.space
https://rapidai-rapidspeech-wasm.ms.show

🚀 新特性

多语言绑定:Python(pybind11)、Node.js、WASM 三套 API 同步可用,覆盖 ASR / TTS / VAD
WebGPU 后端:原生 Dawn + 浏览器 emdawnwebgpu,WASM 也能跑 GPU 推理
DAC Metal 融合内核(macOS):dispatch 从 ~300 次压到 ~30 次,Apple Silicon 上声码器显著加速
imatrix 激活感知量化:新增 rs-imatrix 工具,低比特量化精度更稳
PyPI 多发行包:rapidspeech / rapidspeech-cuda / rapidspeech-metal,支持 cp39–cp313

⚡ 性能优化

OmniVoice 32 步扩散图复用,中间 buffer 提到循环外
8 个 codebook 的 mul_mat 在条件满足时融合为单次矩阵乘

🐛 Bug 修复

修复 OpenVoice2 相对位置注意力的 permute 轴 + rel_v gather 索引,Transformer 6 层 bit-exact 对齐 PyTorch
修复 OmniVoice fc.weight 转置 bug,贯通完整 TTS 管线
修复 Linux Python wheel RPATH/NEEDED 问题、Windows MSVC 编译(attribute((used))、UTF-8、M_PI)
修复在线 ASR 漏字、FunASR Q4_K_M 量化精度

Assets 6

27 Apr 10:38

lovemefan

v1.0.0

7ec1867

新增ASR 模型 FunASR-nano

📦 模型支持

新增 FunASR-nano（ASR），支持q3、q4、 q5、 q6、 q8

🚀 新特性

Flash Attention 支持：Qwen3 注意力层新增 ggml_flash_attn_ext 路径，通过 llm_cparams::flash_attn 参数控制开关（默认开启）。相比标准多头注意力实现，Flash Attention 在长序列上具有更好的内存局部性和计算效率。
量化工具 rs-quantize：新增模型量化命令行工具，支持 Q4_0、Q4_K、Q5_0、Q5_K、Q8_0 等多种量化格式。funasr-nano-fp16 (1955 MB) → Q4_K (596 MB)，压缩 3.3x。

⚡ 性能优化

Decode KV 只追加新列：每步仅回读最后 1 列 KV（~0.1MB），替代全量回读（~50MB/步）
Prefill 仅输出最后位置 logits：logits 传输从 242MB 降到 0.6MB
Embedding lookup 内联到 decode 图：去掉每步独立 mini embed 图，省去 36 次 alloc/sched_reset 开销
KV 持久化驻留 GPU：用 ggml_backend_alloc_ctx_tensors 在 GPU 端独立分配 KV buffer，省去每步 ~100MB CPU→GPU 全量上传

GPU 性能基准 (Apple M1 Pro, funasr-nano-fp16, 15s 音频)：

阶段	优化前	优化后	提升
Decoder	4.36s	3.45s	21%
RTF	0.205	0.170	17%

🐛 Bug 修复

Metal 后端 memcpy 静默失败：set_position_ids/set_input_data 改用 ggml_backend_tensor_set，修复 GPU tensor data 为 null 时的写入问题
KV output tensor buffer 未分配：Prefill 分支直接将 3D ggml_cont tensor 标记为 output，避免 scheduler 不分配独立 buffer 导致 abort
Causal mask shape：从 [n_tokens, n_kv] 改为 [n_kv, n_tokens]，修复 decode 阶段 n_tokens=1 时 broadcast 失败
Flash Attention mask 类型：ggml_flash_attn_ext 要求 F16 mask，自动将 F32 mask 转换为 F16
Flash Attention tensor 布局：permute q/k/v 从 [d_k, n_head, n_seq] 到 [d_k, n_seq, n_head]，匹配 ggml_flash_attn_ext 接口要求
CI 构建修复：Windows DLL 搜索路径、rs_log/load_wav_file hidden visibility 链接错误、PowerShell 2>/dev/null 兼容性

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

🎉 New Features / 新增功能

🐛 Bug Fixes / Bug 修复

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

📦 模型支持

🚀 新特性

⚡ 性能优化

🐛 Bug 修复

Uh oh!

Uh oh!

Releases: RapidAI/RapidSpeech.cpp

Release v1.2.0

🎉 New Features / 新增功能

🐛 Bug Fixes / Bug 修复

Uh oh!

Releases v1.1.0

Uh oh!

新增ASR 模型 FunASR-nano

📦 模型支持

🚀 新特性

⚡ 性能优化

🐛 Bug 修复

Uh oh!