Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
-
Updated
May 10, 2026
Collection of resources on the applications of Large Language Models (LLMs) in Audio AI.
Deep Learning for Speech
FastLongSpeech is a novel framework designed to extend the capabilities of Large Speech-Language Models for efficient long-speech processing without necessitating dedicated long-speech training data.
The code for AAAI 2025 “Large Language Models Are Read/Write Policy-Makers for Simultaneous Generation”
Federated learning for Speech LLMs (WavLM+TinyLlama & Voxtral) with Flower and PyTorch Lightning. LoRA fine-tuning across MLS clients with FedAvg/FedProx, IID and speaker-based partitioning, per-round LR decay, W&B logging, and multi-GPU Ray simulation. Only adapter weights are shared — raw audio never leaves the client
Provide Whisper-based audio transcription and translation with lightweight C++ libraries for easy integration into LLM projects.
Add a description, image, and links to the speech-llms topic page so that developers can more easily learn about it.
To associate your repository with the speech-llms topic, visit your repo's landing page and select "manage topics."