KAME: TANDEM ARCHITECTURE FOR ENHANCING KNOWLEDGE IN REAL-TIME SPEECH-TO-SPEECH CONVERSATIONAL AI
KAME Finetuning · Paper · Blog post
KAME is a spoken dialogue system built on top of the Kyutai Moshi codebase. This repository keeps the Python inference stack needed for:
- running KAME's oracle-guided dialogue server with a web UI
- loading KAME-compatible
kamemodules fromkame_finetune
The public-facing focus of this repository is the Python inference path around
kame.server_oracle, while keeping the generic kame.server flow available
for compatibility.
KAME running oracle-guided spoken dialogue with live browser interaction.
Compared with the upstream Moshi repository, KAME adds and maintains the oracle-guided dialogue path used for our experiments and demos. The primary entrypoint is:
python -m kame.server_oracle --helpor, after installing the package in editable mode:
kame-server-oracle --helpThis server provides the KAME-specific inference path and serves a browser UI.
If --static is not provided, the server can fetch static assets automatically.
For compatibility, the generic Python server is also retained:
python -m kame.server --helpkame.server_oraclerequiresOPENAI_API_KEY.- ASR is enabled by default and uses Google Cloud Speech-to-Text. Set
GOOGLE_APPLICATION_CREDENTIALSto a valid Google Cloud credential JSON file before starting the server. - The current oracle-guided server path is configured for English dialogue and ASR (
en-US). - If
--staticis omitted, the browser UI assets are fetched automatically at startup. kame.server_oraclesends conversation text to OpenAI Chat Completions.- If ASR is enabled,
kame.server_oraclesends audio to Google Cloud Speech-to-Text. kame.server_oraclecurrently supports only a single active WebSocket session at a time; concurrent sessions are rejected with503 Server busy.- Plaintext local session logs are disabled by default. Enable them explicitly with
--log-dirorMOSHI_LOG_DIRif you want to persist transcripts and token streams locally.
The parts of this repository that matter for KAME are:
src/kame/: installable KAME Python packagesrc/kame/server_oracle.py: oracle-guided server entrypointsrc/kame/server.py: generic non-oracle server retained for compatibilitysrc/kame/models/: language model and checkpoint loading code used bykame_finetune
The published distribution name is kame-model, while the Python import
namespace is kame. This repository now uses a standard root project layout
with the package source under src/kame/.
The public checkpoint can be loaded directly from Hugging Face with
--hf-repo. The package distribution name is kame-model, while the Python
module name is kame.
uv init --bare --python 3.12
uv add "kame-model @ git+https://github.com/SakanaAI/kame.git@1a69ee29dbd201d400f841459d87871154881047"
export OPENAI_API_KEY=...
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-cloud-credentials.json
uv run python -m kame.server_oracle_parallel \
--hf-repo SakanaAI/kame \
--host 0.0.0.0 \
--port 8998 \
--device cudaThen open http://localhost:8998.
kame-model is not published on PyPI yet, so the example above installs it
directly from GitHub. For reproducible runs, pin a release tag or commit instead
of installing from main.
Notes:
- Python
>=3.10is supported; the command above uses Python 3.12 because it is the version used for verification. OPENAI_API_KEYis required bykame.server_oracle.- ASR is enabled by default and requires Google Cloud Speech-to-Text. Before
running the server, set up a Google Cloud project for
Speech-to-Text and
configure
Application Default Credentials
with
GOOGLE_APPLICATION_CREDENTIALS. - For local smoke tests without Google Speech-to-Text, pass
--no-enable-asr. This skips ASR and does not exercise the full oracle-guided spoken-dialogue path. --config-path,--moshi-weight,--mimi-weight, and--tokenizerare not needed for the public Hugging Face checkpoint in the usual case.config.jsonin the Hugging Face repo resolves the model weights, Mimi checkpoint, tokenizer, and optional generation settings.
pip install -e .
python -m kame.server_oracle --helpkame_finetune can then depend on this repository directly from the repo root,
for example via a local editable path dependency.
This repository is intentionally narrower than the original Moshi release. The main supported workflow is:
- install the Python package from the repository root
- run
server_oracle.pyfor oracle-guided interactive inference, orserver.pyfor the generic server path - use the same Python package as the
kame-modeldependency fromkame_finetune
The kame-model Python package is distributed under the MIT License.
This repository is derived from the Kyutai Moshi codebase and retains the
relevant upstream license files and notices. Additional inherited notices,
including LICENSE.audiocraft, are kept at the project
root. Model weights and datasets, when distributed separately, may be subject to
different license terms.
KAME is derived from the Kyutai Moshi repository. We retain the original license files and attribution for the inherited codebase, and extend the Python inference stack with KAME-specific functionality.
Please keep the existing license files in this repository, including:
If you use KAME in your research, please cite:
@article{kuroki2025kame,
title={KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI},
author={Kuroki, So and Kubo, Yotaro and Akiba, Takuya and Tang, Yujin},
journal={arXiv preprint arXiv:2510.02327},
year={2025}
}