KAME

KAME: TANDEM ARCHITECTURE FOR ENHANCING KNOWLEDGE IN REAL-TIME SPEECH-TO-SPEECH CONVERSATIONAL AI

KAME is a spoken dialogue system built on top of the Kyutai Moshi codebase. This repository keeps the Python inference stack needed for:

running KAME's oracle-guided dialogue server with a web UI
loading KAME-compatible kame modules from kame_finetune

The public-facing focus of this repository is the Python inference path around kame.server_oracle, while keeping the generic kame.server flow available for compatibility.

KAME running oracle-guided spoken dialogue with live browser interaction.

What KAME Adds

Compared with the upstream Moshi repository, KAME adds and maintains the oracle-guided dialogue path used for our experiments and demos. The primary entrypoint is:

python -m kame.server_oracle --help

or, after installing the package in editable mode:

kame-server-oracle --help

This server provides the KAME-specific inference path and serves a browser UI. If --static is not provided, the server can fetch static assets automatically. For compatibility, the generic Python server is also retained:

python -m kame.server --help

Runtime Notes

kame.server_oracle requires OPENAI_API_KEY.
ASR is enabled by default and uses Google Cloud Speech-to-Text. Set GOOGLE_APPLICATION_CREDENTIALS to a valid Google Cloud credential JSON file before starting the server.
The current oracle-guided server path is configured for English dialogue and ASR (en-US).
If --static is omitted, the browser UI assets are fetched automatically at startup.
kame.server_oracle sends conversation text to OpenAI Chat Completions.
If ASR is enabled, kame.server_oracle sends audio to Google Cloud Speech-to-Text.
kame.server_oracle currently supports only a single active WebSocket session at a time; concurrent sessions are rejected with 503 Server busy.
Plaintext local session logs are disabled by default. Enable them explicitly with --log-dir or MOSHI_LOG_DIR if you want to persist transcripts and token streams locally.

Repository Layout

The parts of this repository that matter for KAME are:

src/kame/: installable KAME Python package
src/kame/server_oracle.py: oracle-guided server entrypoint
src/kame/server.py: generic non-oracle server retained for compatibility
src/kame/models/: language model and checkpoint loading code used by kame_finetune

The published distribution name is kame-model, while the Python import namespace is kame. This repository now uses a standard root project layout with the package source under src/kame/.

Typical Usage

Run from the Hugging Face Checkpoint

The public checkpoint can be loaded directly from Hugging Face with --hf-repo. The package distribution name is kame-model, while the Python module name is kame.

uv init --bare --python 3.12
uv add "kame-model @ git+https://github.com/SakanaAI/kame.git@1a69ee29dbd201d400f841459d87871154881047"

export OPENAI_API_KEY=...
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/google-cloud-credentials.json

uv run python -m kame.server_oracle_parallel \
  --hf-repo SakanaAI/kame \
  --host 0.0.0.0 \
  --port 8998 \
  --device cuda

Then open http://localhost:8998.

kame-model is not published on PyPI yet, so the example above installs it directly from GitHub. For reproducible runs, pin a release tag or commit instead of installing from main.

Notes:

Python >=3.10 is supported; the command above uses Python 3.12 because it is the version used for verification.
OPENAI_API_KEY is required by kame.server_oracle.
ASR is enabled by default and requires Google Cloud Speech-to-Text. Before running the server, set up a Google Cloud project for Speech-to-Text and configure Application Default Credentials with GOOGLE_APPLICATION_CREDENTIALS.
For local smoke tests without Google Speech-to-Text, pass --no-enable-asr. This skips ASR and does not exercise the full oracle-guided spoken-dialogue path.
--config-path, --moshi-weight, --mimi-weight, and --tokenizer are not needed for the public Hugging Face checkpoint in the usual case.
config.json in the Hugging Face repo resolves the model weights, Mimi checkpoint, tokenizer, and optional generation settings.

Local Development

pip install -e .
python -m kame.server_oracle --help

kame_finetune can then depend on this repository directly from the repo root, for example via a local editable path dependency.

Scope

This repository is intentionally narrower than the original Moshi release. The main supported workflow is:

install the Python package from the repository root
run server_oracle.py for oracle-guided interactive inference, or server.py for the generic server path
use the same Python package as the kame-model dependency from kame_finetune

License

The kame-model Python package is distributed under the MIT License. This repository is derived from the Kyutai Moshi codebase and retains the relevant upstream license files and notices. Additional inherited notices, including LICENSE.audiocraft, are kept at the project root. Model weights and datasets, when distributed separately, may be subject to different license terms.

Attribution

KAME is derived from the Kyutai Moshi repository. We retain the original license files and attribution for the inherited codebase, and extend the Python inference stack with KAME-specific functionality.

Please keep the existing license files in this repository, including:

Citation

If you use KAME in your research, please cite:

@article{kuroki2025kame,
  title={KAME: Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI},
  author={Kuroki, So and Kubo, Yotaro and Akiba, Takuya and Tang, Yujin},
  journal={arXiv preprint arXiv:2510.02327},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github		.github
docs/assets		docs/assets
src/kame		src/kame
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
LICENSE-MIT		LICENSE-MIT
LICENSE.audiocraft		LICENSE.audiocraft
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KAME

What KAME Adds

Runtime Notes

Repository Layout

Typical Usage

Run from the Hugging Face Checkpoint

Local Development

Scope

License

Attribution

Citation

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

KAME

What KAME Adds

Runtime Notes

Repository Layout

Typical Usage

Run from the Hugging Face Checkpoint

Local Development

Scope

License

Attribution

Citation

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages