Skip to content

Kame696/kame-api-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

\\ ~ 🐒⚑ Key-Aware Management Engine ⚑🐒 ~ // (API Rotation Plugin) for Agent Zero

KAME β€” the learning carousel that keeps your AI agent alive

Version License: MIT Agent Zero Python Status GitHub stars


❀️ Support the project

If KAME saved you from a rate-limit hell, consider a tip:

Bitcoin β€” 36BGYhMEVFgY8PLGMVux93pjGt92KVM6dJ

Every sat helps me keep this project alive and learning.


KAME banner

4P1 R0T4T10N β€” 4FRE3D0M


🎯 What is KAME?

KAME is what API rotation should have been.

Round-robin libraries cycle keys blindly. They keep banging on a key that just hit a 429 because they have no memory. They have no idea which key has capacity left. They retry through dead keys and call it "resilience."

KAME does the opposite of every assumption round-robin makes:

🧠 Learns from every 429

Parses the provider's own retry-delay and respects it to the second on per-minute limits. On a daily quota it knows not to trust a misleadingly short delay β€” it cools that key for a real cooldown instead.

No guessing. No fixed backoff. Per-minute or daily, on any provider, KAME does the right thing.

🎯 Picks the right key, every time

A 60-second sliding window tracks each key's recent activity. KAME selects the key with the most remaining capacity, not just the next one in line.

LRU tie-break ensures even spreading across the pool.

πŸ’€ Sleeps intelligently when keys cool

When the entire pool is temporarily exhausted, KAME reads the soonest recovery time and sleeps until then (capped at 60s, re-checking after) β€” instead of burning wasted 429 requests that prolong the cooldown.

Production-proven: 45 wasted requests in v0.5.7.x β†’ 0 in v1.0.0.

🀝 Trusts the connection

Zero artificial timeouts. If the API accepts your request without error, KAME waits patiently for it to finish β€” even if it's a 90,000-token compression that takes 90 seconds.

No death-loops on slow models. No "timed out" crashes during legitimate work.

πŸ₯· Stays invisible

Every wait includes random.uniform(0.1, 1.5) seconds of jitter. No two waits are identical. Anti-bot detection systems can't fingerprint KAME, and multi-client deployments never sync-collide on the same recovery moment.

KAME doesn't look like a bot. KAME looks like a thoughtful human who took a coffee break at exactly the right time.

You give it a comma-separated list of API keys. It gives you an agent that never stops.


πŸ“ˆ Production validation (real log, May 2026)

You don't have to take my word for it. Here's a single day of intensive Agent Zero usage:

Metric Value
KAME-managed operations 1,163
Rate limit (429) events encountered 117
Rate limits resolved by rotation alone 116 (99.1%)
Pool-fully-sick events requiring sleep 1
False pulses (wasted retries against sick keys) 0
KAME engine crashes 0
Pool state "healthy" during operations ~99%

The single sleep event tells the whole story:

  • KAME predicted wake: 18:09:00
  • KAME actual wake: 18:09:00.291
  • Off by: 291 milliseconds (the random jitter)
  • After waking: picked the recovered key in 0.08ms, request succeeded

That's what production-validated means: predictions accurate within the jitter window, zero crashes, zero wasted requests.

v1.0.1 update β€” surviving a daily-quota storm (May 29, 2026)

A second real-world run, on a 15-key Gemini pool, hit the exact failure mode v1.0.1 was built for: a wave of daily-quota exhaustion that eventually took the entire pool cold at once. Here is KAME's own session summary at the 100-operation mark:

[KAME] Session: 100 ok Β· 15 limited (min 0, daily 15, quota 0) Β· 1 long-sleep Β· 11 server Β· 0 timeout Β· 0 auth Β· 0 other
Metric Value
Operations completed 100
Rate limits classified as daily (correctly) 15 / 15 β€” the v1.0.1 fix
Rate limits mis-classified as per-minute (the v1.0.0 bug) 0
Auth / timeout / unknown errors 0
KAME engine crashes 0

When all 15 keys went cold, KAME announced the outage once, then slept quietly (no API calls) and woke within seconds of the first recovery:

  • Announced once: "All keys cooling β€” next recovery in ~17m … retry around 06:32:23."
  • Then ETA-sleeps only: "Sleeping 17.8s … next key in ~16s" β†’ woke and picked the recovered key in 0.16ms.

And the eternal carousel never gave up. The single hardest call rode out the whole storm and still returned a successful response:

  • One unified_call: 2154.1s wall time Β· 9 rotations Β· 18 sleeps Β· 1049s of local waiting Β· βœ… success.
  • The pool then recovered all the way back to 15/15 healthy and stayed there for the rest of the ~6-hour session.

v1.0.0 would have trusted the provider's misleading short retryDelay on those daily 429/503s and re-probed dead keys roughly once per second for hours. v1.0.1 rested each one for a real hour, slept through the total outage, and lost zero requests.


⚑ Quick start (3 steps)

  1. Copy the api_rotation_by_kame/ folder into /a0/usr/plugins/
  2. In Agent Zero β†’ Settings β†’ Model Provider, enter your keys separated by commas: key1, key2, key3, key4, ...
  3. Restart Agent Zero. That's it.

No config required. No tuning. No code changes anywhere. The plugin monkey-patches Agent Zero's LiteLLM layer at boot and reverts cleanly on uninstall.

Look for this banner on startup:

=======================================================
  🐒⚑ KAME v1.0.1 β€” ACTIVE
  βœ“ Identity-Aware Health
  βœ“ Eternal Carousel Rotation
  βœ“ RPM-Aware Predictive Selection
  βœ“ Anti-Dogpile Guard
  βœ“ Anti-Thundering-Herd (Pending Counter)
  βœ“ Trust the Connection (No Artificial Timeouts)
  βœ“ KAME-Aware Compression Guard
  βœ“ Hybrid Learning (Parsed retry-delay + ETA-driven sleep)
  βœ“ Daily-Quota & Account-Limit Aware (multi-provider)
  βœ“ Adaptive Backoff (provider-agnostic safety net)
  βœ“ Rate Limiter Lock Fix
  βœ“ Token Callback Support
  βœ“ Friendly Error Reporting (real status + kind)
  Note: keys are shown as anonymized ids (e.g. 'k3f9a1') β€” NOT your real keys.
=======================================================

πŸ†š KAME vs Plain Round-Robin

Plain round-robin KAME
Selection logic "next in line" most remaining capacity (RPM-aware predictive)
Behavior on 429 retry same key with backoff read provider's retry-delay, sleep that exact time
Concurrent calls all dogpile on key #1 spread across keys (anti-dogpile + anti-thundering-herd)
Sick key recovery guessed (often wrong) respected to the second (parsed from response)
Wasted 429 requests many zero
Detectable as bot yes (regular spin) no (jitter on every wait)
Daily-quota / out-of-credit key trusts a misleading short retry, hammers a dead key detected β†’ real cooldown, any provider
Compression flow breaks on token limit rotates mid-compression, finishes anyway
Memory of failures none identity-aware health (per provider:model)
Recovery from "all sick" pool infinite retry, kills your quota ETA-driven sleep, wake exactly on time

If you're using round-robin, your keys are spending half their quota proving they're still rate-limited. With KAME, every request that hits the API actually gets answered.


πŸ›‘οΈ The 13 Shields

# Shield What it gives you
1 πŸ†” Identity-Aware Health Tracks key health per provider:model pair. Your gemini-2.5-flash pool is separate from your gemini-2.5-pro pool β€” a 429 on one doesn't disable the other.
2 πŸ”„ Eternal Carousel Infinite rotation. Never gives up, never crashes. Survives any combination of failures.
3 πŸ“Š RPM-Aware Predictive Selection 60-second sliding window per key. Picks the one with most remaining capacity. LRU tie-break for even spreading.
4 πŸ›‘οΈ Anti-Dogpile Guard At selection, the chosen key is marked busy NOW. Concurrent calls naturally pick different keys.
5 🐎 Anti-Thundering-Herd The pending request counts in the 60s window BEFORE it completes, so other threads route around it.
6 πŸ’€ ETA-Driven Sleep When all keys are sick, sleep until the soonest recovery (capped 60s, then re-check). Re-select after waking. Never call the API with a sick key.
7 🎲 Smart Hybrid Jitter random.uniform(0.1, 1.5) seconds on every wait. Anti-bot-detection. Prevents multi-client sync collisions.
8 🀝 Trust the Connection Zero artificial timeouts. Slow legitimate work runs to completion.
9 πŸ“¦ KAME-Powered Compression History compression goes through the same eternal carousel. Multi-key rotation during summarization.
10 πŸ“… Daily-Quota & Account-Limit Aware Detects daily-quota and out-of-credit (insufficient_quota) errors across providers and applies a real cooldown β€” instead of trusting a misleadingly short retry and hammering a dead key once per second. Configurable (daily_quota_cooldown_seconds, default 1h).
11 πŸ“ˆ Adaptive Backoff Provider-agnostic safety net: if the same key keeps hitting rate limits, its cooldown escalates (20s β†’ 40s β†’ 80s … up to the ceiling) and resets on the first success. Kills re-probe bursts even when the provider strips all error details.
12 πŸ”’ Rate Limiter Deadlock Fix Replaces A0's asyncio.Lock with threading.Lock, eliminating an async deadlock under specific concurrency patterns.
13 🧹 Clean Uninstall hooks.py::uninstall() reverts every monkey-patch. Drop the folder and KAME is gone β€” no leftover state.

πŸ”¬ How it works

flowchart TD
    A[Agent Zero asks LiteLLM for a chat] --> B[KAME monkey-patched unified_call]
    B --> C[_get_best_key for provider:model]
    C --> D{Any healthy keys?}

    D -->|Yes| E[Pick key with most<br/>remaining capacity]
    D -->|No, all sick| F[Read soonest sick_until]

    E --> G[Mark anti-dogpile + anti-herd]
    G --> H[acompletion - real API call]
    H --> I{Success?}

    F --> J[Sleep min ETA+0.5s, 60s<br/>+ jitter 0.1-1.5s]
    J --> K[NO API calls during sleep]
    K --> C

    I -->|Yes| L[Mark healthy<br/>reset backoff<br/>Return response]
    I -->|No, rate-limit| M[Classify error +<br/>parse retry-delay]
    M --> N{Daily / account limit?}
    N -->|Yes| O[Long cooldown<br/>ignore misleading delay]
    N -->|No| P[Per-minute: trust delay<br/>+ adaptive backoff]
    O --> Q[Set sick_until]
    P --> Q
    Q --> C

    style E fill:#10b981
    style F fill:#f59e0b
    style L fill:#10b981
    style O fill:#f59e0b
    style J fill:#3b82f6
Loading

The whole engine is ~1,050 lines in a single file (kame_engine.py), monkey-patching LiteLLMChatWrapper.unified_call, Topic.summarize_messages, Bulk.summarize, and the framework's rate limiter.

πŸ“ Click for technical deep-dive (state schema + selection algorithm)

Per-key health state

Every API key carries this dictionary, scoped under {provider}:{model}:

{
    "sick_until":    float,  # epoch time when key becomes available again
    "last_used":     float,  # for LRU tie-break + anti-dogpile
    "request_log":   [float],# 60s sliding window of request timestamps
    "last_sick_at":  float,  # for compression-aware "fresh recovery" filter
    "consecutive_rl":int,    # consecutive rate-limit fails -> adaptive backoff (resets on success)
}

Selection algorithm

best_key = min(healthy, key=lambda k: (
    len(pool[k]["request_log"]),  # primary: most remaining 60s-window capacity
    pool[k]["last_used"],         # secondary: LRU for even spreading
))
# Then: mark used NOW (anti-dogpile)
#       count pending NOW in request_log (anti-thundering-herd)

In a 15-key pool firing 100 requests in 60 seconds, KAME spreads them roughly evenly (~6-7 per key) without you doing anything.

ETA-driven sleep formula

soonest_eta = min(sick_until - now  for each sick key)
if soonest_eta > 3.0:
    wait = min(soonest_eta + 0.5, 60.0) + random.uniform(0.1, 1.5)
else:
    wait = 2.0 + random.uniform(0.1, 1.5)  # fallback for very short ETAs
await asyncio.sleep(wait)
continue   # never fall through with a sick key

πŸ“Š Logging (silent / normal / verbose)

KAME explains itself in plain language. One setting β€” kame_log_level β€” controls how much it writes to the Docker log. The rotation algorithm is identical at every level; this only changes what you see. Change it in Settings β†’ Plugins β†’ KAME β†’ Log level; it takes effect on the next monologue start (no restart).

normal (default) β€” success is never silent

One compact line per successful call, plus rotations, limit hits, sleeps and errors. Keys appear as anonymized fingerprints (configurable via key_log_style). The pool-health count shows only when the pool is degraded, so a healthy pool stays quiet:

[KAME] Chat|gemini-2.5-flash βœ… k0a770

When a key hits a limit, KAME tells you the real reason (v1.0.1, any provider) and rotates β€” then the success line says, in plain words, how many rotations it took (no more cryptic "3 attempts"):

[KAME] Chat|gemini-2.5-flash k0a770 ⏳ 429 per-minute β†’ wait 37s Β· next key...
[KAME] Chat|gemini-2.5-flash βœ… k1b8c2 Β· 1 rotation Β· pool 14/15 healthy

A daily-quota or out-of-credit key is rested for a real cooldown instead of being hammered once per second:

[KAME] Chat|gemini-2.5-flash k1b8c2 ⏳ 429 daily-quota β†’ cooling 1h Β· next key...
[KAME] Chat|gemini-2.5-flash βœ… k2c9d4 Β· 1 rotation Β· pool 13/15 healthy

verbose β€” full diagnostics

Everything in normal, plus a Calling... heartbeat, the picked-key line, per-call wall time, the full pool snapshot on every success, a cascade breakdown, and a periodic session summary. Best while tuning the key pool or when you suspect KAME is the bottleneck and want to prove it isn't:

[KAME] Chat|gemini-2.5-flash ➑ Calling...
[KAME] Chat|gemini-2.5-flash ➑ k0a770 picked in 0.08ms
[KAME] Chat|gemini-2.5-flash βœ… k0a770 in 2.4s | pool 15/15 healthy

After a cascade across several keys (with a sleep in the middle):

[KAME] Chat|gemini-2.5-flash βœ… k2c9d4 in 9.4s | pool 13/15 healthy | 5 rotations, 1 sleep

silent β€” the documented exception

KAME stays out of the log entirely: no banner, no per-call line, no rotation or sleep notices. Only a hard, unrecoverable error still surfaces. Internal stats and key health are still tracked β€” only the log output is suppressed. Use it when you want the plugin to be invisible in the Docker log for fully unattended runs.

Sleep is always visible (except in silent)

When the whole pool is cooling, KAME never goes dark without telling you. Near a recovery it logs each cycle (throttled); on a long outage (e.g. a full daily quota) it announces once, then waits quietly instead of spamming the log:

[KAME] Chat|gemini-2.5-flash πŸ’€ All keys cooling. Sleeping 7.7s (no API calls) β€” next key in ~7s (wake 18:09:00)
[KAME] Chat|gemini-2.5-flash πŸ’€ All keys cooling β€” next recovery in ~1h. Waiting quietly (no API calls), retry around 19:05:00.

The sleep notice means KAME is intentionally waiting, not stuck β€” so you never mistake a healthy cooldown for a hang.


βš™οΈ Settings

Setting Default Purpose
kame_log_level normal How much KAME writes to the log: silent (nothing but hard errors), normal (one line per success + events; pool count only when degraded), or verbose (full diagnostics). A legacy verbose_trace: true still maps to verbose.
daily_quota_cooldown_seconds 3600 How long to rest a key after a daily-quota / out-of-credit error (any provider). Also the adaptive-backoff ceiling. Clamped 1–86400.
key_log_style fingerprint How keys appear in logs: fingerprint (anonymized id, never leaks the secret), prefix8 (first 8 chars), or full (debug only).

Everything else is opinionated and validated in production β€” the algorithm, sleep timing, jitter range, 60s RPM window, and quarantine logic are all tuned and tested. If you really want to tweak, the code in kame_engine.py is well-commented.


πŸ—‘οΈ Uninstall

rm -rf /a0/usr/plugins/api_rotation_by_kame/
# Restart Agent Zero

KAME's hooks.py::uninstall() runs BEFORE deletion and reverts every monkey-patch. No leftover state.


❓ FAQ

Do I need to restart Agent Zero after installing KAME?

No. A0 hot-reloads plugins: dropping KAME into your plugins folder (or toggling it on) clears the plugin/extension caches, and KAME activates on the next agent turn β€” no container restart, no framework restart, and it triggers no "reload the page" prompt (KAME ships no extensions/webui). The only prerequisite is having multiple API keys configured in A0's normal model settings β€” KAME never stores keys itself; it rotates the ones A0 already has.

I only have one API key. Does KAME help?

With one key, KAME is roughly equivalent to A0's native rate limiter. The eternal-carousel magic needs multiple keys. Recommend 5+ for good RPM spreading.

KAME picked the same key twice in a row. Bug?

Likely not. If you have only 2-3 keys and one just succeeded with fresh RPM capacity, RPM-aware selection may legitimately re-pick it. With more keys (10+), this becomes very rare due to anti-dogpile.

I'm seeing "429 daily-quota β†’ cooling 1h" β€” is this a bug?

No β€” that's the v1.0.1 daily-quota shield working. KAME detected a daily-quota or out-of-credit error and is resting that key for a real cooldown (default 1h, set by daily_quota_cooldown_seconds) instead of trusting a misleadingly short retry and hammering a dead key once per second. Your other keys keep working; the rested key is re-tried after the cooldown.

Compression takes a long time. Is KAME slowing it down?

No. KAME's "Trust the Connection" philosophy means zero artificial timeouts. A 90,000-token compression that legitimately takes 90 seconds takes 90 seconds. Without KAME, A0's native flow can crash with "timed out" β€” KAME lets it finish.

Does verbose / debug logging cost extra API calls?

Zero. Every log level is pure local instrumentation β€” it only changes how many lines KAME prints, never how it calls the API. Even silent still tracks stats and key health internally; it just doesn't write them out.

Can I use KAME with Anthropic / OpenAI / others, not just Gemini?

Yes. KAME is provider-agnostic. It works wherever Agent Zero's LiteLLM layer works. The retry-delay parser handles Google, OpenAI, Anthropic, Groq, and generic HTTP Retry-After headers β€” including compound durations like "6m 11.52s". v1.0.1's daily-quota detection and adaptive backoff are also multi-provider, so an exhausted daily key is handled correctly no matter who serves it.

What if all my keys die permanently?

KAME keeps cycling. Auth errors (401) and daily/account limits trigger a long cooldown (default 1h) for that key. If literally every key is dead, your agent sleeps (up to 60s per cycle), wakes, re-checks, and sleeps again until you fix it β€” announcing a long outage just once instead of spamming the log. No infinite spin against a wall, no wasted API calls.


πŸ”§ Compatibility

  • Agent Zero: v1.14+ (verified through v1.18)
  • Python: 3.10+
  • Providers: any LiteLLM-supported provider (Google, OpenAI, Anthropic, Mistral, Groq, DeepSeek, xAI, Together, ...)
  • No new dependencies β€” uses stdlib only on top of what A0 already ships

🀝 Contributing

PRs welcome. The engine is intentionally small (~1,050 LOC, single file). When proposing changes:

  1. Keep the engine algorithm stable β€” selection, anti-dogpile, ETA-driven sleep are battle-tested.
  2. Add features behind opt-in settings when possible (see kame_log_level / key_log_style as a pattern).
  3. Log production behavior in test_logs/ with version-named files so changes can be audited.

Bugs and feature requests via GitHub issues.


πŸ“œ License

MIT License β€” see LICENSE.

Copyright (c) 2026 KAME (https://github.com/Kame696)

You can use, modify, distribute, and even sell KAME with the only requirement being to keep the copyright notice. The author retains all rights to KAME as the original work; the license simply grants permissive usage to others.


πŸͺͺ Evolution

KAME has been in development since early 2026, learning from real production logs at every step:

Version Focus Key insight
v1.0.1 Quota awareness + reliability fixes + log overhaul Google sends a misleading retryDelay: 1s on a daily 429 β€” trusting it re-probed a dead key once per second. Fixed with strict daily/account detection + a provider-agnostic adaptive backoff, plus honest per-call error reporting. Three reliability fixes: a mid-run chat message is now received without pressing nudge agent (KAME stopped swallowing A0's InterventionException); a stream that fails after emitting content no longer re-generates from scratch on another key (got_any_chunk guard, mirroring vanilla A0); and a sustained 503 server-busy outage on a big pool now escalates gently instead of spinning forever. The logs were reworked into a clear silent/normal/verbose tri-state with self-explanatory wording (no more cryptic "N attempts"). Engine selection path unchanged.
v1.0.0 First stable release Engine validated: 1,163 ops / 117 rate limits / 0 crashes. ETA-driven sleep proven in production.
v0.5.8.0 The ETA Fix Real log revealed: pulsing every 2s against sick keys burned ~45 wasted 429s in 26s. Fixed by sleeping exactly until next recovery.
v0.5.7.4 Verbose Trace Added opt-in observability: key short id, selection latency, pool snapshot, cascade summary, compression-aware filter.
v0.5.7.3 The Trust Restored Rolled back a misattributed "bug fix" that was actually the production-validated dispersion brake.
v0.5.7 Packaging Cleanup A0 v1.15 schema compliance, clean uninstall hooks.
v0.5.6 The Trust "Trust the Connection" philosophy formalized β€” zero artificial timeouts.
v0.5.0 - v0.5.5 The Commander β†’ The Refined Identity-aware health, anti-dogpile, anti-thundering-herd, smart quarantine.
v0.4.x The Seed β†’ The Strategist Foundational rotation, eternal carousel, basic RPM-awareness.

The lesson across versions: the only way to build something this reliable is to run it in production and read the logs honestly. Every major improvement in KAME came from a real log showing real behavior β€” not from theory.


πŸŽ€ Credits & Star CTA

Built by KAME. Engine refinement guided by real production log analysis β€” including the v0.5.7.4 log that revealed the wasted-pulse bug fixed in v0.5.8.0. Special thanks to every 429 that taught KAME something new.

⭐ Star this repo

If KAME made your agent less frustrating, drop a star ⭐ β€” it costs you nothing and helps others find this.

Star Kame696/kame-api-engine on GitHub β†’


🐒⚑ KAME v1.0.1 β€” because round-robin was never enough

Bitcoin β€” 36BGYhMEVFgY8PLGMVux93pjGt92KVM6dJ

4P1 R0T4T10N β€” 4FRE3D0M

About

🐒⚑ KAME: Key-Aware Management Engine β€” API Rotation plugin for A0 β€” Rotating free LLM providers, saves you from 429 errors, rate limits cooldowns, exhausted keys and route around provider outages during long agent runs.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors