Skip to content

Releases: VladoIvankovic/Codeep

v2.13.0

19 Jun 08:39

Choose a tag to compare

Three new providers — Kimi (Moonshot), Grok (xAI), and Qwen (Alibaba) — covering the major coding models. Kimi and Qwen include their flat-fee coding-plan subscriptions alongside pay-per-use; Grok adds graded /thinking effort.

Added

  • Kimi (Moonshot AI). kimi drives the Kimi Code subscription
    (api.kimi.com/coding, model alias kimi-for-coding); kimi-api is
    pay-per-use (api.moonshot.ai, default kimi-k2.7-code); kimi-cn for
    mainland China. Keys: KIMI_CODE_API_KEY / MOONSHOT_API_KEY.
  • Qwen (Alibaba Model Studio). qwen drives the Coding Plan
    subscription (coding-intl.dashscope…, sk-sp- key); qwen-api is
    pay-per-use (DashScope, default qwen3-coder-plus); plus qwen-cn /
    qwen-cn-api and a free ModelScope tier (modelscope). Keys:
    BAILIAN_CODING_PLAN_API_KEY / DASHSCOPE_API_KEY / MODELSCOPE_API_KEY.
  • Grok (xAI). grok — pay-per-use (api.x.ai), default grok-build-0.1
    plus grok-4.3 and the fast/reasoning variants. Key: XAI_API_KEY.

Changed

  • /thinking now covers Grok (reasoning_effort — low/medium/high). Kimi
    and the Qwen coder models have no graded knob, so they stay out of the picker.
  • Qwen tool turns are sent non-streamed. DashScope rejects tools with
    stream:true, so agent turns that carry tools buffer the reply (handled
    transparently); other providers keep streaming.
  • Kimi K2.x code models fix temperature internally — Codeep withholds the
    sampling params so they don't 400.

v2.12.0

16 Jun 08:34

Choose a tag to compare

New /thinking (alias /effort) reasoning-effort control — auto · low · medium · high · max, shown beside the model in the status bar and clamped per provider+model so it never sends a value the API rejects. Plus a Codeep agent identity and iOS-testing MCP servers.

Added

  • /thinking (alias /effort) — thinking / reasoning-effort tiers. A single
    control with five tiers (auto · low · medium · high · max) for how hard the
    model reasons. auto (default) sends nothing — each model's own default. The
    other tiers are clamped to the nearest level the active provider+model
    actually accepts, so an unsupported value is never sent: Anthropic Opus/Sonnet
    output_config.effort, OpenAI GPT‑5.x → reasoning_effort (Max→xhigh),
    Google Gemini 3 → low/high, DeepSeek V4 & Z.AI GLM‑5.2 → high/max,
    OpenRouter → unified reasoning.effort. The active tier shows next to the
    model in the status bar; models without a graded knob (Haiku, GLM‑Turbo,
    Ollama, custom) hide it.
  • About‑Codeep persona. The agent system prompt now states what Codeep is and
    points you at the right slash‑command, backed by a curated command index.
  • MCP marketplace: iOS‑testing servers. Added iOS Simulator and Mobile
    (iOS + Android)
    servers for device/UI automation.

Changed

  • MCP browser server is now Playwright (supersedes Puppeteer) — the de‑facto
    browser‑automation MCP.

v2.11.2

15 Jun 08:51

Choose a tag to compare

Trimmed the model pickers (Claude Fable 5 is de-listed — unavailable under the US export ban — and a few older variants drop off), and editor clients (VS Code, Zed) now see API retry/backoff instead of an endless "Thinking…" spinner.

Changed

  • Model picker cleanup across every provider. Claude Fable 5 is removed
    from the Anthropic picker (unavailable under the US export ban; Opus 4.8 stays
    the default). Z.AI drops glm-5.1 and glm-5 (keeps glm-5.2 + glm-5-turbo);
    OpenAI drops gpt-5.4-nano (keeps 5.5 / 5.4 / 5.4-mini). DeepSeek, Google, and
    MiniMax are unchanged. All ids remain valid if set by hand — they're just no
    longer offered. Context/cost tables updated to match.

Fixed

  • ACP retry visibility. When a request hit a transient API error and the
    agent retried with backoff, the ACP path dropped the notice (only the bare
    iteration counter was suppressed, but the retry message went with it) — so
    editor clients showed an indefinite "Thinking…" while the CLI was actually
    retrying. Retry/backoff notices ("API 429 … retrying in 10s (1/3)") and
    context warnings (⚠) are now forwarded as agent thoughts; the plain
    iteration counter stays internal.

v2.11.1

13 Jun 22:07

Choose a tag to compare

Hotfix: the Z.AI default was glm-5.2[1m], but the API rejects that id ("Unknown Model", code 1211) — so a fresh Z.AI session failed on its first request. The default is now plain glm-5.2 (which works), and the non-working [1m] variant is removed from the picker.

Fixed

  • Z.AI default model glm-5.2[1m] returned "Unknown Model". The 1M-context
    [1m] suffix from the devpack docs isn't accepted by the Z.AI chat API
    endpoints Codeep uses, so it 400'd on every request. The default (and
    cold-start default) is now glm-5.2, and glm-5.2[1m] is dropped from all
    four Z.AI providers' model lists. If your config still points at
    glm-5.2[1m], switch with /model glm-5.2.

v2.11.0

13 Jun 21:50

Choose a tag to compare

New default model GLM-5.2 (1M-context glm-5.2[1m]) across every Z.AI provider, plus TUI polish: ↑ recalls history, diffs render green/red, full / autocomplete, and /settings values now stick.

Added

  • GLM-5.2 — the new default Z.AI model. Added across all four Z.AI
    providers (international + China, subscription + pay-per-use): glm-5.2[1m]
    (1M context — the [1m] suffix selects the million-token window) is now the
    default, with plain glm-5.2 also offered; GLM-5.1, GLM-5 Turbo, and GLM-5
    stay available. Context-window and cost tables include both new ids. (GLM-5.2
    per-token pricing isn't published yet, so /cost mirrors GLM-5.1 for now —
    and on the GLM Coding Plan billing is a flat subscription anyway.) Editor
    clients pick this up automatically over ACP.

Fixed

  • /settings values stick now. A block of startup "migrations" ran on
    every launch and silently forced user-chosen values back up — maxTokens
    below 32768, agentMaxDuration below 480 min, API timeout and rate limits —
    so the affected settings were effectively lies. They now run exactly once
    per config (recorded via migrationVersion); after that, what you set is
    what you get.
  • recalls prompt history on an empty input. The status bar has always
    advertised "↑↓ history", but a scroll handler intercepted the arrows first,
    so history recall was unreachable. Arrows now do history (like every shell);
    scrolling lives on PgUp/PgDn and the mouse wheel.
  • New messages no longer yank you to the bottom. While you're scrolled up
    reading, incoming messages (every agent action, mid-run) used to reset the
    view to the bottom. The view now stays put and the status bar shows a
    "↓ N new · PgDn" badge until you return.

Changed

  • Diff blocks render as diffs. ```diff fences — which the agent emits on
    every edit confirmation — now highlight +added lines green, -removed red,
    and @@hunks cyan. Previously they fell through to JS keyword colors.
  • Every command in / autocomplete has a description. 48 of 123 rows were
    blank (the whole scaffold/git/devops family — /component, /pr, /docker, …)
    and 9 bare single-letter aliases cluttered the list. The dropdown is now
    derived from a single command registry, so a command can't ship without a
    description again; the single-letter shortcuts (/c, /t, …) still work,
    they're just not listed.
  • ~800 lines of dead UI code removed (an unused parallel chat renderer and
    two unreachable fullscreen screens) — no behavior change, but edits can no
    longer land in the wrong renderer by mistake.
  • ACP session/new now returns the prior transcript on resume. When an
    editor client reconnects with fresh: false (e.g. a VS Code window reload),
    the response carries the workspace session's history (user/assistant only,
    mirroring session/load) so the client can repaint the chat instead of
    showing blank while the agent still holds the context. Empty on a fresh
    session; older clients ignore the extra field. Powers Codeep VS Code 2.6.0's
    reload restore.

v2.10.0

11 Jun 09:10

Choose a tag to compare

/tasks add now matches the dashboard: tag a task as a bug or feature and give it a description inline (--bug / --feature / --desc), and the list tags each row with its project when global.

Added

  • Task types in /tasks add. Append --bug or --feature (or --task,
    the default) to file the task under the right type on the codeep.dev
    dashboard — e.g. /tasks add login button misaligned --bug. The flag can sit
    anywhere in the arguments and is stripped from the title; the dashboard and
    the macOS app already render the type with its own icon and color, so this
    brings all three surfaces to parity (the dashboard and macOS both let you pick
    a type; the CLI previously hardcoded task).
  • Task descriptions in /tasks add. --desc (or --description) captures
    the following words — up to the next flag — as the task's description, e.g.
    /tasks add Fix login --bug --desc NPE when the email is empty. It's the same
    field the dashboard and macOS app set; the /tasks list already prints it and
    it's injected into the agent's task-context prompt, so a CLI-set description
    immediately enriches what the agent sees. Omitted from the request when absent.

Changed

  • /tasks list tags each row with its project when listed globally. Running
    /tasks outside a project lists pending tasks across all projects; each row
    now shows its project name (matching the macOS and dashboard task rows) so a
    mixed list is legible. Inside a project the header already names it, so rows
    stay uncluttered.
  • /tasks autocomplete description now reflects the full command — it
    covered only "show pending tasks" and hid the add/done/delete
    subcommands and the type flags from / autocomplete.

Fixed

  • /stats now shows the prompt-caching summary, and a dead duplicate cost
    case is gone.
    The session-cost view had two switch branches sharing a
    case 'cost': /cost always rendered the full formatCostReport() (the
    cross-surface report the editor clients use, with the prompt-caching section),
    while the second branch — the detailed /stats view — was unreachable for
    /cost yet was the only one missing that caching section. /stats now
    reports cache reads/writes and estimated savings too (parity with /cost and
    the 2.0.2 caching work), and the dead cost label was removed so the dispatch
    is honest. What /cost displays is unchanged.
  • /keysync now appears in / autocomplete. The command shipped in 2.8.0
    with a description and an ACP entry, but was missing from the TUI command
    list, so terminal users never saw it offered. (It always worked when typed.)

v2.9.0

09 Jun 22:03

Choose a tag to compare

Claude Fable 5 — Anthropic's most powerful model, a new tier above Opus — is now in the model picker ($10/$50 per MTok, 1M context). Opus 4.7 and 4.6 leave the picker (Opus 4.8 stays the default). Plus a real compatibility fix: temperature is no longer sent to models that reject it (Fable 5 / Opus 4.7+), which previously surfaced as an opaque 400.

Added

  • Claude Fable 5 (claude-fable-5) in the Anthropic provider — the most
    powerful Claude model, a new tier above Opus. $10 input / $50 output per
    MTok, 1M context window. Pick it with /model claude-fable-5.

Changed

  • Opus 4.7 and 4.6 removed from the model picker now that Opus 4.8 and
    Fable 5 cover both tiers. The ids remain valid — if your config still points
    at one, it keeps working; it just isn't offered for new selection.

Fixed

  • temperature is no longer sent to Anthropic models that reject it.
    Fable 5 and Opus 4.7+ return HTTP 400 when the request includes
    temperature — and that 400 was previously masked by the tools-fallback
    retry, surfacing as a generic "API error". A model-aware guard
    (modelRejectsSamplingParams) now omits the parameter on those models
    across all three Anthropic request paths (agent, fallback, plain chat);
    omission means the API default, so behavior on other models is unchanged.

v2.8.0

09 Jun 19:44

Choose a tag to compare

API keys are now keychain-first and stay local by default — syncing them to codeep.dev is an explicit opt-in (/keysync on), and codeep account purge-keys wipes any keys already on the server.

Added

  • /keysync on|off|status — opt in (or out) of syncing API keys to
    codeep.dev. OFF by default: your keys live only in the OS keychain unless
    you enable this. When on, codeep account push/sync upload/download keys;
    the command warns that synced keys are stored server-readable. Also available
    in /settings, and forced off by the CODEEP_NO_KEY_SYNC env var (org policy).
  • codeep account purge-keys — delete every API key stored on codeep.dev in
    one shot (cloud-only; your local OS keychain is untouched). A clean exit if you
    synced keys before and want them off the server.

Changed

  • codeep account push / account sync no longer move API keys unless cloud
    key sync is enabled
    (/keysync on). They still push/pull personalities,
    custom commands, and your profile as before — only the secret half is gated.
    Existing users who relied on key sync just run /keysync on once.

v2.7.0

09 Jun 11:38

Choose a tag to compare

A batch of review tooling: YAML review config, a codeep hook install pre-commit reviewer, codeep review --rules to list rule ids, and an opt-in codeep review --ai second opinion. Plus fixes: compiled binaries report the real version (no more "vunknown"), ACP editor sessions no longer mutate the global confirmation setting, and keychain-fallback keys get swept into the keychain once it's available.

Added

  • YAML review config. .codeep/review.yml / .codeep/review.yaml are now
    supported alongside .codeep/review.json (YAML preferred when present).
    Single-quoted YAML keeps regex backslashes literal (pattern: '\bfoo\('),
    avoiding JSON's double-escaping. Same schema; format is auto-detected.
  • codeep hook install — installs a git pre-commit (or --pre-push) hook
    that runs codeep review --fail-on <level> on your changes, blocking the
    commit when issues at/above the threshold are found (honors .codeep/review.*,
    no API key). codeep hook uninstall removes it; Codeep never overwrites a hook
    it didn't create.
  • codeep review --rules — lists the built-in rule ids (the values you can
    put in disable in .codeep/review.*) and exits.
  • codeep review --ai — opt-in: after the offline pass, asks your configured
    provider for a contextual second opinion, merged into the report as a clearly
    tagged advisory section. Needs an API key (degrades to deterministic-only
    without one) and never affects the exit code — the deterministic review stays
    authoritative, so CI (the GitHub Action) is unchanged.

Fixed

  • Keychain fallback sweep. If the OS keychain was unavailable on a prior run,
    API keys fell back to plaintext config. They're now swept into the keychain
    automatically once it becomes available (completes the 2.5.2 key-storage work).

  • Compiled binary version. The standalone binaries printed "Codeep
    vunknown" because they read the version from package.json, which isn't on
    disk in a compiled binary. The version is now baked in at build time, so
    --version is correct everywhere (npm, Homebrew, and the standalone binaries).

  • ACP confirmation setting no longer leaks/races. Manual-mode editor
    sessions used to flip the global agentConfirmWriteFile config and restore it
    non-atomically around each prompt — which could leak the session's mode into
    the terminal app and race when prompts overlapped. Write/edit confirmation is
    now scoped to the run via a per-call option, with no global config mutation.

v2.6.0

09 Jun 08:28

Choose a tag to compare

New: configurable code-review rules. Drop a .codeep/review.json into a repo to add your own deterministic review rules, disable built-in ones, and scope which files are reviewed — enforced the same way by codeep review (CLI) and the Codeep GitHub Action, with zero LLM cost.

Added

  • .codeep/review.json — review rules as config. The deterministic
    reviewer (codeep review, /review --static, and the GitHub Action) now
    reads a per-project config:
    • rules — your own checks: id, pattern (regex), message (required)
      plus optional flags, category, severity, suggestion, extensions.
    • disable — turn off built-in rules by id (each built-in now has a stable
      id, e.g. eval-usage, todo-comment, any-type, long-file).
    • include / exclude — glob scoping (**, *, ?).
      A missing, malformed, or partially-invalid config never breaks a review — bad
      entries are skipped with a warning and valid ones still apply.

Security

  • Hardened the reviewer against untrusted custom rules. Since a PR's
    .codeep/review.json runs in CI via the Action, custom regexes are screened
    at load (length cap + a catastrophic-backtracking/ReDoS heuristic), the match
    loop guards zero-width patterns (no infinite loop) and caps matches per rule,
    and the GitHub Action bounds each review's wall-clock at 180s.