Skip to content

Releases: Stackbilt-dev/llm-providers

v1.16.0 — Full CF Workers AI integration

13 Jun 11:47

Choose a tag to compare

What's new

4 new Cloudflare Workers AI models

Model Context Tools Use cases
@cf/nvidia/nemotron-3-120b-a12b 256K HIGH_PERFORMANCE, TOOL_CALLING, LONG_CONTEXT
anthropic/claude-opus-4.8 1M HIGH_PERFORMANCE, LONG_CONTEXT (CF-managed Anthropic)
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b 80K RESEARCH (thinking model)
@cf/qwen/qwq-32b 24K RESEARCH (thinking model)

thinkingModel routing guard

New ModelCapabilities.thinkingModel?: boolean flag marks models that output chain-of-thought reasoning traces. rankModels() now excludes all thinkingModel: true entries from every use-case pool except RESEARCH, preventing them from winning direct-response routes (summary, chat, tool calling, etc.).

Affected models: @cf/deepseek-ai/deepseek-r1-distill-qwen-32b, @cf/qwen/qwq-32b, @cf/zai-org/glm-4.7-flash.

Anthropic-via-CF response normalizer

anthropic/claude-opus-4.8 uses the Anthropic message wire format through the Workers AI binding. The provider now routes these through a dedicated formatter (system as a top-level field, not a system role message) and normalizes the content[{type, text}] + stop_reason response shape to the standard LLMResponse contract.

GLM-4.7-Flash reclassification (completes #93)

@cf/zai-org/glm-4.7-flash is now RESEARCH-only with thinkingModel: true, fully excluding it from all direct-response routing pools (not just deprioritized as in v1.15.0).

v1.14.5

10 Jun 12:21

Choose a tag to compare

Patch fix for Cloudflare bad-input error classification (issue #91).

Fixed

  • Cloudflare InvalidRequestError wrappingCloudflareProvider now catches Workers AI AiError: Bad input responses and re-throws them as InvalidRequestError instead of propagating the raw AiError. Callers and gateways can now distinguish non-retryable bad-input failures from transient infrastructure errors without parsing raw message strings.

v1.14.4

10 Jun 10:28

Choose a tag to compare

Patch compatibility fix for Cloudflare local gateway streaming.\n\n- CloudflareProvider.streamResponse now normalizes non-streaming chat-completion JSON responses through the same parser used by generateResponse.\n- Preserves Cloudflare Kimi/Workers AI output when llm-gateway asks Cloudflare for JSON and synthesizes Claude/Codex client streams.\n- Includes reasoning_content fallback from v1.14.3 for Cloudflare reasoning models that return content:null on truncated responses.\n\nValidation:\n- npm run typecheck\n- npm test: 444/444\n- npm run test:package

v1.14.2

09 Jun 22:19

Choose a tag to compare

v1.14.2 — 2026-06-09

Workers AI catalog expansion for Cloudflare credit-backed gateway routing.

Added

  • Cloudflare Kimi K2.6 — adds @cf/moonshotai/kimi-k2.6 as an active Workers AI catalog entry with long context, tool calling, vision, and structured-agent workload metadata.
  • Cloudflare GLM-4.7-Flash — adds @cf/zai-org/glm-4.7-flash as an active fast/balanced Workers AI catalog entry with long context and tool-calling metadata.
  • Cloudflare DeepSeek V4 Pro — adds the dashboard model slug deepseek/deepseek-v4-pro as an active high-performance Workers AI catalog entry for reasoning and coding routes.

This release triggers .github/workflows/publish.yml, which will run CI and publish @stackbilt/llm-providers@1.14.2 to npm with provenance.

v1.14.1

08 Jun 08:30
a1741c1

Choose a tag to compare

Patch release for Cerebras OpenAI-compatible tool-call responses that omit message.content. Includes PR #88 compatibility fix and release metadata from PR #89.

v1.14.0

07 Jun 11:10

Choose a tag to compare

@stackbilt/llm-providers v1.14.0

Worker gateway route-planning surface from issue #87. Additive only — no breaking changes.

Added

  • getGatewayRoutePlan() helper — packages canonical normalization, catalog routing, cache hints, capability checks, degradations, and warnings into a single Worker-friendly object for use behind OpenAI-compatible, Ollama-style, or Anthropic-compatible API routers. Accepts either compatibility LLMRequest or CanonicalLLMRequest input.
  • Route plan typesGatewayRoutePlan, GatewayRouteRequirements, GatewayRouteCapabilityReport, and GatewayRouteCachePlan describe the shape. Storage-agnostic — consumers map plan.cache onto their own KV / Cache API / D1 / R2 implementation.
  • LoRA degradation reporting — when a request carries lora and routing selects a non-Cloudflare provider, the plan reports a stripped degradation and warns that Cloudflare adapter ids are forwarded to Workers AI without validation.
  • Route plan testssrc/__tests__/gateway-routing.test.ts covers canonical→plan mapping, cache-hint handling, LoRA-on/off-Cloudflare paths, and built-in tool capability mismatches.

Validation

  • 21 test files / 441 tests passing locally and in CI
  • tsc --noEmit clean
  • Published with npm provenance

See the full entry in CHANGELOG.md.

v1.13.1 — Groq tool-call content fix

06 Jun 10:57

Choose a tag to compare

Fixed

  • GroqProvider now accepts tool-call-only assistant responses where Groq omits message.content while preserving finishReason: "tool_calls" and populated toolCalls.
  • Adds regression coverage for omitted content plus message.tool_calls.

Fixes #86.

v1.13.0 - Cloudflare Workers AI cache binding support

06 Jun 09:48

Choose a tag to compare

Added

  • Cloudflare Workers AI run options now translate CacheHints.sessionId into x-session-affinity for provider-prefix and both cache strategies, including streaming and raw vision calls.
  • CloudflareConfig.gateway exposes typed Workers AI binding Gateway options and merges request cache metadata into the third env.AI.run() argument.
  • Cloudflare usage parsing now normalizes Workers AI cached input token counts into TokenUsage.cachedInputTokens.

Validation

  • npm run typecheck
  • npm test
  • npm run test:package
  • npm audit --omit=dev

Closes #84.

v1.12.0 — Canonical provider contract

05 Jun 20:19

Choose a tag to compare

Canonical provider contract hardening from issue #81 / PR #82. Additive only.

Added

  • Exported canonical provider contract types, including CanonicalLLMRequest, CanonicalLLMResponse, and related helper types.
  • Added normalizeLLMRequest() to map compatibility LLMRequest fields into the canonical shape.
  • Added canonicalToLLMRequest() to convert canonical requests back into existing adapter input while providers migrate internally.
  • Added normalizeLLMResponse() for stable canonical response routing metadata, fallback/degradation fields, normalized error slots, and provider-extra metadata.
  • Added contract tests covering OpenAI-compatible, Anthropic-compatible, Groq/Cerebras, NVIDIA, and Cloudflare adapter preparation without live API calls.
  • Documented the gateway boundary: client protocol -> gateway adapter -> CanonicalLLMRequest -> llm-providers -> vendor API.

Validation

  • npm run typecheck
  • npm test
  • npm run test:package
  • npm audit --omit=dev

v1.11.0

31 May 10:58

Choose a tag to compare

@stackbilt/llm-providers v1.11.0

Reliability and gateway-routing hardening. Additive APIs plus bug fixes from issues #61, #62, #63, #64, #65, and #67.

Added

  • Generated VERSION export synced from package.json during build/test/publish paths.
  • Streaming usage reconciliation across OpenAI, Groq, Cerebras, NVIDIA, Anthropic, and Cloudflare streams.
  • CircuitBreaker, CircuitBreakerManager, and ExhaustionRegistry persistence APIs for Workers KV/D1/Redis/Durable Objects.
  • Workload-aware model defaults via ModelWorkloadClass, ModelPreferenceMap, getRecommendedModelForWorkload(), and getProviderDefaultModelForWorkload().
  • Provider-agnostic TokenUsage.cacheWriteInputTokens for cache write/create telemetry.

Fixed

  • Vision requests now reject or skip providers that cannot process images instead of silently dropping image content.
  • Anthropic URL images now throw ConfigurationError instead of lossy placeholder conversion.

Validation

  • npm run typecheck
  • npm test (19 files, 421 tests)
  • npm run build
  • npm run test:package
  • npm audit --omit=dev