Releases: Stackbilt-dev/llm-providers
v1.16.0 — Full CF Workers AI integration
What's new
4 new Cloudflare Workers AI models
| Model | Context | Tools | Use cases |
|---|---|---|---|
@cf/nvidia/nemotron-3-120b-a12b |
256K | ✓ | HIGH_PERFORMANCE, TOOL_CALLING, LONG_CONTEXT |
anthropic/claude-opus-4.8 |
1M | — | HIGH_PERFORMANCE, LONG_CONTEXT (CF-managed Anthropic) |
@cf/deepseek-ai/deepseek-r1-distill-qwen-32b |
80K | — | RESEARCH (thinking model) |
@cf/qwen/qwq-32b |
24K | — | RESEARCH (thinking model) |
thinkingModel routing guard
New ModelCapabilities.thinkingModel?: boolean flag marks models that output chain-of-thought reasoning traces. rankModels() now excludes all thinkingModel: true entries from every use-case pool except RESEARCH, preventing them from winning direct-response routes (summary, chat, tool calling, etc.).
Affected models: @cf/deepseek-ai/deepseek-r1-distill-qwen-32b, @cf/qwen/qwq-32b, @cf/zai-org/glm-4.7-flash.
Anthropic-via-CF response normalizer
anthropic/claude-opus-4.8 uses the Anthropic message wire format through the Workers AI binding. The provider now routes these through a dedicated formatter (system as a top-level field, not a system role message) and normalizes the content[{type, text}] + stop_reason response shape to the standard LLMResponse contract.
GLM-4.7-Flash reclassification (completes #93)
@cf/zai-org/glm-4.7-flash is now RESEARCH-only with thinkingModel: true, fully excluding it from all direct-response routing pools (not just deprioritized as in v1.15.0).
v1.14.5
Patch fix for Cloudflare bad-input error classification (issue #91).
Fixed
- Cloudflare
InvalidRequestErrorwrapping —CloudflareProvidernow catches Workers AIAiError: Bad inputresponses and re-throws them asInvalidRequestErrorinstead of propagating the rawAiError. Callers and gateways can now distinguish non-retryable bad-input failures from transient infrastructure errors without parsing raw message strings.
v1.14.4
Patch compatibility fix for Cloudflare local gateway streaming.\n\n- CloudflareProvider.streamResponse now normalizes non-streaming chat-completion JSON responses through the same parser used by generateResponse.\n- Preserves Cloudflare Kimi/Workers AI output when llm-gateway asks Cloudflare for JSON and synthesizes Claude/Codex client streams.\n- Includes reasoning_content fallback from v1.14.3 for Cloudflare reasoning models that return content:null on truncated responses.\n\nValidation:\n- npm run typecheck\n- npm test: 444/444\n- npm run test:package
v1.14.2
v1.14.2 — 2026-06-09
Workers AI catalog expansion for Cloudflare credit-backed gateway routing.
Added
- Cloudflare Kimi K2.6 — adds @cf/moonshotai/kimi-k2.6 as an active Workers AI catalog entry with long context, tool calling, vision, and structured-agent workload metadata.
- Cloudflare GLM-4.7-Flash — adds @cf/zai-org/glm-4.7-flash as an active fast/balanced Workers AI catalog entry with long context and tool-calling metadata.
- Cloudflare DeepSeek V4 Pro — adds the dashboard model slug deepseek/deepseek-v4-pro as an active high-performance Workers AI catalog entry for reasoning and coding routes.
This release triggers .github/workflows/publish.yml, which will run CI and publish @stackbilt/llm-providers@1.14.2 to npm with provenance.
v1.14.1
v1.14.0
@stackbilt/llm-providers v1.14.0
Worker gateway route-planning surface from issue #87. Additive only — no breaking changes.
Added
getGatewayRoutePlan()helper — packages canonical normalization, catalog routing, cache hints, capability checks, degradations, and warnings into a single Worker-friendly object for use behind OpenAI-compatible, Ollama-style, or Anthropic-compatible API routers. Accepts either compatibilityLLMRequestorCanonicalLLMRequestinput.- Route plan types —
GatewayRoutePlan,GatewayRouteRequirements,GatewayRouteCapabilityReport, andGatewayRouteCachePlandescribe the shape. Storage-agnostic — consumers mapplan.cacheonto their own KV / Cache API / D1 / R2 implementation. - LoRA degradation reporting — when a request carries
loraand routing selects a non-Cloudflare provider, the plan reports astrippeddegradation and warns that Cloudflare adapter ids are forwarded to Workers AI without validation. - Route plan tests —
src/__tests__/gateway-routing.test.tscovers canonical→plan mapping, cache-hint handling, LoRA-on/off-Cloudflare paths, and built-in tool capability mismatches.
Validation
- 21 test files / 441 tests passing locally and in CI
tsc --noEmitclean- Published with npm provenance
See the full entry in CHANGELOG.md.
v1.13.1 — Groq tool-call content fix
Fixed
- GroqProvider now accepts tool-call-only assistant responses where Groq omits message.content while preserving finishReason: "tool_calls" and populated toolCalls.
- Adds regression coverage for omitted content plus message.tool_calls.
Fixes #86.
v1.13.0 - Cloudflare Workers AI cache binding support
Added
- Cloudflare Workers AI run options now translate
CacheHints.sessionIdintox-session-affinityforprovider-prefixandbothcache strategies, including streaming and raw vision calls. CloudflareConfig.gatewayexposes typed Workers AI binding Gateway options and merges request cache metadata into the thirdenv.AI.run()argument.- Cloudflare usage parsing now normalizes Workers AI cached input token counts into
TokenUsage.cachedInputTokens.
Validation
npm run typechecknpm testnpm run test:packagenpm audit --omit=dev
Closes #84.
v1.12.0 — Canonical provider contract
Canonical provider contract hardening from issue #81 / PR #82. Additive only.
Added
- Exported canonical provider contract types, including
CanonicalLLMRequest,CanonicalLLMResponse, and related helper types. - Added
normalizeLLMRequest()to map compatibilityLLMRequestfields into the canonical shape. - Added
canonicalToLLMRequest()to convert canonical requests back into existing adapter input while providers migrate internally. - Added
normalizeLLMResponse()for stable canonical response routing metadata, fallback/degradation fields, normalized error slots, and provider-extra metadata. - Added contract tests covering OpenAI-compatible, Anthropic-compatible, Groq/Cerebras, NVIDIA, and Cloudflare adapter preparation without live API calls.
- Documented the gateway boundary:
client protocol -> gateway adapter -> CanonicalLLMRequest -> llm-providers -> vendor API.
Validation
npm run typechecknpm testnpm run test:packagenpm audit --omit=dev
v1.11.0
@stackbilt/llm-providers v1.11.0
Reliability and gateway-routing hardening. Additive APIs plus bug fixes from issues #61, #62, #63, #64, #65, and #67.
Added
- Generated VERSION export synced from package.json during build/test/publish paths.
- Streaming usage reconciliation across OpenAI, Groq, Cerebras, NVIDIA, Anthropic, and Cloudflare streams.
- CircuitBreaker, CircuitBreakerManager, and ExhaustionRegistry persistence APIs for Workers KV/D1/Redis/Durable Objects.
- Workload-aware model defaults via ModelWorkloadClass, ModelPreferenceMap, getRecommendedModelForWorkload(), and getProviderDefaultModelForWorkload().
- Provider-agnostic TokenUsage.cacheWriteInputTokens for cache write/create telemetry.
Fixed
- Vision requests now reject or skip providers that cannot process images instead of silently dropping image content.
- Anthropic URL images now throw ConfigurationError instead of lossy placeholder conversion.
Validation
- npm run typecheck
- npm test (19 files, 421 tests)
- npm run build
- npm run test:package
- npm audit --omit=dev