-
Notifications
You must be signed in to change notification settings - Fork 70
feat(bedrock): full support for Claude Opus 4.7 (registry + API compat layer) #316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -61,9 +61,20 @@ interface BedrockInferenceConfig { | |
| // Define interface for Bedrock additional model request fields | ||
| // This includes thinking configuration, 1M context beta, and other model-specific parameters | ||
| interface BedrockAdditionalModelFields { | ||
| thinking?: { | ||
| type: "enabled" | ||
| budget_tokens: number | ||
| thinking?: | ||
| | { | ||
| type: "enabled" | ||
| budget_tokens: number | ||
| } | ||
| | { | ||
| // Claude 4.7+ adaptive thinking — no budget_tokens, uses output_config.effort instead | ||
| type: "adaptive" | ||
| // "summarized" shows thinking content in UI; omit to keep thinking internal only | ||
| display?: "summarized" | "none" | ||
| } | ||
| output_config?: { | ||
| // Claude 4.7+ effort levels: "low" | "medium" | "high" | "xhigh" | "max" | ||
| effort: string | ||
| } | ||
| anthropic_beta?: string[] | ||
| [key: string]: any // Add index signature to be compatible with DocumentType | ||
|
|
@@ -381,6 +392,11 @@ export class AwsBedrockHandler extends BaseProvider implements SingleCompletionH | |
| let additionalModelRequestFields: BedrockAdditionalModelFields | undefined | ||
| let thinkingEnabled = false | ||
|
|
||
| // Detect model generation for API compatibility | ||
| // Claude 4.7+ removed sampling params (temperature/top_p/top_k) and uses adaptive thinking | ||
| const baseModelId = this.parseBaseModelId(modelConfig.id) | ||
| const isGen47Model = baseModelId.includes("opus-4-7") || baseModelId.includes("sonnet-4-7") | ||
|
Comment on lines
+395
to
+398
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
FILE="$(fd '^bedrock\.ts$' src/api/providers | head -n1)"
echo "== createMessage Claude 4.7 handling =="
sed -n '392,440p' "$FILE"
echo
echo "== completePrompt inferenceConfig =="
sed -n '764,779p' "$FILE"Repository: Zoo-Code-Org/Zoo-Code Length of output: 2877 Mirror the Claude 4.7+ Suggested fix+ private isClaude47Model(modelId: string): boolean {
+ const baseModelId = this.parseBaseModelId(modelId)
+ return baseModelId.includes("opus-4-7") || baseModelId.includes("sonnet-4-7")
+ }
+
override async *createMessage(
systemPrompt: string,
messages: Anthropic.Messages.MessageParam[],
@@
- const baseModelId = this.parseBaseModelId(modelConfig.id)
- const isGen47Model = baseModelId.includes("opus-4-7") || baseModelId.includes("sonnet-4-7")
+ const baseModelId = this.parseBaseModelId(modelConfig.id)
+ const isGen47Model = this.isClaude47Model(modelConfig.id)
@@
async completePrompt(prompt: string): Promise<string> {
try {
const modelConfig = this.getModel()
+ const isGen47Model = this.isClaude47Model(modelConfig.id)
@@
const inferenceConfig: BedrockInferenceConfig = {
maxTokens: modelConfig.maxTokens || (modelConfig.info.maxTokens as number),
- temperature: modelConfig.temperature ?? (this.options.modelTemperature as number),
+ ...(isGen47Model
+ ? {}
+ : { temperature: modelConfig.temperature ?? (this.options.modelTemperature as number) }),
}🤖 Prompt for AI Agents |
||
|
|
||
| // Determine if thinking should be enabled | ||
| // metadata?.thinking?.enabled: Explicitly enabled through API metadata (direct request) | ||
| // shouldUseReasoningBudget(): Enabled through user settings (enableReasoningEffort = true) | ||
|
|
@@ -392,27 +408,38 @@ export class AwsBedrockHandler extends BaseProvider implements SingleCompletionH | |
|
|
||
| if ((isThinkingExplicitlyEnabled || isThinkingEnabledBySettings) && modelConfig.info.supportsReasoningBudget) { | ||
| thinkingEnabled = true | ||
| additionalModelRequestFields = { | ||
| thinking: { | ||
| type: "enabled", | ||
| budget_tokens: metadata?.thinking?.maxThinkingTokens || modelConfig.reasoningBudget || 4096, | ||
| }, | ||
| if (isGen47Model) { | ||
| // Claude 4.7+ uses adaptive thinking with effort levels — budget_tokens causes 400 error | ||
| // display: "summarized" surfaces thinking content in Zoo Code UI | ||
| additionalModelRequestFields = { | ||
| thinking: { type: "adaptive", display: "summarized" }, | ||
| output_config: { effort: "xhigh" }, | ||
| } | ||
| } else { | ||
| additionalModelRequestFields = { | ||
| thinking: { | ||
| type: "enabled", | ||
| budget_tokens: metadata?.thinking?.maxThinkingTokens || modelConfig.reasoningBudget || 4096, | ||
| }, | ||
| } | ||
| } | ||
| logger.info("Extended thinking enabled for Bedrock request", { | ||
| ctx: "bedrock", | ||
| modelId: modelConfig.id, | ||
| thinking: additionalModelRequestFields.thinking, | ||
| thinking: additionalModelRequestFields?.thinking, | ||
| }) | ||
| } | ||
|
|
||
| const inferenceConfig: BedrockInferenceConfig = { | ||
| maxTokens: modelConfig.maxTokens || (modelConfig.info.maxTokens as number), | ||
| temperature: modelConfig.temperature ?? (this.options.modelTemperature as number), | ||
| // Claude 4.7+ removed temperature parameter entirely — causes 400 error if sent | ||
| ...(isGen47Model | ||
| ? {} | ||
| : { temperature: modelConfig.temperature ?? (this.options.modelTemperature as number) }), | ||
| } | ||
|
|
||
| // Check if 1M context is enabled for supported Claude 4 models | ||
| // Use parseBaseModelId to handle cross-region inference prefixes | ||
| const baseModelId = this.parseBaseModelId(modelConfig.id) | ||
| // Use parseBaseModelId to handle cross-region inference prefixes (computed above) | ||
| const is1MContextEnabled = | ||
| BEDROCK_1M_CONTEXT_MODEL_IDS.includes(baseModelId as any) && this.options.awsBedrock1MContext | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
What are the official Amazon Bedrock pricing rates for model ID "anthropic.claude-opus-4-7" (input, output, prompt cache write/read), including any 1M-context tier pricing and the exact effective date/region notes?💡 Result:
Amazon Bedrock (model ID anthropic.claude-opus-4-7) pricing rates (standard on-demand inference): 1) Tokens - Input: $5.00 per 1M input tokens [1] - Output: $25.00 per 1M output tokens [1] 2) Prompt caching (cache write / cache read) - Cache read (hits):$0.50 per 1M input tokens (10% of standard input) [2] - Cache write: not directly published as $ /token in the AWS prompt-caching doc we found, but Anthropic’s pricing multipliers indicate cache write is 1.25x base for 5-minute cache writes and 2x base for 1-hour cache writes, implying for Opus 4.7: - 5-minute cache write: $6.25 per 1M input tokens [2] - 1-hour cache write: $10.00 per 1M input tokens [2] 3) 1M-context tier pricing - No separate “1M-context tier” surcharge is listed in the sources found; Opus 4.7’s standard input/output rates apply for the 1M context window (i.e., context length doesn’t add a separate pricing tier in these sources). [2] 4) Exact effective date / region notes - AWS prompt caching is available in all AWS Regions where the supported models are available, but the caching page does not provide an “effective date” for the specific model’s rates in the content surfaced. [3] - The sources above do not provide an explicit effective date/region schedule for Bedrock pricing for anthropic.claude-opus-4-7; they only state the general availability and that you should use the Bedrock pricing page for the current per-model rates. [3] Note: Bedrock pricing can vary by region and by active features (on-demand vs batch vs provisioned throughput vs reserved tiers). For the “official” per-region numbers, you should verify directly on the Bedrock Pricing page for your selected region, since the AWS HTML pricing table content was not fully extractable from the sources we captured here. [4][3]
Citations:
🌐 Web query:
Bedrock pricing "anthropic.claude-opus-4-7" 1M context "context-1m" tier pricing input output cache write/read💡 Result:
Bedrock model: anthropic.claude-opus-4-7; 1M context (“context-1m”) On-demand per 1M tokens (base prices) - Input: $5.00 / 1M input tokens [1][2] - Output: $25.00 / 1M output tokens [1][2] Prompt caching (Claude prompt caching multipliers; Bedrock is billed per-token using its pricing page, and the cache multipliers are defined by Claude’s pricing docs) - Cache write (5-minute TTL): 1.25× base input price [2] => $6.25 / 1M tokens [2] - Cache write (1-hour TTL): 2.0× base input price [2] => $10.00 / 1M tokens [2] - Cache read (hit): 0.1× base input price [2] => $0.50 / 1M tokens [2] Cache durations - Cache read (hit) tokens use the same duration as the preceding write [2]. Context window / “context-1m” - Claude Opus 4.7 is specified with a 1M context window and uses standard pricing across the 1M context window (caching/batch discounts apply at standard rates) [1][2]. If you share the AWS region plus whether you mean Bedrock on-demand vs Batch vs Provisioned Throughput, I can map these into the exact Bedrock line items for that configuration (the Bedrock model card points to the Bedrock pricing page for the definitive rates) [3].
Citations:
🌐 Web query:
"context-1m-2025-08-07" pricing claude opus bedrock💡 Result:
For the Amazon Bedrock “Claude Opus” 1M-context beta variant identified by the header value context-1m-2025-08-07, the pricing works as follows: 1) How to enable the 1M context variant You must include the anthropic_beta parameter/header value “context-1m-2025-08-07” in your request body; otherwise you’re limited to the standard ~200K context window (and may get an error if you exceed 200K tokens). [1] 2) Base model pricing you will pay (on-demand) AWS Bedrock’s standard token pricing for Claude Opus is listed as: - Input: $5.00 per 1M input tokens - Output: $25.00 per 1M output tokens (These are the on-demand rates for the Opus model versions shown in sources.) [2][3] 3) What changes for long-context requests (premium pricing) AWS states (in its Bedrock/Claude long-context beta documentation) that for requests exceeding 200K input tokens, the long-context rate applies to the entire request (not only the tokens above the threshold). [4] Because AWS’s public “Bedrock pricing” page is dynamic by model/provider/region and does not show the long-context beta “context-1m-2025-08-07” premium multiplier directly in the sources I found, I cannot state the exact long-context premium percentage/multiplier for that specific beta value from AWS primary pricing text. Practical takeaway: expect the request to be billed at the long-context premium whenever your request exceeds 200,000 input tokens, and otherwise use the standard Opus $5 / $25 per 1M input/output token rates. [4][1] If you tell me your AWS region and which exact Opus model ID you’re calling in Bedrock (e.g., anthropic.claude-opus-4.x or opus-4.6 vs opus-4.7), I can re-check the Bedrock pricing page for the exact long-context token rates that apply there.
Citations:
🌐 Web query:
AWS Bedrock prompt caching cache write price multiplier 5-minute vs 1-hour💡 Result:
Answer: For Amazon Bedrock prompt caching (Claude models that support 1-hour TTL), the cache-write price multiplier is 1.25x of the normal input price for the default 5-minute cache, and 2x of the normal input price for the 1-hour cache. Cache reads are ~0.1x of the normal input price in both cases. Evidence: - AWS Bedrock docs confirm the TTL options (5 minutes or 1 hour) are available for certain Claude 4.5 models, and that the default is 5 minutes when no
ttlis provided.[1] - AWS “what’s new” notes that the 1-hour TTL is billed at a different rate than the standard 5-minute cache.[2] - Multiple sources that summarize the published Bedrock/Claude prompt-caching rate card state: 5-minute cache writes cost 1.25x base input, while 1-hour cache writes cost 2x base input, and cache reads cost 0.1x (90% discount) of base input.[3][4] So, the specific “multiplier” comparison you asked for is: - 1-hour cache write vs 5-minute cache write: 2x / 1.25x = 1.6x more expensive to write when using the 1-hour TTL. Notes: - This is about the cache-write rate (the token cost for “writing/creating/refreshing” a cached prompt prefix). The read rate (when the next request hits the cached prefix) stays much lower.[1][3][4]Citations:
Fix unverified extended-context (context-1m) tier pricing for Opus 4.7
In
packages/types/src/providers/bedrock.ts(lines ~176-190), the base on-demand rates ($5/$25input/output) and prompt-caching rates (cacheReadsPrice: 0.5,cacheWritesPrice: 6.25) match published Claude prompt-caching multipliers (5-minute TTL) and the published Opus base pricing.However, the extended-context tier values in
tiers[0](markedcontext-1m-2025-08-07/>200K):inputPrice: 10.0,outputPrice: 37.5,cacheWritesPrice: 12.5,cacheReadsPrice: 1.0are not backed by the Bedrock/Claude sources found; long-context beta “context-1m” pricing is not explicitly published in a way that establishes these exact numbers.Replace the
tiers[0]pricing with the exact official Bedrock long-context beta rates foranthropic.claude-opus-4-7(or omittiersuntil verified) to avoid incorrect cost reporting/budget UX for >200K requests.🤖 Prompt for AI Agents