feat(pepper): route Bedrock calls through tagged inference profiles [DEV-245]#42
Merged
Merged
Conversation
…DEV-245] Adds per-mode AWS Application Inference Profiles so we can attribute Bedrock spend to review vs on-demand flows in Cost Explorer: - review_model defaults to pepper-pr-review profile (Mode=review) - on_demand_model defaults to pepper-on-demand profile (Mode=on-demand) - model input becomes an opt-in override that wins for both modes Both profiles wrap the same Sonnet 4.5 system inference profile that was previously used directly, so behavior is unchanged — only billing attribution changes. Cost allocation tag activation in AWS Billing is follow-up: the tags only register after the first invocation through each profile.
There was a problem hiding this comment.
Verified against DEV-245 — aligned. The workflow now routes review-mode and on-demand-mode Bedrock calls through separate Application Inference Profiles tagged Mode=review and Mode=on-demand, so Cost Explorer can finally tell you which flow is eating the budget. The three-input hierarchy (override wins, else per-mode default) is clearly documented and the resolution step at :348–367 branches exactly right. Test plan is observational (check the resolution logs, wait 24h for billing to populate) — correct shape for a config-wiring change where the proof is in the spend attribution downstream.
Yours,
Pepper
When you're ready for another look, drop a comment with @pepper review.
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes DEV-245. Pepper currently runs both review and on-demand flows through the same Bedrock model ID, so Cost Explorer collapses them into one undifferentiated line item — we can't tell which flow drives spend.
This wires Pepper through two AWS Application Inference Profiles, both wrapping the existing
us.anthropic.claude-sonnet-4-5-20250929-v1:0system inference profile, each carrying its own cost-allocation tags:pepper-pr-reviewarn:aws:bedrock:us-west-2:618640261060:application-inference-profile/cz21awrop223Product=pepper, Mode=reviewpepper-on-demandarn:aws:bedrock:us-west-2:618640261060:application-inference-profile/68jw718dw1jvProduct=pepper, Mode=on-demandBoth profiles already exist in account
618640261060. TheGitHubActions-ClaudeCode-Bedrockrole's existingBedrockModelAccesspolicy already grantsInvokeModel*onapplication-inference-profile/*— no IAM change required.Workflow changes
review_modelandon_demand_modelinputs default to the two profile ARNs.modelinput becomes an empty-default override that wins for both modes (back-compat for any caller currently settingmodel:— there are none in this repo's example).Resolve model for this runstep picks the right ARN per resolved mode.claude_args: --modelreads from the resolved value.Follow-up (not in this PR)
aws ce update-cost-allocation-tags-statuserrored withTag keys not found. After this lands and the first review runs, activateProductandModein Billing → Cost Allocation Tags. Cost Explorer will populate ~24h later.Test plan
Resolve model for this runstep output@peppercomment triggers on-demand mode; on-demand profile ARN appears in resolution step outputProductandModetag keys are selectable in Billing → Cost Allocation TagsModeshows two non-zero series for Bedrock spend