Add integration with Bedrock Guardrails
Overview
Add support for Amazon Bedrock Guardrails so that model responses stay within required boundaries. This enables content filtering, PII redaction, topic blocking, and word filtering to ensure agents comply with organizational policies, regulatory requirements, and safety standards.
Requirements
R1: Guardrail Policy Configuration
Users should be able to specify required guardrail policies. This includes:
Selection of pre-configured Bedrock Guardrails by ARN or ID
Configuration at multiple levels:
Global/default guardrails for all agents
Per-agent guardrails for specific use cases
Per-invocation guardrails via API parameters
Policy categories to configure:
Content filters (hate, insults, sexual, violence)
Denied topics (custom topic blocking)
Word filters (profanity and custom word lists)
PII redaction (sensitive data masking)
Handling of guardrail interventions:
Block and return error message
Redact content and continue
Log and continue (monitoring mode)
Visibility into guardrail actions (logging and UI display)
Technical Considerations
AWS Bedrock Guardrails Overview
Bedrock Guardrails provide:
Content Filters : Filter harmful content across categories (hate, insults, sexual, violence) with configurable thresholds (NONE, LOW, MEDIUM, HIGH)
Denied Topics : Block responses related to specific topics (e.g., financial advice, medical diagnosis)
Word Filters : Block or redact profanity and custom word lists
Sensitive Information Filters : Detect and redact PII (SSN, credit cards, email, phone, etc.)
Contextual Grounding : Check if responses are grounded in source content (reduce hallucinations)
Configuration Management
Update etc/environment.sh to include guardrail configuration:
Default guardrail ID/ARN
Per-agent guardrail overrides
Guardrail version to use (for versioned guardrails)
Intervention behavior (block, redact, monitor)
Store guardrail configurations in database:
Map user/agent to guardrail policies
Track guardrail intervention history
Environment-specific guardrails (dev vs production):
Dev: Lenient or monitoring mode for testing
Prod: Strict enforcement
Backend Changes
Bedrock API Integration
Guardrail Parameters : Add to Bedrock API calls:
guardrailIdentifier: Guardrail ID or ARN
guardrailVersion: Version of the guardrail (or "DRAFT")
Apply to both InvokeModel and InvokeModelWithResponseStream
Response Handling : Process guardrail trace information:
Check stopReason for "guardrail_intervened"
Extract guardrail action from response metadata
Log intervention details for auditing
Provider Abstraction Updates
Extend provider interface (from Issue Add support for alternate LLM providers #74 ) to support guardrails:
configure_guardrail(guardrail_config): Set guardrail for provider
handle_guardrail_intervention(response): Process interventions
Bedrock-specific implementation initially
Consider guardrail equivalents for other providers:
OpenAI Moderation API
Anthropic API safety features
Provider-agnostic wrapper for consistent behavior
Database Schema
Extend existing tables or add new ones:
guardrail_configs table:
id, name, guardrail_arn, guardrail_version
config_json (store full guardrail configuration)
applies_to (global, agent_id, user_id)
created_at, updated_at
Update invocations table:
guardrail_id (reference to applied guardrail)
guardrail_intervened (boolean)
guardrail_action (blocked, redacted, none)
guardrail_trace (JSON with intervention details)
Intervention Handling
Three intervention strategies:
Block : Stop execution, return error to user
Status: 400 or 403
Error message: "Content blocked by guardrails policy"
Details: Which policy was violated
Redact : Replace sensitive content with placeholders
Continue execution with redacted content
Log redaction for audit
Show redaction indicators in UI
Monitor : Allow content but log for review
Useful for testing and tuning policies
Administrators review flagged content
Frontend Changes
Guardrail Configuration UI
Settings Page : Configure guardrails
Select from available Bedrock Guardrails
Preview guardrail policies
Set intervention behavior
Assign guardrails to agents
Agent Configuration : Per-agent guardrail selection
Override default guardrails for specific agents
View active policies
Invocation Display
Guardrail Indicators : Show when guardrails are active
Badge/icon indicating guardrail enabled
Tooltip with guardrail name and policies
Intervention Messages : Clear display when content is blocked
"This response was blocked by content policy"
Specific reason (if safe to display)
Guidance for users on next steps
Redaction Display : Show when content was redacted
Redacted text with placeholder: [REDACTED: EMAIL]
Visual styling to indicate redaction
Monitoring Dashboard
Guardrail Activity : New dashboard section
Intervention count by policy type
Trends over time
Top violating agents or users
Content category breakdown (hate, violence, PII, etc.)
IAM Permissions
Update IAM policies in iac/ templates:
bedrock:ApplyGuardrail: Permission to apply guardrails
bedrock:GetGuardrail: Permission to retrieve guardrail details
bedrock:ListGuardrails: Permission to list available guardrails
Least privilege: Only grant access to specific guardrail ARNs if possible
Logging & Auditing
CloudWatch Logs : Log all guardrail interventions
Timestamp, user, agent, invocation ID
Intervention type and reason
Original content (if safe to log)
Redacted content or block reason
Compliance : May be required for regulatory compliance
HIPAA: PII redaction audit trail
Financial services: Topic blocking audit trail
Alerts : Notify on unusual intervention patterns
Sudden spike in blocks
Repeated violations by user/agent
Testing Strategy
Unit Tests : Mock Bedrock responses with guardrail interventions
Integration Tests : Use Bedrock test guardrails
Policy Testing : Test each policy type:
Content filters: Submit harmful content
Denied topics: Request blocked topics
Word filters: Use profanity
PII: Include SSN, credit cards, etc.
Redaction Testing : Verify PII is properly masked
Performance : Measure latency impact of guardrails
Acceptance Criteria
AC1: Guardrail Configuration
AC2: Bedrock Integration
AC3: Intervention Handling
AC4: Frontend Display
AC5: Auditing & Compliance
AC6: Testing
Implementation Notes
Suggested Approach
Phase 1: Infrastructure & Configuration (Week 1)
Research Bedrock Guardrails API and capabilities
Create or identify test guardrails in Bedrock
Design database schema for guardrail configs
Add configuration parameters to etc/environment.sh
Update IAM policies with guardrail permissions
Create database migration for new tables/columns
Phase 2: Backend Integration (Week 1-2)
Extend provider interface to support guardrails
Implement Bedrock guardrail integration:
Add guardrail parameters to API calls
Parse guardrail trace from responses
Handle guardrail_intervened stop reason
Implement intervention handlers (block, redact, monitor)
Add logging for all interventions
Unit tests for guardrail logic
Phase 3: Configuration & Management (Week 2)
Build guardrail configuration service
Implement guardrail assignment (global, agent, invocation)
Create API endpoints for guardrail management
Add guardrail versioning support
Integration tests with real guardrails
Phase 4: Frontend Integration (Week 2-3)
Build guardrail configuration UI
Add guardrail indicators to invocation displays
Implement intervention message components
Create redaction display styling
Build monitoring dashboard for interventions
Phase 5: Auditing & Monitoring (Week 3)
Enhance CloudWatch logging with structured data
Create CloudWatch metrics for interventions
Set up alerts for unusual patterns
Build compliance reporting queries
Documentation for operators and users
Key Files to Modify/Create
Backend:
backend/providers/bedrock.py: Add guardrail parameters to API calls
backend/services/guardrail_service.py: Guardrail configuration and enforcement
backend/models/guardrail.py: Database models for guardrail configs
backend/middleware/guardrail_interceptor.py: Intervention handling
backend/api/guardrails.py: API endpoints for guardrail management
backend/database/migrations/: Add guardrail tables migration
Configuration:
etc/environment.sh: Add guardrail configuration
iac/: Update IAM policies with guardrail permissions
Frontend:
frontend/src/components/GuardrailConfig.tsx: Configuration UI
frontend/src/components/GuardrailIndicator.tsx: Indicator badge
frontend/src/components/GuardrailIntervention.tsx: Intervention message
frontend/src/components/RedactedContent.tsx: Redaction display
frontend/src/pages/GuardrailDashboard.tsx: Monitoring dashboard
frontend/src/api/guardrails.ts: API client
Infrastructure:
iac/: SAM templates with updated IAM policies
Tests:
backend/tests/services/test_guardrail_service.py
backend/tests/integration/test_bedrock_guardrails.py
backend/tests/middleware/test_guardrail_interceptor.py
Example Configuration
# etc/environment.sh additions
# Default guardrail
export DEFAULT_GUARDRAIL_ID=" guardrail-abc123"
export DEFAULT_GUARDRAIL_VERSION=" 1"
# Intervention behavior
export GUARDRAIL_INTERVENTION_MODE=" block" # block, redact, monitor
# Per-agent overrides (JSON format)
export AGENT_GUARDRAILS=' {
"customer-support": "guardrail-xyz789",
"internal-tools": "guardrail-def456"
}'
# Logging
export LOG_GUARDRAIL_INTERVENTIONS=" true"
export LOG_REDACTED_CONTENT=" false" # Don't log original PII
Bedrock API Example
# Before (without guardrails)
response = bedrock_runtime .invoke_model (
modelId = "anthropic.claude-3-sonnet-20240229-v1:0" ,
body = json .dumps (request_body )
)
# After (with guardrails)
response = bedrock_runtime .invoke_model (
modelId = "anthropic.claude-3-sonnet-20240229-v1:0" ,
body = json .dumps (request_body ),
guardrailIdentifier = "guardrail-abc123" ,
guardrailVersion = "1"
)
# Check for intervention
if response .get ('stopReason' ) == 'guardrail_intervened' :
trace = response .get ('amazonBedrockGuardrailTrace' , {})
action = trace .get ('action' ) # INTERVENED or NONE
# Handle based on intervention mode
Intervention Response Format
{
"error" : " GuardrailIntervention" ,
"message" : " Content blocked by guardrails policy" ,
"details" : {
"guardrail_id" : " guardrail-abc123" ,
"guardrail_version" : " 1" ,
"action" : " blocked" ,
"reason" : " Content violated topic policy" ,
"policy_violated" : " denied_topics" ,
"topic" : " medical_advice"
}
}
Database Schema Example
CREATE TABLE guardrail_configs (
id SERIAL PRIMARY KEY ,
name VARCHAR (255 ) NOT NULL ,
guardrail_arn VARCHAR (512 ) NOT NULL ,
guardrail_version VARCHAR (50 ) DEFAULT ' DRAFT' ,
config_json JSONB,
applies_to VARCHAR (50 ), -- 'global', 'agent', 'user'
applies_to_id VARCHAR (255 ), -- agent_id or user_id if applicable
intervention_mode VARCHAR (50 ) DEFAULT ' block' , -- 'block', 'redact', 'monitor'
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Add columns to invocations table
ALTER TABLE invocations ADD COLUMN guardrail_id INTEGER REFERENCES guardrail_configs(id);
ALTER TABLE invocations ADD COLUMN guardrail_intervened BOOLEAN DEFAULT FALSE;
ALTER TABLE invocations ADD COLUMN guardrail_action VARCHAR (50 );
ALTER TABLE invocations ADD COLUMN guardrail_trace JSONB;
CREATE INDEX idx_invocations_guardrail ON invocations(guardrail_id);
CREATE INDEX idx_invocations_intervened ON invocations(guardrail_intervened);
Policy Categories Configuration
Example of guardrail policy categories:
{
"content_filters" : {
"hate" : " HIGH" ,
"insults" : " HIGH" ,
"sexual" : " MEDIUM" ,
"violence" : " MEDIUM"
},
"denied_topics" : [
{
"name" : " Medical Advice" ,
"definition" : " Providing medical diagnosis or treatment recommendations" ,
"type" : " DENY"
},
{
"name" : " Financial Advice" ,
"definition" : " Providing specific investment or trading recommendations" ,
"type" : " DENY"
}
],
"word_filters" : {
"profanity" : " BLOCK" ,
"custom_words" : [" confidential" , " internal_only" ]
},
"pii_filters" : {
"email" : " REDACT" ,
"phone" : " REDACT" ,
"ssn" : " REDACT" ,
"credit_card" : " BLOCK"
}
}
Cost Implications
Bedrock Guardrails add minimal cost per request (~$0.01 per 1000 text units)
Update cost tracking to include guardrail costs
Maintain consistency with formatCost functions
Show guardrail costs separately in dashboards
References
Dependencies
Priority
Medium-High - Important for compliance and safety in production deployments
Estimated Effort
Medium-Large (2-3 weeks) - Requires Bedrock integration, database changes, frontend work, and compliance considerations
Add integration with Bedrock Guardrails
Overview
Add support for Amazon Bedrock Guardrails so that model responses stay within required boundaries. This enables content filtering, PII redaction, topic blocking, and word filtering to ensure agents comply with organizational policies, regulatory requirements, and safety standards.
Requirements
R1: Guardrail Policy Configuration
Users should be able to specify required guardrail policies. This includes:
Technical Considerations
AWS Bedrock Guardrails Overview
Bedrock Guardrails provide:
Configuration Management
etc/environment.shto include guardrail configuration:Backend Changes
Bedrock API Integration
guardrailIdentifier: Guardrail ID or ARNguardrailVersion: Version of the guardrail (or "DRAFT")InvokeModelandInvokeModelWithResponseStreamstopReasonfor "guardrail_intervened"Provider Abstraction Updates
configure_guardrail(guardrail_config): Set guardrail for providerhandle_guardrail_intervention(response): Process interventionsDatabase Schema
Extend existing tables or add new ones:
guardrail_configstable:id,name,guardrail_arn,guardrail_versionconfig_json(store full guardrail configuration)applies_to(global, agent_id, user_id)created_at,updated_atinvocationstable:guardrail_id(reference to applied guardrail)guardrail_intervened(boolean)guardrail_action(blocked, redacted, none)guardrail_trace(JSON with intervention details)Intervention Handling
Three intervention strategies:
Frontend Changes
Guardrail Configuration UI
Invocation Display
[REDACTED: EMAIL]Monitoring Dashboard
IAM Permissions
Update IAM policies in
iac/templates:bedrock:ApplyGuardrail: Permission to apply guardrailsbedrock:GetGuardrail: Permission to retrieve guardrail detailsbedrock:ListGuardrails: Permission to list available guardrailsLogging & Auditing
Testing Strategy
Acceptance Criteria
AC1: Guardrail Configuration
etc/environment.shAC2: Bedrock Integration
AC3: Intervention Handling
AC4: Frontend Display
AC5: Auditing & Compliance
AC6: Testing
Implementation Notes
Suggested Approach
Phase 1: Infrastructure & Configuration (Week 1)
etc/environment.shPhase 2: Backend Integration (Week 1-2)
guardrail_intervenedstop reasonPhase 3: Configuration & Management (Week 2)
Phase 4: Frontend Integration (Week 2-3)
Phase 5: Auditing & Monitoring (Week 3)
Key Files to Modify/Create
Backend:
backend/providers/bedrock.py: Add guardrail parameters to API callsbackend/services/guardrail_service.py: Guardrail configuration and enforcementbackend/models/guardrail.py: Database models for guardrail configsbackend/middleware/guardrail_interceptor.py: Intervention handlingbackend/api/guardrails.py: API endpoints for guardrail managementbackend/database/migrations/: Add guardrail tables migrationConfiguration:
etc/environment.sh: Add guardrail configurationiac/: Update IAM policies with guardrail permissionsFrontend:
frontend/src/components/GuardrailConfig.tsx: Configuration UIfrontend/src/components/GuardrailIndicator.tsx: Indicator badgefrontend/src/components/GuardrailIntervention.tsx: Intervention messagefrontend/src/components/RedactedContent.tsx: Redaction displayfrontend/src/pages/GuardrailDashboard.tsx: Monitoring dashboardfrontend/src/api/guardrails.ts: API clientInfrastructure:
iac/: SAM templates with updated IAM policiesTests:
backend/tests/services/test_guardrail_service.pybackend/tests/integration/test_bedrock_guardrails.pybackend/tests/middleware/test_guardrail_interceptor.pyExample Configuration
Bedrock API Example
Intervention Response Format
{ "error": "GuardrailIntervention", "message": "Content blocked by guardrails policy", "details": { "guardrail_id": "guardrail-abc123", "guardrail_version": "1", "action": "blocked", "reason": "Content violated topic policy", "policy_violated": "denied_topics", "topic": "medical_advice" } }Database Schema Example
Policy Categories Configuration
Example of guardrail policy categories:
{ "content_filters": { "hate": "HIGH", "insults": "HIGH", "sexual": "MEDIUM", "violence": "MEDIUM" }, "denied_topics": [ { "name": "Medical Advice", "definition": "Providing medical diagnosis or treatment recommendations", "type": "DENY" }, { "name": "Financial Advice", "definition": "Providing specific investment or trading recommendations", "type": "DENY" } ], "word_filters": { "profanity": "BLOCK", "custom_words": ["confidential", "internal_only"] }, "pii_filters": { "email": "REDACT", "phone": "REDACT", "ssn": "REDACT", "credit_card": "BLOCK" } }Cost Implications
formatCostfunctionsReferences
Dependencies
Priority
Medium-High - Important for compliance and safety in production deployments
Estimated Effort
Medium-Large (2-3 weeks) - Requires Bedrock integration, database changes, frontend work, and compliance considerations