Skip to content

Add integration with Bedrock Guardrails #76

Description

@heeki

Add integration with Bedrock Guardrails

Overview

Add support for Amazon Bedrock Guardrails so that model responses stay within required boundaries. This enables content filtering, PII redaction, topic blocking, and word filtering to ensure agents comply with organizational policies, regulatory requirements, and safety standards.

Requirements

R1: Guardrail Policy Configuration

Users should be able to specify required guardrail policies. This includes:

  • Selection of pre-configured Bedrock Guardrails by ARN or ID
  • Configuration at multiple levels:
    • Global/default guardrails for all agents
    • Per-agent guardrails for specific use cases
    • Per-invocation guardrails via API parameters
  • Policy categories to configure:
    • Content filters (hate, insults, sexual, violence)
    • Denied topics (custom topic blocking)
    • Word filters (profanity and custom word lists)
    • PII redaction (sensitive data masking)
  • Handling of guardrail interventions:
    • Block and return error message
    • Redact content and continue
    • Log and continue (monitoring mode)
  • Visibility into guardrail actions (logging and UI display)

Technical Considerations

AWS Bedrock Guardrails Overview

Bedrock Guardrails provide:

  • Content Filters: Filter harmful content across categories (hate, insults, sexual, violence) with configurable thresholds (NONE, LOW, MEDIUM, HIGH)
  • Denied Topics: Block responses related to specific topics (e.g., financial advice, medical diagnosis)
  • Word Filters: Block or redact profanity and custom word lists
  • Sensitive Information Filters: Detect and redact PII (SSN, credit cards, email, phone, etc.)
  • Contextual Grounding: Check if responses are grounded in source content (reduce hallucinations)

Configuration Management

  • Update etc/environment.sh to include guardrail configuration:
    • Default guardrail ID/ARN
    • Per-agent guardrail overrides
    • Guardrail version to use (for versioned guardrails)
    • Intervention behavior (block, redact, monitor)
  • Store guardrail configurations in database:
    • Map user/agent to guardrail policies
    • Track guardrail intervention history
  • Environment-specific guardrails (dev vs production):
    • Dev: Lenient or monitoring mode for testing
    • Prod: Strict enforcement

Backend Changes

Bedrock API Integration

  • Guardrail Parameters: Add to Bedrock API calls:
    • guardrailIdentifier: Guardrail ID or ARN
    • guardrailVersion: Version of the guardrail (or "DRAFT")
    • Apply to both InvokeModel and InvokeModelWithResponseStream
  • Response Handling: Process guardrail trace information:
    • Check stopReason for "guardrail_intervened"
    • Extract guardrail action from response metadata
    • Log intervention details for auditing

Provider Abstraction Updates

  • Extend provider interface (from Issue Add support for alternate LLM providers #74) to support guardrails:
    • configure_guardrail(guardrail_config): Set guardrail for provider
    • handle_guardrail_intervention(response): Process interventions
  • Bedrock-specific implementation initially
  • Consider guardrail equivalents for other providers:
    • OpenAI Moderation API
    • Anthropic API safety features
    • Provider-agnostic wrapper for consistent behavior

Database Schema

Extend existing tables or add new ones:

  • guardrail_configs table:
    • id, name, guardrail_arn, guardrail_version
    • config_json (store full guardrail configuration)
    • applies_to (global, agent_id, user_id)
    • created_at, updated_at
  • Update invocations table:
    • guardrail_id (reference to applied guardrail)
    • guardrail_intervened (boolean)
    • guardrail_action (blocked, redacted, none)
    • guardrail_trace (JSON with intervention details)

Intervention Handling

Three intervention strategies:

  1. Block: Stop execution, return error to user
    • Status: 400 or 403
    • Error message: "Content blocked by guardrails policy"
    • Details: Which policy was violated
  2. Redact: Replace sensitive content with placeholders
    • Continue execution with redacted content
    • Log redaction for audit
    • Show redaction indicators in UI
  3. Monitor: Allow content but log for review
    • Useful for testing and tuning policies
    • Administrators review flagged content

Frontend Changes

Guardrail Configuration UI

  • Settings Page: Configure guardrails
    • Select from available Bedrock Guardrails
    • Preview guardrail policies
    • Set intervention behavior
    • Assign guardrails to agents
  • Agent Configuration: Per-agent guardrail selection
    • Override default guardrails for specific agents
    • View active policies

Invocation Display

  • Guardrail Indicators: Show when guardrails are active
    • Badge/icon indicating guardrail enabled
    • Tooltip with guardrail name and policies
  • Intervention Messages: Clear display when content is blocked
    • "This response was blocked by content policy"
    • Specific reason (if safe to display)
    • Guidance for users on next steps
  • Redaction Display: Show when content was redacted
    • Redacted text with placeholder: [REDACTED: EMAIL]
    • Visual styling to indicate redaction

Monitoring Dashboard

  • Guardrail Activity: New dashboard section
    • Intervention count by policy type
    • Trends over time
    • Top violating agents or users
    • Content category breakdown (hate, violence, PII, etc.)

IAM Permissions

Update IAM policies in iac/ templates:

  • bedrock:ApplyGuardrail: Permission to apply guardrails
  • bedrock:GetGuardrail: Permission to retrieve guardrail details
  • bedrock:ListGuardrails: Permission to list available guardrails
  • Least privilege: Only grant access to specific guardrail ARNs if possible

Logging & Auditing

  • CloudWatch Logs: Log all guardrail interventions
    • Timestamp, user, agent, invocation ID
    • Intervention type and reason
    • Original content (if safe to log)
    • Redacted content or block reason
  • Compliance: May be required for regulatory compliance
    • HIPAA: PII redaction audit trail
    • Financial services: Topic blocking audit trail
  • Alerts: Notify on unusual intervention patterns
    • Sudden spike in blocks
    • Repeated violations by user/agent

Testing Strategy

  • Unit Tests: Mock Bedrock responses with guardrail interventions
  • Integration Tests: Use Bedrock test guardrails
  • Policy Testing: Test each policy type:
    • Content filters: Submit harmful content
    • Denied topics: Request blocked topics
    • Word filters: Use profanity
    • PII: Include SSN, credit cards, etc.
  • Redaction Testing: Verify PII is properly masked
  • Performance: Measure latency impact of guardrails

Acceptance Criteria

AC1: Guardrail Configuration

  • Guardrails configurable in etc/environment.sh
  • Per-agent guardrail assignment in database
  • API parameter for per-invocation guardrail override
  • Support for guardrail versions
  • Configuration UI for selecting and assigning guardrails
  • Documentation for all configuration options

AC2: Bedrock Integration

  • Guardrail parameters added to Bedrock API calls
  • Streaming and non-streaming invocations both support guardrails
  • Guardrail trace information extracted from responses
  • IAM permissions updated for guardrail operations
  • Provider abstraction includes guardrail interface

AC3: Intervention Handling

  • Block intervention returns clear error message
  • Redaction intervention properly masks content
  • Monitor mode logs without blocking
  • Intervention details logged to CloudWatch
  • Database records intervention metadata
  • User sees appropriate feedback for each intervention type

AC4: Frontend Display

  • Guardrail indicators visible on invocations
  • Blocked content shows clear message
  • Redacted content displayed with placeholders
  • Configuration UI allows guardrail selection
  • Monitoring dashboard shows intervention statistics
  • Invocation history shows guardrail actions

AC5: Auditing & Compliance

  • All interventions logged with full context
  • Audit trail queryable for compliance reporting
  • PII redaction fully logged (redacted versions only)
  • CloudWatch metrics for intervention counts
  • Alerts configured for unusual patterns

AC6: Testing

  • Unit tests for each intervention type
  • Integration tests with real guardrails
  • Test coverage for all policy categories
  • Performance tests measure latency impact
  • Redaction accuracy validated

Implementation Notes

Suggested Approach

Phase 1: Infrastructure & Configuration (Week 1)

  1. Research Bedrock Guardrails API and capabilities
  2. Create or identify test guardrails in Bedrock
  3. Design database schema for guardrail configs
  4. Add configuration parameters to etc/environment.sh
  5. Update IAM policies with guardrail permissions
  6. Create database migration for new tables/columns

Phase 2: Backend Integration (Week 1-2)

  1. Extend provider interface to support guardrails
  2. Implement Bedrock guardrail integration:
    • Add guardrail parameters to API calls
    • Parse guardrail trace from responses
    • Handle guardrail_intervened stop reason
  3. Implement intervention handlers (block, redact, monitor)
  4. Add logging for all interventions
  5. Unit tests for guardrail logic

Phase 3: Configuration & Management (Week 2)

  1. Build guardrail configuration service
  2. Implement guardrail assignment (global, agent, invocation)
  3. Create API endpoints for guardrail management
  4. Add guardrail versioning support
  5. Integration tests with real guardrails

Phase 4: Frontend Integration (Week 2-3)

  1. Build guardrail configuration UI
  2. Add guardrail indicators to invocation displays
  3. Implement intervention message components
  4. Create redaction display styling
  5. Build monitoring dashboard for interventions

Phase 5: Auditing & Monitoring (Week 3)

  1. Enhance CloudWatch logging with structured data
  2. Create CloudWatch metrics for interventions
  3. Set up alerts for unusual patterns
  4. Build compliance reporting queries
  5. Documentation for operators and users

Key Files to Modify/Create

Backend:

  • backend/providers/bedrock.py: Add guardrail parameters to API calls
  • backend/services/guardrail_service.py: Guardrail configuration and enforcement
  • backend/models/guardrail.py: Database models for guardrail configs
  • backend/middleware/guardrail_interceptor.py: Intervention handling
  • backend/api/guardrails.py: API endpoints for guardrail management
  • backend/database/migrations/: Add guardrail tables migration

Configuration:

  • etc/environment.sh: Add guardrail configuration
  • iac/: Update IAM policies with guardrail permissions

Frontend:

  • frontend/src/components/GuardrailConfig.tsx: Configuration UI
  • frontend/src/components/GuardrailIndicator.tsx: Indicator badge
  • frontend/src/components/GuardrailIntervention.tsx: Intervention message
  • frontend/src/components/RedactedContent.tsx: Redaction display
  • frontend/src/pages/GuardrailDashboard.tsx: Monitoring dashboard
  • frontend/src/api/guardrails.ts: API client

Infrastructure:

  • iac/: SAM templates with updated IAM policies

Tests:

  • backend/tests/services/test_guardrail_service.py
  • backend/tests/integration/test_bedrock_guardrails.py
  • backend/tests/middleware/test_guardrail_interceptor.py

Example Configuration

# etc/environment.sh additions

# Default guardrail
export DEFAULT_GUARDRAIL_ID="guardrail-abc123"
export DEFAULT_GUARDRAIL_VERSION="1"

# Intervention behavior
export GUARDRAIL_INTERVENTION_MODE="block"  # block, redact, monitor

# Per-agent overrides (JSON format)
export AGENT_GUARDRAILS='{
  "customer-support": "guardrail-xyz789",
  "internal-tools": "guardrail-def456"
}'

# Logging
export LOG_GUARDRAIL_INTERVENTIONS="true"
export LOG_REDACTED_CONTENT="false"  # Don't log original PII

Bedrock API Example

# Before (without guardrails)
response = bedrock_runtime.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps(request_body)
)

# After (with guardrails)
response = bedrock_runtime.invoke_model(
    modelId="anthropic.claude-3-sonnet-20240229-v1:0",
    body=json.dumps(request_body),
    guardrailIdentifier="guardrail-abc123",
    guardrailVersion="1"
)

# Check for intervention
if response.get('stopReason') == 'guardrail_intervened':
    trace = response.get('amazonBedrockGuardrailTrace', {})
    action = trace.get('action')  # INTERVENED or NONE
    # Handle based on intervention mode

Intervention Response Format

{
  "error": "GuardrailIntervention",
  "message": "Content blocked by guardrails policy",
  "details": {
    "guardrail_id": "guardrail-abc123",
    "guardrail_version": "1",
    "action": "blocked",
    "reason": "Content violated topic policy",
    "policy_violated": "denied_topics",
    "topic": "medical_advice"
  }
}

Database Schema Example

CREATE TABLE guardrail_configs (
    id SERIAL PRIMARY KEY,
    name VARCHAR(255) NOT NULL,
    guardrail_arn VARCHAR(512) NOT NULL,
    guardrail_version VARCHAR(50) DEFAULT 'DRAFT',
    config_json JSONB,
    applies_to VARCHAR(50),  -- 'global', 'agent', 'user'
    applies_to_id VARCHAR(255),  -- agent_id or user_id if applicable
    intervention_mode VARCHAR(50) DEFAULT 'block',  -- 'block', 'redact', 'monitor'
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Add columns to invocations table
ALTER TABLE invocations ADD COLUMN guardrail_id INTEGER REFERENCES guardrail_configs(id);
ALTER TABLE invocations ADD COLUMN guardrail_intervened BOOLEAN DEFAULT FALSE;
ALTER TABLE invocations ADD COLUMN guardrail_action VARCHAR(50);
ALTER TABLE invocations ADD COLUMN guardrail_trace JSONB;

CREATE INDEX idx_invocations_guardrail ON invocations(guardrail_id);
CREATE INDEX idx_invocations_intervened ON invocations(guardrail_intervened);

Policy Categories Configuration

Example of guardrail policy categories:

{
  "content_filters": {
    "hate": "HIGH",
    "insults": "HIGH",
    "sexual": "MEDIUM",
    "violence": "MEDIUM"
  },
  "denied_topics": [
    {
      "name": "Medical Advice",
      "definition": "Providing medical diagnosis or treatment recommendations",
      "type": "DENY"
    },
    {
      "name": "Financial Advice",
      "definition": "Providing specific investment or trading recommendations",
      "type": "DENY"
    }
  ],
  "word_filters": {
    "profanity": "BLOCK",
    "custom_words": ["confidential", "internal_only"]
  },
  "pii_filters": {
    "email": "REDACT",
    "phone": "REDACT",
    "ssn": "REDACT",
    "credit_card": "BLOCK"
  }
}

Cost Implications

  • Bedrock Guardrails add minimal cost per request (~$0.01 per 1000 text units)
  • Update cost tracking to include guardrail costs
  • Maintain consistency with formatCost functions
  • Show guardrail costs separately in dashboards

References

Dependencies

Priority

Medium-High - Important for compliance and safety in production deployments

Estimated Effort

Medium-Large (2-3 weeks) - Requires Bedrock integration, database changes, frontend work, and compliance considerations

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions