Architecture Compliance Agent — Haystack 2.x RAG Pipeline

An enterprise-grade Haystack 2.x RAG agent that automatically checks IT requirements against internal architecture standards.

Deployed on AWS ECS Fargate with a FastAPI REST API, GitHub Actions CI/CD, and OIDC-based AWS authentication. LLM: Claude Sonnet 4.6 via the AWS Bedrock Converse API, protected by Bedrock Guardrails.

Built and maintained by LeopardCode.AI.

Architecture Overview

graph TB
    subgraph "Client"
        A[IT Requirement]
    end

    subgraph "AWS ECS Fargate"
        B["FastAPI REST API (port 8000)"]
        C["Haystack 2.x Pipeline"]
        D["InMemoryDocumentStore<br/>3 Architecture Standards"]
        G["Bedrock Guardrails<br/>Content Filter + PII Block"]
    end

    subgraph "CI/CD (GitHub Actions → OIDC)"
        E["GitHub Actions"]
        F["Amazon ECR"]
        H["ECS Deployment"]
    end

    A -- "POST /check {requirement}" --> B
    B -- "JSON request" --> C
    C -- "semantic search" --> D
    D -- "relevant standards" --> C
    C -- "LLM call" --> G
    G -- "filtered response" --> C
    C -- "structured report (Pydantic)" --> B
    B -- "JSON response" --> A

    E -- "docker build & push" --> F
    F -- "image pull" --> H
    H -- "force new deployment" --> B

Haystack 2.x Pipeline (Detail)

graph LR
    subgraph "Production Pipeline"
        A["SentenceTransformersTextEmbedder<br/>all-MiniLM-L6-v2"] -- "embedding" --> B["InMemoryEmbeddingRetriever"]
        B -- "documents" --> C["ChatPromptBuilder<br/>Compliance Agent"]
        C -- "ChatMessage[]" --> D["AmazonBedrockChatGenerator<br/>Claude Sonnet 4.6"]
        D -- "LLM reply (JSON)" --> E["ComplianceOutputValidator<br/>→ Pydantic Schema"]
    end

    subgraph "Fallback / Test Mode"
        F["MockComplianceEngine<br/>keyword-based, deterministic"] --> E
    end

    subgraph "Data Indexing"
        H["Architecture Standards<br/>3 standards"] --> I["SentenceTransformersDocumentEmbedder"]
        I --> J["InMemoryDocumentStore"]
    end

    J --> B
    E -- "structured Dict[str, Any]" --> K["FastAPI Response / Test Assertion"]

Two Operation Modes

Mode	Component	Use Case
Production	`ChatPromptBuilder` → `AmazonBedrockChatGenerator` (Sonnet 4.6)	AWS Bedrock runtime; handles negation, context, natural language
Mock	`MockComplianceEngine` (keyword rules with negation handling)	Local development, CI/CD, offline testing; deterministic, no AWS required

Set HAYSTACK_COMPLIANCE_MOCK=1 or pass use_mock=True to force mock mode.

Project Structure

├── app/
│   ├── __init__.py              # FastAPI (health, /check, /standards)
│   ├── models.py                # Pydantic: StandardFinding, ComplianceReport
│   ├── components.py            # Haystack 2.x @component classes
│   │   ├── MockComplianceEngine       # Deterministic keyword engine
│   │   └── ComplianceOutputValidator  # LLM → Pydantic (with mock fallback)
│   └── pipeline.py              # ArchitectureCompliancePipeline (RAG)
├── infra/
│   ├── app.py                   # CDK entry point
│   ├── cdk.json                 # CDK configuration
│   ├── stacks.py                # CDK stacks + Bedrock Guardrail resource
│   └── requirements.txt         # CDK dependencies
├── scripts/
│   ├── deploy-aws.sh            # Full deployment
│   ├── integration_test.py      # HTTP integration tests
│   ├── test_analysis.py         # 18-scenario analysis suite
│   ├── test_analysis_report.json
│   ├── evaluate_compliance.py   # Mock vs. Bedrock → HTML dashboard
│   └── evaluation_report.html   # Generated HTML report
├── tests/
│   ├── conftest.py              # Shared fixtures (mock + Bedrock pipelines)
│   ├── test_data.json           # 18 categorized test scenarios
│   ├── test_pipeline.py         # 14 original unit tests
│   ├── test_comprehensive.py    # 48 tests across 9 categories
│   └── test_bedrock_integration.py  # 20 real Bedrock pipeline tests
├── docs/
│   └── evaluation_dashboard.png # Screenshot of the evaluation report
├── .github/workflows/
│   └── deploy.yml               # Lint → Type Check → Test → Deploy
├── pyproject.toml               # Poetry + ruff + mypy + pytest
├── Dockerfile                   # Multi-stage build
└── README.md

Architecture Standards

ID	Standard	MockEngine Rule	LLM Capability
Standard-01	Databases: AES-256 encryption + EU data residency (eu-central-1)	`unverschlüsselt` keyword + region list	Full semantic analysis
Standard-02	Microservices: gRPC/REST over TLS 1.3	`http && !tls`	Protocol inference
Standard-03	External APIs: API Gateway + OAuth2	`api gateway` substring + `oauth`	Contextual API analysis

All three standards are checked on every run. The LLM can infer violations the MockEngine misses (e.g., gRPC without TLS, implicit non-EU regions).

Pipeline Output Schema

class StandardFinding(BaseModel):
    standard_id: str
    is_compliant: bool
    finding_details: str          # Description of the finding (German)
    remediation: str              # Corrective action (empty if compliant)

class ComplianceReport(BaseModel):
    status: Literal["KONFORM", "NICHT KONFORM", "RELEVANZ UNKLAR"]
    reasoning_summary: str        # Concise per-standard bullet list
    findings: List[StandardFinding]  # One per standard, always 3 entries
    full_report: str              # Formatted compliance report (German)

The LLM is instructed to output JSON matching this Pydantic schema exactly. The ComplianceOutputValidator strips markdown fences, validates against the schema, and falls back to the MockComplianceEngine on any parsing failure. Report text is generated in German, the working language of the target compliance workflow.

MockComplianceEngine

A lightweight test substitute that matches keywords deterministically — intentionally simple so tests are fast, reproducible, and free of external dependencies.

Capability	MockEngine	LLM (Sonnet 4.6)
Encryption check	`unverschlüsselt` keyword	Full NLP understanding
Region check	Keyword list (`us-east`, `us-west`, …)	All AWS regions known
HTTP/TLS	`http && !tls`	Protocol inference
API Gateway	`api gateway` substring + `oauth`	Contextual reasoning
Negation ("nicht unverschlüsselt")	Explicit double-negative rule	Inherent comprehension
Negation ("ohne API Gateway")	3-token lookback for `ohne`/`kein`/`nicht`	Full sentence comprehension
RELEVANZ UNKLAR detection	20+ infrastructure keywords	Semantic relevance scoring
English input	Partial (keywords catch `us-east`, etc.)	Full English support
gRPC without TLS	Not detected (no rule)	Flagged as violation

Test Suite

Architecture

graph TB
    subgraph "Test Layers"
        U["test_pipeline.py<br/>14 tests"] --> C["test_comprehensive.py<br/>48 tests, 9 categories"]
        C --> B["test_bedrock_integration.py<br/>20 tests, real Claude Sonnet 4.6"]
        B --> E["evaluate_compliance.py<br/>18 scenarios, Mock vs. LLM comparison"]
    end

    subgraph "Engine under Test"
        M["MockComplianceEngine<br/>deterministic, offline"]
        L["Claude Sonnet 4.6<br/>AWS Bedrock Converse API"]
    end

    U --> M
    C --> M
    B --> L
    E --> M
    E --> L

    subgraph "Output"
        R["HTML Dashboard<br/>evaluation_report.html"]
        T["Terminal Summary"]
    end

    E --> R
    E --> T

Test Layers

Layer	File	Tests	Runtime	AWS Required
Original	`test_pipeline.py`	14	< 1 s	No
Comprehensive	`test_comprehensive.py`	48	< 1 s	No
Bedrock Integration	`test_bedrock_integration.py`	20	~3 min	Yes
Cross-Validation	`evaluate_compliance.py`	18 scenarios	~3 min	Yes

9 Test Categories (Comprehensive Suite)

#	Category	Tests	Coverage
1	Happy Path	5	Fully compliant across all 3 standards
2	Negative Single	8	Each standard violated individually
3	Negative Multiple	2	Two or more standards violated simultaneously
4	Edge — Negation	3	"nicht unverschlüsselt", "ohne", "kein"
5	Edge — Region	3	us-west-2, no region, encrypted without region
6	Edge — Input	5	English, mixed language, all-caps, newlines, special characters
7	Edge — Boundary	4	Minimal, single-word, short, non-architectural
8	Failure Inputs	6	Empty, whitespace, very long, numeric, stopwords, injection
9	Output Structure	12	Schema keys, finding format, status consistency, JSON

Running Tests

# All unit tests (62 tests, < 1 s)
python3 -m pytest tests/test_pipeline.py tests/test_comprehensive.py -v

# Bedrock integration (20 tests, ~3 min, requires AWS)
python3 -m pytest tests/test_bedrock_integration.py -v

# Full evaluation: Mock vs. Bedrock + HTML dashboard
python3 scripts/evaluate_compliance.py

# 18-scenario pipeline analysis
python3 scripts/test_analysis.py

Test Results

Unit Tests — 62/62 Pass

Category              Tests    Pass Rate
Happy Path             5/5     100% ████████████████████
Negative Single        8/8     100% ████████████████████
Negative Multiple      2/2     100% ████████████████████
Edge Negation          3/3     100% ████████████████████
Edge Region            3/3     100% ████████████████████
Edge Input             5/5     100% ████████████████████
Edge Boundary          4/4     100% ████████████████████
Failure Inputs         6/6     100% ████████████████████
Output Structure      12/12    100% ████████████████████

Bedrock Integration — 20/20 Pass

Claude Sonnet 4.6 (eu-central-1) correctly handles:

Scenario	Status	Notes
All standards met	`KONFORM`	AES-256, eu-central-1, gRPC+TLS, API Gateway+OAuth2
Unencrypted DB	`NICHT KONFORM`	Correctly identified
Plain HTTP	`NICHT KONFORM`	Correctly identified
Non-EU region (us-east-1)	`NICHT KONFORM`	Correctly flagged
Missing OAuth2	`NICHT KONFORM`	API Gateway without OAuth2
Double negative ("nicht unverschlüsselt")	`KONFORM`	LLM understands negation
us-west-2 region	`NICHT KONFORM`	LLM knows us-west-2 is non-EU
gRPC without TLS	`NICHT KONFORM`	MockEngine misses this
"ohne API Gateway"	`NICHT KONFORM`	LLM treats as violation (defensible)
Non-architectural input ("Hallo")	`RELEVANZ UNKLAR`	Correct classification

Mock vs. Bedrock — Evaluation Dashboard

The evaluation runs all 18 test_data.json scenarios through both engines and generates a side-by-side HTML comparison.

Latest results (Claude Sonnet 4.6, eu-central-1):

Metric	MockEngine	Bedrock LLM	Δ	Interpretation
Pass Rate	100.0%	83.3%	−16.7%	LLM is stricter on some edge cases
Avg F1 Score	0.981	0.811	−0.170	Trade-off: stricter means more false positives
Avg Precision	0.972	0.737	−0.235	LLM flags more items as violations
Avg Recall	1.0	0.972	−0.028	Both catch nearly all actual violations
Avg Response Time	0.0 s	10.7 s	+10.7 s	LLM inference vs. instant keyword matching

The LLM's lower pass rate reflects test expectations designed around the keyword-based MockEngine. The LLM is genuinely more capable — it catches violations the MockEngine misses entirely (e.g., gRPC without TLS). The three discrepancies are edge cases where both interpretations are defensible.

Dashboard features: summary cards (pass rate, F1, precision, recall, response time), per-engine category breakdown, confusion matrix, per-test-case comparison with reasoning snippets, disagreement analysis, and an error details table.

open scripts/evaluation_report.html

Deployment

Prerequisites

AWS CLI with OIDC-configured GitHub Actions
Docker, Python 3.9+
Bedrock model access: Claude Sonnet 4.6 in eu-central-1

GitHub Actions CI/CD

Uses OIDC — no long-lived AWS credentials:

graph LR
    A["push to main"] --> B["Lint (ruff)"]
    B --> C["Type Check (mypy)"]
    C --> D["Test (pytest, mock mode)"]
    D --> E["Docker Build & Push to ECR"]
    E --> F["ECS Force New Deployment"]
    F --> G["Health Check Wait"]

Required secret: AWS_ROLE_ARN — an IAM role with an OIDC trust policy (permissions: ECR push, ECS update-service, Bedrock InvokeModel, ApplyGuardrail).

Infrastructure (CDK)

Resource	Details
VPC	2 AZs, 1 NAT Gateway, public/private subnets
ECS Cluster	Fargate (serverless), container insights
ECR	`haystack-compliance`, retains the last 10 images
ECS Service	1–5 tasks, auto-scaling at 70% CPU/memory
ALB	Health check on `/health`
IAM Role	Bedrock: InvokeModel, ApplyGuardrail
Bedrock Guardrail	Content filters (sexual/hate/violence/insults) + PII (email anonymized, AWS keys blocked)
Fargate Spot	Cost-optimized capacity provider
Region	`eu-central-1` (Frankfurt)

Local Development

# Install
pip install -r requirements.txt

# Mock mode (no AWS)
HAYSTACK_COMPLIANCE_MOCK=1 python3 architecture_compliance_checker.py

# FastAPI dev server
HAYSTACK_COMPLIANCE_MOCK=1 uvicorn app.main:app --reload --port 8000
# http://localhost:8000/docs (Swagger UI)

API Endpoints

`GET /health`

{ "status": "healthy", "pipeline": "haystack-architecture-compliance" }

`POST /check`

Request:

{
  "requirement": "Wir planen eine unverschlüsselte PostgreSQL-Datenbank in us-east-1."
}

Response:

{
  "status": "NICHT KONFORM",
  "reasoning_summary": "   - Standard-01: Datenbankverschlüsselung (AES-256) nicht eingehalten\n   - Standard-01: Datenhaltung außerhalb der EU (eu-central-1)",
  "findings": [
    {
      "standard_id": "Standard-01",
      "is_compliant": false,
      "finding_details": "Datenbankverschlüsselung (AES-256) nicht eingehalten\n   - Datenhaltung außerhalb der EU (eu-central-1)",
      "remediation": "Aktivieren Sie AES-256 Verschlüsselung für alle Datenbanken\nVerlagerung der Daten in die EU (eu-central-1)"
    },
    {
      "standard_id": "Standard-02",
      "is_compliant": true,
      "finding_details": "Keine Aussage zur Kommunikation — Standard als erfüllt betrachtet.",
      "remediation": ""
    },
    {
      "standard_id": "Standard-03",
      "is_compliant": true,
      "finding_details": "Keine externe API genannt — Standard als erfüllt betrachtet.",
      "remediation": ""
    }
  ],
  "full_report": "1. STATUS: NICHT KONFORM\n2. BEGRÜNDUNG:\n   ..."
}

`GET /standards`

{
  "standards": [
    { "id": 0, "content": "Standard-01: Alle produktiven Datenbanken..." },
    { "id": 1, "content": "Standard-02: Für den Datenaustausch..." },
    { "id": 2, "content": "Standard-03: Externe APIs müssen..." }
  ],
  "count": 3
}

Security & Compliance

OIDC authentication — GitHub Actions → AWS via IAM roles, no access keys
Bedrock Guardrails — content filters (hate, sexual, violence, insults) + PII handling (email anonymized, AWS keys blocked)
IAM least privilege — ECS task role limited to bedrock:InvokeModel and bedrock:ApplyGuardrail
No secrets in the repository — .env, *.pyc, __pycache__, cdk.out/ are gitignored
Deterministic LLM output — temperature=0.0 for reproducible results
Input validation — FastAPI Pydantic schema validation on /check

CI/CD Pipeline

Deploys on push to main or manual workflow_dispatch:

Lint — ruff check app/ tests/ scripts/
Type check — mypy app/
Test — pytest tests/ -v (MockEngine, no AWS needed, < 1 s)
Docker build & push — multi-arch → ECR
ECS deploy — force-new-deployment, waits for service stability

Required secret: AWS_ROLE_ARN

Known Limitations

Area	Limitation	Impact
MockEngine	Keyword-based; misses gRPC without TLS, non-standard regions	Only used in CI — the LLM handles all edge cases
Region detection	`us-`, `ap-` prefix list	The LLM knows all AWS regions
TLS version	Mock checks the `tls` keyword only	The LLM distinguishes TLS 1.2 vs. 1.3
Language	MockEngine is German-only	The LLM handles German + English
LLM cost	~10 s per inference, ~$0.01 per check	Acceptable for a compliance workflow
Bedrock availability	Requires Claude Sonnet 4.6 access in eu-central-1	MockEngine works offline
No persistent store	Results are ephemeral (in-memory)	No audit trail without extensions

Quick Reference

# Install & run tests
pip install -r requirements.txt
python3 -m pytest tests/ -v

# FastAPI server
uvicorn app.main:app --reload --port 8000

# Check compliance
curl -X POST http://localhost:8000/check \
  -H "Content-Type: application/json" \
  -d '{"requirement": "Unverschlüsselte DB in us-east-1"}'

# Full evaluation
python3 scripts/evaluate_compliance.py
open scripts/evaluation_report.html

_{Built by LeopardCode.AI — AI Engineering & Consulting}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
app		app
infra		infra
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
architecture_compliance_checker.py		architecture_compliance_checker.py
requirements.txt		requirements.txt

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Architecture Compliance Agent — Haystack 2.x RAG Pipeline

Architecture Overview

Haystack 2.x Pipeline (Detail)

Two Operation Modes

Project Structure

Architecture Standards

Pipeline Output Schema

MockComplianceEngine

Test Suite

Architecture

Test Layers

9 Test Categories (Comprehensive Suite)

Running Tests

Test Results

Unit Tests — 62/62 Pass

Bedrock Integration — 20/20 Pass

Mock vs. Bedrock — Evaluation Dashboard

Deployment

Prerequisites

GitHub Actions CI/CD

Infrastructure (CDK)

Local Development

API Endpoints

GET /health

POST /check

GET /standards

Security & Compliance

CI/CD Pipeline

Known Limitations

Quick Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`POST /check`

`GET /standards`

Packages