An enterprise-grade Haystack 2.x RAG agent that automatically checks IT requirements against internal architecture standards.
Deployed on AWS ECS Fargate with a FastAPI REST API, GitHub Actions CI/CD, and OIDC-based AWS authentication. LLM: Claude Sonnet 4.6 via the AWS Bedrock Converse API, protected by Bedrock Guardrails.
Built and maintained by LeopardCode.AI.
graph TB
subgraph "Client"
A[IT Requirement]
end
subgraph "AWS ECS Fargate"
B["FastAPI REST API (port 8000)"]
C["Haystack 2.x Pipeline"]
D["InMemoryDocumentStore<br/>3 Architecture Standards"]
G["Bedrock Guardrails<br/>Content Filter + PII Block"]
end
subgraph "CI/CD (GitHub Actions → OIDC)"
E["GitHub Actions"]
F["Amazon ECR"]
H["ECS Deployment"]
end
A -- "POST /check {requirement}" --> B
B -- "JSON request" --> C
C -- "semantic search" --> D
D -- "relevant standards" --> C
C -- "LLM call" --> G
G -- "filtered response" --> C
C -- "structured report (Pydantic)" --> B
B -- "JSON response" --> A
E -- "docker build & push" --> F
F -- "image pull" --> H
H -- "force new deployment" --> B
graph LR
subgraph "Production Pipeline"
A["SentenceTransformersTextEmbedder<br/>all-MiniLM-L6-v2"] -- "embedding" --> B["InMemoryEmbeddingRetriever"]
B -- "documents" --> C["ChatPromptBuilder<br/>Compliance Agent"]
C -- "ChatMessage[]" --> D["AmazonBedrockChatGenerator<br/>Claude Sonnet 4.6"]
D -- "LLM reply (JSON)" --> E["ComplianceOutputValidator<br/>→ Pydantic Schema"]
end
subgraph "Fallback / Test Mode"
F["MockComplianceEngine<br/>keyword-based, deterministic"] --> E
end
subgraph "Data Indexing"
H["Architecture Standards<br/>3 standards"] --> I["SentenceTransformersDocumentEmbedder"]
I --> J["InMemoryDocumentStore"]
end
J --> B
E -- "structured Dict[str, Any]" --> K["FastAPI Response / Test Assertion"]
| Mode | Component | Use Case |
|---|---|---|
| Production | ChatPromptBuilder → AmazonBedrockChatGenerator (Sonnet 4.6) |
AWS Bedrock runtime; handles negation, context, natural language |
| Mock | MockComplianceEngine (keyword rules with negation handling) |
Local development, CI/CD, offline testing; deterministic, no AWS required |
Set HAYSTACK_COMPLIANCE_MOCK=1 or pass use_mock=True to force mock mode.
├── app/
│ ├── __init__.py # FastAPI (health, /check, /standards)
│ ├── models.py # Pydantic: StandardFinding, ComplianceReport
│ ├── components.py # Haystack 2.x @component classes
│ │ ├── MockComplianceEngine # Deterministic keyword engine
│ │ └── ComplianceOutputValidator # LLM → Pydantic (with mock fallback)
│ └── pipeline.py # ArchitectureCompliancePipeline (RAG)
├── infra/
│ ├── app.py # CDK entry point
│ ├── cdk.json # CDK configuration
│ ├── stacks.py # CDK stacks + Bedrock Guardrail resource
│ └── requirements.txt # CDK dependencies
├── scripts/
│ ├── deploy-aws.sh # Full deployment
│ ├── integration_test.py # HTTP integration tests
│ ├── test_analysis.py # 18-scenario analysis suite
│ ├── test_analysis_report.json
│ ├── evaluate_compliance.py # Mock vs. Bedrock → HTML dashboard
│ └── evaluation_report.html # Generated HTML report
├── tests/
│ ├── conftest.py # Shared fixtures (mock + Bedrock pipelines)
│ ├── test_data.json # 18 categorized test scenarios
│ ├── test_pipeline.py # 14 original unit tests
│ ├── test_comprehensive.py # 48 tests across 9 categories
│ └── test_bedrock_integration.py # 20 real Bedrock pipeline tests
├── docs/
│ └── evaluation_dashboard.png # Screenshot of the evaluation report
├── .github/workflows/
│ └── deploy.yml # Lint → Type Check → Test → Deploy
├── pyproject.toml # Poetry + ruff + mypy + pytest
├── Dockerfile # Multi-stage build
└── README.md
| ID | Standard | MockEngine Rule | LLM Capability |
|---|---|---|---|
| Standard-01 | Databases: AES-256 encryption + EU data residency (eu-central-1) | unverschlüsselt keyword + region list |
Full semantic analysis |
| Standard-02 | Microservices: gRPC/REST over TLS 1.3 | http && !tls |
Protocol inference |
| Standard-03 | External APIs: API Gateway + OAuth2 | api gateway substring + oauth |
Contextual API analysis |
All three standards are checked on every run. The LLM can infer violations the MockEngine misses (e.g., gRPC without TLS, implicit non-EU regions).
class StandardFinding(BaseModel):
standard_id: str
is_compliant: bool
finding_details: str # Description of the finding (German)
remediation: str # Corrective action (empty if compliant)
class ComplianceReport(BaseModel):
status: Literal["KONFORM", "NICHT KONFORM", "RELEVANZ UNKLAR"]
reasoning_summary: str # Concise per-standard bullet list
findings: List[StandardFinding] # One per standard, always 3 entries
full_report: str # Formatted compliance report (German)The LLM is instructed to output JSON matching this Pydantic schema exactly. The ComplianceOutputValidator strips markdown fences, validates against the schema, and falls back to the MockComplianceEngine on any parsing failure. Report text is generated in German, the working language of the target compliance workflow.
A lightweight test substitute that matches keywords deterministically — intentionally simple so tests are fast, reproducible, and free of external dependencies.
| Capability | MockEngine | LLM (Sonnet 4.6) |
|---|---|---|
| Encryption check | unverschlüsselt keyword |
Full NLP understanding |
| Region check | Keyword list (us-east, us-west, …) |
All AWS regions known |
| HTTP/TLS | http && !tls |
Protocol inference |
| API Gateway | api gateway substring + oauth |
Contextual reasoning |
| Negation ("nicht unverschlüsselt") | Explicit double-negative rule | Inherent comprehension |
| Negation ("ohne API Gateway") | 3-token lookback for ohne/kein/nicht |
Full sentence comprehension |
| RELEVANZ UNKLAR detection | 20+ infrastructure keywords | Semantic relevance scoring |
| English input | Partial (keywords catch us-east, etc.) |
Full English support |
| gRPC without TLS | Not detected (no rule) | Flagged as violation |
graph TB
subgraph "Test Layers"
U["test_pipeline.py<br/>14 tests"] --> C["test_comprehensive.py<br/>48 tests, 9 categories"]
C --> B["test_bedrock_integration.py<br/>20 tests, real Claude Sonnet 4.6"]
B --> E["evaluate_compliance.py<br/>18 scenarios, Mock vs. LLM comparison"]
end
subgraph "Engine under Test"
M["MockComplianceEngine<br/>deterministic, offline"]
L["Claude Sonnet 4.6<br/>AWS Bedrock Converse API"]
end
U --> M
C --> M
B --> L
E --> M
E --> L
subgraph "Output"
R["HTML Dashboard<br/>evaluation_report.html"]
T["Terminal Summary"]
end
E --> R
E --> T
| Layer | File | Tests | Runtime | AWS Required |
|---|---|---|---|---|
| Original | test_pipeline.py |
14 | < 1 s | No |
| Comprehensive | test_comprehensive.py |
48 | < 1 s | No |
| Bedrock Integration | test_bedrock_integration.py |
20 | ~3 min | Yes |
| Cross-Validation | evaluate_compliance.py |
18 scenarios | ~3 min | Yes |
| # | Category | Tests | Coverage |
|---|---|---|---|
| 1 | Happy Path | 5 | Fully compliant across all 3 standards |
| 2 | Negative Single | 8 | Each standard violated individually |
| 3 | Negative Multiple | 2 | Two or more standards violated simultaneously |
| 4 | Edge — Negation | 3 | "nicht unverschlüsselt", "ohne", "kein" |
| 5 | Edge — Region | 3 | us-west-2, no region, encrypted without region |
| 6 | Edge — Input | 5 | English, mixed language, all-caps, newlines, special characters |
| 7 | Edge — Boundary | 4 | Minimal, single-word, short, non-architectural |
| 8 | Failure Inputs | 6 | Empty, whitespace, very long, numeric, stopwords, injection |
| 9 | Output Structure | 12 | Schema keys, finding format, status consistency, JSON |
# All unit tests (62 tests, < 1 s)
python3 -m pytest tests/test_pipeline.py tests/test_comprehensive.py -v
# Bedrock integration (20 tests, ~3 min, requires AWS)
python3 -m pytest tests/test_bedrock_integration.py -v
# Full evaluation: Mock vs. Bedrock + HTML dashboard
python3 scripts/evaluate_compliance.py
# 18-scenario pipeline analysis
python3 scripts/test_analysis.pyCategory Tests Pass Rate
Happy Path 5/5 100% ████████████████████
Negative Single 8/8 100% ████████████████████
Negative Multiple 2/2 100% ████████████████████
Edge Negation 3/3 100% ████████████████████
Edge Region 3/3 100% ████████████████████
Edge Input 5/5 100% ████████████████████
Edge Boundary 4/4 100% ████████████████████
Failure Inputs 6/6 100% ████████████████████
Output Structure 12/12 100% ████████████████████
Claude Sonnet 4.6 (eu-central-1) correctly handles:
| Scenario | Status | Notes |
|---|---|---|
| All standards met | KONFORM |
AES-256, eu-central-1, gRPC+TLS, API Gateway+OAuth2 |
| Unencrypted DB | NICHT KONFORM |
Correctly identified |
| Plain HTTP | NICHT KONFORM |
Correctly identified |
| Non-EU region (us-east-1) | NICHT KONFORM |
Correctly flagged |
| Missing OAuth2 | NICHT KONFORM |
API Gateway without OAuth2 |
| Double negative ("nicht unverschlüsselt") | KONFORM |
LLM understands negation |
| us-west-2 region | NICHT KONFORM |
LLM knows us-west-2 is non-EU |
| gRPC without TLS | NICHT KONFORM |
MockEngine misses this |
| "ohne API Gateway" | NICHT KONFORM |
LLM treats as violation (defensible) |
| Non-architectural input ("Hallo") | RELEVANZ UNKLAR |
Correct classification |
The evaluation runs all 18 test_data.json scenarios through both engines and generates a side-by-side HTML comparison.
Latest results (Claude Sonnet 4.6, eu-central-1):
| Metric | MockEngine | Bedrock LLM | Δ | Interpretation |
|---|---|---|---|---|
| Pass Rate | 100.0% | 83.3% | −16.7% | LLM is stricter on some edge cases |
| Avg F1 Score | 0.981 | 0.811 | −0.170 | Trade-off: stricter means more false positives |
| Avg Precision | 0.972 | 0.737 | −0.235 | LLM flags more items as violations |
| Avg Recall | 1.0 | 0.972 | −0.028 | Both catch nearly all actual violations |
| Avg Response Time | 0.0 s | 10.7 s | +10.7 s | LLM inference vs. instant keyword matching |
The LLM's lower pass rate reflects test expectations designed around the keyword-based MockEngine. The LLM is genuinely more capable — it catches violations the MockEngine misses entirely (e.g., gRPC without TLS). The three discrepancies are edge cases where both interpretations are defensible.
Dashboard features: summary cards (pass rate, F1, precision, recall, response time), per-engine category breakdown, confusion matrix, per-test-case comparison with reasoning snippets, disagreement analysis, and an error details table.
open scripts/evaluation_report.html- AWS CLI with OIDC-configured GitHub Actions
- Docker, Python 3.9+
- Bedrock model access: Claude Sonnet 4.6 in
eu-central-1
Uses OIDC — no long-lived AWS credentials:
graph LR
A["push to main"] --> B["Lint (ruff)"]
B --> C["Type Check (mypy)"]
C --> D["Test (pytest, mock mode)"]
D --> E["Docker Build & Push to ECR"]
E --> F["ECS Force New Deployment"]
F --> G["Health Check Wait"]
Required secret: AWS_ROLE_ARN — an IAM role with an OIDC trust policy (permissions: ECR push, ECS update-service, Bedrock InvokeModel, ApplyGuardrail).
| Resource | Details |
|---|---|
| VPC | 2 AZs, 1 NAT Gateway, public/private subnets |
| ECS Cluster | Fargate (serverless), container insights |
| ECR | haystack-compliance, retains the last 10 images |
| ECS Service | 1–5 tasks, auto-scaling at 70% CPU/memory |
| ALB | Health check on /health |
| IAM Role | Bedrock: InvokeModel, ApplyGuardrail |
| Bedrock Guardrail | Content filters (sexual/hate/violence/insults) + PII (email anonymized, AWS keys blocked) |
| Fargate Spot | Cost-optimized capacity provider |
| Region | eu-central-1 (Frankfurt) |
# Install
pip install -r requirements.txt
# Mock mode (no AWS)
HAYSTACK_COMPLIANCE_MOCK=1 python3 architecture_compliance_checker.py
# FastAPI dev server
HAYSTACK_COMPLIANCE_MOCK=1 uvicorn app.main:app --reload --port 8000
# http://localhost:8000/docs (Swagger UI){ "status": "healthy", "pipeline": "haystack-architecture-compliance" }Request:
{
"requirement": "Wir planen eine unverschlüsselte PostgreSQL-Datenbank in us-east-1."
}Response:
{
"status": "NICHT KONFORM",
"reasoning_summary": " - Standard-01: Datenbankverschlüsselung (AES-256) nicht eingehalten\n - Standard-01: Datenhaltung außerhalb der EU (eu-central-1)",
"findings": [
{
"standard_id": "Standard-01",
"is_compliant": false,
"finding_details": "Datenbankverschlüsselung (AES-256) nicht eingehalten\n - Datenhaltung außerhalb der EU (eu-central-1)",
"remediation": "Aktivieren Sie AES-256 Verschlüsselung für alle Datenbanken\nVerlagerung der Daten in die EU (eu-central-1)"
},
{
"standard_id": "Standard-02",
"is_compliant": true,
"finding_details": "Keine Aussage zur Kommunikation — Standard als erfüllt betrachtet.",
"remediation": ""
},
{
"standard_id": "Standard-03",
"is_compliant": true,
"finding_details": "Keine externe API genannt — Standard als erfüllt betrachtet.",
"remediation": ""
}
],
"full_report": "1. STATUS: NICHT KONFORM\n2. BEGRÜNDUNG:\n ..."
}{
"standards": [
{ "id": 0, "content": "Standard-01: Alle produktiven Datenbanken..." },
{ "id": 1, "content": "Standard-02: Für den Datenaustausch..." },
{ "id": 2, "content": "Standard-03: Externe APIs müssen..." }
],
"count": 3
}- OIDC authentication — GitHub Actions → AWS via IAM roles, no access keys
- Bedrock Guardrails — content filters (hate, sexual, violence, insults) + PII handling (email anonymized, AWS keys blocked)
- IAM least privilege — ECS task role limited to
bedrock:InvokeModelandbedrock:ApplyGuardrail - No secrets in the repository —
.env,*.pyc,__pycache__,cdk.out/are gitignored - Deterministic LLM output —
temperature=0.0for reproducible results - Input validation — FastAPI Pydantic schema validation on
/check
Deploys on push to main or manual workflow_dispatch:
- Lint —
ruff check app/ tests/ scripts/ - Type check —
mypy app/ - Test —
pytest tests/ -v(MockEngine, no AWS needed, < 1 s) - Docker build & push — multi-arch → ECR
- ECS deploy —
force-new-deployment, waits for service stability
Required secret: AWS_ROLE_ARN
| Area | Limitation | Impact |
|---|---|---|
| MockEngine | Keyword-based; misses gRPC without TLS, non-standard regions | Only used in CI — the LLM handles all edge cases |
| Region detection | us-*, ap-* prefix list |
The LLM knows all AWS regions |
| TLS version | Mock checks the tls keyword only |
The LLM distinguishes TLS 1.2 vs. 1.3 |
| Language | MockEngine is German-only | The LLM handles German + English |
| LLM cost | ~10 s per inference, ~$0.01 per check | Acceptable for a compliance workflow |
| Bedrock availability | Requires Claude Sonnet 4.6 access in eu-central-1 | MockEngine works offline |
| No persistent store | Results are ephemeral (in-memory) | No audit trail without extensions |
# Install & run tests
pip install -r requirements.txt
python3 -m pytest tests/ -v
# FastAPI server
uvicorn app.main:app --reload --port 8000
# Check compliance
curl -X POST http://localhost:8000/check \
-H "Content-Type: application/json" \
-d '{"requirement": "Unverschlüsselte DB in us-east-1"}'
# Full evaluation
python3 scripts/evaluate_compliance.py
open scripts/evaluation_report.htmlBuilt by LeopardCode.AI — AI Engineering & Consulting
