A governed agentic workflow that turns an unstructured motor-insurance claim into a policy-grounded, human-reviewed, and fully traceable case.
ClaimFlow AI accepts a claim PDF or email, extracts structured claim data, validates what is missing, retrieves current policy evidence, recalls relevant workflow guidance, recommends one safe next action, and keeps a human reviewer in control of the final decision.
The project was built to answer a practical question:
How can AI help claim reviewers process incomplete and complex claims faster—while keeping every recommendation grounded, governed, human-reviewed, and auditable?
Architecturally, the workflow shows how each ClaimFlow AI layer contributes to one governed decision. Extraction converts an unstructured claim into structured JSON, deterministic validation identifies missing fields and evidence, policy RAG retrieves relevant policy clauses, and memory supplies applicable guidance from previously reviewed outcomes. The guarded agent combines this context to recommend and execute one permitted workflow action. Human review verifies the claim information and makes the final decision, while observability, evaluation, and memory feedback keep the complete process traceable, measurable, and continuously improving.
Agentic Workflow : https://x.com/RitikaxG/status/2061398296137199908?s=20
Memory Layer : https://x.com/RitikaxG/status/2065037735145205775?s=20
ClaimFlow AI is an end-to-end motor-claim operations workflow with:
- PDF and pasted-email intake
- schema-shaped claim JSON extraction
- deterministic validation of required fields, evidence, conflicts, and warnings
- claim-aware policy RAG with verified citations
- a guarded agent that can propose only registered tools
- human review for corrections and final decisions
- safe workflow memory built from trusted review outcomes
- an AI gateway for model, prompt, latency, token, cost, and failure metadata
- a per-run trace that connects the complete claim journey
- Week 1–6 evaluation suites and dashboards
This is not a claim chatbot and it is not an autonomous approval system. The model interprets and proposes. Deterministic software validates and constrains. Policy evidence grounds the answer. Memory offers guidance, never claim facts. The human reviewer owns approval or rejection.
- Structured extraction from claim PDFs and email text
- Deterministic validation before AI output becomes workflow state
- Policy-grounded RAG with multi-query retrieval and citation verification
- One bounded agent action per step
- Registered tools and guardrails around every proposal
- Human-in-the-loop review for missing information and final decisions
- Safe workflow memory that cannot overwrite current claim evidence
- Auditable memory lifecycle: create, strengthen, weaken, retire, and supersede
- Central AI gateway for governed model calls
- End-to-end run trace across extraction, RAG, memory, agent, review, and gateway events
- Regression evaluations for every capability added from Week 1 through Week 6
The 16 screens below follow a single claim through the complete product loop.
A reviewer uploads a claim PDF or pastes the original claim email. ClaimFlow stores the source document and creates a durable extraction run.
The extraction call passes through the AI gateway. Gemini converts the unstructured claim into schema-shaped JSON, while ClaimFlow stores the raw response, parsed result, model, prompt version, schema version, and trace metadata.
Deterministic rules inspect required fields, required evidence, conflicts, warnings, and confidence issues. In this claim, the workflow identifies missing information instead of treating a syntactically valid model response as a complete claim.
The reviewer asks whether the current claim is covered and what evidence is required. The question is combined with the latest claim context rather than sent to a general-purpose chatbot.
Claim-aware multi-query retrieval searches coverage, evidence, exclusion, and limit clauses in Postgres with pgvector. For this claim, RAG retrieves policy evidence showing that a police report is required. The answer is grounded in the retrieved wording and its citations are verified before it is saved.
ClaimFlow searches prior reviewed outcomes for reusable workflow guidance. It finds a relevant lesson: a previous claim with a missing vehicle.registrationNumber was resolved by drafting an information request.
The memory does not supply the old vehicle registration number. It can suggest a safe process, but the current claim must provide its own facts.
The current claim state, validation result, retrieved policy evidence, review state, recent actions, and safe memory guidance are assembled into the agent context. The agent recommends one bounded next action: request the missing information.
The proposed action passes through guardrails before execution. ClaimFlow permits only registered backend tools; the model cannot directly approve, reject, mutate arbitrary data, or bypass review.
The allowed draft_information_request tool creates a durable follow-up draft for the missing registration number and required evidence. The draft is reviewable and is not silently sent as an email.
The current claim's missing information is entered into the review workflow. ClaimFlow validates the required items for this request rather than allowing the case to advance with unresolved fields.
The reviewer records the submitted information against the durable request, preserving who supplied it and which fields or evidence it addresses.
Once the requested information is received, the review task moves out of its waiting state and returns to the active review queue with the new evidence attached.
The reviewer marks the retrieved memory as relevant or irrelevant. Retrieval alone never strengthens a memory; only trusted human outcomes can update its confidence and lifecycle.
The reviewer sees the source claim, extracted JSON, validation findings, requested information, policy evidence, memory guidance, and agent rationale. They fill the current registration number, make any necessary corrections, and own the final approval decision.
ClaimFlow stores the corrected claim JSON and the EDITED_AND_APPROVED outcome without erasing the original extraction. That difference becomes auditable evidence for future evaluation and safe memory updates.
The trace dashboard reconstructs what happened in order: document intake, extraction, validation, gateway calls, RAG retrieval, citations, memory retrieval and use, agent proposal, guardrail decision, tool execution, follow-up state changes, human review, and memory feedback.
Claim PDF / email
│
▼
AI extraction ───────────────► structured claim JSON
│ │
│ ▼
│ deterministic validation
│ │
│ ┌───────────────┼────────────────┐
│ ▼ ▼ ▼
│ Policy RAG workflow memory review state
│ current policy past process current case
│ evidence guidance state
│ └───────────────┼────────────────┘
│ ▼
│ guarded agent step
│ one registered tool
│ │
│ ▼
│ information request / review
│ │
│ ▼
│ human-owned decision
│ │
│ ▼
│ memory feedback + lifecycle
│
└────────► AI gateway + run trace + evaluationsThe model turns a messy document into a typed claim object. ClaimFlow persists both the model output and the normalized result so the transformation can be inspected later.
Deterministic rules decide whether the extracted claim can continue or needs review. Missing fields and evidence are explicit state, not hidden inside model prose.
The policy corpus is parsed into clause-level chunks and stored with embeddings. Claim-aware queries retrieve the most relevant clauses; thresholds and citation checks prevent unsupported coverage answers. Weak evidence produces NEEDS_REVIEW, not a confident guess.
Memory is derived from trusted corrections and review outcomes. It may say “a similar missing field was resolved through an information request,” but it may not copy a past claimant's value, override policy evidence, or make the current decision.
A deterministic router handles obvious cases first. When planning is needed, LangChain tool-calling proposes exactly one registered action. Guardrails decide whether it is permitted, and backend code—not the model—executes it.
The reviewer can approve, edit and approve, reject, or request more information. Original extraction, corrected data, rationale, and status transitions remain available for audit.
Every model-backed call passes through the AI gateway. Per-run traces explain one claim; controlled evaluation datasets measure whether extraction, review routing, RAG, agent actions, memory behavior, and gateway failures remain reliable across many cases.
| Layer | Responsibility |
|---|---|
| Model intelligence | Extract ambiguous documents, draft grounded explanations, and propose one typed action. |
| Deterministic control | Validate schemas and business rules, rank evidence, enforce state transitions, and execute backend tools. |
| Policy RAG | Supply current, cited policy evidence and abstain when retrieval is weak. |
| Workflow memory | Reuse safe process lessons from trusted outcomes without supplying current claim facts. |
| Guardrails | Block unsupported tools, invalid arguments, final-state mutations, and unsafe actions. |
| Human review | Correct claim data, assess evidence, rate memory relevance, and make the final decision. |
| AI gateway | Record model, prompt, schema, trace, latency, tokens, cost, status, and normalized failures. |
| Evaluations | Test capability quality and safety contracts from Week 1 through Week 6. |
- PDF and pasted-email claim intake
- Zod-based structured claim schema
- raw and parsed model output persistence
- deterministic required-field and evidence checks
- conflict, warning, and confidence findings
COMPLETED,NEEDS_REVIEW, andFAILEDrun states- duplicate-content detection, soft delete, and restore
- clause-aware policy parsing and chunking
- pgvector similarity search
- claim-aware multi-query planning
- coverage, evidence, exclusion, and limit retrieval intents
- retrieval strength thresholds
- grounded answer generation and citation verification
- persisted questions, answers, evidence, and retrieval traces
- current-state context builder
- deterministic routing before model planning
- LangChain structured tool-calling
- one action per agent step
- typed tool registry and argument validation
- pre-execution guardrails
- idempotent tool execution and action audit logs
- durable review tasks
- information-request drafts
- waiting and received-information states
- required-item enforcement before progression
- edited claim JSON with original extraction preserved
- reviewer-owned approval and rejection
- episodic and generalized workflow lessons
- deterministic filters plus semantic retrieval
- relevance scoring and reason codes
- explicit
safeUseandmustNotDoguidance - separate retrieval, agent-use, and reviewer-feedback audit
- create, strengthen, weaken, retire, and supersede lifecycle
- trace IDs across model-backed work
- provider, model, prompt, and schema metadata
- latency, token, and estimated-cost tracking
- normalized timeout, rate-limit, provider, parse, and validation failures
- retryability and fallback metadata
- per-run workflow trace
- Week 1–6 evaluation dashboard
claimflow_ai/
├── apps/
│ └── web/ # Next.js product UI and server actions
├── packages/
│ ├── ai/ # Extraction and model-backed generation
│ ├── agent/ # Context, routing, planner, tools, guardrails, runner
│ ├── db/ # Prisma schema, migrations, and demo seed
│ ├── evals/ # Week 1–6 evaluation runners and reports
│ ├── gateway/ # Governed model calls and AiCallLog persistence
│ ├── memory/ # Memory writing, retrieval, scoring, audit, lifecycle
│ ├── rag/ # Policy ingestion, embeddings, retrieval, citations
│ └── shared/ # Shared schemas and types
├── sample-data/ # Synthetic packets, gold expectations, eval results
├── docs/ # Architecture, evidence, demos, and implementation notes
├── Dockerfile
├── docker-compose.yml
└── render.yaml| Area | Stack |
|---|---|
| Web | Next.js 16, React 19, TypeScript, Tailwind CSS |
| Monorepo | Turborepo, Bun workspaces |
| Data | Postgres, Prisma 7, pgvector |
| AI | Gemini, LangChain tool-calling, Zod schemas |
| Reliability | Deterministic rules, typed tools, guardrails, eval runners, AI gateway |
| Deployment | Docker and Render Blueprint |
cp .env.example packages/db/.envAdd the required database and Gemini credentials described in .env.example.
docker compose up -dbun installbun run db:generate
bun run db:migratebun run demo:seedbun run devOpen http://localhost:3000.
bun run release:checkThe release check generates Prisma types, type-checks the workspaces, runs lint, executes the deterministic Week 6 gateway evaluation, and builds the application.
| Week | Capability | What is measured |
|---|---|---|
| Week 1 | Extraction and validation | Schema conformity, field extraction, evidence checks, and run status |
| Week 2 | Human review | Routing, priority, events, corrected data, and decision state |
| Week 3 | Policy RAG | Retrieval quality, citation support, abstention, and false approval |
| Week 4 | Agent and guardrails | Tool selection, blocking, idempotency, and final-state safety |
| Week 5 | Workflow memory | Retrieval, safe use, feedback, conflicts, and lifecycle |
| Week 6 | Gateway observability | Trace completeness, failures, retries, cost, and metadata governance |
See evaluation design and results and the synthetic datasets.
| Capability | Walkthrough |
|---|---|
| Intake, extraction, validation, and initial review boundary | Week 1 — Document Intake Reviewer |
| Policy ingestion, retrieval, thresholds, citations, and Coverage UI | Week 3 — Policy RAG Architecture |
| Agent context, tools, guardrails, information request, and review loop | Week 4 — Guarded Agentic Workflow |
| Memory retrieval, agent use, feedback, lifecycle, and patterns | Week 5 — Memory Flow Evidence |
| Gateway, final schema, complete run trace, and Week 1–6 proof | Week 6 — Observability Flow Evidence |
- All checked-in and seeded claims are synthetic.
- Extraction is validated before it becomes trusted workflow state.
- Weak retrieval or unsupported citations force human review.
- Memory is advisory and cannot overwrite current claim data.
- The agent cannot approve, reject, delete, send an email, or bypass review.
- Final claim decisions remain human-owned.
- Production deployment still requires organization-specific authentication, authorization, secrets management, retention rules, and compliance review.
- Document intake and structured extraction: implemented
- Deterministic validation and review routing: implemented
- Human review and information-request loop: implemented
- Policy RAG and citations: implemented
- Guarded agent and registered tools: implemented
- Workflow memory and lifecycle: implemented
- AI gateway and complete run trace: implemented
- Week 1–6 evaluations: implemented
- Docker and Render deployment configuration: implemented
ClaimFlow AI demonstrates agentic AI as a governed product workflow: grounded by current policy evidence, informed by safe memory, constrained by typed tools and guardrails, accountable to human review, and measurable through traces and evaluations.
















