A collection of system design documents for production AI systems - focused on architecture decisions, tradeoffs, and design patterns. No code, just thinking.
| Folder | What's inside |
|---|---|
RAG/ |
Retrieval-Augmented Generation systems for different domains and accuracy requirements |
Agents/ |
Multi-agent orchestration, loop detection, safety patterns |
LLM-Serving/ |
Model serving, scaling, batching, and GPU efficiency |
Evaluation/ |
Eval pipelines, regression detection, LLM-as-judge systems |
Guardrails/ |
Input/output safety, PII handling, constrained generation |
Data-Pipelines/ |
Ingestion, chunking, freshness, and deduplication |
- High-Accuracy RAG - Legal, medical, and compliance use cases where hallucination is unacceptable
- Medical RAG with Temporal KG - Patient history reasoning using Knowledge Graphs and specialist model routing
- Loop Detection & Recovery - Detecting and breaking agent loops without losing context
- Fintech Refund Agent - Safe irreversible action execution with confidence scoring and idempotency
- Coding Assistant Eval Pipeline - Regression detection for model upgrades using shadow evaluation and execution-based testing
Each document follows this structure:
- Problem - what we're solving and why it's hard
- Architecture - the full system design with components
- Key decisions - why each component was chosen over alternatives
- Tradeoffs - what we gave up and why it was worth it
- Failure modes - what can go wrong and how the system handles it
- Interview question - the prompt this design answers
Designs derived from first-principles reasoning, mapped to production patterns used at Harvey AI, GitHub Copilot, Google Med-PaLM, and Midjourney.