Stop Claude from forgetting. Ship faster with fewer tokens.
A portable, deployable 3-layer memory architecture for Claude Code that
reduces token consumption by ~81% and makes your AI assistant learn from every session.
Claude Code forgets everything between sessions. Every new conversation starts from zero — re-reading files, re-discovering architecture, re-learning your preferences. You burn tokens repeating context that should already be known.
Vector databases alone don't fix this. The problem isn't storage — it's retrieval architecture. Dumping embeddings into a single store creates noise, retrieval latency, and context window bloat.
What you need is a tiered system that loads the right memory at the right time, at the right cost.
3 layers. Each serves a different purpose. Each has a different cost.
ASCII version (for terminals)
┌──────────────────────────────────────────────────────────────────┐
│ Claude Code Session │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ L1: Always-Loaded Context COST: FREE │ │
│ │ CLAUDE.md + memory-bank/*.md + tech-tips/*.md │ │
│ │ Preferences, decisions, patterns, strategies │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ falls through if not found │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ L2: Code Intelligence Pipeline COST: AUTO │ │
│ │ code-review-graph ──▶ codebase-memory-mcp │ │
│ │ Blast-radius scoping → targeted code retrieval │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │ │
│ falls through if not found │
│ ▼ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ L3: Knowledge + Learning COST: ON-DEMAND │ │
│ │ RuVector PostgreSQL with ReasoningBank │ │
│ │ Vectors + patterns + learning + graph ops │ │
│ └────────────────────────────────────────────────────────────┘ │
│ │
│ ┌───────────────────────┐ ┌─────────────────────────────────┐ │
│ │ Ruflo Orchestration │ │ Superpowers (8 skills) │ │
│ │ Swarms + Q-Learning │ │ TDD, Debug, Review, Verify │ │
│ └───────────────────────┘ └─────────────────────────────────┘ │
│ │
│ ┌────────────────────────────────────────────────────────────┐ │
│ │ Security: no-leak hook blocks .env, .key, .pem, secrets │ │
│ └────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────┘
Measured on real-world coding sessions with 25 turns.
| Metric | Without Stack | With Stack | Improvement |
|---|---|---|---|
| Tokens per session | ~380,000 | ~72,000 | -81% |
| System prompt per turn | 22,000 | 16,500 | -25% |
| Code exploration task | 65,000 | 2,300 | -96% |
| Memory queries per session | 30 | 0–8 | -73% |
| Effective sessions per $ budget | 1x | ~5x | +400% |
| Bug-related rework | ~40% | ~10% | -75% |
L1 eliminates repeated context. L2 replaces multi-round grep with single graph queries. L3 surfaces historical knowledge only when needed, instead of stuffing the prompt with everything.
- macOS or Linux
- Node.js 20+
- Docker Desktop (running)
- Claude Code CLI (
npm install -g @anthropic-ai/claude-code) - jq (
brew install jq)
git clone https://github.com/rajat1021/claude-memory-stack.git
cd claude-memory-stack
./install.sh./verify.shecho 'export GITHUB_TOKEN="ghp_your_token"' >> ~/.zshenv
source ~/.zshenv
./install.sh # re-run to inject tokenWhat the installer does
- Checks prerequisites (Node.js 20+, Docker, Claude CLI, jq)
- Installs MCP servers (codebase-memory-mcp, code-review-graph, ruvector)
- Starts RuVector PostgreSQL container (port 5433, auto-restart on boot)
- Configures Docker Desktop to start on login
- Installs Ruflo orchestration
- Backs up existing config (
~/.claude/CLAUDE.md,settings.json,.mcp.json) - Deploys new config with trimmed system prompt
- Deploys hooks (2 instead of 7), skills, commands, tech-tips
- Installs Superpowers plugin
- Writes install marker and runs verification
Idempotent — safe to run multiple times.
Layer 1: Always-Loaded Context
Files in ~/.claude/CLAUDE.md and ~/.claude/memory-bank/*.md are loaded into every session's system prompt automatically. This is where you store:
- Coding style preferences and conventions
- Project-specific protocols and state
- Architecture Decision Records (ADRs)
- Tech-tips (cross-project gotchas)
Cost: Part of the system prompt. No queries. No latency. No cost.
Layer 2: Code Intelligence Pipeline
Two MCP servers chained for maximum token efficiency:
Stage 1 — code-review-graph (scoping)
- Blast-radius analysis: "this change affects 4 functions in 3 files"
- Risk scoring: "high: payment_handler, low: logger"
- 22 MCP tools, 19 languages
Stage 2 — codebase-memory-mcp (precision retrieval)
get_code_snippet(): exact function code (~200 tokens vs ~5,000 for full file)trace_call_path(): who calls what, depth-controlled- 14 MCP tools, 66 languages
Together: Know WHAT to look at, then get ONLY that code.
Cost: 1 MCP call (~200–500 tokens) instead of 10+ grep/read cycles (~20,000–65,000 tokens).
Layer 3: Knowledge + Learning
RuVector PostgreSQL — replaces both pgvector (dumb storage) and memorygraph (separate knowledge graph) with one intelligent system:
| Feature | What It Does |
|---|---|
| Vector search | HNSW-indexed similarity on insights, note_chunks, observations |
| ReasoningBank | RETRIEVE → JUDGE → DISTILL → ROUTE learning loop |
| Pattern tracking | Confidence scores, success/failure counts per insight |
| Graph operations | Relationships between insights — replaces memorygraph |
| Agent routing | Auto-route tasks to optimal model via Q-Learning |
Cost: Only queried when user approves — except for keyword triggers ("history", "insights", "patterns").
Memory Retrieval Protocol
L1 (check first, free) → L2 (auto for code) → L3 (ask user first)
| Layer | Trigger | Cost |
|---|---|---|
| L1 | Always loaded — check first | Free |
| L2 | Auto-fires on code questions | Only when relevant |
| L3 | Asks user before querying | Only when approved |
L3 Exception: If the user says "history", "what happened last time", "insights", or "patterns" — query L3 directly without asking.
Orchestration: Ruflo
- Swarm coordination — Queen/Worker hierarchy for parallel work
- Q-Learning task routing — right model for right task (Opus for hard, Haiku for simple)
- ReasoningBank — self-learning from every session
- Multi-model selection — 30–50% token savings via intelligent routing
Discipline: Superpowers (8 skills)
| Skill | What It Enforces |
|---|---|
| Brainstorming | Explore intent before implementation |
| Writing Plans | Structured specs before code |
| TDD | No production code without failing test first |
| Systematic Debugging | Root cause before fix attempts |
| Verification | Run commands + confirm output before claiming done |
| Code Review (give) | Verify work meets requirements |
| Code Review (receive) | Verify feedback technically before implementing |
| Git Worktrees | Isolated feature work with safety checks |
| Component | Type | Source | Role |
|---|---|---|---|
| codebase-memory-mcp | MCP server | DeusData/codebase-memory-mcp | L2 — code graph, 66 languages, sub-ms queries |
| code-review-graph | MCP server | tirth8205/code-review-graph | L2 — blast-radius scoping, risk scoring |
| ruvector | MCP + PostgreSQL | ruvnet/ruflo | L3 — vectors, learning, graph, routing |
| github MCP | MCP server | @anthropic-ai/github-mcp-server | GitHub integration — PRs, issues, CI |
| ruflo | Orchestration | ruvnet/ruflo | Swarms, Q-Learning routing, ReasoningBank |
| superpowers | Plugin (8 skills) | obra/superpowers | TDD, debugging, verification, code review |
| no-leak | Hook | Custom | Blocks .env, .pem, .key, credentials |
| auto-index | Hook | Custom | Re-indexes codebase on session start |
Every project gets its own memory-bank. Switch projects, switch context — instantly. No cross-contamination.
cd ~/my-new-project
claude /init-project my-projectCreates (copied from ~/.claude/templates/project/ with placeholders substituted):
my-project/
├── .mcp.json # codebase-memory + memory-graph + postgres
├── CLAUDE.md # Project prefs + @imports
└── .claude/
└── memory-bank/
├── architecture/system-overview.md
├── decisions/README.md # ADR log
├── patterns/coding-standards.md
└── troubleshooting/known-issues.md
Placeholders substituted on init: __PROJECT_NAME__, __CMS_CODEBASE_MEMORY_BIN__ (from which codebase-memory-mcp), __CMS_DB_PASSWORD__ (from $CMS_DB_PASSWORD or default).
Then auto-runs code-review-graph build and codebase-memory-mcp index.
| Command | Usage | What It Does |
|---|---|---|
/init-project |
claude /init-project my-app |
Bootstrap 3-layer architecture for any project — creates CLAUDE.md, .mcp.json, memory-bank/, and indexes the codebase |
/tech-tip |
claude /tech-tip python |
Capture a technology-specific gotcha to the shared tips library at ~/.claude/memory-bank/tech-tips/ |
| Hook | Fires On | What It Does |
|---|---|---|
| no-leak | Every file read/write/edit/bash | Blocks access to .env, .pem, .key, credentials.json — prevents accidental secret exposure |
| auto-index | Session start | Re-indexes codebase graph if stale (>24h) — keeps L2 code intelligence fresh |
| Skill | Trigger | Purpose |
|---|---|---|
| codebase-memory-exploring | Code exploration questions | Guides Claude to use graph search instead of grep |
| codebase-memory-tracing | "Who calls this function?" | Traces call paths via graph instead of multi-round grep |
| codebase-memory-quality | Code quality questions | Dead code detection, complexity analysis via graph |
| codebase-memory-reference | MCP tool usage questions | Reference guide for graph query syntax |
| defuddle | URL content extraction | Clean markdown from web pages, saves tokens vs raw HTML |
From existing pgvector
./migration/migrate-pgvector.shMigrates source tables into insights, note_chunks, and observations on RuVector PostgreSQL (port 5433). Configure source creds and table names via env vars — see migration/migrate-pgvector.sh.
From memorygraph
./migration/migrate-memorygraph.shExports entities and relations from memorygraph SQLite databases into RuVector's patterns table.
From vanilla Claude Code
Just run ./install.sh. It backs up your existing config before deploying. Originals saved as *.bak.<timestamp> in ~/.claude/.
Rollback
./migration/rollback.sh # Restore config backups
./uninstall.sh # Full removal + Docker cleanup| File | Location | Purpose |
|---|---|---|
CLAUDE.md |
~/.claude/CLAUDE.md |
Preferences, protocols, memory retrieval rules |
settings.json |
~/.claude/settings.json |
Hooks, permissions, plugins |
.mcp.json |
~/.claude/.mcp.json |
Global MCP servers (GitHub) |
memory-bank/ |
~/.claude/memory-bank/ |
Tech-tips, shared knowledge |
| File | Location | Purpose |
|---|---|---|
CLAUDE.md |
<project>/CLAUDE.md |
Project-specific instructions |
.mcp.json |
<project>/.mcp.json |
Project MCP servers (code-review-graph, codebase-memory, ruvector) |
memory-bank/ |
<project>/.claude/memory-bank/ |
Decisions, patterns, troubleshooting |
./uninstall.shStops Docker container, restores backed-up configs, removes hooks/skills/commands. Restart Claude Code after.
Built by Rajat Tanwar
Powered by:
codebase-memory-mcp by DeusData •
code-review-graph by Tirth Patel
ruflo by Ruv •
superpowers by Jesse Vincent
Claude Code by Anthropic