diff --git a/posts/pablocalofatti/2.cortexmem.md b/posts/pablocalofatti/2.cortexmem.md new file mode 100644 index 00000000..675b901b --- /dev/null +++ b/posts/pablocalofatti/2.cortexmem.md @@ -0,0 +1,147 @@ +--- +title: 'Teaching AI to Remember: Building Persistent Memory for Coding Agents' +published: false +description: 'AI coding agents forget everything between sessions. I built cortexmem, a Rust-powered persistent memory layer, so they do not have to.' +tags: 'ai, rust, developer-tools, opensource' +cover_image: ./assets/cortexmem-cover.png +--- + +## The Conversation You've Had a Hundred Times + +It's 9 AM. You open your terminal, fire up your AI coding agent, and type: + +"Hey, remember yesterday we decided to use the repository pattern for the data layer? And that we're going with Zustand instead of Redux because of the bundle size issue?" + +Silence. A blank stare. Your agent has no idea what you're talking about. + +So you explain it again. The architecture. The tradeoffs. The bug you spent two hours debugging. The naming convention you agreed on. All of it, gone. Every session is day one. + +If you've used AI coding agents for anything beyond toy projects, you know this feeling. It's like working with the world's most brilliant colleague who has amnesia. Every morning you walk in, and they've forgotten everything from the day before. Not just the small stuff, the *decisions*. The context that took hours to build. + +I got tired of re-explaining myself. So I built something about it. + +## What If Your Agent Could Remember? + +The idea behind [cortexmem](https://github.com/pablocalofatti/cortexmem) is almost embarrassingly simple: **your AI agent should remember what you taught it yesterday.** + +Not in some clever prompt-engineering way where you paste a giant markdown file at the start of every session. Not through a RAG pipeline that needs a vector database, an embedding service, and a PhD to configure. I wanted something that felt like *actual memory*: save something, and it's there next time. Search for it, and the right thing comes back. No infrastructure. No cloud account. No YAML ritual. + +Just install and go. A bicycle for agent memory. + +## Zero Infrastructure, Maximum Recall + +When I sat down to design cortexmem, I had three non-negotiable principles: + +**1. It has to be a single binary.** No Docker compose files. No "first, spin up Postgres." No microservice constellation. You download one thing, you run it, it works. I chose Rust because I wanted that self-contained feeling, and because I wanted it to be fast enough that agents could call it mid-thought without noticeable latency. + +![cortexmem CLI showing all available commands](./assets/cortexmem-cli.png) + +**2. Search has to be smart, not just fast.** The whole point of memory is finding what's relevant. Keyword search alone misses semantic connections ("auth flow" won't match "login pipeline"). Vector search alone drowns in vague similarity. So cortexmem uses *hybrid search*: BM25 keyword matching via SQLite's FTS5, plus vector KNN similarity via local embeddings (fastembed, ONNX, no API calls), fused together with Reciprocal Rank Fusion. You get the precision of keywords and the flexibility of semantics in one query. + +**3. Memory should behave like memory.** Human memory isn't a flat database. Things you use often stay sharp. Things you haven't touched in months fade. Cortexmem has a three-tier lifecycle: **buffer** (fresh, unvalidated), **working** (actively useful), and **core** (proven, long-term). Memories decay naturally over time, and frequently-accessed ones get boosted. It's closer to how you actually remember things than to a filing cabinet. + +## How It Actually Works + +Cortexmem exposes 16 MCP tools that any compatible agent can call. The two you'll use most: + +**Saving a memory:** + +```text +mem_save( + content: "We decided to use the repository pattern for data access. + Controllers delegate to services, services use repositories. + This keeps the data layer swappable for testing.", + observation_type: "decision", + topic_key: "architecture/data-layer-pattern", + tags: ["architecture", "nestjs", "repository-pattern"] +) +``` + +That `topic_key` is important: it's a deterministic `{family}/{slug}` path. If the agent saves another observation with the same topic key, it *upserts* instead of duplicating. Your "data layer decision" stays as one evolving memory, not seventeen conflicting copies. + +**Searching memory:** + +```text +mem_search(query: "how do we handle data access") +``` + +Behind the scenes, this fires both a keyword search and a vector search, merges the results with RRF scoring, and returns the most relevant memories. The agent gets back the decision you saved last week, with full context, without you having to explain it again. + +**Session tracking** ties it together. When you start a session (`mem_session_start`), cortexmem records it. When you end one (`mem_session_end`), you can attach a summary. Next session, the agent can pull up what happened last time and pick up where you left off. No more "so, where were we?" + +## The Part Where It Gets Interesting + +The initial version was straightforward: save, search, retrieve. Useful, but static. The memories were only as good as what the agent explicitly saved. + +Then I started asking: *what if the memory system could learn?* + +**v1.4 introduced auto-tagging and relevance learning.** When you save a memory now, cortexmem runs TF-IDF keyword extraction and fact extraction to automatically generate tags. You don't have to be meticulous about metadata; the system figures out what's important. + +More interestingly, it tracks *search-to-access patterns*. When you search for "authentication" and then access a specific memory, that connection gets recorded. Over time, cortexmem learns which memories are actually useful for which queries, and boosts them in future results. The more you use it, the better it gets at surfacing the right thing. + +This is where it stopped feeling like a database and started feeling like memory. + +## It Works With Everything + +One thing I was stubborn about: cortexmem is not locked to any specific agent. It speaks MCP (Model Context Protocol), which means it works with Claude Code, Cursor, Windsurf, Cline, Continue, OpenCode, VS Code, Zed, Gemini CLI, anything that supports MCP tool calling. + +The setup wizard (`cortexmem setup`) auto-detects which agent you're using and writes the configuration for you. Three commands from zero to working memory: + +```bash +# Both pull from verified package registries (Homebrew tap / npm) +brew install pablocalofatti/tap/cortexmem # Homebrew +npx cortexmem-install # npm +# Then: +cortexmem setup +``` + +If you prefer to inspect before running, you can clone the [repo](https://github.com/pablocalofatti/cortexmem) and build from source with `cargo install --path .`. + +That's it. Your agent now has persistent memory. + +## The Dashboard Nobody Asked For (But Everyone Likes) + +I'll be honest: I built the TUI because I wanted to *see* what my agent was remembering. It turned into a 7-screen terminal dashboard with a Catppuccin Mocha theme that lets you browse memories, watch search results in real-time, inspect session histories, and manage the memory lifecycle. + +![cortexmem TUI dashboard showing memory statistics by tier and type](./assets/cortexmem-dashboard.png) + +Is it strictly necessary? No. Is it satisfying to watch your agent's memories accumulate in a beautifully-themed terminal UI? Absolutely. + +## Things I Learned Building This + +**SQLite is underrated for AI workloads.** WAL mode gives you concurrent reads with writes. FTS5 gives you full-text search. sqlite-vec gives you vector operations. All in a single file, no server process, zero configuration. For a single-user tool like this, it's perfect. + +**Deduplication is harder than it sounds.** SHA-256 content hashing catches exact duplicates, but agents love to rephrase the same observation slightly differently. Topic key upsert handles the semantic deduplication: if two memories are about the same topic, the newer one wins. Not perfect, but practical. + +**Memory decay needs to be aggressive.** Early versions kept everything forever, and search quality degraded fast. The three-tier lifecycle with automatic decay was the fix. Buffer memories that never get accessed fade away. Core memories that prove their value stick around. It mirrors the psychology: not every thought deserves to be permanent. + +**152 integration tests** keep the whole thing honest. When you're building a memory system, correctness is not optional. A wrong recall is worse than no recall. + +## The Road From Here + +Cortexmem is at v1.4 now, and the roadmap is about making memories more *connected*: + +- **Memory graphs**: linking related observations so the agent can traverse context, not just search for it +- **Multi-agent memory sharing**: when one agent learns something, others benefit +- **Smarter summarization**: automatically condensing old memories into higher-level insights + +The git sync feature already lets you share memories across machines via a plain git repo. Cloud sync via PostgreSQL is there for teams. But the real frontier is making memory *compositional*: not just "I remember X" but "X relates to Y, which contradicts Z, and here's why we chose X anyway." + +## Try It + +If you use AI coding agents for real work (not demos, not tutorials, actual projects with decisions and history) cortexmem might be the missing piece. + +```bash +brew install pablocalofatti/tap/cortexmem +cortexmem setup +``` + +Two commands. Your agent remembers now. + +The repo is at [github.com/pablocalofatti/cortexmem](https://github.com/pablocalofatti/cortexmem). Stars help with visibility, issues help with direction, and PRs are welcome. The entire thing is open source. + +Because the best coding partner isn't the smartest one. It's the one that remembers what you've been building together. + +--- + +*If you liked this, you might also enjoy my previous post on why [Bicycles Are All Your AI Agents Need](https://dev.to/cloudx/bicycles-are-all-your-ai-agents-need-51oc). I write about building developer tools that make AI agents actually useful for real engineering work.* diff --git a/posts/pablocalofatti/assets/cortexmem-cli.png b/posts/pablocalofatti/assets/cortexmem-cli.png new file mode 100644 index 00000000..1c5c4962 Binary files /dev/null and b/posts/pablocalofatti/assets/cortexmem-cli.png differ diff --git a/posts/pablocalofatti/assets/cortexmem-cover.png b/posts/pablocalofatti/assets/cortexmem-cover.png new file mode 100644 index 00000000..9726b86d Binary files /dev/null and b/posts/pablocalofatti/assets/cortexmem-cover.png differ diff --git a/posts/pablocalofatti/assets/cortexmem-dashboard.png b/posts/pablocalofatti/assets/cortexmem-dashboard.png new file mode 100644 index 00000000..fb18ad06 Binary files /dev/null and b/posts/pablocalofatti/assets/cortexmem-dashboard.png differ