CrossModalRAG

This repository contains a local-first foundation for a cross-modal memory system.

Current scope:

Ingest markdown notes into SQLite.
Ingest git commits and diffs into SQLite.
Create searchable chunks from evidence.
Query with a simple lexical retriever and get cited evidence.

Quickstart

Create a virtual environment and install:

python -m venv .venv
source .venv/bin/activate
pip install -e .

Initialize the local database:

mem init-db

Optional: seed deterministic synthetic notes + git history for smoke testing:

mem seed-sample

Ingest notes:

mem ingest-notes /path/to/obsidian/vault

Or ingest multiple vaults in one command:

mem ingest-notes /path/to/vault-a /path/to/vault-b

If you omit paths, mem ingest-notes will use all OBSIDIAN_VAULT_PATH_<n> values from your local .env.

Ingest git history:

mem ingest-git /path/to/repo --max-commits 300

Or ingest multiple repos in one command:

mem ingest-git /path/to/repo-a /path/to/repo-b --max-commits 300

If you omit paths, mem ingest-git will use all REPO_PATH_<n> values from your local .env.

Ask a question:

mem ask "Why did I change the parser?" --top-k 5

Run retrieval evaluation (using seeded sample queries or your own queries_eval rows):

mem eval --top-k 5

CLI Commands

mem init-db
mem seed-sample [--workspace-dir PATH] [--force]
mem ingest-notes [<vault_path> ...] (falls back to .env OBSIDIAN_VAULT_PATH_*)
mem ingest-git [<repo_path> ...] [--max-commits N] (falls back to .env REPO_PATH_*)
mem ask "<query>" [--top-k N]
mem eval [--top-k N] [--query-prefix PREFIX] [--load-queries PATH.json]

Synthetic Sample Seed Workflow

Use mem seed-sample to create a tiny deterministic sample vault + sample git repo and ingest them into an isolated sample DB.

Creates a local workspace at ./data/sample-seed-workspace by default
Writes to a separate temp sample DB by default (does not modify your main ./data/memory.db)
Seeds synthetic markdown notes and git commits (no personal data)
Populates namespaced sample rows in queries_eval for future eval/smoke tests
Safe to re-run; unchanged content is reused and ingestion remains idempotent

Run the sample retrieval benchmark:

mem eval --query-prefix "[sample]" --top-k 5

Use --force to rebuild the sample workspace directory from scratch. Use --db-path if you want the sample dataset in a specific non-main database path.

Data Location

By default, data is stored at:

./data/memory.db

Set CMRAG_DB_PATH to override:

export CMRAG_DB_PATH=/absolute/path/to/memory.db

Evaluation Query File Format (`mem eval --load-queries`)

Use a JSON array of rows:

[
  {
    "query_text": "What fixed the parser bounds check bug?",
    "expected_source_uris": [
      "/abs/path/to/repo@abc123",
      "/abs/path/to/note/http-parser.md"
    ]
  }
]

mem eval --load-queries file.json upserts rows into queries_eval and then runs metrics (Recall@K, MRR@K, and an approximate citation hit-rate based on the top retrieved source).

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
scripts		scripts
src/crossmodalrag		src/crossmodalrag
tests		tests
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossModalRAG

Quickstart

CLI Commands

Synthetic Sample Seed Workflow

Data Location

Evaluation Query File Format (`mem eval --load-queries`)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CrossModalRAG

Quickstart

CLI Commands

Synthetic Sample Seed Workflow

Data Location

Evaluation Query File Format (mem eval --load-queries)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Evaluation Query File Format (`mem eval --load-queries`)

Packages