Marrow is a local-first, hybrid search engine for Markdown repositories. It combines full-text search (FTS5) with vector similarity (sqlite-vec) in a single SQLite database.
It can also index live GitHub issues and pull requests via a GitHub App, making them searchable alongside your documentation.
π Visit the website for a plain-English overview, quick start, and live demo.
curl -sSL https://raw.githubusercontent.com/enekos/marrow/master/install.sh | shThis downloads the latest release for your platform and copies the marrow binary into ~/.local/bin (or /usr/local/bin if writable).
- Go to the Releases page.
- Download
marrow_vX.Y.Z_windows_amd64.zip. - Extract
marrow.exeand place it in a folder on yourPATH.
Visit the Releases page, pick the archive matching your OS and architecture, extract it, and move the marrow binary to a directory on your PATH.
# Index a local directory of markdown files
make run-sync
# or: go run -tags sqlite_fts5 . sync -dir ./docs -db marrow.db -source local -default-lang en
# Start the search API (with optional GitHub repo sync)
go run -tags sqlite_fts5 . serve \
-db marrow.db \
-addr :8080 \
-detect-lang=true \
-default-lang en \
-repo-url https://github.com/owner/private-repo \
-repo-token $GITHUB_TOKEN \
-webhook-secret $WEBHOOK_SECRET \
-source github \
-local-path ./repo
# Start with GitHub App integration (issues + PRs)
go run -tags sqlite_fts5 . serve \
-db marrow.db \
-addr :8080 \
-repo-url https://github.com/owner/private-repo \
-github-app-id 3384614 \
-github-app-private-key /path/to/app-private-key.pem \
-github-webhook-secret $GITHUB_WEBHOOK_SECRETPOST /search
{
"q": "go best practices",
"limit": 10,
"lang": "en"
}When the JSON request does not include lang, the server uses automatic language detection by default. You can disable detection and force a default language with the -detect-lang and -default-lang CLI flags:
-detect-langβ Enable or disable automatic query-language detection (default:true).-default-langβ Fallback language code used when detection is disabled or when a document has nolangfrontmatter (default:en, supported:en,es,eu).
POST /webhook
Trigger a re-sync from the configured GitHub repo:
curl -X POST http://localhost:8080/webhook \
-H "X-Marrow-Secret: your-secret-key"Returns 202 Accepted immediately and performs the sync in the background.
When -github-webhook-secret is set, /webhook also accepts real GitHub App webhooks. It automatically handles:
issues(opened, edited, reopened, closed)pull_request(opened, edited, reopened, synchronize, closed)issue_comment/pull_request_review_comment(created, edited)
Configure your GitHub App's webhook URL to point to https://your-host/webhook.
make build # local build (requires -tags sqlite_fts5)
make build-vps # production build with multi-site support
make testMarrow can run as a persistent search service on a VPS, serving multiple static sites simultaneously.
Compile with the vps build tag to enable per-site CORS, API keys, rate limiting, and scheduled background sync:
go build -tags "sqlite_fts5 vps" -o marrow .Without the vps tag, the binary behaves exactly as before β ideal for local development.
See docs/vps-deployment.md for a complete walkthrough. The short version:
cp .env.example .env
# edit config.toml and Caddyfile with your domains
docker compose up -dCaddy provisions TLS automatically, and Marrow handles per-site search isolation via the Host header.
Define sites in config.toml:
[[sites]]
name = "blog"
hosts = ["search.blog.example.com"]
cors_origins = ["https://blog.example.com"]
api_key = "sk_blog_xxx"
sources = ["blog-docs"]
rate_limit_rps = 20See docs/multi-site.md for full details.
internal/watcherβ incremental local directory crawler using mtime + SHA-256internal/gitpullβ clones or pulls GitHub repos with token auth; returns changed file listinternal/syncβ orchestrates the full ingestion pipeline and deletes stale docsinternal/markdownβ extracts YAML frontmatter and plain text via goldmarkinternal/stemmerβ Unicode-aware tokenization, stopword removal, Snowball stemming (en/es/eu)internal/embedβ deterministic mock embeddings (stable 384-dim vectors)internal/dbβ SQLite schema withfts5,sqlite-vec, andsync_statetableinternal/indexβ ingestion pipeline with source trackinginternal/searchβ hybrid ranking via Reciprocal Rank Fusion (70% FTS + 30% vector) + stemmed title boostinternal/githubapiβ GitHub App authenticated client for issues and PRs
- Local directories: Only files with
mtimenewer thanlast_sync_atare hashed and re-indexed. Deleted files are removed from the DB. - GitHub repos:
git fetch --depth 1+git reset --hard FETCH_HEAD. Only files changed in the latest commit are re-indexed. - GitHub issues & PRs: Fetched via the GitHub API on startup and kept in sync via webhooks. Closed items are automatically removed from the search index.
- Create a GitHub App and note the App ID.
- Generate and download a private key (PEM file).
- Install the app on the target repository or organization.
- Run Marrow with
-github-app-id,-github-app-private-key, and optionally-github-webhook-secret. - Set the app's webhook URL to your Marrow instance's
/webhookendpoint.
GitHub items are indexed with synthetic paths like gh:owner/repo/issues/123 and gh:owner/repo/pull/456. The search API returns a doc_type field so you can distinguish between markdown files, issues, and pull requests.
Marrow ships a composite GitHub Action that builds a semantic "related articles"
graph for any Markdown-based static site as part of CI. See
docs/github-action.md.
- uses: enekos/marrow@v0
with:
content-dir: content
embedding-provider: openai
openai-api-key: ${{ secrets.OPENAI_API_KEY }}
output-path: data/related.json- Build requires
-tags sqlite_fts5because FTS5 is gated behind a build tag inmattn/go-sqlite3. - Stemming is implemented in pure Go: English (Porter2), Spanish (Snowball), and Basque (Snowball). No external NLP services required.
- An embedding provider must be configured explicitly via
embedding.provider(mock,ollama, oropenai). Dimensions are validated at startup against the 384-dim schema. The mock provider is for tests/CI β it returns pseudo-random but deterministic vectors derived from SHA-256 and gives a loud warning when used.