feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1 by Dewinator · Pull Request #187 · Dewinator/mycelium

Dewinator · 2026-05-02T14:42:52Z

Summary

Lands the embedding side of Spike 2's recommendation (PR spike(native-llm): node-llama-cpp validates as in-process LLM (#178) #182, docs/native-llm-spike.md) as a real provider — the centerpiece of the native-app track (epic: native standalone app (no Docker) — macOS / Windows / Linux #176, Welle 1).
Sets MYCELIUM_LLM_PROVIDER=llama-cpp to swap the in-process node-llama-cpp provider in for Ollama. Default stays ollama, so existing installs are unaffected.
Identical EmbeddingProvider contract → all consumers (MemoryService, import_markdown, REM digest pre-stage) work unchanged.

What changes

LlamaCppEmbeddingProvider (new) — lazy-init, dynamic import of node-llama-cpp, OS-native models dir (~/Library/Application Support/mycelium/models on macOS, %APPDATA%\mycelium\models on Windows, $XDG_DATA_HOME/mycelium/models on Linux), 768d default to match VECTOR(768) schema, dim-mismatch guard, PGlite-style queue serializer (matches PR feat(native): PGlite adapter foundation — issue #184 part 1 #185).
createEmbeddingProvider() — env-driven factory: ollama (default) | llama-cpp (alias llamacpp). Unknown values throw at startup.
Packaging — node-llama-cpp@^3.4.0 added as optionalDependencies so niche/unsupported platforms can still install the rest of mycelium. Dynamic import inside the provider yields a clear "install node-llama-cpp or switch to ollama" error if the runtime is absent.
Tests — 7 new unit tests (selection, dim, path defaults, alias, error path) plus an opt-in real-GGUF e2e test gated on MYCELIUM_TEST_LLAMA_CPP=1 (downloads ~84 MB on first run, skipped in CI).
Docs — docs/native-llm-spike.md adds an "Implementation 1" section documenting the integration contract.

Why default is not flipped

The spike (docs/native-llm-spike.md §3) flagged a tokenizer-config warning on the nomic-embed-text GGUF: cross-path embedding parity with Ollama is not yet validated, and divergence below ~0.95 cosine would invalidate stored vectors on existing installs. Default-flip waits until a CI gate verifies parity. Fresh native-app installs (#176 sub-task 4, Tauri shell) start empty, so they can opt in safely.

Out of scope

Cosine-similarity cross-validation against Ollama (deferred CI gate)
SHA-256 verification of GGUF downloads (Pillar 6 follow-up)
Chat provider + REM-digest re-route (separate sub-task of feat(app): Spike 2 — node-llama-cpp embedding + chat bridge (drop Ollama dependency) #178)

Test plan

npm run build — clean
npm test — 946 pass, 1 skipped (gated e2e), 0 fail
Local opt-in: MYCELIUM_TEST_LLAMA_CPP=1 npm test -- --test-name-pattern 'real GGUF' (verifies first-run download, 768d, queued concurrent embeds)
Manual: MYCELIUM_LLM_PROVIDER=llama-cpp npm run dev, remember() a memory, recall() it back

🤖 Generated with Claude Code

…DER — issue #178 part 1 Lands the embedding side of the Spike 2 recommendation (PR #182, docs/native-llm-spike.md) as a real provider in mcp-server/src/services/embeddings.ts. Default provider stays `ollama` so existing installs are unaffected; native-app installs (#176, Welle 1) will set MYCELIUM_LLM_PROVIDER=llama-cpp explicitly. What changes - New LlamaCppEmbeddingProvider implementing the existing EmbeddingProvider contract (embed/dimensions). Lazy init: first embed() triggers GGUF download into the OS app-data dir, model load, and embedding-context creation. Subsequent calls reuse the context. - createEmbeddingProvider() factory honours MYCELIUM_LLM_PROVIDER ('ollama' | 'llama-cpp', alias 'llamacpp'). Unknown values throw at startup rather than silently falling back. - Knobs: MYCELIUM_LLAMA_MODELS_DIR, MYCELIUM_LLAMA_EMBEDDING_MODEL_URI; default URI is the dim-768 GGUF that matches our schema. The provider cross-checks the loaded model's embeddingVectorSize against the configured dimensions and throws on mismatch instead of producing wrong-shaped vectors. - Concurrency: PGlite-style Promise-chain serializer (matches PR #185). Single embedding context handles one call at a time; queue keeps a failed embed from poisoning subsequent ops. - Packaging: node-llama-cpp added as optionalDependencies so niche or unsupported platforms can still install the rest of mycelium. Dynamic import inside the provider gives a clear "install node-llama-cpp or switch to ollama" error if the runtime is absent. - Tests: 7 new unit tests for selection/dimensions/path defaults; one opt-in real-GGUF e2e test gated on MYCELIUM_TEST_LLAMA_CPP=1 (downloads ~84 MB on first run, skipped in CI by default). Out of scope - Cosine-similarity cross-validation against Ollama (deferred to a CI gate before any default flip — tokenizer warning in the spike doc). - Automatic SHA-256 verification of GGUFs (Pillar 6 follow-up). - Chat provider + REM-digest re-route (separate sub-task). All 946 unit tests still pass; spike doc updated with the integration contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-in (#176 sub-task 5) Closes the Windows/Linux GPU strategy gap that docs/native-llm-spike.md explicitly deferred (line 121: "sub-task 5 of #176 is where DirectML/CUDA/ Vulkan tradeoffs get owned"). Verified empirically against node-llama-cpp 3.18.1's package matrix in experiments/native-llm/node_modules/.package-lock.json. Headline findings: - The epic table's "DirectML" entry is wrong — node-llama-cpp 3.x ships no DirectML backend. Actual matrix: Metal (mac-arm64), Vulkan + CUDA (win-x64, linux-x64), CPU only on ARM Windows / ARM Linux / mac-x64. - Vulkan, not CUDA, is the universal-GPU default for Windows + Linux x64 (covers Nvidia + AMD + Intel from one binary; CUDA is Nvidia-only and ships with a 400+ MB runtime extension). - PR #187's hard-coded `gpu: "auto"` is already correct under this policy — node-llama-cpp's auto-picker order (Metal → CUDA → Vulkan → CPU) is what we want; no follow-up needed. - CUDA opt-in via Settings toggle that npm-installs the cuda backend into the data dir at runtime, not the installer — same pattern as model files. Five follow-up tickets enumerated for when Reed's queue drains. Not filed yet (proposed-by-agent at 3-cap, Tauri-shell scaffolding lands first). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…n W4.1 W4.1 of docs/wave-4-anti-echo.md says "the first PR of this wave creates the directory" — this is that scaffold. Lands two files only: - mcp-server/src/__tests__/fixtures/anti-echo/README.md Developer-facing spec for the corpus shape, mirrors the governance rules from the anchor doc but written for the file-format reader. - mcp-server/src/__tests__/fixtures/anti-echo/corpus-types.ts `AntiEchoCorpusFixture` + `AntiEchoCohortFixture` discriminated union over the v1.1 Lesson envelope (services/wire-types.ts). Types only, no loader, no harness — those land alongside the first concrete fixture per category in subsequent PRs. Why scaffold-first instead of one big "land all 8 fixtures" PR: the eight attack categories from wave-4-anti-echo.md §"Corpus categories" each have their own subtleties (cohort vs single-envelope, signing-key handling, which §10 mechanism asserts). Decomposing into one fixture per follow-up PR keeps each diff reviewable and lets the harness shape evolve from the first concrete fixture rather than from speculation. Why this can land while the 9-PR native-app queue is open: the new directory lives entirely under `__tests__/fixtures/`, so it has zero file overlap with the native-app stack (#185 / #187 / #188 / #189 / #190 / #191 / #192 / #193 / #194). 939/939 node --test tests still green; `tsc --noEmit` clean. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

#188)

…rain) Reed merged 10 PRs today: all 3 W4.1 anti-echo (#197/#198/#201), both W2 federation (#199/#200), 5 native-app (#190/#191/#192/#193/#194). Only the linear 4-PR #178-stack remains open (#185 independent + #187 → #188 → #189 strictly stacked). Three-cohort split collapsed to one cohort — old order- independence proofs (143rd/148th tick) now obsolete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…open While the previous commit was being prepared, Reed merged the remaining 4-PR native-app stack (#185, #187, #188, #189). All 14 PRs from yesterday's queue are now on main: 9 Native-App + 3 W4.1 + 2 W2. Key shift: the "wait for queue drain" block on Native-App Sub-Tasks 3, 6, 7, 8, 9, 10 is gone. Implementation can resume directly from the spikes in docs/native-*-spike.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Dewinator mentioned this pull request May 2, 2026

feat(security): MYCELIUM_LLAMA_REQUIRE_CHECKSUM=1 fail-closed — Pillar 6 follow-up to #188 #189

Merged

3 tasks

Dewinator mentioned this pull request May 3, 2026

feat(wave-4): W4.1 — fill the anti-echo corpus (8 fixtures + node:test harness) #196

Open

feat(security): GGUF SHA-256 verification — Pillar 6 follow-up to #187 (

37fb15d

#188)

Dewinator merged commit 1e3e701 into main May 3, 2026
1 check passed

This was referenced May 3, 2026

feat(app): Tauri 2 shell scaffold — main window, tray, sidecar wiring (#176 sub-task 3) #203

Merged

feat(eval): built-in LocomoRegressionAgent — periodic memory-quality regression bench #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1#187

feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1#187
Dewinator merged 2 commits into
mainfrom
agent/llama-cpp-embedding-178

Dewinator commented May 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Dewinator commented May 2, 2026

Summary

What changes

Why default is not flipped

Out of scope

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant