Skip to content

feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1#187

Merged
Dewinator merged 2 commits into
mainfrom
agent/llama-cpp-embedding-178
May 3, 2026
Merged

feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1#187
Dewinator merged 2 commits into
mainfrom
agent/llama-cpp-embedding-178

Conversation

@Dewinator

Copy link
Copy Markdown
Owner

Summary

What changes

  • LlamaCppEmbeddingProvider (new) — lazy-init, dynamic import of node-llama-cpp, OS-native models dir (~/Library/Application Support/mycelium/models on macOS, %APPDATA%\mycelium\models on Windows, $XDG_DATA_HOME/mycelium/models on Linux), 768d default to match VECTOR(768) schema, dim-mismatch guard, PGlite-style queue serializer (matches PR feat(native): PGlite adapter foundation — issue #184 part 1 #185).
  • createEmbeddingProvider() — env-driven factory: ollama (default) | llama-cpp (alias llamacpp). Unknown values throw at startup.
  • Packagingnode-llama-cpp@^3.4.0 added as optionalDependencies so niche/unsupported platforms can still install the rest of mycelium. Dynamic import inside the provider yields a clear "install node-llama-cpp or switch to ollama" error if the runtime is absent.
  • Tests — 7 new unit tests (selection, dim, path defaults, alias, error path) plus an opt-in real-GGUF e2e test gated on MYCELIUM_TEST_LLAMA_CPP=1 (downloads ~84 MB on first run, skipped in CI).
  • Docsdocs/native-llm-spike.md adds an "Implementation 1" section documenting the integration contract.

Why default is not flipped

The spike (docs/native-llm-spike.md §3) flagged a tokenizer-config warning on the nomic-embed-text GGUF: cross-path embedding parity with Ollama is not yet validated, and divergence below ~0.95 cosine would invalidate stored vectors on existing installs. Default-flip waits until a CI gate verifies parity. Fresh native-app installs (#176 sub-task 4, Tauri shell) start empty, so they can opt in safely.

Out of scope

Test plan

  • npm run build — clean
  • npm test — 946 pass, 1 skipped (gated e2e), 0 fail
  • Local opt-in: MYCELIUM_TEST_LLAMA_CPP=1 npm test -- --test-name-pattern 'real GGUF' (verifies first-run download, 768d, queued concurrent embeds)
  • Manual: MYCELIUM_LLM_PROVIDER=llama-cpp npm run dev, remember() a memory, recall() it back

🤖 Generated with Claude Code

…DER — issue #178 part 1

Lands the embedding side of the Spike 2 recommendation (PR #182,
docs/native-llm-spike.md) as a real provider in
mcp-server/src/services/embeddings.ts. Default provider stays `ollama`
so existing installs are unaffected; native-app installs (#176, Welle 1)
will set MYCELIUM_LLM_PROVIDER=llama-cpp explicitly.

What changes
- New LlamaCppEmbeddingProvider implementing the existing
  EmbeddingProvider contract (embed/dimensions). Lazy init: first
  embed() triggers GGUF download into the OS app-data dir, model load,
  and embedding-context creation. Subsequent calls reuse the context.
- createEmbeddingProvider() factory honours MYCELIUM_LLM_PROVIDER
  ('ollama' | 'llama-cpp', alias 'llamacpp'). Unknown values throw at
  startup rather than silently falling back.
- Knobs: MYCELIUM_LLAMA_MODELS_DIR, MYCELIUM_LLAMA_EMBEDDING_MODEL_URI;
  default URI is the dim-768 GGUF that matches our schema. The provider
  cross-checks the loaded model's embeddingVectorSize against the
  configured dimensions and throws on mismatch instead of producing
  wrong-shaped vectors.
- Concurrency: PGlite-style Promise-chain serializer (matches PR #185).
  Single embedding context handles one call at a time; queue keeps a
  failed embed from poisoning subsequent ops.
- Packaging: node-llama-cpp added as optionalDependencies so niche or
  unsupported platforms can still install the rest of mycelium. Dynamic
  import inside the provider gives a clear "install node-llama-cpp or
  switch to ollama" error if the runtime is absent.
- Tests: 7 new unit tests for selection/dimensions/path defaults; one
  opt-in real-GGUF e2e test gated on MYCELIUM_TEST_LLAMA_CPP=1
  (downloads ~84 MB on first run, skipped in CI by default).

Out of scope
- Cosine-similarity cross-validation against Ollama (deferred to a CI
  gate before any default flip — tokenizer warning in the spike doc).
- Automatic SHA-256 verification of GGUFs (Pillar 6 follow-up).
- Chat provider + REM-digest re-route (separate sub-task).

All 946 unit tests still pass; spike doc updated with the integration
contract.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dewinator added a commit that referenced this pull request May 2, 2026
…-in (#176 sub-task 5)

Closes the Windows/Linux GPU strategy gap that docs/native-llm-spike.md
explicitly deferred (line 121: "sub-task 5 of #176 is where DirectML/CUDA/
Vulkan tradeoffs get owned"). Verified empirically against node-llama-cpp
3.18.1's package matrix in experiments/native-llm/node_modules/.package-lock.json.

Headline findings:
- The epic table's "DirectML" entry is wrong — node-llama-cpp 3.x ships no
  DirectML backend. Actual matrix: Metal (mac-arm64), Vulkan + CUDA (win-x64,
  linux-x64), CPU only on ARM Windows / ARM Linux / mac-x64.
- Vulkan, not CUDA, is the universal-GPU default for Windows + Linux x64
  (covers Nvidia + AMD + Intel from one binary; CUDA is Nvidia-only and
  ships with a 400+ MB runtime extension).
- PR #187's hard-coded `gpu: "auto"` is already correct under this policy
  — node-llama-cpp's auto-picker order (Metal → CUDA → Vulkan → CPU) is
  what we want; no follow-up needed.
- CUDA opt-in via Settings toggle that npm-installs the cuda backend into
  the data dir at runtime, not the installer — same pattern as model files.

Five follow-up tickets enumerated for when Reed's queue drains. Not filed
yet (proposed-by-agent at 3-cap, Tauri-shell scaffolding lands first).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dewinator added a commit that referenced this pull request May 3, 2026
…n W4.1

W4.1 of docs/wave-4-anti-echo.md says "the first PR of this wave creates
the directory" — this is that scaffold. Lands two files only:

  - mcp-server/src/__tests__/fixtures/anti-echo/README.md
    Developer-facing spec for the corpus shape, mirrors the governance
    rules from the anchor doc but written for the file-format reader.
  - mcp-server/src/__tests__/fixtures/anti-echo/corpus-types.ts
    `AntiEchoCorpusFixture` + `AntiEchoCohortFixture` discriminated union
    over the v1.1 Lesson envelope (services/wire-types.ts). Types only,
    no loader, no harness — those land alongside the first concrete
    fixture per category in subsequent PRs.

Why scaffold-first instead of one big "land all 8 fixtures" PR: the eight
attack categories from wave-4-anti-echo.md §"Corpus categories" each have
their own subtleties (cohort vs single-envelope, signing-key handling,
which §10 mechanism asserts). Decomposing into one fixture per follow-up
PR keeps each diff reviewable and lets the harness shape evolve from the
first concrete fixture rather than from speculation.

Why this can land while the 9-PR native-app queue is open: the new
directory lives entirely under `__tests__/fixtures/`, so it has zero file
overlap with the native-app stack (#185 / #187 / #188 / #189 / #190 /
#191 / #192 / #193 / #194). 939/939 node --test tests still green;
`tsc --noEmit` clean.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Dewinator Dewinator merged commit 1e3e701 into main May 3, 2026
1 check passed
Dewinator added a commit that referenced this pull request May 3, 2026
…rain)

Reed merged 10 PRs today: all 3 W4.1 anti-echo (#197/#198/#201), both W2
federation (#199/#200), 5 native-app (#190/#191/#192/#193/#194). Only the
linear 4-PR #178-stack remains open (#185 independent + #187#188#189
strictly stacked). Three-cohort split collapsed to one cohort — old order-
independence proofs (143rd/148th tick) now obsolete.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dewinator added a commit that referenced this pull request May 3, 2026
…open

While the previous commit was being prepared, Reed merged the remaining
4-PR native-app stack (#185, #187, #188, #189). All 14 PRs from yesterday's
queue are now on main: 9 Native-App + 3 W4.1 + 2 W2.

Key shift: the "wait for queue drain" block on Native-App Sub-Tasks 3, 6,
7, 8, 9, 10 is gone. Implementation can resume directly from the spikes
in docs/native-*-spike.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant