feat(embeddings): LlamaCppEmbeddingProvider behind MYCELIUM_LLM_PROVIDER — #178 part 1#187
Merged
Merged
Conversation
…DER — issue #178 part 1 Lands the embedding side of the Spike 2 recommendation (PR #182, docs/native-llm-spike.md) as a real provider in mcp-server/src/services/embeddings.ts. Default provider stays `ollama` so existing installs are unaffected; native-app installs (#176, Welle 1) will set MYCELIUM_LLM_PROVIDER=llama-cpp explicitly. What changes - New LlamaCppEmbeddingProvider implementing the existing EmbeddingProvider contract (embed/dimensions). Lazy init: first embed() triggers GGUF download into the OS app-data dir, model load, and embedding-context creation. Subsequent calls reuse the context. - createEmbeddingProvider() factory honours MYCELIUM_LLM_PROVIDER ('ollama' | 'llama-cpp', alias 'llamacpp'). Unknown values throw at startup rather than silently falling back. - Knobs: MYCELIUM_LLAMA_MODELS_DIR, MYCELIUM_LLAMA_EMBEDDING_MODEL_URI; default URI is the dim-768 GGUF that matches our schema. The provider cross-checks the loaded model's embeddingVectorSize against the configured dimensions and throws on mismatch instead of producing wrong-shaped vectors. - Concurrency: PGlite-style Promise-chain serializer (matches PR #185). Single embedding context handles one call at a time; queue keeps a failed embed from poisoning subsequent ops. - Packaging: node-llama-cpp added as optionalDependencies so niche or unsupported platforms can still install the rest of mycelium. Dynamic import inside the provider gives a clear "install node-llama-cpp or switch to ollama" error if the runtime is absent. - Tests: 7 new unit tests for selection/dimensions/path defaults; one opt-in real-GGUF e2e test gated on MYCELIUM_TEST_LLAMA_CPP=1 (downloads ~84 MB on first run, skipped in CI by default). Out of scope - Cosine-similarity cross-validation against Ollama (deferred to a CI gate before any default flip — tokenizer warning in the spike doc). - Automatic SHA-256 verification of GGUFs (Pillar 6 follow-up). - Chat provider + REM-digest re-route (separate sub-task). All 946 unit tests still pass; spike doc updated with the integration contract. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 2, 2026
Dewinator
added a commit
that referenced
this pull request
May 2, 2026
…-in (#176 sub-task 5) Closes the Windows/Linux GPU strategy gap that docs/native-llm-spike.md explicitly deferred (line 121: "sub-task 5 of #176 is where DirectML/CUDA/ Vulkan tradeoffs get owned"). Verified empirically against node-llama-cpp 3.18.1's package matrix in experiments/native-llm/node_modules/.package-lock.json. Headline findings: - The epic table's "DirectML" entry is wrong — node-llama-cpp 3.x ships no DirectML backend. Actual matrix: Metal (mac-arm64), Vulkan + CUDA (win-x64, linux-x64), CPU only on ARM Windows / ARM Linux / mac-x64. - Vulkan, not CUDA, is the universal-GPU default for Windows + Linux x64 (covers Nvidia + AMD + Intel from one binary; CUDA is Nvidia-only and ships with a 400+ MB runtime extension). - PR #187's hard-coded `gpu: "auto"` is already correct under this policy — node-llama-cpp's auto-picker order (Metal → CUDA → Vulkan → CPU) is what we want; no follow-up needed. - CUDA opt-in via Settings toggle that npm-installs the cuda backend into the data dir at runtime, not the installer — same pattern as model files. Five follow-up tickets enumerated for when Reed's queue drains. Not filed yet (proposed-by-agent at 3-cap, Tauri-shell scaffolding lands first). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Merged
3 tasks
Dewinator
added a commit
that referenced
this pull request
May 3, 2026
…n W4.1
W4.1 of docs/wave-4-anti-echo.md says "the first PR of this wave creates
the directory" — this is that scaffold. Lands two files only:
- mcp-server/src/__tests__/fixtures/anti-echo/README.md
Developer-facing spec for the corpus shape, mirrors the governance
rules from the anchor doc but written for the file-format reader.
- mcp-server/src/__tests__/fixtures/anti-echo/corpus-types.ts
`AntiEchoCorpusFixture` + `AntiEchoCohortFixture` discriminated union
over the v1.1 Lesson envelope (services/wire-types.ts). Types only,
no loader, no harness — those land alongside the first concrete
fixture per category in subsequent PRs.
Why scaffold-first instead of one big "land all 8 fixtures" PR: the eight
attack categories from wave-4-anti-echo.md §"Corpus categories" each have
their own subtleties (cohort vs single-envelope, signing-key handling,
which §10 mechanism asserts). Decomposing into one fixture per follow-up
PR keeps each diff reviewable and lets the harness shape evolve from the
first concrete fixture rather than from speculation.
Why this can land while the 9-PR native-app queue is open: the new
directory lives entirely under `__tests__/fixtures/`, so it has zero file
overlap with the native-app stack (#185 / #187 / #188 / #189 / #190 /
#191 / #192 / #193 / #194). 939/939 node --test tests still green;
`tsc --noEmit` clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dewinator
added a commit
that referenced
this pull request
May 3, 2026
…rain) Reed merged 10 PRs today: all 3 W4.1 anti-echo (#197/#198/#201), both W2 federation (#199/#200), 5 native-app (#190/#191/#192/#193/#194). Only the linear 4-PR #178-stack remains open (#185 independent + #187 → #188 → #189 strictly stacked). Three-cohort split collapsed to one cohort — old order- independence proofs (143rd/148th tick) now obsolete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Dewinator
added a commit
that referenced
this pull request
May 3, 2026
…open While the previous commit was being prepared, Reed merged the remaining 4-PR native-app stack (#185, #187, #188, #189). All 14 PRs from yesterday's queue are now on main: 9 Native-App + 3 W4.1 + 2 W2. Key shift: the "wait for queue drain" block on Native-App Sub-Tasks 3, 6, 7, 8, 9, 10 is gone. Implementation can resume directly from the spikes in docs/native-*-spike.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
docs/native-llm-spike.md) as a real provider — the centerpiece of the native-app track (epic: native standalone app (no Docker) — macOS / Windows / Linux #176, Welle 1).MYCELIUM_LLM_PROVIDER=llama-cppto swap the in-processnode-llama-cppprovider in for Ollama. Default staysollama, so existing installs are unaffected.EmbeddingProvidercontract → all consumers (MemoryService,import_markdown, REM digest pre-stage) work unchanged.What changes
LlamaCppEmbeddingProvider(new) — lazy-init, dynamic import ofnode-llama-cpp, OS-native models dir (~/Library/Application Support/mycelium/modelson macOS,%APPDATA%\mycelium\modelson Windows,$XDG_DATA_HOME/mycelium/modelson Linux), 768d default to matchVECTOR(768)schema, dim-mismatch guard, PGlite-style queue serializer (matches PR feat(native): PGlite adapter foundation — issue #184 part 1 #185).createEmbeddingProvider()— env-driven factory:ollama(default) |llama-cpp(aliasllamacpp). Unknown values throw at startup.node-llama-cpp@^3.4.0added asoptionalDependenciesso niche/unsupported platforms can still install the rest of mycelium. Dynamic import inside the provider yields a clear "install node-llama-cpp or switch to ollama" error if the runtime is absent.MYCELIUM_TEST_LLAMA_CPP=1(downloads ~84 MB on first run, skipped in CI).docs/native-llm-spike.mdadds an "Implementation 1" section documenting the integration contract.Why default is not flipped
The spike (
docs/native-llm-spike.md§3) flagged a tokenizer-config warning on the nomic-embed-text GGUF: cross-path embedding parity with Ollama is not yet validated, and divergence below ~0.95 cosine would invalidate stored vectors on existing installs. Default-flip waits until a CI gate verifies parity. Fresh native-app installs (#176 sub-task 4, Tauri shell) start empty, so they can opt in safely.Out of scope
Test plan
npm run build— cleannpm test— 946 pass, 1 skipped (gated e2e), 0 failMYCELIUM_TEST_LLAMA_CPP=1 npm test -- --test-name-pattern 'real GGUF'(verifies first-run download, 768d, queued concurrent embeds)MYCELIUM_LLM_PROVIDER=llama-cpp npm run dev,remember()a memory,recall()it back🤖 Generated with Claude Code