Skip to content

Epic: Felix-LM v2 — Splice-Native Bespoke Model #438

@CalebisGross

Description

@CalebisGross

Context

Mnemonic's current production base is Gemma 4 E2B + Felix-LM low-rank spokes (rank 64, 35 layers, RQ4 quantized to ~3GB on RX 7800 XT). Two structural problems with this stack drive the v2 epic:

  1. RQ4 quantization collapses behavior. EXP-34.2 (2026-05-10): the 32-element BetaQ codebook rounds small but semantically critical logits (refusal tokens at <1% probability) to zero. Python serve Likert 3.95 → GGUF Likert 3.03 (Δ −0.92). The bf16 weights are an upper bound on deployment quality, not a faithful preview. A separate kernel-layer track is fixing RQ4 itself; this epic is the orthogonal model-side track.

  2. The architecture is someone else's wall with our paint on it. The post is Google's; the spokes are ours; crispr-lm splices into spoke tensors that were never designed for live editing. No architectural lever to keep refusal logits loud enough to survive RQ4 rounding, no circuit-stability guarantee across many sequential edits.

Goal

A from-scratch bespoke 300–500M model running under 4GB VRAM at deployment quant on the local RX 7800 XT (no MI300X) that:

  • Beats a multi-teacher ensemble (Gemma 4 31B + Claude Haiku 4.5 + Gemini 3.1 Pro, all via API) on mnemonic's task suite (encoding, retrieval-synthesis, principle abstraction, episode summarization) — joint-min Likert with cross-grader rubric.
  • Survives the deployment quantization scheme without behavioral collapse.
  • Has splice tensors by construction — a dedicated trainable tensor surface for crispr-lm runtime edits, integrated into the architecture from layer 0, not retrofitted on someone else's frozen post.

Gemma 4 31B's role in this epic is one of three teachers in the ensemble, nothing more. Gemma 4 E2B + spokes (current production) is the v1 baseline we beat. Neither stays in the final shipped model.

Constraints (locked by user, 2026-05-12)

  • Pretrain from scratch. No continued-pretrain on a vendor base.
  • Local-only RX 7800 XT 16GB. No MI300X. Forces the model size into the 300–500M range — what fits training-time memory at seq_len 2048 with grad checkpointing.
  • Splice tensors by construction.
  • Multi-teacher ensemble as 30B reference. Output-matching black-box distillation only.
  • Deployment VRAM cap <4GB at quant.
  • No timeline pressure but no wasted cycles.

Compute Envelope Math

Target Params Training memory Throughput Chinchilla (20 tok/param) Modestly overtrained (100 tok/param)
300M ~7GB (weights+opt+acts) ~6500 tok/s 6B tok ≈ 11 days 30B tok ≈ 53 days
500M ~10GB ~4100 tok/s 10B tok ≈ 28 days 50B tok ≈ 140 days
1B ~12GB (tight) ~2000 tok/s 20B tok ≈ 115 days (too long)

Initial target: 300M, ~30B tokens (~100 tok/param), ~30 days wall-clock. If 300M doesn't clear the quality bar, scale to 500M with lessons learned.

Architecture: Felix-LM v2

To be frozen in EXP-41. Initial candidate: 300M dense transformer, 24 layers × hidden 1024 × 16 heads, FFN 4096, vocab 32K (custom BPE on mnemonic domain text), RoPE, SpliceCore tensors at every 4th layer (rank 8 each) — a dedicated trainable tensor surface for crispr-lm edits, distinct from any per-task spokes. Alternative under consideration: 400M parallel Mamba-2 + attention hybrid for self-speculation compatibility (Component-Aware Self-Spec arxiv:2605.01106).

Training Recipe

  • Multi-teacher harvest. Gemma 4 31B (OpenRouter/Together) + Haiku 4.5 (Anthropic Batch) + Gemini 3.1 Pro (Google Batch). ~5–10K examples per task per teacher.
  • FinePhrase-style synthetic structured-output pretraining (arxiv:2604.13977) — generator size plateaus at 1B, so the existing Gemma 4 E2B daemon (or Haiku batch) can produce the entire corpus cheaply.
  • SODA semi-on-policy black-box distillation (arxiv:2604.03873) on task-aligned slices — 10× faster + 27% less GPU memory than full OPD.
  • QUAIL refusal-loud hinge loss (arxiv:2601.15538, adapted) on a whitelist of refusal-emit and schema-critical tokens, baked in from step 0.
  • WSD LR schedule.
  • Scaling-law-aware data/compute split per Prescriptive Scaling (arxiv:2605.01640) and InfoLaw (arxiv:2605.02364).

Pre-Registered Experiment Sequence

Full pre-registration in training/docs/experiment_registry.md Phase 7. Each entry has hypothesis / variable / control / prediction / fail-conditions per .claude/rules/scientific-method.md.

ID Name Role
EXP-40 QUAIL hinge probe on existing v3.2 spoke Half-day technique-validation probe. NOT the main path. Confirms QUAIL implementation correctness on a known-failing Gemma checkpoint.
EXP-41 From-scratch architecture freeze Profile candidate specs (dim, layers, attention type, splice topology), produce training/docs/felix_lm_v2_design.md
EXP-42 50M arch pilot Scale-down sanity check before the 300M run. Verifies training stability, SpliceCore signal, throughput projections.
EXP-43 Multi-teacher harvest + synthetic pretrain corpus ~10–30B token corpus assembled from teacher distillation + FinePhrase-style synth + curated text.
EXP-44 300M from-scratch pretrain (the main run) ~30 days wall-clock on RX 7800 XT. SODA distillation + QUAIL + WSD.
EXP-45 crispr-lm splice into SpliceCore tensors Splice the from-scratch model.
EXP-46 Sequential splice circuit-stability 100+ sequential edits with bounded drift, per Old Maps arxiv:2605.06076.
EXP-47 Headline ensemble eval "300M from-scratch beats 30B ensemble on the mnemonic task suite."

Novel Contributions (publishable bars)

  1. Felix-LM v2 from-scratch splice-native architecture. Splice tensors integrated at training time, not retrofitted as adapters.
  2. QUAIL-for-refusal baked into pretraining. Extends QUAIL's logit-space hinge loss from unlearning to refusal-token preservation under blockwise quantization.
  3. Multi-teacher ensemble distillation for cognitive task suite. Unifies SODA + Uni-OPD with multi-teacher diversity.
  4. Live splice circuit-stability bounds. Sequential-edit Circuit Distance / Stability measurements on a from-scratch model.
  5. Single-consumer-GPU bespoke training. The whole stack trained on a single 16GB AMD RX 7800 XT, no datacenter compute.

Files (planned)

New:

  • training/configs/felix_lm_v2_architecture.yaml
  • training/configs/felix_lm_v2_data_mix.yaml
  • training/scripts/felix_lm_v2_model.py — from-scratch architecture
  • training/scripts/train_felix_lm_v2.py — pretraining + distillation + QUAIL loop
  • training/scripts/quail_loss.pyDONE (committed in af37aa9)
  • training/scripts/harvest_multi_teacher.py
  • training/scripts/synth_structured_pretrain.py — FinePhrase-style rephraser
  • training/scripts/tokenizer_train.py — custom BPE on mnemonic domain text
  • training/docs/felix_lm_v2_design.md — architecture spec + publishable paper draft skeleton

Reused without modification:

  • training/scripts/harvest_gemini_batch.py, diff_teacher_corpora.py — generalized in harvest_multi_teacher.py
  • training/scripts/validate.py — faithfulness gate
  • training/scripts/download_*.py — corpus ingest
  • training/scripts/audit_mix.py — data mix audit
  • training/scripts/characterize_serve_output.py, compare_models.py, stress_test_hallucination.py — eval

Modified:

  • training/scripts/train_spokes.pyDONE (QUAIL wire-in committed in af37aa9; lives there to serve EXP-40 probe)
  • training/docs/experiment_registry.mdDONE (Phase 7 pre-registration committed in af37aa9, will be updated to from-scratch framing)
  • third_party/llama.cpp/ — coordinate with parallel quantization-kernel agent on GGUF format for the from-scratch architecture; the from-scratch model is not Gemma so it needs custom GGUF support in the fork
  • internal/api/routes/splice_tensor.go — extend for SpliceCore tensor names if the new architecture's tensor naming differs

Branch

feat/felix-lm-v2-splice-native. First commit af37aa9 adds the QUAIL loss module + EXP-40 wire-in + Phase 7 pre-registration.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions