Prerequisites
Expected Behavior
Llama.embed() should successfully compute embeddings when called on a model constructed with embeddings=True.
Current Behavior
Llama.embed() raises a TypeError immediately, before any embedding is computed:
TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'
The cause: Llama.embed() in llama_cpp/llama.py (around line 1678) calls add_sequence with three positional arguments:
self._batch.add_sequence(tokens, p_batch, logits_all)
But LlamaBatch.add_sequence in llama_cpp/_internals.py (around line 1013) requires four:
def add_sequence(
self,
token_array: Sequence[int],
pos_array: Sequence[int],
seq_ids: Sequence[Sequence[int]],
logits_array: Sequence[bool]
)
llama_cpp/llama_embedding.py (around line 262) already calls add_sequence correctly with the four-arg shape — the call site in Llama.embed() was apparently missed during the LlamaBatch.add_sequence refactor.
Environment and Context
- Hardware: x86_64, NVIDIA GeForce RTX 4090
- OS: Windows 10 22H2
- Python 3.12.9
- llama-cpp-python 0.3.36 (CUDA 12.8 prebuilt wheel)
$ python --version
Python 3.12.9
$ pip show llama-cpp-python | findstr Version
Version: 0.3.36
Failure Information (for bugs)
This is a clean regression — LlamaBatch.add_sequence was refactored from a 3-arg signature to a 4-arg one, and the call sites were updated everywhere except in Llama.embed(). llama_embedding.py shows what the new shape should look like for the embedding code path.
Steps to Reproduce
from llama_cpp import Llama
m = Llama(model_path="path/to/model.gguf", embeddings=True)
m.embed("hello")
Result:
TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'
Failure Logs
Traceback (most recent call last):
File "...\Lib\site-packages\llama_cpp\llama.py", line 1678, in embed
self._batch.add_sequence(tokens, p_batch, logits_all)
TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'
Suggested fix
Mirror the call shape already used in llama_cpp/llama_embedding.py:
# In llama.py Llama.embed(), replace:
self._batch.add_sequence(tokens, p_batch, logits_all)
# With something like:
self._batch.add_sequence(
token_array=tokens,
pos_array=list(range(len(tokens))),
seq_ids=[p_batch],
logits_array=[True] * len(tokens) if logits_all else [False] * (len(tokens) - 1) + [True],
)
Workaround
Monkey-patching LlamaBatch.add_sequence to detect 3-arg legacy calls and synthesize the missing pos_array works as a stopgap. Hit while running Tencent's HY-Motion text-to-motion model, whose text encoder uses Llama.embed() against GGUF Qwen3 weights.
Prerequisites
Expected Behavior
Llama.embed()should successfully compute embeddings when called on a model constructed withembeddings=True.Current Behavior
Llama.embed()raises aTypeErrorimmediately, before any embedding is computed:The cause:
Llama.embed()inllama_cpp/llama.py(around line 1678) callsadd_sequencewith three positional arguments:But
LlamaBatch.add_sequenceinllama_cpp/_internals.py(around line 1013) requires four:llama_cpp/llama_embedding.py(around line 262) already callsadd_sequencecorrectly with the four-arg shape — the call site inLlama.embed()was apparently missed during theLlamaBatch.add_sequencerefactor.Environment and Context
Failure Information (for bugs)
This is a clean regression —
LlamaBatch.add_sequencewas refactored from a 3-arg signature to a 4-arg one, and the call sites were updated everywhere except inLlama.embed().llama_embedding.pyshows what the new shape should look like for the embedding code path.Steps to Reproduce
Result:
Failure Logs
Suggested fix
Mirror the call shape already used in
llama_cpp/llama_embedding.py:Workaround
Monkey-patching
LlamaBatch.add_sequenceto detect 3-arg legacy calls and synthesize the missingpos_arrayworks as a stopgap. Hit while running Tencent's HY-Motion text-to-motion model, whose text encoder usesLlama.embed()against GGUF Qwen3 weights.