Skip to content

Llama.embed() calls LlamaBatch.add_sequence with old 3-arg signature; missing logits_array #2211

@emptyngton

Description

@emptyngton

Prerequisites

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Llama.embed() should successfully compute embeddings when called on a model constructed with embeddings=True.

Current Behavior

Llama.embed() raises a TypeError immediately, before any embedding is computed:

TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'

The cause: Llama.embed() in llama_cpp/llama.py (around line 1678) calls add_sequence with three positional arguments:

self._batch.add_sequence(tokens, p_batch, logits_all)

But LlamaBatch.add_sequence in llama_cpp/_internals.py (around line 1013) requires four:

def add_sequence(
    self,
    token_array: Sequence[int],
    pos_array: Sequence[int],
    seq_ids: Sequence[Sequence[int]],
    logits_array: Sequence[bool]
)

llama_cpp/llama_embedding.py (around line 262) already calls add_sequence correctly with the four-arg shape — the call site in Llama.embed() was apparently missed during the LlamaBatch.add_sequence refactor.

Environment and Context

  • Hardware: x86_64, NVIDIA GeForce RTX 4090
  • OS: Windows 10 22H2
  • Python 3.12.9
  • llama-cpp-python 0.3.36 (CUDA 12.8 prebuilt wheel)
$ python --version
Python 3.12.9

$ pip show llama-cpp-python | findstr Version
Version: 0.3.36

Failure Information (for bugs)

This is a clean regression — LlamaBatch.add_sequence was refactored from a 3-arg signature to a 4-arg one, and the call sites were updated everywhere except in Llama.embed(). llama_embedding.py shows what the new shape should look like for the embedding code path.

Steps to Reproduce

from llama_cpp import Llama

m = Llama(model_path="path/to/model.gguf", embeddings=True)
m.embed("hello")

Result:

TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'

Failure Logs

Traceback (most recent call last):
  File "...\Lib\site-packages\llama_cpp\llama.py", line 1678, in embed
    self._batch.add_sequence(tokens, p_batch, logits_all)
TypeError: LlamaBatch.add_sequence() missing 1 required positional argument: 'logits_array'

Suggested fix

Mirror the call shape already used in llama_cpp/llama_embedding.py:

# In llama.py Llama.embed(), replace:
self._batch.add_sequence(tokens, p_batch, logits_all)

# With something like:
self._batch.add_sequence(
    token_array=tokens,
    pos_array=list(range(len(tokens))),
    seq_ids=[p_batch],
    logits_array=[True] * len(tokens) if logits_all else [False] * (len(tokens) - 1) + [True],
)

Workaround

Monkey-patching LlamaBatch.add_sequence to detect 3-arg legacy calls and synthesize the missing pos_array works as a stopgap. Hit while running Tencent's HY-Motion text-to-motion model, whose text encoder uses Llama.embed() against GGUF Qwen3 weights.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions