Eval bug: LFM2.5-8B-A1B tool parser rejects documented <|tool_call_start|>[...]<|tool_call_end|> format

### Name and Version

version: 9378 (09e7b76c9)
built with Clang 19.1.5 for Windows x86_64

### Operating systems

Windows

### GGML backends

CUDA

### Hardware

i7-13620H - RTX 4070

### Models

https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF

### Problem description & steps to reproduce

### Problem description

I'm seeing a parser failure with tool calls from `LiquidAI/LFM2.5-8B-A1B-GGUF` when using `llama-server`.

Normal chat completions work. The issue only shows up when tools are provided.

The model generates this tool call:

```text
<|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>
```

`llama-server` then returns a 500 error:

```json
{
  "error": {
    "code": 500,
    "message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>",
    "type": "server_error"
  }
}
```

From the LFM2.5 model card, this is the documented tool-call format: a Python-style function call wrapped with `<|tool_call_start|>` and `<|tool_call_end|>`. The model is producing the documented format, but the llama.cpp parser is rejecting it.

**Prior art:** PR [#20251](https://github.com/ggml-org/llama.cpp/pull/20251) added a dedicated PEG parser for LFM2's Python-style tool calls, but it only detected LFM2 via `<|tool_list_start|>` / `<|tool_list_end|>` tags. PR [#21242](https://github.com/ggml-org/llama.cpp/pull/21242) then added LFM2.5 support. My build (`09e7b76c9`, May 28 2026) includes PR #21242, yet the issue persists -- suggesting a regression or incomplete detection for LFM2.5 specifically.

**Possible root cause:** In `common/chat.cpp`, the `is_lfm2_template()` function detects LFM2 templates by checking for `<|tool_list_start|>` and `<|tool_list_end|>`. LFM2.5's template does **not** use these tags -- it only uses `<|tool_call_start|>` / `<|tool_call_end|>`. This means LFM2.5 may not be routed to the Python-style parser at all.

### Steps to reproduce

1. Start `llama-server` with Jinja enabled and the official `chat_template.jinja` from `LiquidAI/LFM2.5-8B-A1B`:

```bash
llama-server \
  -hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M \
  --jinja \
  --chat-template-file /path/to/chat_template.jinja
```

2. Send this request to `/v1/chat/completions`:

```json
{
  "model": "LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M",
  "messages": [
    {
      "role": "user",
      "content": "Search for the current temperature in Mexico City."
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "web_search_exa",
        "description": "Search the web",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {
              "type": "string"
            },
            "numResults": {
              "type": "integer"
            }
          },
          "required": ["query"]
        }
      }
    }
  ],
  "tool_choice": "auto",
  "parse_tool_calls": true,
  "temperature": 0,
  "max_tokens": 256
}
```

3. The server returns:

```json
{
  "error": {
    "code": 500,
    "message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>",
    "type": "server_error"
  }
}
```

I also tried the same request with `"parse_tool_calls": false` but it still fails with the same parser error.

### Chat template reference

The official LFM2.5 `chat_template.jinja` renders tool calls as (from the `render_tool_calls` macro):

```
<|tool_call_start|>[func_name(arg1='value1', arg2=value2)]<|tool_call_end|>
```

Note: the template does **not** contain `<|tool_list_start|>` or `<|tool_list_end|>` -- only `<|tool_call_start|>` / `<|tool_call_end|>`. This is different from LFM2 which uses both pairs.

### First Bad Commit

Likely related to the autoparser refactoring ([#18675](https://github.com/ggml-org/llama.cpp/pull/18675)) or the LFM2.5-specific handling in [#21242](https://github.com/ggml-org/llama.cpp/pull/21242) not fully covering the case.

### Relevant log output

<details>
<summary>Logs</summary>

```console
Request with parse_tool_calls=true:

{"error":{"code":500,"message":"Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>","type":"server_error"}}

Request with parse_tool_calls=false:

{"error":{"code":500,"message":"Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>","type":"server_error"}}
```
</details>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: LFM2.5-8B-A1B tool parser rejects documented <|tool_call_start|>[...]<|tool_call_end|> format #23838

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem description

Steps to reproduce

Chat template reference

First Bad Commit

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Eval bug: LFM2.5-8B-A1B tool parser rejects documented <|tool_call_start|>[...]<|tool_call_end|> format #23838

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

Problem description

Steps to reproduce

Chat template reference

First Bad Commit

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions