Name and Version
version: 9378 (09e7b76)
built with Clang 19.1.5 for Windows x86_64
Operating systems
Windows
GGML backends
CUDA
Hardware
i7-13620H - RTX 4070
Models
https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF
Problem description & steps to reproduce
Problem description
I'm seeing a parser failure with tool calls from LiquidAI/LFM2.5-8B-A1B-GGUF when using llama-server.
Normal chat completions work. The issue only shows up when tools are provided.
The model generates this tool call:
<|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>
llama-server then returns a 500 error:
{
"error": {
"code": 500,
"message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>",
"type": "server_error"
}
}
From the LFM2.5 model card, this is the documented tool-call format: a Python-style function call wrapped with <|tool_call_start|> and <|tool_call_end|>. The model is producing the documented format, but the llama.cpp parser is rejecting it.
Prior art: PR #20251 added a dedicated PEG parser for LFM2's Python-style tool calls, but it only detected LFM2 via <|tool_list_start|> / <|tool_list_end|> tags. PR #21242 then added LFM2.5 support. My build (09e7b76c9, May 28 2026) includes PR #21242, yet the issue persists -- suggesting a regression or incomplete detection for LFM2.5 specifically.
Possible root cause: In common/chat.cpp, the is_lfm2_template() function detects LFM2 templates by checking for <|tool_list_start|> and <|tool_list_end|>. LFM2.5's template does not use these tags -- it only uses <|tool_call_start|> / <|tool_call_end|>. This means LFM2.5 may not be routed to the Python-style parser at all.
Steps to reproduce
- Start
llama-server with Jinja enabled and the official chat_template.jinja from LiquidAI/LFM2.5-8B-A1B:
llama-server \
-hf LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M \
--jinja \
--chat-template-file /path/to/chat_template.jinja
- Send this request to
/v1/chat/completions:
{
"model": "LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M",
"messages": [
{
"role": "user",
"content": "Search for the current temperature in Mexico City."
}
],
"tools": [
{
"type": "function",
"function": {
"name": "web_search_exa",
"description": "Search the web",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string"
},
"numResults": {
"type": "integer"
}
},
"required": ["query"]
}
}
}
],
"tool_choice": "auto",
"parse_tool_calls": true,
"temperature": 0,
"max_tokens": 256
}
- The server returns:
{
"error": {
"code": 500,
"message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>",
"type": "server_error"
}
}
I also tried the same request with "parse_tool_calls": false but it still fails with the same parser error.
Chat template reference
The official LFM2.5 chat_template.jinja renders tool calls as (from the render_tool_calls macro):
<|tool_call_start|>[func_name(arg1='value1', arg2=value2)]<|tool_call_end|>
Note: the template does not contain <|tool_list_start|> or <|tool_list_end|> -- only <|tool_call_start|> / <|tool_call_end|>. This is different from LFM2 which uses both pairs.
First Bad Commit
Likely related to the autoparser refactoring (#18675) or the LFM2.5-specific handling in #21242 not fully covering the case.
Relevant log output
Logs
Request with parse_tool_calls=true:
{"error":{"code":500,"message":"Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>","type":"server_error"}}
Request with parse_tool_calls=false:
{"error":{"code":500,"message":"Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>","type":"server_error"}}
Name and Version
version: 9378 (09e7b76)
built with Clang 19.1.5 for Windows x86_64
Operating systems
Windows
GGML backends
CUDA
Hardware
i7-13620H - RTX 4070
Models
https://huggingface.co/LiquidAI/LFM2.5-8B-A1B-GGUF
Problem description & steps to reproduce
Problem description
I'm seeing a parser failure with tool calls from
LiquidAI/LFM2.5-8B-A1B-GGUFwhen usingllama-server.Normal chat completions work. The issue only shows up when tools are provided.
The model generates this tool call:
llama-serverthen returns a 500 error:{ "error": { "code": 500, "message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>", "type": "server_error" } }From the LFM2.5 model card, this is the documented tool-call format: a Python-style function call wrapped with
<|tool_call_start|>and<|tool_call_end|>. The model is producing the documented format, but the llama.cpp parser is rejecting it.Prior art: PR #20251 added a dedicated PEG parser for LFM2's Python-style tool calls, but it only detected LFM2 via
<|tool_list_start|>/<|tool_list_end|>tags. PR #21242 then added LFM2.5 support. My build (09e7b76c9, May 28 2026) includes PR #21242, yet the issue persists -- suggesting a regression or incomplete detection for LFM2.5 specifically.Possible root cause: In
common/chat.cpp, theis_lfm2_template()function detects LFM2 templates by checking for<|tool_list_start|>and<|tool_list_end|>. LFM2.5's template does not use these tags -- it only uses<|tool_call_start|>/<|tool_call_end|>. This means LFM2.5 may not be routed to the Python-style parser at all.Steps to reproduce
llama-serverwith Jinja enabled and the officialchat_template.jinjafromLiquidAI/LFM2.5-8B-A1B:/v1/chat/completions:{ "model": "LiquidAI/LFM2.5-8B-A1B-GGUF:Q4_K_M", "messages": [ { "role": "user", "content": "Search for the current temperature in Mexico City." } ], "tools": [ { "type": "function", "function": { "name": "web_search_exa", "description": "Search the web", "parameters": { "type": "object", "properties": { "query": { "type": "string" }, "numResults": { "type": "integer" } }, "required": ["query"] } } } ], "tool_choice": "auto", "parse_tool_calls": true, "temperature": 0, "max_tokens": 256 }{ "error": { "code": 500, "message": "Failed to parse input at pos 395: <|tool_call_start|>[web_search_exa(query='current temperature in Mexico City', numResults=5)]<|tool_call_end|>", "type": "server_error" } }I also tried the same request with
"parse_tool_calls": falsebut it still fails with the same parser error.Chat template reference
The official LFM2.5
chat_template.jinjarenders tool calls as (from therender_tool_callsmacro):Note: the template does not contain
<|tool_list_start|>or<|tool_list_end|>-- only<|tool_call_start|>/<|tool_call_end|>. This is different from LFM2 which uses both pairs.First Bad Commit
Likely related to the autoparser refactoring (#18675) or the LFM2.5-specific handling in #21242 not fully covering the case.
Relevant log output
Logs