Summary
We investigated adding gemini-cli-agent as a CLI-based agent for AssetOpsBench, but it is currently not reliable for benchmark runs with TokenRouter-backed models.
The main blocker is that Gemini CLI expects Gemini-compatible auth/model/tool-call behavior, while our preferred TokenRouter model path is OpenAI-compatible (tokenrouter/MiniMax-M3). Even with a Gemini-style TokenRouter route, the run failed during tool streaming/tool calling.
What we tried
Smoke command:
uv run gemini-cli-agent \
--model-id tokenrouter_gemini/google/gemma-4-26b-a4b-it \
--workspace-dir /tmp/assetopsbench-gemini/smoke \
--allow-files \
--allow-bash \
--show-trajectory \
"What is the current time?"
Observed behavior:
After auth/config fixes, Gemini CLI was able to start and call an AssetOpsBench MCP tool:
tool: mcp_utilities_current_time_english
output: {
"english": "2026-06-25 17:45:00",
"iso": "2026-06-25T17:45:00.614808Z"
}
But the run ended with:
Gemini CLI error: Invalid stream: The model returned an empty response or malformed tool call.
It also attempted invalid tool calls:
Tool "generic_tool" not found. Did you mean one of: "read_file", "grep_search", "glob"?
Why this is a blocker
AssetOpsBench depends on reliable MCP tool use for operational data access. In this test, Gemini CLI could reach MCP, but the model/tool stream was malformed, causing the agent to fail before producing a usable answer.
This makes Gemini CLI unsuitable for benchmark runs with TokenRouter at the moment.
Root causes / likely causes
tokenrouter/MiniMax-M3 is OpenAI-compatible, not Gemini API-compatible, so it cannot be used directly with Gemini CLI.
The Gemini-compatible TokenRouter route starts, but tool-call streaming is not compatible enough for Gemini CLI.
The model emitted generic_tool, which Gemini CLI did not register, suggesting tool-call schema mismatch.
Therefore, Gemini CLI + TokenRouter is not currently reliable for AssetOpsBench MCP-heavy scenarios.
Expected behavior
Gemini CLI agent should:
Start with TokenRouter-backed model config.
Call AssetOpsBench MCP tools correctly.
Receive valid streamed tool-call responses.
Produce a final answer and trajectory JSON.
Actual behavior
The agent starts and calls at least one MCP tool, but fails with malformed stream/tool-call errors.
Summary
We investigated adding
gemini-cli-agentas a CLI-based agent for AssetOpsBench, but it is currently not reliable for benchmark runs with TokenRouter-backed models.The main blocker is that Gemini CLI expects Gemini-compatible auth/model/tool-call behavior, while our preferred TokenRouter model path is OpenAI-compatible (
tokenrouter/MiniMax-M3). Even with a Gemini-style TokenRouter route, the run failed during tool streaming/tool calling.What we tried
Smoke command:
uv run gemini-cli-agent \ --model-id tokenrouter_gemini/google/gemma-4-26b-a4b-it \ --workspace-dir /tmp/assetopsbench-gemini/smoke \ --allow-files \ --allow-bash \ --show-trajectory \ "What is the current time?"Observed behavior:
After auth/config fixes, Gemini CLI was able to start and call an AssetOpsBench MCP tool:
tool: mcp_utilities_current_time_english output: { "english": "2026-06-25 17:45:00", "iso": "2026-06-25T17:45:00.614808Z" }But the run ended with:
It also attempted invalid tool calls:
Why this is a blocker
AssetOpsBench depends on reliable MCP tool use for operational data access. In this test, Gemini CLI could reach MCP, but the model/tool stream was malformed, causing the agent to fail before producing a usable answer.
This makes Gemini CLI unsuitable for benchmark runs with TokenRouter at the moment.
Root causes / likely causes
tokenrouter/MiniMax-M3 is OpenAI-compatible, not Gemini API-compatible, so it cannot be used directly with Gemini CLI.
The Gemini-compatible TokenRouter route starts, but tool-call streaming is not compatible enough for Gemini CLI.
The model emitted generic_tool, which Gemini CLI did not register, suggesting tool-call schema mismatch.
Therefore, Gemini CLI + TokenRouter is not currently reliable for AssetOpsBench MCP-heavy scenarios.
Expected behavior
Gemini CLI agent should:
Start with TokenRouter-backed model config.
Call AssetOpsBench MCP tools correctly.
Receive valid streamed tool-call responses.
Produce a final answer and trajectory JSON.
Actual behavior
The agent starts and calls at least one MCP tool, but fails with malformed stream/tool-call errors.