Add Ollama provider support (agent + judge) by adrianchung · Pull Request #74 · gke-labs/devops-bench

adrianchung · 2026-06-15T18:27:44Z

Summary

Adds an OpenAI-compatible Ollama integration so the benchmark can run fully locally (agent and/or judge) against Ollama-served models.

Scope

pkg/agents/runner/api/llm_adapters.py — OllamaClientAdapter (agent) + OllamaDeepEvalModel (judge)
pkg/evaluator/evaluate.py — AGENT_PROVIDER=ollama / JUDGE_PROVIDER=ollama routing; plus SYSTEM_INSTRUCTION_NO_MCP so models emit direct YAML when BENCH_USE_MCP=false (strict instruction-followers previously returned empty responses)
requirements.txt — openai client
scripts/setup_ollama.sh — install Ollama + pull model
tests/test_ollama_adapters.py — unit tests

Adds an OpenAI-compatible Ollama integration so the benchmark can run fully locally against models served by Ollama: - pkg/agents/runner/api/llm_adapters.py: OllamaClientAdapter (agent) and OllamaDeepEvalModel (judge), backed by Ollama's OpenAI-compatible API. - pkg/evaluator/evaluate.py: route AGENT_PROVIDER=ollama / JUDGE_PROVIDER=ollama; also adds SYSTEM_INSTRUCTION_NO_MCP so models return direct YAML/manifests when BENCH_USE_MCP=false (strict instruction-followers previously returned empty). - requirements.txt: openai client dependency. - scripts/setup_ollama.sh: install Ollama and pull the model. - tests/test_ollama_adapters.py: unit tests for the adapter and judge model.

pradeepvrd · 2026-06-17T21:14:41Z

    return anthropic_messages
+
+
+class OllamaClientAdapter(LLMClient):


Note: This will go thorugh some extensive refactoring in near future. But ok to merge for now.

Ports the merged Ollama support (gke-labs#74) onto the new devops_bench/models structure. OllamaClientAdapter talks to an Ollama server via the openai client's OpenAI-compatible endpoint (OLLAMA_BASE_URL), self-registers under the canonical 'ollama' key, and defers the openai import so a missing SDK surfaces as MissingDependencyError at construction. - new 'ollama' optional-dependency extra (openai>=1.0.0), added to 'all' - get_model resolves 'ollama' via its existing dynamic per-key import - unit tests for construction, tool formatting, call/text extraction, and message conversion Tests: 172 passed; ruff clean.

This was referenced Jun 15, 2026

Add local end-to-end test harness for the Ollama provider #75

Merged

Add task-generation framework + first task (debug-crashloop) #76

Open

Support a local option to test for inner dev loop #72

Closed

adrianchung requested review from itssimrank and pradeepvrd June 15, 2026 20:05

pradeepvrd reviewed Jun 17, 2026

View reviewed changes

pradeepvrd approved these changes Jun 17, 2026

View reviewed changes

pradeepvrd merged commit 39cb71f into gke-labs:main Jun 18, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Ollama provider support (agent + judge)#74

Add Ollama provider support (agent + judge)#74
pradeepvrd merged 1 commit into
gke-labs:mainfrom
adrianchung:add-ollama-provider

adrianchung commented Jun 15, 2026 •

edited

Loading

Uh oh!

pradeepvrd Jun 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		return anthropic_messages


		class OllamaClientAdapter(LLMClient):

Uh oh!

Conversation

adrianchung commented Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope

Uh oh!

pradeepvrd Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adrianchung commented Jun 15, 2026 •

edited

Loading