Add Ollama provider support (agent + judge)#74
Merged
Conversation
Adds an OpenAI-compatible Ollama integration so the benchmark can run fully locally against models served by Ollama: - pkg/agents/runner/api/llm_adapters.py: OllamaClientAdapter (agent) and OllamaDeepEvalModel (judge), backed by Ollama's OpenAI-compatible API. - pkg/evaluator/evaluate.py: route AGENT_PROVIDER=ollama / JUDGE_PROVIDER=ollama; also adds SYSTEM_INSTRUCTION_NO_MCP so models return direct YAML/manifests when BENCH_USE_MCP=false (strict instruction-followers previously returned empty). - requirements.txt: openai client dependency. - scripts/setup_ollama.sh: install Ollama and pull the model. - tests/test_ollama_adapters.py: unit tests for the adapter and judge model.
This was referenced Jun 15, 2026
pradeepvrd
reviewed
Jun 17, 2026
| return anthropic_messages | ||
|
|
||
|
|
||
| class OllamaClientAdapter(LLMClient): |
Collaborator
There was a problem hiding this comment.
Note: This will go thorugh some extensive refactoring in near future. But ok to merge for now.
pradeepvrd
approved these changes
Jun 17, 2026
pradeepvrd
added a commit
to pradeepvrd/devops-bench
that referenced
this pull request
Jun 18, 2026
Ports the merged Ollama support (gke-labs#74) onto the new devops_bench/models structure. OllamaClientAdapter talks to an Ollama server via the openai client's OpenAI-compatible endpoint (OLLAMA_BASE_URL), self-registers under the canonical 'ollama' key, and defers the openai import so a missing SDK surfaces as MissingDependencyError at construction. - new 'ollama' optional-dependency extra (openai>=1.0.0), added to 'all' - get_model resolves 'ollama' via its existing dynamic per-key import - unit tests for construction, tool formatting, call/text extraction, and message conversion Tests: 172 passed; ruff clean.
pradeepvrd
added a commit
to pradeepvrd/devops-bench
that referenced
this pull request
Jun 23, 2026
Ports the merged Ollama support (gke-labs#74) onto the new devops_bench/models structure. OllamaClientAdapter talks to an Ollama server via the openai client's OpenAI-compatible endpoint (OLLAMA_BASE_URL), self-registers under the canonical 'ollama' key, and defers the openai import so a missing SDK surfaces as MissingDependencyError at construction. - new 'ollama' optional-dependency extra (openai>=1.0.0), added to 'all' - get_model resolves 'ollama' via its existing dynamic per-key import - unit tests for construction, tool formatting, call/text extraction, and message conversion Tests: 172 passed; ruff clean.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an OpenAI-compatible Ollama integration so the benchmark can run fully locally (agent and/or judge) against Ollama-served models.
Scope
pkg/agents/runner/api/llm_adapters.py—OllamaClientAdapter(agent) +OllamaDeepEvalModel(judge)pkg/evaluator/evaluate.py—AGENT_PROVIDER=ollama/JUDGE_PROVIDER=ollamarouting; plusSYSTEM_INSTRUCTION_NO_MCPso models emit direct YAML whenBENCH_USE_MCP=false(strict instruction-followers previously returned empty responses)requirements.txt—openaiclientscripts/setup_ollama.sh— install Ollama + pull modeltests/test_ollama_adapters.py— unit tests