- Environment ID:
swe-grep-oss - Short description: Environment for evaluating and developing models like SWE-grep
- Primary dataset(s): SWE-Bench Lite
- Type: <single-turn | multi-turn | tool use>
- Parser: <e.g., ThinkParser, XMLParser, custom>
- Rubric overview:
Run an evaluation with your model of choice (repos are cloned automatically and deleted after each rollout):
- Default rollout clone root: system temp directory under
swe-grep-oss-repos - Rollout directories are unique per rollout and look like
<repo>_<instance_id>_<random_suffix> - Repositories are cloned directly at the target commit with
git clone --revision <sha> --depth 1when supported, with agit init+fetchfallback for older Git versions - Set
SWE_GREP_ENV_BACKEND=sandboxto switch from the default local env to a sandbox-backed env - The sandbox variant uses a minimal public image (
python:3.11-slim) with1CPU core,2GB RAM, and5GB disk, then installsgit,jq, andripgrepduring setup before checking out the repo into/workspace/repo
uv run vf-eval swe-grep-oss \
--api-base-url https://api.openai.com/v1 \
--api-key-var OPENAI_API_KEY \
--model "gpt-4o-mini" \
--num-examples 2 \
--rollouts-per-example 1