Skip to content

Add OpenCode autoresearch example with eval sandbox sidecar#1570

Draft
junlin-star wants to merge 4 commits into
mainfrom
devin/1779476807-sailboxes-replication
Draft

Add OpenCode autoresearch example with eval sandbox sidecar#1570
junlin-star wants to merge 4 commits into
mainfrom
devin/1779476807-sailboxes-replication

Conversation

@junlin-star

@junlin-star junlin-star commented May 22, 2026

Copy link
Copy Markdown

Adds a small, self-contained example showing how to run a coding agent against an automated verifier on Modal. One Sandbox runs an OpenCode agent; a second Sandbox runs an HTTP verifier that grades candidate solutions.

The agent is given a small optimization task (find a non-negative step function with a small discrete autoconvolution score, inspired by the autocorrelation_first task from SimpleTES), edits the starter solution.py, and submits candidates to the verifier. When the agent exits, the local script fetches the ranked submissions and saves the best one.

New files:

  • 13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py — orchestration script that:
    • Builds an agent Image (OpenCode) and a verifier Image (FastAPI), each with the task files
    • Starts a verifier sidecar Sandbox exposing /task, /submit, and /submissions
    • Runs OpenCode non-interactively against an OpenAI-compatible model (default openai/gpt-5.5, via an openai-secret Modal Secret)
    • Fetches ranked submissions and saves the best solution + agent trajectory locally
    • Includes a --smoke-test mode that exercises the submit/verify loop without invoking OpenCode or any external LLM API (used for CI)
  • 13_sandboxes/opencode_autoresearch_with_eval/autocorrelation_first/ — task assets:
    • autocorrelation_first.txt — task description
    • evaluate.py — FastAPI verifier that computes the c1 autoconvolution score and ranks submissions
    • solution.py — starter solution with an EVOLVE-BLOCK for the agent to improve
    • submit.py — loads solution.py and POSTs the result to the verifier

Removed: the earlier GLM-5 swarm draft (13_sandboxes/opencode_swarm.py, 06_gpu_and_ml/llm-serving/config_glm5.yaml), which this example replaces.

Usage:

# Requires a Modal Secret named `openai-secret` containing OPENAI_API_KEY
python 13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py

Type of Change

  • New example for the GitHub repo
    • New example for the documentation site

Monitoring Checklist

  • Example is configured for testing in the synthetic monitoring system (runs via the --smoke-test cmd in the frontmatter, which avoids external LLM APIs)
    • Example is tested by executing with the cmd / args provided in the frontmatter
    • Example does not require third-party dependencies besides fastapi to be installed locally

Requested by: @junlin-star

- New example: 13_sandboxes/opencode_swarm.py
  Launches a swarm of OpenCode Sandboxes powered by a self-hosted GLM-5
  inference server, replicating the Sailboxes experiment (4 agents
  building a Redis clone in Rust).

- New config: 06_gpu_and_ml/llm-serving/config_glm5.yaml
  SGLang configuration for GLM-5-FP8 on 8x H200 GPUs, with speculative
  decoding tuned per the model card recommendations.

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
@devin-ai-integration

Copy link
Copy Markdown
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

Open in Devin Review

@charlesfrye charlesfrye marked this pull request as ready for review May 25, 2026 00:25
@charlesfrye charlesfrye marked this pull request as draft May 25, 2026 00:26
@charlesfrye

Copy link
Copy Markdown
Collaborator

Hey @junlin-star!

This is super cool! Can you provide some proof that it works?

Also, we very much want examples to be tested in CI so that they can be maintained. Can you come up with a way to do that? Feel free to DM me to discuss options and patterns.

If not, please move all of the changes into misc/.

Introduce a Modal sandbox example that runs OpenCode against a verifier sidecar, with a fast smoke-test path for example CI and compact result artifacts for agent traces and best submissions.

Co-authored-by: Cursor <cursoragent@cursor.com>
@junlin-star junlin-star changed the title Add OpenCode agent swarm example replicating Sailboxes experiment Add OpenCode autoresearch example with eval sandbox sidecar Jun 8, 2026
@junlin-star

Copy link
Copy Markdown
Author

I massively simplified this to just using opencode agent to do autoresearch on the autocorrelation_first task. Two sandboxes are created for this -- an agent sandbox and a eval sandbox. The agent can submit answers to the eval sandbox to get feedback during its optimization loop.

Proof of this working:

uv run --no-project --with modal python \
  13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py \
  --openai-secret openai-secret \
  --model openai/gpt-5.5 \
  --max-submissions 3 \
  --agent-timeout 15m \
  --timeout 15m
Starting verifier sidecar...
Verifier sidecar: sb-S7DZJSsbEm1RJPCn6c440B
Verifier URL: https://ta-01ktda9wr70ay86sdkdm6fkt6q-8000-5sc8v0pnij6gpyw0gp3i9jn8k.w.modal.host
Agent Sandbox: sb-6clGkznYzIwuYcuRST0icJ
Starting OpenCode agent...
Agent trajectory saved to opencode_agent_trajectory.txt

Best submitted solutions:

[
  {
    "valid": true,
    "c1": 1.570892203519807,
    "score": 0.6365809133138155,
    "length": 4096,
    "reported": 1.570892203519807,
    "message": null,
    "solution_preview": {
      "first": [
        1.0,
        0.5,
        0.375,
        0.3125,
        0.2734375
      ],
      "last": [
        0.008820578189333012,
        0.008819500406074297,
        0.008818423017800893,
        0.008817346024271608,
        0.008816269425245445
      ],
      "omitted": 4086
    }
  }
]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants