Add OpenCode autoresearch example with eval sandbox sidecar by junlin-star · Pull Request #1570 · modal-labs/modal-examples

junlin-star · 2026-05-22T19:11:45Z

Adds a small, self-contained example showing how to run a coding agent against an automated verifier on Modal. One Sandbox runs an OpenCode agent; a second Sandbox runs an HTTP verifier that grades candidate solutions.

The agent is given a small optimization task (find a non-negative step function with a small discrete autoconvolution score, inspired by the autocorrelation_first task from SimpleTES), edits the starter solution.py, and submits candidates to the verifier. When the agent exits, the local script fetches the ranked submissions and saves the best one.

New files:

13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py — orchestration script that:
- Builds an agent Image (OpenCode) and a verifier Image (FastAPI), each with the task files
- Starts a verifier sidecar Sandbox exposing /task, /submit, and /submissions
- Runs OpenCode non-interactively against an OpenAI-compatible model (default openai/gpt-5.5, via an openai-secret Modal Secret)
- Fetches ranked submissions and saves the best solution + agent trajectory locally
- Includes a --smoke-test mode that exercises the submit/verify loop without invoking OpenCode or any external LLM API (used for CI)
13_sandboxes/opencode_autoresearch_with_eval/autocorrelation_first/ — task assets:
- autocorrelation_first.txt — task description
- evaluate.py — FastAPI verifier that computes the c1 autoconvolution score and ranks submissions
- solution.py — starter solution with an EVOLVE-BLOCK for the agent to improve
- submit.py — loads solution.py and POSTs the result to the verifier

Removed: the earlier GLM-5 swarm draft (13_sandboxes/opencode_swarm.py, 06_gpu_and_ml/llm-serving/config_glm5.yaml), which this example replaces.

Usage:

# Requires a Modal Secret named `openai-secret` containing OPENAI_API_KEY
python 13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py

Type of Change

New example for the GitHub repo
- New example for the documentation site

Monitoring Checklist

Example is configured for testing in the synthetic monitoring system (runs via the --smoke-test cmd in the frontmatter, which avoids external LLM APIs)
- Example is tested by executing with the cmd / args provided in the frontmatter
- Example does not require third-party dependencies besides fastapi to be installed locally

Requested by: @junlin-star

- New example: 13_sandboxes/opencode_swarm.py Launches a swarm of OpenCode Sandboxes powered by a self-hosted GLM-5 inference server, replicating the Sailboxes experiment (4 agents building a Redis clone in Rust). - New config: 06_gpu_and_ml/llm-serving/config_glm5.yaml SGLang configuration for GLM-5-FP8 on 8x H200 GPUs, with speculative decoding tuned per the model card recommendations. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration · 2026-05-22T19:11:50Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 4 additional findings.

charlesfrye · 2026-05-25T00:27:39Z

Hey @junlin-star!

This is super cool! Can you provide some proof that it works?

Also, we very much want examples to be tested in CI so that they can be maintained. Can you come up with a way to do that? Feel free to DM me to discuss options and patterns.

If not, please move all of the changes into misc/.

Introduce a Modal sandbox example that runs OpenCode against a verifier sidecar, with a fast smoke-test path for example CI and compact result artifacts for agent traces and best submissions. Co-authored-by: Cursor <cursoragent@cursor.com>

junlin-star · 2026-06-08T23:39:49Z

I massively simplified this to just using opencode agent to do autoresearch on the autocorrelation_first task. Two sandboxes are created for this -- an agent sandbox and a eval sandbox. The agent can submit answers to the eval sandbox to get feedback during its optimization loop.

Proof of this working:

uv run --no-project --with modal python \
  13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py \
  --openai-secret openai-secret \
  --model openai/gpt-5.5 \
  --max-submissions 3 \
  --agent-timeout 15m \
  --timeout 15m
Starting verifier sidecar...
Verifier sidecar: sb-S7DZJSsbEm1RJPCn6c440B
Verifier URL: https://ta-01ktda9wr70ay86sdkdm6fkt6q-8000-5sc8v0pnij6gpyw0gp3i9jn8k.w.modal.host
Agent Sandbox: sb-6clGkznYzIwuYcuRST0icJ
Starting OpenCode agent...
Agent trajectory saved to opencode_agent_trajectory.txt

Best submitted solutions:

[
  {
    "valid": true,
    "c1": 1.570892203519807,
    "score": 0.6365809133138155,
    "length": 4096,
    "reported": 1.570892203519807,
    "message": null,
    "solution_preview": {
      "first": [
        1.0,
        0.5,
        0.375,
        0.3125,
        0.2734375
      ],
      "last": [
        0.008820578189333012,
        0.008819500406074297,
        0.008818423017800893,
        0.008817346024271608,
        0.008816269425245445
      ],
      "omitted": 4086
    }
  }
]

devin-ai-integration Bot assigned junlin-star May 22, 2026

Add lambda-test: false to skip CI run (requires external GLM-5 endpoint)

13cf6a3

Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>

devin-ai-integration Bot reviewed May 22, 2026

View reviewed changes

charlesfrye marked this pull request as ready for review May 25, 2026 00:25

charlesfrye marked this pull request as draft May 25, 2026 00:26

Add OpenCode autoresearch eval example

042c23b

Introduce a Modal sandbox example that runs OpenCode against a verifier sidecar, with a fast smoke-test path for example CI and compact result artifacts for agent traces and best submissions. Co-authored-by: Cursor <cursoragent@cursor.com>

junlin-star changed the title ~~Add OpenCode agent swarm example replicating Sailboxes experiment~~ Add OpenCode autoresearch example with eval sandbox sidecar Jun 8, 2026

Merge branch 'main' into devin/1779476807-sailboxes-replication

ff57e82

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add OpenCode autoresearch example with eval sandbox sidecar#1570

Add OpenCode autoresearch example with eval sandbox sidecar#1570
junlin-star wants to merge 4 commits into
mainfrom
devin/1779476807-sailboxes-replication

junlin-star commented May 22, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot commented May 22, 2026

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

charlesfrye commented May 25, 2026

Uh oh!

junlin-star commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

junlin-star commented May 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Monitoring Checklist

Uh oh!

devin-ai-integration Bot commented May 22, 2026

🤖 Devin AI Engineer

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

charlesfrye commented May 25, 2026

Uh oh!

junlin-star commented Jun 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junlin-star commented May 22, 2026 •

edited

Loading