Add OpenCode autoresearch example with eval sandbox sidecar#1570
Add OpenCode autoresearch example with eval sandbox sidecar#1570junlin-star wants to merge 4 commits into
Conversation
- New example: 13_sandboxes/opencode_swarm.py Launches a swarm of OpenCode Sandboxes powered by a self-hosted GLM-5 inference server, replicating the Sailboxes experiment (4 agents building a Redis clone in Rust). - New config: 06_gpu_and_ml/llm-serving/config_glm5.yaml SGLang configuration for GLM-5-FP8 on 8x H200 GPUs, with speculative decoding tuned per the model card recommendations. Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
Co-Authored-By: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com>
|
Hey @junlin-star! This is super cool! Can you provide some proof that it works? Also, we very much want examples to be tested in CI so that they can be maintained. Can you come up with a way to do that? Feel free to DM me to discuss options and patterns. If not, please move all of the changes into |
Introduce a Modal sandbox example that runs OpenCode against a verifier sidecar, with a fast smoke-test path for example CI and compact result artifacts for agent traces and best submissions. Co-authored-by: Cursor <cursoragent@cursor.com>
|
I massively simplified this to just using opencode agent to do autoresearch on the Proof of this working: |
Adds a small, self-contained example showing how to run a coding agent against an automated verifier on Modal. One Sandbox runs an OpenCode agent; a second Sandbox runs an HTTP verifier that grades candidate solutions.
The agent is given a small optimization task (find a non-negative step function with a small discrete autoconvolution score, inspired by the
autocorrelation_firsttask from SimpleTES), edits the startersolution.py, and submits candidates to the verifier. When the agent exits, the local script fetches the ranked submissions and saves the best one.New files:
13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.py— orchestration script that:/task,/submit, and/submissionsopenai/gpt-5.5, via anopenai-secretModal Secret)--smoke-testmode that exercises the submit/verify loop without invoking OpenCode or any external LLM API (used for CI)13_sandboxes/opencode_autoresearch_with_eval/autocorrelation_first/— task assets:autocorrelation_first.txt— task descriptionevaluate.py— FastAPI verifier that computes thec1autoconvolution score and ranks submissionssolution.py— starter solution with anEVOLVE-BLOCKfor the agent to improvesubmit.py— loadssolution.pyand POSTs the result to the verifierRemoved: the earlier GLM-5 swarm draft (
13_sandboxes/opencode_swarm.py,06_gpu_and_ml/llm-serving/config_glm5.yaml), which this example replaces.Usage:
# Requires a Modal Secret named `openai-secret` containing OPENAI_API_KEY python 13_sandboxes/opencode_autoresearch_with_eval/opencode_autoresearch_with_eval.pyType of Change
Monitoring Checklist
--smoke-testcmdin the frontmatter, which avoids external LLM APIs)cmd/argsprovided in the frontmatterfastapito be installed locallyRequested by: @junlin-star