Skip to content

feat: add human-approved Jungle Grid execution demo#433

Open
dejaguarkyng wants to merge 6 commits into
openagents-org:developfrom
dejaguarkyng:feature/jungle-grid-gpu-execution-demo
Open

feat: add human-approved Jungle Grid execution demo#433
dejaguarkyng wants to merge 6 commits into
openagents-org:developfrom
dejaguarkyng:feature/jungle-grid-gpu-execution-demo

Conversation

@dejaguarkyng

@dejaguarkyng dejaguarkyng commented Jun 3, 2026

Copy link
Copy Markdown

Summary

This demo provides human-approved asynchronous Jungle Grid execution from an
OpenAgents project using a deterministic Python WorkerAgent.

It estimates before submission, records durable project state, requires an exact
human approval before billable compute can start, and reports lifecycle events,
status, polled workload logs, runtime details, and managed artifact metadata.

Current workflow

Project goal
→ estimate
→ human approval
→ optional input/script references
→ submit
→ lifecycle events and status
→ workload logs
→ runtime details
→ managed artifacts

The demo calls Jungle Grid REST directly so the OpenAgents human-approval and
project-state transitions remain explicit and testable.

Safety

  • Estimation never submits a billable workload.
  • Submission requires exact APPROVE <estimate-id> from a verified human:
    project identity.
  • Cancellation requires exact CANCEL <job-id> from a verified human:
    project identity.
  • API and workload credentials come from executor environment variables only.
  • Workload secrets are resolved only after approval and are excluded from
    estimate requests and durable project state.
  • API keys, Bearer tokens, resolved environment values, secret-bearing fields,
    and signed URLs are redacted.
  • CI and automated tests mock all Jungle Grid requests and never submit paid
    work.
  • File-backed jobs accept pre-uploaded Jungle Grid input_id references; goals
    cannot read arbitrary executor host paths.
  • Temporary artifact download URLs are not requested or stored in OpenAgents
    project state.

Changes

  • Merged the latest OpenAgents develop branch and preserved runtime
    executors group authentication through password_hash.
  • Aligned the client with the current /v1/mcp/jobs routes, lifecycle-events
    endpoint, runtime endpoint, log pagination, cancellation, and managed artifact
    routes.
  • Prefer JUNGLEGRID_API_BASE, retain JUNGLE_GRID_API_URL and
    JUNGLE_GRID_API compatibility fallbacks, and normalize trailing slashes.
  • Support inference, training, fine_tuning, and batch; normalize
    fine_tuning to the REST fine-tuning value.
  • Support command arrays and retain legacy string command plus args
    compatibility.
  • Add validated input_files, script_files, expected artifacts, callback
    configuration, environment references, GPU/routing constraints, timeout,
    template, metadata, and optimization fields that are accepted by the current
    API.
  • Store estimate, submission, execution state, event IDs, and log cursor in
    project artifacts for restart-safe duplicate-submission protection.
  • Serialize approvals and cancellations per project; do not retry ambiguous
    non-idempotent submissions.
  • Poll and deduplicate lifecycle events separately from paginated workload logs.
  • Report execution phase and delayed-start information without treating empty
    startup logs as failure.
  • Bound shared events and logs and store sanitized runtime/artifact metadata on
    completion.
  • Document the committed group hash as a demo-only credential.
  • Expand focused mocked coverage to 52 tests.

Testing

Passed:

  • PYTHONDONTWRITEBYTECODE=1 .venv/bin/pytest tests/agents/test_jungle_grid_executor.py -q
    • 52 passed, 1 warning
  • PYTHONDONTWRITEBYTECODE=1 .venv/bin/pytest tests/network/test_agent_groups.py -q
    • 20 passed
  • .venv/bin/ruff check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
    • passed
  • .venv/bin/ruff format --check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
    • passed
  • MYPYPATH=sdk/src .venv/bin/mypy --follow-untyped-imports sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
    • passed
  • OpenAgents load_network_config("sdk/demos/09_jungle_grid_gpu_execution/network.yaml")
    • loaded successfully as JungleGridGPUExecution
  • git diff --check origin/develop...HEAD
    • passed

Unrelated repository fixture failures:

  • .venv/bin/pytest tests/network/test_agent_groups.py tests/network/test_authentication.py -q
    • 27 passed, 1 setup error because
      examples/workspace_test.yaml is absent
  • .venv/bin/pytest tests/mods/test_project_mode.py -q
    • 5 failures because
      examples/test_configs/project_mode.yaml is absent

No live paid Jungle Grid job was submitted.

Compatibility

The preferred command form is:

{"command": ["python", "-c", "print('hello')"]}

The original form remains supported and is converted in order:

{"command": "python", "args": ["-c", "print('hello')"]}

The old JUNGLE_GRID_API_URL and JUNGLE_GRID_API base variables remain
fallbacks behind the current JUNGLEGRID_API_BASE variable.

Known limitations

  • The demo supports previously uploaded Jungle Grid input/script IDs. It does
    not upload arbitrary local files or copy OpenAgents artifact bytes into
    Jungle Grid.
  • Managed artifact metadata is stored, but temporary signed download URLs and
    downloaded artifact bytes are intentionally not placed in project state.
  • The demo assumes one executor process. Durable project state protects
    restarts and duplicate messages within that executor workflow.
  • CPU/memory sizing, provider pinning, and user-controlled retry policy are not
    exposed because they are not current public MCP submission fields.
  • No live billable job was run as evidence.

Screenshots or evidence

The mocked integration tests exercise and assert the project-visible estimate
message, exact approval command, lifecycle updates, terminal completion/failure,
runtime-unavailable handling, and sanitized artifact metadata. No screenshots
or paid workload were created solely for this update.

Closes #432

@vercel

vercel Bot commented Jun 3, 2026

Copy link
Copy Markdown

@dejaguarkyng is attempting to deploy a commit to the Raphael's projects Team on Vercel.

A member of the Team first needs to authorize it.

@QuanCheng-QC

Copy link
Copy Markdown
Collaborator

Overall this is a positive review: the demo's API integration, human-approval gate, secret redaction, concurrency control, and test coverage all look solid.

That said, I see one high-probability blocking issue around agent group authorization. The project template restricts execution to agent_groups: [executors], but OpenAgents appears to resolve group authorization from the runtime topology.agent_group_membership, not from the static allowlist under agent_groups.executors.metadata.agents in network.yaml.

With the README's current startup command, I don't see how the executor agent joins the executors group via group/password. My read is that it would likely land in the default guest group. If so, _get_agents_in_group("executors") would return empty, project.notification.started would not be delivered to the executor, and the demo would not trigger as documented.

This is based on a source read rather than an end-to-end run, so please correct me if there's an authorization path I missed. References for checking:

  • Group resolution: _get_agents_in_group() reads network.topology.agent_group_membership in sdk/src/openagents/mods/workspace/project/mod.py
  • Group assignment at connect time: _assign_to_requested_group / _legacy_assign_by_password in sdk/src/openagents/sdk/topology.py
  • Reference pattern: sdk/demos/08_alternative_service_project/agents/coordinator.py:213 joins the coordinators group via password_hash

Suggested fixes, following demo 08:

  1. Add a password_hash to the executors group in network.yaml;
  2. Pass the corresponding password_hash in main() / async_start(...);
  3. Or provide an end-to-end run showing that the static agents allowlist is converted into real runtime group membership.

Any of these would resolve my concern. If the static allowlist does get converted into real membership, the fastest way to clear this up would be to attach an end-to-end run showing: human starts a project → executor receives the estimate.

Once this is resolved, the rest are mostly minor nits: env var naming consistency (JUNGLE_GRID_API_KEY vs JUNGLEGRID_API_BASE), documenting the single-executor assumption, and optionally redacting the _error_detail code as well as the message.

@dejaguarkyng dejaguarkyng force-pushed the feature/jungle-grid-gpu-execution-demo branch from 902f9d2 to 7d6c00d Compare June 9, 2026 09:49
@dejaguarkyng

Copy link
Copy Markdown
Author

Overall this is a positive review: the demo's API integration, human-approval gate, secret redaction, concurrency control, and test coverage all look solid.

That said, I see one high-probability blocking issue around agent group authorization. The project template restricts execution to agent_groups: [executors], but OpenAgents appears to resolve group authorization from the runtime topology.agent_group_membership, not from the static allowlist under agent_groups.executors.metadata.agents in network.yaml.

With the README's current startup command, I don't see how the executor agent joins the executors group via group/password. My read is that it would likely land in the default guest group. If so, _get_agents_in_group("executors") would return empty, project.notification.started would not be delivered to the executor, and the demo would not trigger as documented.

This is based on a source read rather than an end-to-end run, so please correct me if there's an authorization path I missed. References for checking:

  • Group resolution: _get_agents_in_group() reads network.topology.agent_group_membership in sdk/src/openagents/mods/workspace/project/mod.py
  • Group assignment at connect time: _assign_to_requested_group / _legacy_assign_by_password in sdk/src/openagents/sdk/topology.py
  • Reference pattern: sdk/demos/08_alternative_service_project/agents/coordinator.py:213 joins the coordinators group via password_hash

Suggested fixes, following demo 08:

  1. Add a password_hash to the executors group in network.yaml;
  2. Pass the corresponding password_hash in main() / async_start(...);
  3. Or provide an end-to-end run showing that the static agents allowlist is converted into real runtime group membership.

Any of these would resolve my concern. If the static allowlist does get converted into real membership, the fastest way to clear this up would be to attach an end-to-end run showing: human starts a project → executor receives the estimate.

Once this is resolved, the rest are mostly minor nits: env var naming consistency (JUNGLE_GRID_API_KEY vs JUNGLEGRID_API_BASE), documenting the single-executor assumption, and optionally redacting the _error_detail code as well as the message.

Thanks — this was a good catch.

Fixed in 7d6c00d3. The executor now authenticates into the executors group at connection time, so runtime topology membership resolves correctly and project.notification.started reaches the executor.

Focused Jungle Grid and agent-group tests are passing: 62 tests total. Ruff format/lint and targeted MyPy checks also pass. The remaining project-mod failures appear unrelated and come from a missing examples/test_configs/project_mode.yaml.

Please take another look when you get a chance.

@QuanCheng-QC

Copy link
Copy Markdown
Collaborator

Thanks for the thorough fix and follow-up. I re-reviewed the runtime group-auth path and the new end-to-end test, and the original blocker is resolved.

The executor now joins the executors group through the password_hash registration path, so project.notification.started can reach it as expected. The unrelated project_mode.yaml failure appears to be a pre-existing repo-layout issue, not caused by this PR.

Only two non-blocking notes remain: please have a maintainer confirm the intended external Jungle Grid REST contract, and consider adding a short “demo-only credential” note for the committed password_hash.

From my side, the blocking concern is cleared. Nice work.

@dejaguarkyng dejaguarkyng changed the title feat: add Jungle Grid GPU execution agent demo feat: add human-approved Jungle Grid execution demo Jun 11, 2026
@dejaguarkyng

Copy link
Copy Markdown
Author

Thanks again for reviewing this contribution.

I updated the existing branch against the latest OpenAgents develop branch and aligned the Jungle Grid demo with the current production job workflow.

The update now covers the current submission shape, pre-uploaded input and script references, lifecycle events, phase-aware status, paginated logs, runtime details, managed artifact metadata, restart-safe duplicate-submission protection, and expanded safety tests while preserving the exact human approval and cancellation requirements.

I re-ran the focused formatting, linting, typing, configuration, project-group, and mocked execution checks. The exact results and the unrelated missing-fixture failures are included in the PR description.

The PR is ready for another review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Jungle Grid human-approved GPU execution agent demo

2 participants