From 39dba28ef13a10279e11a51e0d0e750b1ca3ba20 Mon Sep 17 00:00:00 2001
From: dejaguarkyng <deinvinciblekyng.1@gmail.com>
Date: Wed, 3 Jun 2026 10:59:37 +0000
Subject: [PATCH 1/5] feat: add Jungle Grid GPU execution agent demo

---
 .../IMPLEMENTATION_DECISION.md                |  57 ++
 .../09_jungle_grid_gpu_execution/README.md    | 166 +++++
 .../agents/jungle_grid_executor.py            | 571 ++++++++++++++++++
 .../09_jungle_grid_gpu_execution/network.yaml |  89 +++
 tests/agents/test_jungle_grid_executor.py     | 482 +++++++++++++++
 5 files changed, 1365 insertions(+)
 create mode 100644 sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
 create mode 100644 sdk/demos/09_jungle_grid_gpu_execution/README.md
 create mode 100644 sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
 create mode 100644 sdk/demos/09_jungle_grid_gpu_execution/network.yaml
 create mode 100644 tests/agents/test_jungle_grid_executor.py

diff --git a/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
new file mode 100644
index 000000000..41375c1d0
--- /dev/null
+++ b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
@@ -0,0 +1,57 @@
+# Jungle Grid Integration Decision
+
+## Selected Extension Point
+
+This contribution is a runnable demo network with a Python `WorkerAgent`. The agent
+uses OpenAgents' project mod for the long-running workflow, project messages for
+estimate and lifecycle updates, and project artifacts for logs and Jungle Grid
+artifact metadata.
+
+Jungle Grid is an external execution layer, not an OpenAgents transport, launcher
+agent type, or network mod. A demo keeps the integration provider-specific while
+showing a reusable OpenAgents pattern: an agent delegates asynchronous compute,
+waits for human approval before billable work, and returns results to a shared
+project.
+
+## Rejected Alternatives
+
+- **Launcher agent type:** Jungle Grid executes workloads; it is not an interactive
+  coding-agent runtime managed by the launcher.
+- **Core provider integration:** No OpenAgents core abstraction requires a
+  provider-specific compute backend.
+- **Jungle Grid mod:** The integration does not add network-wide event semantics or
+  shared infrastructure. Existing project events already cover the workflow.
+- **Hosted MCP entry:** OpenAgents can load external MCP tools, but the current
+  Streamable HTTP MCP connector does not perform Jungle Grid's hosted OAuth flow or
+  attach API-key headers. Adding that capability solely for this demo would be a
+  core architecture change.
+- **Local stdio MCP dependency:** The Jungle Grid stdio MCP package is supported,
+  but a direct Python API client is easier to validate, test, and constrain around
+  mandatory human approval. It also avoids requiring Node.js for a Python demo.
+
+## Jungle Grid Contract Used
+
+The demo uses the documented public execution API:
+
+- `POST /v1/jobs/estimate`
+- `POST /v1/jobs`
+- `GET /v1/jobs/{job_id}`
+- `GET /v1/jobs/{job_id}/logs`
+- `POST /v1/jobs/{job_id}/cancel`
+- `GET /v1/jobs/{job_id}/artifacts`
+- `POST /v1/jobs/{job_id}/artifacts/{artifact_id}/download`
+
+Authentication is a scoped server-side API key in `JUNGLE_GRID_API_KEY`. The
+documented lifecycle includes `pending`, `queued`, `assigned`, `running`,
+`completed`, `failed`, `rejected`, and `cancelled`.
+
+Workload environment values are not accepted in project goals. A goal may use
+`environment_from_env` to reference variables available only in the executor
+process; those values are resolved after human approval and are excluded from
+the estimate request and project-visible output.
+
+## Contribution Workflow
+
+OpenAgents' contributing guide asks contributors to create an issue for feature
+suggestions before submitting a pull request. This demo should be proposed in an
+issue and held for maintainer direction before a PR is opened.
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/README.md b/sdk/demos/09_jungle_grid_gpu_execution/README.md
new file mode 100644
index 000000000..a9841b8df
--- /dev/null
+++ b/sdk/demos/09_jungle_grid_gpu_execution/README.md
@@ -0,0 +1,166 @@
+# Jungle Grid GPU Execution Demo
+
+This demo shows an OpenAgents execution agent delegating long-running AI and GPU
+workloads to [Jungle Grid](https://junglegrid.dev), an execution layer that
+places and runs AI workloads without requiring agents to manage GPU servers.
+
+The workflow fits OpenAgents because the workload is asynchronous and
+collaborative: an agent estimates the job, a human approves spending in the
+shared project, and the agent returns lifecycle updates, logs, and artifact
+metadata to the same workspace.
+
+## Security And Billing Warning
+
+Jungle Grid jobs may consume credits or incur charges. The executor never submits
+a workload when a project starts. It requires an exact approval command from a
+human identity after posting the estimate. Keep API keys in environment variables
+and do not paste secrets into project goals, messages, logs, metadata, or
+committed files. Workloads that need environment values must use
+`environment_from_env`; the executor resolves those references only after human
+approval, immediately before submission.
+
+## Prerequisites
+
+- Python with the OpenAgents development package installed.
+- A Jungle Grid account and a scoped API key that can estimate, submit, read, and
+  cancel jobs.
+- A public container image suitable for the requested workload.
+
+## Environment Variables
+
+- `JUNGLE_GRID_API_KEY` is required. The agent reads this server-side API key and
+  sends it only as a Bearer token to Jungle Grid.
+- `JUNGLEGRID_API_BASE` optionally overrides the default API base,
+  `https://api.junglegrid.dev`.
+- Any workload-specific variables referenced by `environment_from_env` must also
+  be exported in the executor process. Their values are never placed in the
+  project goal or estimate request.
+
+## Setup
+
+From the repository root, install OpenAgents with SDK and development
+dependencies so the network, agent, and test commands are available:
+
+```bash
+pip install -e ".[sdk,dev]"
+```
+
+Export the Jungle Grid API key in the shell that will run the executor. This
+keeps the credential out of the repository and network configuration:
+
+```bash
+export JUNGLE_GRID_API_KEY="jg_..."
+```
+
+## Run The Demo
+
+Start the OpenAgents network from this demo directory. The network enables the
+project mod and exposes the `Jungle Grid GPU Execution` project template:
+
+```bash
+cd sdk/demos/09_jungle_grid_gpu_execution
+openagents network start network.yaml
+```
+
+In a second terminal, start the deterministic Python executor. It does not need
+an LLM provider key:
+
+```bash
+cd sdk/demos/09_jungle_grid_gpu_execution
+python agents/jungle_grid_executor.py
+```
+
+Open Studio at `http://localhost:8700/studio`, create a project with the
+`Jungle Grid GPU Execution` template, and use a JSON object as the project goal.
+For example:
+
+```json
+{
+  "name": "openagents-batch-demo",
+  "workload_type": "batch",
+  "image": "python:3.11-slim",
+  "command": "python",
+  "args": ["-c", "print('hello from Jungle Grid')"],
+  "optimize_for": "cost"
+}
+```
+
+The agent validates the request and calls Jungle Grid's estimate endpoint. It
+posts the structured estimate and stores it as project artifact
+`jungle_grid_estimate`. No compute has been submitted at this point.
+
+For a workload that needs a credential or other environment value, export it in
+the executor shell and reference only its local variable name in the goal:
+
+```bash
+export MODEL_TOKEN="..."
+```
+
+```json
+{
+  "name": "openagents-inference-demo",
+  "workload_type": "inference",
+  "image": "example/model-server:latest",
+  "environment_from_env": {
+    "MODEL_TOKEN": "MODEL_TOKEN"
+  },
+  "optimize_for": "cost"
+}
+```
+
+The mapping key is the variable sent to the workload, and the mapping value is
+the local executor variable to resolve. Literal `environment` values, API keys,
+Bearer tokens, and secret-like metadata keys are rejected.
+
+Review the estimate, then reply in the project with the exact command shown by
+the agent. Estimates that explicitly report `available: false` or
+`can_submit: false` cannot be approved:
+
+```text
+APPROVE <estimate-id>
+```
+
+After approval, the agent submits the workload, posts status changes such as
+submitted, queued, assigned/provisioning, running, completed, failed, rejected,
+or cancelled, and stores the final job details, logs, artifact list, and
+temporary download metadata in project artifact `jungle_grid_result`.
+
+To cancel a submitted job, reply with the exact job ID:
+
+```text
+CANCEL <job-id>
+```
+
+Cancellation is explicit and only applies when the job ID matches the project.
+Only a human identity can request cancellation. The agent reports cancellation
+failures without exposing the API key.
+
+## Failure Behavior
+
+Invalid workload JSON, missing required fields, missing API keys, timeouts,
+invalid Jungle Grid responses, and API errors are posted to the project in
+sanitized form. Failed, rejected, or cancelled jobs stop the OpenAgents project.
+Completed jobs complete the project.
+
+## Tests
+
+Run the focused mocked tests. They do not contact Jungle Grid or submit paid
+work:
+
+```bash
+pytest tests/agents/test_jungle_grid_executor.py
+```
+
+Run the repository formatter and linter checks used by the Python project:
+
+```bash
+ruff format --check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
+ruff check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
+```
+
+## Optional Live Estimate
+
+The normal demo performs a live estimate when a project starts, but it never
+automatically submits a job. Use a low-cost workload goal, review the estimate in
+the project, and do not send the approval command unless you explicitly intend
+to start billable compute.
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
new file mode 100644
index 000000000..6e8a369c2
--- /dev/null
+++ b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
@@ -0,0 +1,571 @@
+#!/usr/bin/env python3
+"""Jungle Grid execution agent for the OpenAgents project workflow demo."""
+
+import asyncio
+import json
+import logging
+import os
+import re
+import uuid
+from dataclasses import dataclass
+from typing import Any, Dict, Iterable, Optional
+from urllib.parse import quote
+
+import aiohttp
+
+from openagents.agents.worker_agent import WorkerAgent, on_event
+from openagents.models.event_context import EventContext
+from openagents.mods.workspace.project import DefaultProjectAgentAdapter
+
+logger = logging.getLogger(__name__)
+
+DEFAULT_API_BASE = "https://api.junglegrid.dev"
+TERMINAL_STATUSES = {"completed", "failed", "rejected", "cancelled"}
+VALID_WORKLOAD_TYPES = {"inference", "training", "fine-tuning", "batch"}
+VALID_OPTIMIZE_FOR = {"balanced", "cost", "speed"}
+SUBMIT_FIELDS = {
+    "name",
+    "workload_type",
+    "image",
+    "command",
+    "args",
+    "environment_from_env",
+    "optimize_for",
+    "template",
+    "metadata",
+}
+ESTIMATE_FIELDS = {
+    "workload_type",
+    "image",
+    "command",
+    "args",
+    "optimize_for",
+    "template",
+}
+SENSITIVE_PATTERN = re.compile(r"(?i)(bearer\s+)[^\s,;]+|jg_[A-Za-z0-9_-]+")
+SENSITIVE_KEY_PATTERN = re.compile(r"(?i)(api[_-]?key|authorization|password|secret|token)")
+
+
+class JungleGridError(Exception):
+    """Sanitized Jungle Grid client error."""
+
+    def __init__(self, code: str, message: str, status: Optional[int] = None):
+        super().__init__(message)
+        self.code = code
+        self.status = status
+
+
+def redact_sensitive(value: Any, secret: Optional[str] = None) -> str:
+    """Return a log-safe string with credentials removed."""
+    text = str(value)
+    if secret:
+        text = text.replace(secret, "[REDACTED]")
+    return SENSITIVE_PATTERN.sub(lambda match: f"{match.group(1) or ''}[REDACTED]", text)
+
+
+def _collect_string_values(value: Any) -> list[str]:
+    """Collect nested string values that must not be exposed in project output."""
+    if isinstance(value, str):
+        return [value] if value else []
+    if isinstance(value, dict):
+        strings = []
+        for nested in value.values():
+            strings.extend(_collect_string_values(nested))
+        return strings
+    if isinstance(value, list):
+        strings = []
+        for nested in value:
+            strings.extend(_collect_string_values(nested))
+        return strings
+    return []
+
+
+def _contains_sensitive_key(value: Any) -> bool:
+    """Return whether nested data uses a key commonly associated with credentials."""
+    if isinstance(value, dict):
+        return any(
+            SENSITIVE_KEY_PATTERN.search(str(key)) or _contains_sensitive_key(nested) for key, nested in value.items()
+        )
+    if isinstance(value, list):
+        return any(_contains_sensitive_key(nested) for nested in value)
+    return False
+
+
+def sanitize_project_data(value: Any, secrets: Iterable[str]) -> Any:
+    """Recursively redact credentials and workload-provided secret values."""
+    secret_values = [secret for secret in secrets if secret]
+    if isinstance(value, str):
+        result = value
+        for secret in secret_values:
+            result = result.replace(secret, "[REDACTED]")
+        return redact_sensitive(result)
+    if isinstance(value, dict):
+        return {key: sanitize_project_data(nested, secret_values) for key, nested in value.items()}
+    if isinstance(value, list):
+        return [sanitize_project_data(nested, secret_values) for nested in value]
+    return value
+
+
+def _unwrap_response(data: Any) -> Any:
+    if isinstance(data, dict) and data.get("ok") is True and "data" in data:
+        return data["data"]
+    return data
+
+
+def _error_detail(data: Any, status: int) -> tuple[str, str]:
+    if isinstance(data, dict):
+        nested = data.get("error")
+        if isinstance(nested, dict):
+            return str(nested.get("code") or "API_ERROR"), str(nested.get("message") or f"HTTP {status}")
+        return str(data.get("code") or "API_ERROR"), str(data.get("message") or f"HTTP {status}")
+    return "API_ERROR", f"HTTP {status}"
+
+
+class JungleGridClient:
+    """Small async client for Jungle Grid's documented public execution API."""
+
+    def __init__(
+        self,
+        api_base: Optional[str] = None,
+        timeout_seconds: float = 30.0,
+    ):
+        raw_api_base = api_base if api_base is not None else os.getenv("JUNGLEGRID_API_BASE", DEFAULT_API_BASE)
+        self.api_key = os.getenv("JUNGLE_GRID_API_KEY", "").strip()
+        self.api_base = raw_api_base.rstrip("/")
+        self.timeout_seconds = timeout_seconds
+
+    def _require_api_key(self) -> str:
+        if not self.api_key:
+            raise JungleGridError("MISSING_API_KEY", "JUNGLE_GRID_API_KEY is required.")
+        return self.api_key
+
+    async def _request(self, method: str, path: str, payload: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
+        api_key = self._require_api_key()
+        timeout = aiohttp.ClientTimeout(total=self.timeout_seconds)
+        headers = {
+            "Accept": "application/json",
+            "Authorization": f"Bearer {api_key}",
+            "Content-Type": "application/json",
+        }
+        try:
+            async with aiohttp.ClientSession(timeout=timeout) as session:
+                async with session.request(method, f"{self.api_base}{path}", headers=headers, json=payload) as response:
+                    text = await response.text()
+                    try:
+                        data = json.loads(text) if text.strip() else {}
+                    except json.JSONDecodeError as exc:
+                        raise JungleGridError(
+                            "INVALID_API_RESPONSE", "Jungle Grid returned invalid JSON.", response.status
+                        ) from exc
+                    if response.status < 200 or response.status >= 300:
+                        code, message = _error_detail(data, response.status)
+                        raise JungleGridError(code, redact_sensitive(message, api_key), response.status)
+        except asyncio.TimeoutError as exc:
+            raise JungleGridError("NETWORK_TIMEOUT", "Jungle Grid request timed out.") from exc
+        except aiohttp.ClientError as exc:
+            raise JungleGridError("NETWORK_ERROR", redact_sensitive(exc, api_key)) from exc
+
+        result = _unwrap_response(data)
+        if not isinstance(result, dict):
+            raise JungleGridError("INVALID_API_RESPONSE", "Jungle Grid returned an unexpected response shape.")
+        return result
+
+    async def estimate_job(self, workload: Dict[str, Any]) -> Dict[str, Any]:
+        return await self._request("POST", "/v1/jobs/estimate", workload)
+
+    async def submit_job(self, workload: Dict[str, Any]) -> Dict[str, Any]:
+        return await self._request("POST", "/v1/jobs", workload)
+
+    async def get_job(self, job_id: str) -> Dict[str, Any]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}")
+
+    async def get_job_logs(self, job_id: str) -> Dict[str, Any]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/logs")
+
+    async def cancel_job(self, job_id: str, reason: str) -> Dict[str, Any]:
+        return await self._request("POST", f"/v1/jobs/{quote(job_id, safe='')}/cancel", {"reason": reason})
+
+    async def list_artifacts(self, job_id: str) -> Dict[str, Any]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/artifacts")
+
+    async def get_artifact(self, job_id: str, artifact_id: str) -> Dict[str, Any]:
+        return await self._request(
+            "POST",
+            f"/v1/jobs/{quote(job_id, safe='')}/artifacts/{quote(artifact_id, safe='')}/download",
+        )
+
+
+def parse_workload_goal(goal: str) -> Dict[str, Any]:
+    """Parse and validate a project goal containing a Jungle Grid workload JSON object."""
+    text = goal.strip()
+    if text.startswith("```"):
+        text = re.sub(r"^```(?:json)?\s*", "", text)
+        text = re.sub(r"\s*```$", "", text)
+    try:
+        workload = json.loads(text)
+    except json.JSONDecodeError as exc:
+        raise ValueError("Project goal must be a JSON object describing the Jungle Grid workload.") from exc
+    if not isinstance(workload, dict):
+        raise ValueError("Project goal must be a JSON object.")
+    if SENSITIVE_PATTERN.search(json.dumps(workload)):
+        raise ValueError("Workload must not contain API keys or Bearer tokens.")
+
+    unsupported = sorted(set(workload) - SUBMIT_FIELDS)
+    if unsupported:
+        raise ValueError(f"Unsupported workload fields: {', '.join(unsupported)}.")
+    required = {"name", "workload_type", "image"}
+    missing = sorted(key for key in required if not isinstance(workload.get(key), str) or not workload[key].strip())
+    if missing:
+        raise ValueError(f"Missing required workload fields: {', '.join(missing)}.")
+    if workload["workload_type"] not in VALID_WORKLOAD_TYPES:
+        raise ValueError(f"workload_type must be one of: {', '.join(sorted(VALID_WORKLOAD_TYPES))}.")
+    if "optimize_for" in workload and workload["optimize_for"] not in VALID_OPTIMIZE_FOR:
+        raise ValueError(f"optimize_for must be one of: {', '.join(sorted(VALID_OPTIMIZE_FOR))}.")
+    if "args" in workload and not (
+        isinstance(workload["args"], list) and all(isinstance(item, str) for item in workload["args"])
+    ):
+        raise ValueError("args must be an array of strings.")
+    if "environment_from_env" in workload and not (
+        isinstance(workload["environment_from_env"], dict)
+        and all(
+            isinstance(key, str) and key.strip() and isinstance(value, str) and value.strip()
+            for key, value in workload["environment_from_env"].items()
+        )
+    ):
+        raise ValueError("environment_from_env must map workload variable names to local environment variable names.")
+    if _contains_sensitive_key(workload.get("metadata")):
+        raise ValueError("metadata must not contain secret-like keys.")
+    return workload
+
+
+def build_estimate_payload(workload: Dict[str, Any]) -> Dict[str, Any]:
+    """Build an estimate-only payload without submit-only or secret-bearing fields."""
+    return {key: value for key, value in workload.items() if key in ESTIMATE_FIELDS}
+
+
+def build_submit_payload(workload: Dict[str, Any]) -> Dict[str, Any]:
+    """Build a submit payload, resolving secret environment values only at submission time."""
+    payload = {key: value for key, value in workload.items() if key != "environment_from_env"}
+    references = workload.get("environment_from_env")
+    if not references:
+        return payload
+
+    missing = sorted(env_name for env_name in references.values() if not os.getenv(env_name))
+    if missing:
+        raise ValueError(f"Missing required local environment variables: {', '.join(missing)}.")
+    payload["environment"] = {name: os.environ[env_name] for name, env_name in references.items()}
+    return payload
+
+
+def public_workload(workload: Dict[str, Any]) -> Dict[str, Any]:
+    """Return workload metadata safe to share in a project message or artifact."""
+    result = dict(workload)
+    if "metadata" in result:
+        metadata = result["metadata"]
+        result["metadata"] = {key: "[REDACTED]" for key in metadata} if isinstance(metadata, dict) else "[REDACTED]"
+    return result
+
+
+def lifecycle_label(status: str) -> str:
+    """Map Jungle Grid status to a user-facing lifecycle label."""
+    if status == "assigned":
+        return "assigned (provisioning)"
+    return status
+
+
+def estimate_can_submit(estimate: Dict[str, Any]) -> bool:
+    """Return whether an estimate explicitly permits submission."""
+    return estimate.get("available") is not False and estimate.get("can_submit") is not False
+
+
+@dataclass
+class ProjectExecution:
+    """State tracked between estimate, approval, submission, and completion."""
+
+    project_id: str
+    workload: Dict[str, Any]
+    estimate_id: str
+    estimate: Dict[str, Any]
+    job_id: Optional[str] = None
+    last_status: Optional[str] = None
+    approved_by: Optional[str] = None
+    submission_started: bool = False
+    submit_payload: Optional[Dict[str, Any]] = None
+    secret_values: Optional[list[str]] = None
+
+
+class JungleGridExecutorAgent(WorkerAgent):
+    """Execute approved Jungle Grid workloads and report results to an OpenAgents project."""
+
+    default_agent_id = "jungle-grid-executor"
+
+    def __init__(
+        self,
+        jungle_grid_client: Optional[JungleGridClient] = None,
+        poll_interval_seconds: float = 10.0,
+        **kwargs: Any,
+    ):
+        super().__init__(**kwargs)
+        self.jungle_grid = jungle_grid_client or JungleGridClient()
+        self.poll_interval_seconds = poll_interval_seconds
+        self.project_adapter = DefaultProjectAgentAdapter()
+        self.executions: Dict[str, ProjectExecution] = {}
+        self.monitor_tasks: Dict[str, asyncio.Task] = {}
+
+    async def on_startup(self):
+        """Bind the project adapter after the OpenAgents client is connected."""
+        self.project_adapter.bind_client(self.client)
+        self.project_adapter.bind_connector(self.client.connector)
+        self.project_adapter.bind_agent(self.agent_id)
+        logger.info("Jungle Grid executor is ready")
+
+    async def on_shutdown(self):
+        """Stop local monitor tasks without cancelling remote jobs."""
+        for task in self.monitor_tasks.values():
+            task.cancel()
+        if self.monitor_tasks:
+            await asyncio.gather(*self.monitor_tasks.values(), return_exceptions=True)
+
+    async def _post(self, project_id: str, text: str):
+        await self.project_adapter.send_project_message(project_id=project_id, content={"text": text})
+
+    async def _set_artifact(self, project_id: str, key: str, value: Dict[str, Any]):
+        await self.project_adapter.set_project_artifact(
+            project_id=project_id, key=key, value=json.dumps(value, indent=2)
+        )
+
+    def _project_secrets(self, execution: ProjectExecution) -> list[str]:
+        return [
+            self.jungle_grid.api_key,
+            *(execution.secret_values or []),
+            *_collect_string_values(execution.workload.get("metadata")),
+        ]
+
+    def _sanitize_for_project(self, value: Any, execution: ProjectExecution) -> Any:
+        return sanitize_project_data(value, self._project_secrets(execution))
+
+    def _is_human_approver(self, sender_id: str) -> bool:
+        return sender_id.startswith("human:")
+
+    @on_event("project.notification.started")
+    async def handle_project_started(self, context: EventContext):
+        """Estimate a workload and request human approval without submitting it."""
+        payload = context.incoming_event.payload
+        project_id = payload.get("project_id")
+        goal = payload.get("goal", "")
+        if not project_id:
+            return
+        try:
+            workload = parse_workload_goal(goal)
+            estimate = await self.jungle_grid.estimate_job(build_estimate_payload(workload))
+            estimate_id = uuid.uuid4().hex[:12]
+            execution = ProjectExecution(project_id, workload, estimate_id, estimate)
+            self.executions[project_id] = execution
+            shared_workload = self._sanitize_for_project(public_workload(workload), execution)
+            shared_estimate = self._sanitize_for_project(estimate, execution)
+            await self._set_artifact(
+                project_id,
+                "jungle_grid_estimate",
+                {"estimate_id": estimate_id, "workload": shared_workload, "estimate": shared_estimate},
+            )
+            if not estimate_can_submit(estimate):
+                await self._post(
+                    project_id,
+                    "Jungle Grid estimate is not currently eligible for submission.\n\n"
+                    f"```json\n{json.dumps({'estimate_id': estimate_id, 'workload': shared_workload, 'estimate': shared_estimate}, indent=2)}\n```",
+                )
+                await self.project_adapter.stop_project(
+                    project_id=project_id, reason="Jungle Grid estimate is not eligible for submission"
+                )
+                return
+            await self._post(
+                project_id,
+                "Jungle Grid estimate ready. No job has been submitted.\n\n"
+                f"```json\n{json.dumps({'estimate_id': estimate_id, 'workload': shared_workload, 'estimate': shared_estimate}, indent=2)}\n```\n\n"
+                f"A human must reply exactly `APPROVE {estimate_id}` before billable compute can start.",
+            )
+        except (ValueError, JungleGridError) as exc:
+            await self._post(
+                project_id, f"Jungle Grid estimate failed: {redact_sensitive(exc, self.jungle_grid.api_key)}"
+            )
+            await self.project_adapter.stop_project(project_id=project_id, reason="Jungle Grid estimate failed")
+
+    @on_event("project.notification.message_received")
+    async def handle_project_message(self, context: EventContext):
+        """Handle explicit approval and cancellation commands."""
+        payload = context.incoming_event.payload
+        project_id = payload.get("project_id")
+        sender_id = str(payload.get("sender_id", ""))
+        content = payload.get("content", {})
+        text = content.get("text", "") if isinstance(content, dict) else ""
+        if not project_id or not isinstance(text, str):
+            return
+        command = text
+        execution = self.executions.get(project_id)
+
+        if command.startswith("APPROVE "):
+            if not execution:
+                await self._post(project_id, "There is no pending Jungle Grid estimate for this project.")
+                return
+            if not self._is_human_approver(sender_id):
+                await self._post(
+                    project_id, "Approval rejected: billable Jungle Grid submission requires a human approver."
+                )
+                return
+            if command != f"APPROVE {execution.estimate_id}":
+                await self._post(project_id, "Approval rejected: estimate id does not match the pending estimate.")
+                return
+            if execution.submission_started:
+                suffix = f" as job `{execution.job_id}`" if execution.job_id else ""
+                await self._post(project_id, f"Jungle Grid submission has already been requested{suffix}.")
+                return
+            await self._submit_and_monitor(execution, sender_id)
+            return
+
+        if command.startswith("CANCEL "):
+            if not execution or not execution.job_id:
+                await self._post(project_id, "There is no submitted Jungle Grid job to cancel for this project.")
+                return
+            if command != f"CANCEL {execution.job_id}":
+                await self._post(project_id, "Cancellation rejected: job id does not match this project.")
+                return
+            if not self._is_human_approver(sender_id):
+                await self._post(
+                    project_id, "Cancellation rejected: Jungle Grid cancellation requires a human approver."
+                )
+                return
+            try:
+                result = await self.jungle_grid.cancel_job(
+                    execution.job_id, f"Requested from OpenAgents by {sender_id}"
+                )
+                shared_result = self._sanitize_for_project(result, execution)
+                await self._post(
+                    project_id,
+                    f"Cancellation requested for Jungle Grid job `{execution.job_id}`.\n\n```json\n{json.dumps(shared_result, indent=2)}\n```",
+                )
+            except JungleGridError as exc:
+                await self._post(
+                    project_id, f"Jungle Grid cancellation failed: {redact_sensitive(exc, self.jungle_grid.api_key)}"
+                )
+
+    async def _submit_and_monitor(self, execution: ProjectExecution, approved_by: str):
+        execution.submission_started = True
+        execution.approved_by = approved_by
+        try:
+            execution.submit_payload = build_submit_payload(execution.workload)
+            execution.secret_values = _collect_string_values(execution.submit_payload.get("environment"))
+            result = await self.jungle_grid.submit_job(execution.submit_payload)
+            job_id = str(result.get("job_id") or result.get("id") or "").strip()
+            if not job_id:
+                raise JungleGridError("INVALID_API_RESPONSE", "Jungle Grid submit response did not include a job id.")
+            execution.job_id = job_id
+            execution.last_status = str(result.get("status") or "submitted")
+            await self._set_artifact(
+                execution.project_id,
+                "jungle_grid_submission",
+                {
+                    "approved_by": approved_by,
+                    "estimate_id": execution.estimate_id,
+                    "submission": self._sanitize_for_project(result, execution),
+                },
+            )
+            await self._post(
+                execution.project_id,
+                f"Jungle Grid job submitted after approval by `{approved_by}`: `{job_id}` "
+                f"(status: `{lifecycle_label(execution.last_status)}`).",
+            )
+            task = asyncio.create_task(self._monitor(execution))
+            self.monitor_tasks[execution.project_id] = task
+        except (ValueError, JungleGridError) as exc:
+            await self._post(
+                execution.project_id,
+                f"Jungle Grid submission failed: {redact_sensitive(exc, self.jungle_grid.api_key)}",
+            )
+            await self.project_adapter.stop_project(
+                project_id=execution.project_id, reason="Jungle Grid submission failed"
+            )
+
+    async def _monitor(self, execution: ProjectExecution):
+        assert execution.job_id
+        try:
+            while True:
+                job = await self.jungle_grid.get_job(execution.job_id)
+                status = str(job.get("status") or "unknown")
+                if status != execution.last_status:
+                    execution.last_status = status
+                    await self._post(
+                        execution.project_id,
+                        f"Jungle Grid job `{execution.job_id}` is now `{lifecycle_label(status)}`.",
+                    )
+                if status in TERMINAL_STATUSES:
+                    await self._finalize(execution, job)
+                    return
+                await asyncio.sleep(self.poll_interval_seconds)
+        except JungleGridError as exc:
+            await self._post(
+                execution.project_id,
+                f"Jungle Grid monitoring failed: {redact_sensitive(exc, self.jungle_grid.api_key)}",
+            )
+            await self.project_adapter.stop_project(
+                project_id=execution.project_id, reason="Jungle Grid monitoring failed"
+            )
+        finally:
+            self.monitor_tasks.pop(execution.project_id, None)
+
+    async def _finalize(self, execution: ProjectExecution, job: Dict[str, Any]):
+        assert execution.job_id
+        logs: Dict[str, Any] = {}
+        artifacts: Dict[str, Any] = {}
+        downloads = []
+        try:
+            logs = await self.jungle_grid.get_job_logs(execution.job_id)
+        except JungleGridError as exc:
+            logs = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
+        try:
+            artifacts = await self.jungle_grid.list_artifacts(execution.job_id)
+            for artifact in artifacts.get("artifacts", []):
+                if not isinstance(artifact, dict):
+                    continue
+                artifact_id = str(artifact.get("artifact_id") or artifact.get("id") or "").strip()
+                if artifact_id:
+                    downloads.append(await self.jungle_grid.get_artifact(execution.job_id, artifact_id))
+        except JungleGridError as exc:
+            artifacts = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
+
+        result = self._sanitize_for_project(
+            {"job": job, "logs": logs, "artifacts": artifacts, "downloads": downloads},
+            execution,
+        )
+        await self._set_artifact(execution.project_id, "jungle_grid_result", result)
+        status = str(job.get("status") or "unknown")
+        await self._post(
+            execution.project_id,
+            f"Jungle Grid job `{execution.job_id}` finished with status `{status}`. "
+            "Logs and artifact metadata are stored in project artifact `jungle_grid_result`.",
+        )
+        if status == "completed":
+            await self.project_adapter.complete_project(
+                project_id=execution.project_id,
+                summary=f"Jungle Grid job {execution.job_id} completed successfully.",
+            )
+        else:
+            await self.project_adapter.stop_project(
+                project_id=execution.project_id,
+                reason=f"Jungle Grid job {execution.job_id} finished with status {status}.",
+            )
+
+
+async def main():
+    """Run the Jungle Grid executor agent."""
+    logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
+    agent = JungleGridExecutorAgent()
+    try:
+        await agent.async_start(network_host="localhost", network_port=8700)
+        while True:
+            await asyncio.sleep(3600)
+    finally:
+        await agent.async_stop()
+
+
+if __name__ == "__main__":
+    asyncio.run(main())
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/network.yaml b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
new file mode 100644
index 000000000..dd935306c
--- /dev/null
+++ b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
@@ -0,0 +1,89 @@
+network:
+  name: JungleGridGPUExecution
+  mode: centralized
+  node_id: jungle-grid-gpu-execution-1
+  initialized: true
+  transports:
+    - type: http
+      config:
+        port: 8700
+        serve_studio: true
+        serve_mcp: true
+    - type: grpc
+      config:
+        port: 8600
+  manifest_transport: http
+  recommended_transport: grpc
+  encryption_enabled: false
+  default_agent_group: guest
+  requires_password: false
+  agent_groups:
+    executors:
+      description: Agents allowed to execute Jungle Grid project workflows
+      metadata:
+        permissions:
+          - execute_external_compute
+        agents:
+          - jungle-grid-executor
+  mods:
+    - name: openagents.mods.workspace.default
+      enabled: true
+      config:
+        custom_events_enabled: true
+    - name: openagents.mods.workspace.project
+      enabled: true
+      config:
+        max_concurrent_projects: 5
+        project_templates:
+          jungle_grid_execution:
+            name: Jungle Grid GPU Execution
+            description: Estimate, approve, execute, and monitor an AI workload on Jungle Grid
+            expose_as_tool: true
+            tool_name: run_jungle_grid_workload
+            tool_description: Start a Jungle Grid workload project. The task must be a JSON object with name, workload_type, and image; use environment_from_env for workload environment values.
+            tool_mode: async
+            agent_groups:
+              - executors
+            context: |
+              This project delegates a long-running AI or GPU workload to Jungle Grid.
+              The executor estimates cost first and will not submit a job until a human
+              replies with the exact approval command shown in the project. Do not put
+              credentials in the goal; use environment_from_env to reference variables
+              available only in the executor process.
+  created_by_version: 0.9.3
+
+network_profile:
+  discoverable: true
+  name: Jungle Grid GPU Execution
+  description: A demo of human-approved asynchronous AI and GPU workload delegation through Jungle Grid.
+  tags:
+    - demo
+    - jungle-grid
+    - gpu
+    - execution
+    - project
+  categories:
+    - demo
+    - workflow
+  country: Worldwide
+  required_openagents_version: 0.9.3
+  capacity: 10
+  authentication:
+    type: none
+  host: 0.0.0.0
+  port: 8700
+
+log_level: INFO
+data_dir: ./data/jungle-grid-gpu-execution
+runtime_limit: null
+shutdown_timeout: 30
+
+external_access:
+  default_agent_group: guest
+  auth_token: null
+  auth_token_env: null
+  instruction: null
+  exposed_tools:
+    - start_run_jungle_grid_workload
+    - get_result_run_jungle_grid_workload
+  excluded_tools: []
diff --git a/tests/agents/test_jungle_grid_executor.py b/tests/agents/test_jungle_grid_executor.py
new file mode 100644
index 000000000..288cc8fce
--- /dev/null
+++ b/tests/agents/test_jungle_grid_executor.py
@@ -0,0 +1,482 @@
+"""Mocked tests for the Jungle Grid GPU execution demo agent."""
+
+import asyncio
+import importlib.util
+import json
+from pathlib import Path
+from unittest.mock import AsyncMock
+
+import pytest
+
+from openagents.models.event import Event
+from openagents.models.event_context import EventContext
+
+MODULE_PATH = (
+    Path(__file__).parent.parent.parent
+    / "sdk"
+    / "demos"
+    / "09_jungle_grid_gpu_execution"
+    / "agents"
+    / "jungle_grid_executor.py"
+)
+SPEC = importlib.util.spec_from_file_location("jungle_grid_executor", MODULE_PATH)
+MODULE = importlib.util.module_from_spec(SPEC)
+assert SPEC and SPEC.loader
+SPEC.loader.exec_module(MODULE)
+
+JungleGridClient = MODULE.JungleGridClient
+JungleGridError = MODULE.JungleGridError
+JungleGridExecutorAgent = MODULE.JungleGridExecutorAgent
+ProjectExecution = MODULE.ProjectExecution
+build_estimate_payload = MODULE.build_estimate_payload
+build_submit_payload = MODULE.build_submit_payload
+estimate_can_submit = MODULE.estimate_can_submit
+lifecycle_label = MODULE.lifecycle_label
+parse_workload_goal = MODULE.parse_workload_goal
+public_workload = MODULE.public_workload
+redact_sensitive = MODULE.redact_sensitive
+sanitize_project_data = MODULE.sanitize_project_data
+
+
+def context(event_name, payload):
+    return EventContext(
+        incoming_event=Event(event_name=event_name, source_id="system", payload=payload),
+        event_threads={},
+        incoming_thread_id="thread-1",
+    )
+
+
+def workload():
+    return {
+        "name": "batch-demo",
+        "workload_type": "batch",
+        "image": "python:3.11-slim",
+        "command": "python",
+        "args": ["-c", "print(42)"],
+        "optimize_for": "cost",
+    }
+
+
+class FakeJungleGridClient:
+    def __init__(self):
+        self.api_key = "test-api-key"
+        self.estimate_job = AsyncMock(return_value={"available": True, "estimated_cost_usd": {"min": 0.1, "max": 0.2}})
+        self.submit_job = AsyncMock(return_value={"job_id": "job_123", "status": "queued"})
+        self.get_job = AsyncMock(return_value={"job_id": "job_123", "status": "completed"})
+        self.get_job_logs = AsyncMock(return_value={"items": [{"message": "done"}]})
+        self.cancel_job = AsyncMock(return_value={"job_id": "job_123", "status": "cancelled", "cancelled": True})
+        self.list_artifacts = AsyncMock(
+            return_value={"artifacts": [{"artifact_id": "artifact_1", "filename": "output.json"}]}
+        )
+        self.get_artifact = AsyncMock(
+            return_value={
+                "artifact": {"artifact_id": "artifact_1", "filename": "output.json"},
+                "url": "https://example.test/file",
+            }
+        )
+
+
+def agent_with_mocks(fake=None):
+    agent = JungleGridExecutorAgent(jungle_grid_client=fake or FakeJungleGridClient(), poll_interval_seconds=0)
+    agent.project_adapter = AsyncMock()
+    agent.project_adapter.send_project_message = AsyncMock(return_value={"success": True})
+    agent.project_adapter.set_project_artifact = AsyncMock(return_value={"success": True})
+    agent.project_adapter.complete_project = AsyncMock(return_value={"success": True})
+    agent.project_adapter.stop_project = AsyncMock(return_value={"success": True})
+    return agent
+
+
+@pytest.mark.asyncio
+async def test_successful_estimate_flow_posts_estimate_and_requires_approval():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+
+    await agent.handle_project_started(
+        context("project.notification.started", {"project_id": "project-1", "goal": json.dumps(workload())})
+    )
+
+    fake.estimate_job.assert_awaited_once_with(build_estimate_payload(workload()))
+    fake.submit_job.assert_not_awaited()
+    assert "project-1" in agent.executions
+    message = agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
+    assert "No job has been submitted" in message
+    assert "APPROVE" in message
+
+
+@pytest.mark.asyncio
+async def test_unavailable_estimate_never_requests_approval_or_submits():
+    fake = FakeJungleGridClient()
+    fake.estimate_job = AsyncMock(return_value={"available": False, "can_submit": False})
+    agent = agent_with_mocks(fake)
+
+    await agent.handle_project_started(
+        context("project.notification.started", {"project_id": "project-1", "goal": json.dumps(workload())})
+    )
+
+    fake.submit_job.assert_not_awaited()
+    message = agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
+    assert "not currently eligible for submission" in message
+    assert "APPROVE" not in message
+    agent.project_adapter.stop_project.assert_awaited_once()
+
+
+@pytest.mark.asyncio
+async def test_approval_required_before_submit_and_non_human_is_rejected():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    agent.executions["project-1"] = execution
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "agent:other", "content": {"text": "APPROVE estimate-1"}},
+        )
+    )
+
+    fake.submit_job.assert_not_awaited()
+    assert (
+        "requires a human approver" in agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
+    )
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("command", ["APPROVE estimate-2", " APPROVE estimate-1", "APPROVE estimate-1\n"])
+async def test_approval_requires_exact_command(command):
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    agent.executions["project-1"] = execution
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": command}},
+        )
+    )
+
+    fake.submit_job.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_approved_submit_flow_starts_monitor():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    agent.executions["project-1"] = execution
+    agent._monitor = AsyncMock()
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "APPROVE estimate-1"}},
+        )
+    )
+    await asyncio.sleep(0)
+
+    fake.submit_job.assert_awaited_once_with(workload())
+    assert execution.job_id == "job_123"
+    agent._monitor.assert_awaited_once_with(execution)
+
+
+@pytest.mark.asyncio
+async def test_concurrent_matching_approvals_submit_only_once():
+    fake = FakeJungleGridClient()
+    submit_started = asyncio.Event()
+    release_submit = asyncio.Event()
+
+    async def delayed_submit(_workload):
+        submit_started.set()
+        await release_submit.wait()
+        return {"job_id": "job_123", "status": "queued"}
+
+    fake.submit_job = AsyncMock(side_effect=delayed_submit)
+    agent = agent_with_mocks(fake)
+    agent._monitor = AsyncMock()
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    agent.executions["project-1"] = execution
+    approval = context(
+        "project.notification.message_received",
+        {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "APPROVE estimate-1"}},
+    )
+
+    first = asyncio.create_task(agent.handle_project_message(approval))
+    await submit_started.wait()
+    await agent.handle_project_message(approval)
+    release_submit.set()
+    await first
+    await asyncio.sleep(0)
+
+    fake.submit_job.assert_awaited_once_with(workload())
+
+
+@pytest.mark.asyncio
+async def test_status_polling_posts_updates_and_completes():
+    fake = FakeJungleGridClient()
+    fake.get_job = AsyncMock(
+        side_effect=[
+            {"job_id": "job_123", "status": "running"},
+            {"job_id": "job_123", "status": "completed"},
+        ]
+    )
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123", last_status="queued")
+
+    await agent._monitor(execution)
+
+    texts = [call.kwargs["content"]["text"] for call in agent.project_adapter.send_project_message.await_args_list]
+    assert any("`running`" in text for text in texts)
+    assert any("`completed`" in text for text in texts)
+    agent.project_adapter.complete_project.assert_awaited_once()
+
+
+@pytest.mark.asyncio
+async def test_failed_workload_stops_project():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+
+    await agent._finalize(execution, {"job_id": "job_123", "status": "failed"})
+
+    agent.project_adapter.stop_project.assert_awaited_once()
+    agent.project_adapter.complete_project.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_logs_and_artifacts_are_stored_in_project_artifact():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+
+    await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
+
+    fake.get_job_logs.assert_awaited_once_with("job_123")
+    fake.list_artifacts.assert_awaited_once_with("job_123")
+    fake.get_artifact.assert_awaited_once_with("job_123", "artifact_1")
+    artifact_call = agent.project_adapter.set_project_artifact.await_args
+    assert artifact_call.kwargs["key"] == "jungle_grid_result"
+    assert "output.json" in artifact_call.kwargs["value"]
+
+
+@pytest.mark.asyncio
+async def test_resolved_environment_values_are_redacted_from_results(monkeypatch):
+    monkeypatch.setenv("MODEL_TOKEN", "secret-value")
+    fake = FakeJungleGridClient()
+    fake.get_job_logs = AsyncMock(return_value={"items": [{"message": "token=secret-value"}]})
+    agent = agent_with_mocks(fake)
+    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MODEL_TOKEN"}}
+    execution = ProjectExecution(
+        "project-1",
+        requested,
+        "estimate-1",
+        {},
+        job_id="job_123",
+        submit_payload=build_submit_payload(requested),
+        secret_values=["secret-value"],
+    )
+
+    await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
+
+    artifact_value = agent.project_adapter.set_project_artifact.await_args.kwargs["value"]
+    assert "secret-value" not in artifact_value
+    assert "[REDACTED]" in artifact_value
+
+
+@pytest.mark.asyncio
+async def test_cancellation_uses_matching_job_id():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "CANCEL job_123"}},
+        )
+    )
+
+    fake.cancel_job.assert_awaited_once_with("job_123", "Requested from OpenAgents by human:user")
+
+
+@pytest.mark.asyncio
+async def test_non_human_cancellation_is_rejected():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "agent:other", "content": {"text": "CANCEL job_123"}},
+        )
+    )
+
+    fake.cancel_job.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+@pytest.mark.parametrize("command", ["CANCEL job_456", " CANCEL job_123", "CANCEL job_123\n"])
+async def test_cancellation_requires_exact_command(command):
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": command}},
+        )
+    )
+
+    fake.cancel_job.assert_not_awaited()
+
+
+@pytest.mark.asyncio
+async def test_missing_api_key_is_reported_without_network_call(monkeypatch):
+    monkeypatch.delenv("JUNGLE_GRID_API_KEY", raising=False)
+    client = JungleGridClient()
+    with pytest.raises(JungleGridError, match="JUNGLE_GRID_API_KEY is required"):
+        await client.estimate_job(workload())
+
+
+def test_invalid_workload_is_rejected():
+    with pytest.raises(ValueError, match="Missing required workload fields"):
+        parse_workload_goal('{"workload_type": "batch"}')
+
+
+def test_workload_rejects_literal_credentials_and_secret_like_metadata():
+    with pytest.raises(ValueError, match="must not contain API keys"):
+        parse_workload_goal(json.dumps({**workload(), "command": "curl -H 'Bearer secret-value'"}))
+    with pytest.raises(ValueError, match="secret-like keys"):
+        parse_workload_goal(json.dumps({**workload(), "metadata": {"api_token": "secret-value"}}))
+
+
+def test_build_submit_payload_resolves_environment_only_at_submission(monkeypatch):
+    monkeypatch.setenv("MODEL_TOKEN", "secret-value")
+    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MODEL_TOKEN"}}
+
+    assert "environment_from_env" not in build_estimate_payload(requested)
+    assert build_submit_payload(requested)["environment"] == {"MODEL_TOKEN": "secret-value"}
+    assert public_workload(requested)["environment_from_env"] == {"MODEL_TOKEN": "MODEL_TOKEN"}
+
+
+def test_build_submit_payload_rejects_missing_local_environment(monkeypatch):
+    monkeypatch.delenv("MISSING_MODEL_TOKEN", raising=False)
+    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MISSING_MODEL_TOKEN"}}
+
+    with pytest.raises(ValueError, match="MISSING_MODEL_TOKEN"):
+        build_submit_payload(requested)
+
+
+def test_secret_redaction_removes_api_keys_and_bearer_tokens():
+    text = redact_sensitive("failed with Bearer abc123 and jg_super_secret", "jg_super_secret")
+    assert "abc123" not in text
+    assert "jg_super_secret" not in text
+    assert "[REDACTED]" in text
+
+
+def test_public_workload_redacts_metadata_values():
+    shared = public_workload({**workload(), "metadata": {"nested": {"value": "secret"}}})
+    assert shared["metadata"] == {"nested": "[REDACTED]"}
+    assert "secret" not in json.dumps(shared)
+
+
+def test_project_data_redaction_removes_nested_workload_secrets():
+    result = sanitize_project_data(
+        {"logs": [{"message": "token=secret-value"}], "error": "Bearer test-api-key"},
+        ["secret-value", "test-api-key"],
+    )
+    assert "secret-value" not in json.dumps(result)
+    assert "test-api-key" not in json.dumps(result)
+
+
+def test_estimate_can_submit_honors_explicit_unavailability():
+    assert estimate_can_submit({"available": True, "can_submit": True})
+    assert not estimate_can_submit({"available": False})
+    assert not estimate_can_submit({"can_submit": False})
+
+
+@pytest.mark.parametrize(
+    ("status", "label"),
+    [
+        ("submitted", "submitted"),
+        ("queued", "queued"),
+        ("assigned", "assigned (provisioning)"),
+        ("running", "running"),
+        ("completed", "completed"),
+        ("failed", "failed"),
+        ("rejected", "rejected"),
+        ("cancelled", "cancelled"),
+    ],
+)
+def test_lifecycle_labels(status, label):
+    assert lifecycle_label(status) == label
+
+
+class FakeResponse:
+    def __init__(self, status, text):
+        self.status = status
+        self._text = text
+
+    async def text(self):
+        return self._text
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        return None
+
+
+class FakeSession:
+    def __init__(self, response=None, error=None, **kwargs):
+        self.response = response
+        self.error = error
+
+    def request(self, *args, **kwargs):
+        if self.error:
+            raise self.error
+        return self.response
+
+    async def __aenter__(self):
+        return self
+
+    async def __aexit__(self, exc_type, exc, tb):
+        return None
+
+
+@pytest.mark.asyncio
+async def test_invalid_jungle_grid_response(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+    monkeypatch.setattr(MODULE.aiohttp, "ClientSession", lambda **kwargs: FakeSession(FakeResponse(200, "not-json")))
+    client = JungleGridClient()
+
+    with pytest.raises(JungleGridError, match="invalid JSON"):
+        await client.get_job("job_123")
+
+
+@pytest.mark.asyncio
+async def test_network_timeout_is_sanitized(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+    monkeypatch.setattr(
+        MODULE.aiohttp,
+        "ClientSession",
+        lambda **kwargs: FakeSession(error=asyncio.TimeoutError()),
+    )
+    client = JungleGridClient()
+
+    with pytest.raises(JungleGridError, match="timed out"):
+        await client.get_job("job_123")
+
+
+@pytest.mark.asyncio
+async def test_api_error_is_sanitized(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+    body = json.dumps({"error": {"code": "FORBIDDEN", "message": "Bearer test-api-key is not allowed"}})
+    monkeypatch.setattr(MODULE.aiohttp, "ClientSession", lambda **kwargs: FakeSession(FakeResponse(403, body)))
+    client = JungleGridClient()
+
+    with pytest.raises(JungleGridError) as exc_info:
+        await client.get_job("job_123")
+    assert exc_info.value.code == "FORBIDDEN"
+    assert "test-api-key" not in str(exc_info.value)

From 7d6c00d375c63986ac5ff8766000739831a9e279 Mon Sep 17 00:00:00 2001
From: dejaguarkyng <deinvinciblekyng.1@gmail.com>
Date: Tue, 9 Jun 2026 09:48:52 +0000
Subject: [PATCH 2/5] fix: update jungle grid executor group auth

---
 .../IMPLEMENTATION_DECISION.md                |  28 ++--
 .../09_jungle_grid_gpu_execution/README.md    |  54 +++++--
 .../agents/jungle_grid_executor.py            |  52 +++++--
 .../09_jungle_grid_gpu_execution/network.yaml |   5 +-
 tests/agents/test_jungle_grid_executor.py     | 144 +++++++++++++++++-
 5 files changed, 250 insertions(+), 33 deletions(-)

diff --git a/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
index 41375c1d0..05857178a 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
+++ b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
@@ -7,11 +7,11 @@ uses OpenAgents' project mod for the long-running workflow, project messages for
 estimate and lifecycle updates, and project artifacts for logs and Jungle Grid
 artifact metadata.
 
-Jungle Grid is an external execution layer, not an OpenAgents transport, launcher
-agent type, or network mod. A demo keeps the integration provider-specific while
-showing a reusable OpenAgents pattern: an agent delegates asynchronous compute,
-waits for human approval before billable work, and returns results to a shared
-project.
+Jungle Grid is an external agentic AI workload execution and GPU orchestration
+layer, not an OpenAgents transport, launcher agent type, or network mod. A demo
+keeps the integration provider-specific while showing a reusable OpenAgents
+pattern: an agent delegates asynchronous compute, waits for human approval
+before billable work, and returns results to a shared project.
 
 ## Rejected Alternatives
 
@@ -21,10 +21,9 @@ project.
   provider-specific compute backend.
 - **Jungle Grid mod:** The integration does not add network-wide event semantics or
   shared infrastructure. Existing project events already cover the workflow.
-- **Hosted MCP entry:** OpenAgents can load external MCP tools, but the current
-  Streamable HTTP MCP connector does not perform Jungle Grid's hosted OAuth flow or
-  attach API-key headers. Adding that capability solely for this demo would be a
-  core architecture change.
+- **Hosted MCP entry:** Jungle Grid's hosted Streamable HTTP endpoint uses OAuth,
+  while local stdio uses an API key. The direct REST integration keeps approval
+  and project state inside OpenAgents without requiring an MCP auth change.
 - **Local stdio MCP dependency:** The Jungle Grid stdio MCP package is supported,
   but a direct Python API client is easier to validate, test, and constrain around
   mandatory human approval. It also avoids requiring Node.js for a Python demo.
@@ -36,15 +35,24 @@ The demo uses the documented public execution API:
 - `POST /v1/jobs/estimate`
 - `POST /v1/jobs`
 - `GET /v1/jobs/{job_id}`
+- `GET /v1/jobs/{job_id}/runtime`
 - `GET /v1/jobs/{job_id}/logs`
 - `POST /v1/jobs/{job_id}/cancel`
 - `GET /v1/jobs/{job_id}/artifacts`
 - `POST /v1/jobs/{job_id}/artifacts/{artifact_id}/download`
 
-Authentication is a scoped server-side API key in `JUNGLE_GRID_API_KEY`. The
+Authentication is a scoped server-side API key in `JUNGLE_GRID_API_KEY`; the
+REST base can be overridden with `JUNGLE_GRID_API`. The
 documented lifecycle includes `pending`, `queued`, `assigned`, `running`,
 `completed`, `failed`, `rejected`, and `cancelled`.
 
+The current REST request shape includes `model_size_gb`. Estimate responses
+describe classification, routing, capacity, rates, cost ranges, queue waits,
+start windows, warnings, and screening without starting compute. Managed
+workloads can publish regular files from `/workspace/artifacts`; temporary
+signed artifact download URLs are treated as secrets and are not stored in the
+OpenAgents project.
+
 Workload environment values are not accepted in project goals. A goal may use
 `environment_from_env` to reference variables available only in the executor
 process; those values are resolved after human approval and are excluded from
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/README.md b/sdk/demos/09_jungle_grid_gpu_execution/README.md
index a9841b8df..599cf77ab 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/README.md
+++ b/sdk/demos/09_jungle_grid_gpu_execution/README.md
@@ -1,8 +1,9 @@
 # Jungle Grid GPU Execution Demo
 
 This demo shows an OpenAgents execution agent delegating long-running AI and GPU
-workloads to [Jungle Grid](https://junglegrid.dev), an execution layer that
-places and runs AI workloads without requiring agents to manage GPU servers.
+workloads to [Jungle Grid](https://junglegrid.dev), an agentic AI workload
+execution and GPU orchestration layer that classifies intent, resolves capacity,
+and places workloads without requiring agents to manage GPU servers.
 
 The workflow fits OpenAgents because the workload is asynchronous and
 collaborative: an agent estimates the job, a human approves spending in the
@@ -30,7 +31,7 @@ approval, immediately before submission.
 
 - `JUNGLE_GRID_API_KEY` is required. The agent reads this server-side API key and
   sends it only as a Bearer token to Jungle Grid.
-- `JUNGLEGRID_API_BASE` optionally overrides the default API base,
+- `JUNGLE_GRID_API` optionally overrides the default REST API base,
   `https://api.junglegrid.dev`.
 - Any workload-specific variables referenced by `environment_from_env` must also
   be exported in the executor process. Their values are never placed in the
@@ -54,6 +55,10 @@ export JUNGLE_GRID_API_KEY="jg_..."
 
 ## Run The Demo
 
+The current demo assumes exactly one executor. Run one
+`jungle-grid-executor` process so a project is estimated and submitted at most
+once.
+
 Start the OpenAgents network from this demo directory. The network enables the
 project mod and exposes the `Jungle Grid GPU Execution` project template:
 
@@ -70,6 +75,13 @@ cd sdk/demos/09_jungle_grid_gpu_execution
 python agents/jungle_grid_executor.py
 ```
 
+The script connects with the password hash configured for the `executors`
+group. OpenAgents records that connection in
+`network.topology.agent_group_membership`, which is the runtime source used by
+the project mod. The optional `metadata.agents` list in an agent-group
+configuration does not assign runtime membership and is intentionally not used
+by this demo.
+
 Open Studio at `http://localhost:8700/studio`, create a project with the
 `Jungle Grid GPU Execution` template, and use a JSON object as the project goal.
 For example:
@@ -79,14 +91,18 @@ For example:
   "name": "openagents-batch-demo",
   "workload_type": "batch",
   "image": "python:3.11-slim",
+  "model_size_gb": 1,
   "command": "python",
   "args": ["-c", "print('hello from Jungle Grid')"],
   "optimize_for": "cost"
 }
 ```
 
-The agent validates the request and calls Jungle Grid's estimate endpoint. It
-posts the structured estimate and stores it as project artifact
+The agent validates the request and calls the read-only
+`POST /v1/jobs/estimate` endpoint. Current estimates include workload
+classification, routing and capacity signals, hourly and total cost ranges,
+queue-wait ranges, estimated start windows, warnings, and screening details.
+The executor posts that structured estimate and stores it as project artifact
 `jungle_grid_estimate`. No compute has been submitted at this point.
 
 For a workload that needs a credential or other environment value, export it in
@@ -101,6 +117,7 @@ export MODEL_TOKEN="..."
   "name": "openagents-inference-demo",
   "workload_type": "inference",
   "image": "example/model-server:latest",
+  "model_size_gb": 7,
   "environment_from_env": {
     "MODEL_TOKEN": "MODEL_TOKEN"
   },
@@ -120,10 +137,16 @@ the agent. Estimates that explicitly report `available: false` or
 APPROVE <estimate-id>
 ```
 
-After approval, the agent submits the workload, posts status changes such as
-submitted, queued, assigned/provisioning, running, completed, failed, rejected,
-or cancelled, and stores the final job details, logs, artifact list, and
-temporary download metadata in project artifact `jungle_grid_result`.
+After approval, the agent submits with `POST /v1/jobs`, polls
+`GET /v1/jobs/{job_id}`, and posts public lifecycle changes: pending, queued,
+assigned, running, completed, failed, rejected, or cancelled. On a terminal
+state it retrieves the runtime surface, the latest 100 stored log entries, and
+the managed artifact list. Regular files written by managed workloads under
+`/workspace/artifacts` are eligible for automatic upload.
+
+Artifact download requests mint temporary signed URLs. The executor requests
+download metadata but redacts the URL before storing `jungle_grid_result`; do
+not log or share signed URLs.
 
 To cancel a submitted job, reply with the exact job ID:
 
@@ -142,6 +165,19 @@ invalid Jungle Grid responses, and API errors are posted to the project in
 sanitized form. Failed, rejected, or cancelled jobs stop the OpenAgents project.
 Completed jobs complete the project.
 
+The API key needs `jobs:estimate`, `jobs:submit`, `jobs:read`, and `logs:read`
+capabilities for the complete flow.
+
+## Jungle Grid Interfaces
+
+This demo calls the REST API directly so OpenAgents can enforce project-based
+human approval. Jungle Grid also provides the `jungle` CLI, whose `submit`
+command estimates and asks for confirmation before queuing, and a hosted MCP
+endpoint at `https://mcp.junglegrid.dev/mcp`. Hosted MCP uses OAuth; local stdio
+MCP uses `JUNGLE_GRID_API_KEY`. The current MCP tools are `estimate_job`,
+`submit_job`, `list_jobs`, `get_job`, `get_job_logs`, `cancel_job`,
+`list_artifacts`, and `get_artifact`.
+
 ## Tests
 
 Run the focused mocked tests. They do not contact Jungle Grid or submit paid
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
index 6e8a369c2..23348120b 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
+++ b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
@@ -20,26 +20,32 @@
 logger = logging.getLogger(__name__)
 
 DEFAULT_API_BASE = "https://api.junglegrid.dev"
+EXECUTORS_GROUP_PASSWORD_HASH = "8fba13dab71d6fdd8a9b9db1f06e81315dfbfd69167b6097f724604db3c91cdf"
 TERMINAL_STATUSES = {"completed", "failed", "rejected", "cancelled"}
-VALID_WORKLOAD_TYPES = {"inference", "training", "fine-tuning", "batch"}
+VALID_WORKLOAD_TYPES = {"inference", "training", "batch"}
 VALID_OPTIMIZE_FOR = {"balanced", "cost", "speed"}
 SUBMIT_FIELDS = {
     "name",
     "workload_type",
     "image",
+    "model_size_gb",
     "command",
     "args",
     "environment_from_env",
     "optimize_for",
+    "constraints",
     "template",
     "metadata",
 }
 ESTIMATE_FIELDS = {
+    "name",
     "workload_type",
     "image",
+    "model_size_gb",
     "command",
     "args",
     "optimize_for",
+    "constraints",
     "template",
 }
 SENSITIVE_PATTERN = re.compile(r"(?i)(bearer\s+)[^\s,;]+|jg_[A-Za-z0-9_-]+")
@@ -116,8 +122,14 @@ def _error_detail(data: Any, status: int) -> tuple[str, str]:
     if isinstance(data, dict):
         nested = data.get("error")
         if isinstance(nested, dict):
-            return str(nested.get("code") or "API_ERROR"), str(nested.get("message") or f"HTTP {status}")
-        return str(data.get("code") or "API_ERROR"), str(data.get("message") or f"HTTP {status}")
+            return (
+                redact_sensitive(nested.get("code") or "API_ERROR"),
+                redact_sensitive(nested.get("message") or f"HTTP {status}"),
+            )
+        return (
+            redact_sensitive(data.get("code") or "API_ERROR"),
+            redact_sensitive(data.get("message") or f"HTTP {status}"),
+        )
     return "API_ERROR", f"HTTP {status}"
 
 
@@ -129,7 +141,7 @@ def __init__(
         api_base: Optional[str] = None,
         timeout_seconds: float = 30.0,
     ):
-        raw_api_base = api_base if api_base is not None else os.getenv("JUNGLEGRID_API_BASE", DEFAULT_API_BASE)
+        raw_api_base = api_base if api_base is not None else os.getenv("JUNGLE_GRID_API", DEFAULT_API_BASE)
         self.api_key = os.getenv("JUNGLE_GRID_API_KEY", "").strip()
         self.api_base = raw_api_base.rstrip("/")
         self.timeout_seconds = timeout_seconds
@@ -159,7 +171,11 @@ async def _request(self, method: str, path: str, payload: Optional[Dict[str, Any
                         ) from exc
                     if response.status < 200 or response.status >= 300:
                         code, message = _error_detail(data, response.status)
-                        raise JungleGridError(code, redact_sensitive(message, api_key), response.status)
+                        raise JungleGridError(
+                            redact_sensitive(code, api_key),
+                            redact_sensitive(message, api_key),
+                            response.status,
+                        )
         except asyncio.TimeoutError as exc:
             raise JungleGridError("NETWORK_TIMEOUT", "Jungle Grid request timed out.") from exc
         except aiohttp.ClientError as exc:
@@ -179,8 +195,11 @@ async def submit_job(self, workload: Dict[str, Any]) -> Dict[str, Any]:
     async def get_job(self, job_id: str) -> Dict[str, Any]:
         return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}")
 
+    async def get_job_runtime(self, job_id: str) -> Dict[str, Any]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/runtime")
+
     async def get_job_logs(self, job_id: str) -> Dict[str, Any]:
-        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/logs")
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/logs?tail=100")
 
     async def cancel_job(self, job_id: str, reason: str) -> Dict[str, Any]:
         return await self._request("POST", f"/v1/jobs/{quote(job_id, safe='')}/cancel", {"reason": reason})
@@ -217,6 +236,9 @@ def parse_workload_goal(goal: str) -> Dict[str, Any]:
     missing = sorted(key for key in required if not isinstance(workload.get(key), str) or not workload[key].strip())
     if missing:
         raise ValueError(f"Missing required workload fields: {', '.join(missing)}.")
+    model_size_gb = workload.get("model_size_gb")
+    if not isinstance(model_size_gb, (int, float)) or isinstance(model_size_gb, bool) or model_size_gb <= 0:
+        raise ValueError("model_size_gb must be a positive number.")
     if workload["workload_type"] not in VALID_WORKLOAD_TYPES:
         raise ValueError(f"workload_type must be one of: {', '.join(sorted(VALID_WORKLOAD_TYPES))}.")
     if "optimize_for" in workload and workload["optimize_for"] not in VALID_OPTIMIZE_FOR:
@@ -514,9 +536,14 @@ async def _monitor(self, execution: ProjectExecution):
 
     async def _finalize(self, execution: ProjectExecution, job: Dict[str, Any]):
         assert execution.job_id
+        runtime: Dict[str, Any] = {}
         logs: Dict[str, Any] = {}
         artifacts: Dict[str, Any] = {}
         downloads = []
+        try:
+            runtime = await self.jungle_grid.get_job_runtime(execution.job_id)
+        except JungleGridError as exc:
+            runtime = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
         try:
             logs = await self.jungle_grid.get_job_logs(execution.job_id)
         except JungleGridError as exc:
@@ -528,12 +555,15 @@ async def _finalize(self, execution: ProjectExecution, job: Dict[str, Any]):
                     continue
                 artifact_id = str(artifact.get("artifact_id") or artifact.get("id") or "").strip()
                 if artifact_id:
-                    downloads.append(await self.jungle_grid.get_artifact(execution.job_id, artifact_id))
+                    download = await self.jungle_grid.get_artifact(execution.job_id, artifact_id)
+                    if "url" in download:
+                        download = {**download, "url": "[REDACTED]"}
+                    downloads.append(download)
         except JungleGridError as exc:
             artifacts = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
 
         result = self._sanitize_for_project(
-            {"job": job, "logs": logs, "artifacts": artifacts, "downloads": downloads},
+            {"job": job, "runtime": runtime, "logs": logs, "artifacts": artifacts, "downloads": downloads},
             execution,
         )
         await self._set_artifact(execution.project_id, "jungle_grid_result", result)
@@ -560,7 +590,11 @@ async def main():
     logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
     agent = JungleGridExecutorAgent()
     try:
-        await agent.async_start(network_host="localhost", network_port=8700)
+        await agent.async_start(
+            network_host="localhost",
+            network_port=8700,
+            password_hash=EXECUTORS_GROUP_PASSWORD_HASH,
+        )
         while True:
             await asyncio.sleep(3600)
     finally:
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/network.yaml b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
index dd935306c..30c0f5012 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
+++ b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
@@ -20,11 +20,10 @@ network:
   agent_groups:
     executors:
       description: Agents allowed to execute Jungle Grid project workflows
+      password_hash: 8fba13dab71d6fdd8a9b9db1f06e81315dfbfd69167b6097f724604db3c91cdf
       metadata:
         permissions:
           - execute_external_compute
-        agents:
-          - jungle-grid-executor
   mods:
     - name: openagents.mods.workspace.default
       enabled: true
@@ -40,7 +39,7 @@ network:
             description: Estimate, approve, execute, and monitor an AI workload on Jungle Grid
             expose_as_tool: true
             tool_name: run_jungle_grid_workload
-            tool_description: Start a Jungle Grid workload project. The task must be a JSON object with name, workload_type, and image; use environment_from_env for workload environment values.
+            tool_description: Start a Jungle Grid workload project. The task must be a JSON object with name, workload_type, image, and model_size_gb; use environment_from_env for workload environment values.
             tool_mode: async
             agent_groups:
               - executors
diff --git a/tests/agents/test_jungle_grid_executor.py b/tests/agents/test_jungle_grid_executor.py
index 288cc8fce..aa9bf248e 100644
--- a/tests/agents/test_jungle_grid_executor.py
+++ b/tests/agents/test_jungle_grid_executor.py
@@ -4,12 +4,18 @@
 import importlib.util
 import json
 from pathlib import Path
+from types import SimpleNamespace
 from unittest.mock import AsyncMock
 
 import pytest
+import yaml
 
+from openagents.core.network import AgentNetwork
 from openagents.models.event import Event
 from openagents.models.event_context import EventContext
+from openagents.models.network_config import AgentGroupConfig, NetworkConfig
+from openagents.models.transport import TransportType
+from openagents.mods.workspace.project.mod import DefaultProjectNetworkMod
 
 MODULE_PATH = (
     Path(__file__).parent.parent.parent
@@ -19,6 +25,7 @@
     / "agents"
     / "jungle_grid_executor.py"
 )
+NETWORK_CONFIG_PATH = MODULE_PATH.parent.parent / "network.yaml"
 SPEC = importlib.util.spec_from_file_location("jungle_grid_executor", MODULE_PATH)
 MODULE = importlib.util.module_from_spec(SPEC)
 assert SPEC and SPEC.loader
@@ -28,6 +35,7 @@
 JungleGridError = MODULE.JungleGridError
 JungleGridExecutorAgent = MODULE.JungleGridExecutorAgent
 ProjectExecution = MODULE.ProjectExecution
+EXECUTORS_GROUP_PASSWORD_HASH = MODULE.EXECUTORS_GROUP_PASSWORD_HASH
 build_estimate_payload = MODULE.build_estimate_payload
 build_submit_payload = MODULE.build_submit_payload
 estimate_can_submit = MODULE.estimate_can_submit
@@ -51,6 +59,7 @@ def workload():
         "name": "batch-demo",
         "workload_type": "batch",
         "image": "python:3.11-slim",
+        "model_size_gb": 1,
         "command": "python",
         "args": ["-c", "print(42)"],
         "optimize_for": "cost",
@@ -63,6 +72,7 @@ def __init__(self):
         self.estimate_job = AsyncMock(return_value={"available": True, "estimated_cost_usd": {"min": 0.1, "max": 0.2}})
         self.submit_job = AsyncMock(return_value={"job_id": "job_123", "status": "queued"})
         self.get_job = AsyncMock(return_value={"job_id": "job_123", "status": "completed"})
+        self.get_job_runtime = AsyncMock(return_value={"exit_code": 0, "stdout_tail": "done"})
         self.get_job_logs = AsyncMock(return_value={"items": [{"message": "done"}]})
         self.cancel_job = AsyncMock(return_value={"job_id": "job_123", "status": "cancelled", "cancelled": True})
         self.list_artifacts = AsyncMock(
@@ -86,6 +96,88 @@ def agent_with_mocks(fake=None):
     return agent
 
 
+@pytest.mark.asyncio
+async def test_executor_group_membership_delivers_project_start_and_returns_estimate():
+    network_yaml = yaml.safe_load(NETWORK_CONFIG_PATH.read_text())
+    executor_group = network_yaml["network"]["agent_groups"]["executors"]
+    assert executor_group["password_hash"] == EXECUTORS_GROUP_PASSWORD_HASH
+    assert "agents" not in executor_group.get("metadata", {})
+
+    config = NetworkConfig(
+        name="JungleGridGroupTest",
+        default_agent_group="guest",
+        requires_password=False,
+        agent_groups={"executors": AgentGroupConfig(**executor_group)},
+    )
+    network = AgentNetwork.create_from_config(config)
+    registration = await network.register_agent(
+        agent_id="jungle-grid-executor",
+        transport_type=TransportType.HTTP,
+        metadata={"name": "Jungle Grid Executor"},
+        certificate=None,
+        password_hash=EXECUTORS_GROUP_PASSWORD_HASH,
+    )
+    assert registration.success
+    assert network.topology.agent_group_membership["jungle-grid-executor"] == "executors"
+
+    project_mod = DefaultProjectNetworkMod()
+    project_mod.update_config(
+        {
+            "project_templates": {
+                "jungle_grid_execution": {
+                    "name": "Jungle Grid GPU Execution",
+                    "agent_groups": ["executors"],
+                }
+            }
+        }
+    )
+    project_mod.initialize()
+    project_mod.bind_network(network)
+    assert project_mod._get_agents_in_group("executors") == ["jungle-grid-executor"]
+
+    fake = FakeJungleGridClient()
+    executor = agent_with_mocks(fake)
+    delivered = []
+
+    async def deliver(event):
+        delivered.append(event)
+        if event.destination_id == "jungle-grid-executor":
+            await executor.handle_project_started(
+                EventContext(
+                    incoming_event=event,
+                    event_threads={},
+                    incoming_thread_id="project-start",
+                )
+            )
+        return SimpleNamespace(success=True)
+
+    project_mod.send_event = AsyncMock(side_effect=deliver)
+    response = await project_mod.process_system_message(
+        Event(
+            event_name="project.start",
+            source_id="human:project-owner",
+            payload={
+                "template_id": "jungle_grid_execution",
+                "goal": json.dumps(workload()),
+                "name": "Jungle Grid test",
+            },
+        )
+    )
+
+    assert response.success
+    assert "jungle-grid-executor" in response.data["authorized_agents"]
+    assert any(
+        event.event_name == "project.notification.started"
+        and event.destination_id == "jungle-grid-executor"
+        and event.payload["initiator_agent_id"] == "human:project-owner"
+        for event in delivered
+    )
+    fake.estimate_job.assert_awaited_once_with(build_estimate_payload(workload()))
+    estimate_message = executor.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
+    assert "Jungle Grid estimate ready" in estimate_message
+    assert "APPROVE" in estimate_message
+
+
 @pytest.mark.asyncio
 async def test_successful_estimate_flow_posts_estimate_and_requires_approval():
     fake = FakeJungleGridClient()
@@ -250,12 +342,16 @@ async def test_logs_and_artifacts_are_stored_in_project_artifact():
 
     await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
 
+    fake.get_job_runtime.assert_awaited_once_with("job_123")
     fake.get_job_logs.assert_awaited_once_with("job_123")
     fake.list_artifacts.assert_awaited_once_with("job_123")
     fake.get_artifact.assert_awaited_once_with("job_123", "artifact_1")
     artifact_call = agent.project_adapter.set_project_artifact.await_args
     assert artifact_call.kwargs["key"] == "jungle_grid_result"
     assert "output.json" in artifact_call.kwargs["value"]
+    assert "stdout_tail" in artifact_call.kwargs["value"]
+    assert "https://example.test/file" not in artifact_call.kwargs["value"]
+    assert "[REDACTED]" in artifact_call.kwargs["value"]
 
 
 @pytest.mark.asyncio
@@ -344,6 +440,20 @@ def test_invalid_workload_is_rejected():
         parse_workload_goal('{"workload_type": "batch"}')
 
 
+def test_workload_requires_positive_model_size():
+    with pytest.raises(ValueError, match="model_size_gb"):
+        parse_workload_goal(json.dumps({**workload(), "model_size_gb": 0}))
+
+
+def test_estimate_payload_matches_current_draft_job_fields():
+    requested = {
+        **workload(),
+        "constraints": {"max_price_per_hour": 2.5, "preferred_gpu_family": "l4"},
+    }
+
+    assert build_estimate_payload(requested) == requested
+
+
 def test_workload_rejects_literal_credentials_and_secret_like_metadata():
     with pytest.raises(ValueError, match="must not contain API keys"):
         parse_workload_goal(json.dumps({**workload(), "command": "curl -H 'Bearer secret-value'"}))
@@ -472,11 +582,41 @@ async def test_network_timeout_is_sanitized(monkeypatch):
 @pytest.mark.asyncio
 async def test_api_error_is_sanitized(monkeypatch):
     monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
-    body = json.dumps({"error": {"code": "FORBIDDEN", "message": "Bearer test-api-key is not allowed"}})
+    body = json.dumps(
+        {
+            "error": {
+                "code": "provider_jg_private_backend",
+                "message": "Bearer test-api-key is not allowed",
+            }
+        }
+    )
     monkeypatch.setattr(MODULE.aiohttp, "ClientSession", lambda **kwargs: FakeSession(FakeResponse(403, body)))
     client = JungleGridClient()
 
     with pytest.raises(JungleGridError) as exc_info:
         await client.get_job("job_123")
-    assert exc_info.value.code == "FORBIDDEN"
+    assert "jg_private_backend" not in exc_info.value.code
+    assert "[REDACTED]" in exc_info.value.code
     assert "test-api-key" not in str(exc_info.value)
+
+
+def test_client_uses_documented_rest_api_environment(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API", "https://orchestrator.example.test/")
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+
+    client = JungleGridClient()
+
+    assert client.api_base == "https://orchestrator.example.test"
+
+
+@pytest.mark.asyncio
+async def test_client_uses_documented_runtime_and_log_routes(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+    client = JungleGridClient()
+    client._request = AsyncMock(return_value={})
+
+    await client.get_job_runtime("job_123")
+    await client.get_job_logs("job_123")
+
+    assert client._request.await_args_list[0].args == ("GET", "/v1/jobs/job_123/runtime")
+    assert client._request.await_args_list[1].args == ("GET", "/v1/jobs/job_123/logs?tail=100")

From 1164763e289f5055d661b6c1c03c347882caae64 Mon Sep 17 00:00:00 2001
From: dejaguarkyng <deinvinciblekyng.1@gmail.com>
Date: Thu, 11 Jun 2026 15:03:07 +0000
Subject: [PATCH 3/5] feat: align jungle grid demo with current job workflow

---
 .../agents/jungle_grid_executor.py            | 1083 ++++++++++++-----
 .../09_jungle_grid_gpu_execution/network.yaml |    7 +-
 2 files changed, 752 insertions(+), 338 deletions(-)

diff --git a/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
index 23348120b..a12fc9778 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
+++ b/sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
@@ -1,15 +1,18 @@
 #!/usr/bin/env python3
-"""Jungle Grid execution agent for the OpenAgents project workflow demo."""
+"""Human-approved Jungle Grid execution through an OpenAgents project."""
+
+from __future__ import annotations
 
 import asyncio
+import copy
 import json
 import logging
 import os
 import re
 import uuid
-from dataclasses import dataclass
-from typing import Any, Dict, Iterable, Optional
-from urllib.parse import quote
+from dataclasses import asdict, dataclass, field
+from typing import Any, Awaitable, Callable, Iterable, Mapping, Optional
+from urllib.parse import quote, urlencode
 
 import aiohttp
 
@@ -21,35 +24,71 @@
 
 DEFAULT_API_BASE = "https://api.junglegrid.dev"
 EXECUTORS_GROUP_PASSWORD_HASH = "8fba13dab71d6fdd8a9b9db1f06e81315dfbfd69167b6097f724604db3c91cdf"
-TERMINAL_STATUSES = {"completed", "failed", "rejected", "cancelled"}
-VALID_WORKLOAD_TYPES = {"inference", "training", "batch"}
+STATE_ARTIFACT = "jungle_grid_execution_state"
+TERMINAL_STATUSES = {"completed", "failed", "rejected", "cancelled", "canceled"}
+VALID_WORKLOAD_TYPES = {"inference", "training", "fine_tuning", "batch"}
 VALID_OPTIMIZE_FOR = {"balanced", "cost", "speed"}
+VALID_GPU_CLASSES = {"consumer", "datacenter"}
+VALID_REGION_MODES = {"prefer", "strict"}
+VALID_PRIORITIES = {"low", "balanced", "high", "low_latency", "low_cost", "high_reliability"}
+VALID_PRECISIONS = {"fp32", "fp16", "bf16", "int8"}
+CONSTRAINT_FIELDS = {
+    "max_price_per_hour",
+    "gpu_type",
+    "gpu_class",
+    "preferred_gpu_family",
+    "avoid_gpu_families",
+    "region_preference",
+    "region_mode",
+    "latency_priority",
+    "cost_priority",
+}
+MAX_SHARED_LOGS = 200
+MAX_SHARED_EVENTS = 200
+
 SUBMIT_FIELDS = {
     "name",
     "workload_type",
     "image",
-    "model_size_gb",
     "command",
     "args",
     "environment_from_env",
-    "optimize_for",
-    "constraints",
+    "input_files",
+    "script_files",
+    "expected_artifacts",
     "template",
     "metadata",
-}
-ESTIMATE_FIELDS = {
-    "name",
-    "workload_type",
-    "image",
+    "callback",
     "model_size_gb",
-    "command",
-    "args",
+    "batch_size",
+    "precision",
+    "disk_gb",
+    "gpu_required",
+    "gpu_count",
+    "gpu_type",
+    "gpu_class",
+    "min_vram_gb",
+    "max_price_per_hour",
+    "preferred_gpu_family",
+    "avoid_gpu_families",
+    "region_preference",
+    "region_mode",
+    "priority",
+    "latency_priority",
+    "cost_priority",
+    "timeout_seconds",
+    "routing_mode",
     "optimize_for",
     "constraints",
-    "template",
 }
-SENSITIVE_PATTERN = re.compile(r"(?i)(bearer\s+)[^\s,;]+|jg_[A-Za-z0-9_-]+")
-SENSITIVE_KEY_PATTERN = re.compile(r"(?i)(api[_-]?key|authorization|password|secret|token)")
+ESTIMATE_FIELDS = SUBMIT_FIELDS - {"environment_from_env"}
+SECRET_KEY_PATTERN = re.compile(
+    r"(?i)(api[_-]?key|authorization|password|secret|token|auth_token|upload_url|download_url|complete_url)"
+)
+SECRET_TEXT_PATTERN = re.compile(
+    r"(?i)(bearer\s+)[^\s,;]+|(?<![A-Za-z0-9])(?:jg|sk)_[A-Za-z0-9_-]+|https?://[^\s\"']+[?&](?:token|signature|sig|x-amz-)[^\s\"']*"
+)
+INPUT_ID_PATTERN = re.compile(r"^[A-Za-z0-9][A-Za-z0-9._:-]{2,255}$")
 
 
 class JungleGridError(Exception):
@@ -61,263 +100,515 @@ def __init__(self, code: str, message: str, status: Optional[int] = None):
         self.status = status
 
 
-def redact_sensitive(value: Any, secret: Optional[str] = None) -> str:
-    """Return a log-safe string with credentials removed."""
+def redact_sensitive(value: object, secrets: Iterable[str] = ()) -> str:
+    """Return a project-safe string."""
     text = str(value)
-    if secret:
-        text = text.replace(secret, "[REDACTED]")
-    return SENSITIVE_PATTERN.sub(lambda match: f"{match.group(1) or ''}[REDACTED]", text)
+    for secret in secrets:
+        if secret:
+            text = text.replace(secret, "[REDACTED]")
+    return SECRET_TEXT_PATTERN.sub(lambda match: f"{match.group(1) or ''}[REDACTED]", text)
 
 
-def _collect_string_values(value: Any) -> list[str]:
-    """Collect nested string values that must not be exposed in project output."""
-    if isinstance(value, str):
-        return [value] if value else []
-    if isinstance(value, dict):
-        strings = []
-        for nested in value.values():
-            strings.extend(_collect_string_values(nested))
-        return strings
-    if isinstance(value, list):
-        strings = []
-        for nested in value:
-            strings.extend(_collect_string_values(nested))
-        return strings
-    return []
-
-
-def _contains_sensitive_key(value: Any) -> bool:
-    """Return whether nested data uses a key commonly associated with credentials."""
-    if isinstance(value, dict):
+def contains_sensitive_key(value: object) -> bool:
+    if isinstance(value, Mapping):
         return any(
-            SENSITIVE_KEY_PATTERN.search(str(key)) or _contains_sensitive_key(nested) for key, nested in value.items()
+            SECRET_KEY_PATTERN.search(str(key)) or contains_sensitive_key(nested) for key, nested in value.items()
         )
     if isinstance(value, list):
-        return any(_contains_sensitive_key(nested) for nested in value)
+        return any(contains_sensitive_key(nested) for nested in value)
     return False
 
 
-def sanitize_project_data(value: Any, secrets: Iterable[str]) -> Any:
-    """Recursively redact credentials and workload-provided secret values."""
+def sanitize_project_data(value: object, secrets: Iterable[str] = ()) -> object:
+    """Recursively redact credentials, signed URLs, and resolved environment values."""
     secret_values = [secret for secret in secrets if secret]
     if isinstance(value, str):
-        result = value
-        for secret in secret_values:
-            result = result.replace(secret, "[REDACTED]")
-        return redact_sensitive(result)
-    if isinstance(value, dict):
-        return {key: sanitize_project_data(nested, secret_values) for key, nested in value.items()}
+        return redact_sensitive(value, secret_values)
+    if isinstance(value, Mapping):
+        result: dict[str, object] = {}
+        for key, nested in value.items():
+            clean_key = str(key)
+            if SECRET_KEY_PATTERN.search(clean_key):
+                result[clean_key] = "[REDACTED]"
+            else:
+                result[clean_key] = sanitize_project_data(nested, secret_values)
+        return result
     if isinstance(value, list):
         return [sanitize_project_data(nested, secret_values) for nested in value]
     return value
 
 
-def _unwrap_response(data: Any) -> Any:
-    if isinstance(data, dict) and data.get("ok") is True and "data" in data:
+def unwrap_response(data: object) -> object:
+    if isinstance(data, Mapping) and data.get("ok") is True and "data" in data:
         return data["data"]
     return data
 
 
-def _error_detail(data: Any, status: int) -> tuple[str, str]:
-    if isinstance(data, dict):
+def error_detail(data: object, status: int) -> tuple[str, str]:
+    if isinstance(data, Mapping):
         nested = data.get("error")
-        if isinstance(nested, dict):
-            return (
-                redact_sensitive(nested.get("code") or "API_ERROR"),
-                redact_sensitive(nested.get("message") or f"HTTP {status}"),
-            )
+        source = nested if isinstance(nested, Mapping) else data
         return (
-            redact_sensitive(data.get("code") or "API_ERROR"),
-            redact_sensitive(data.get("message") or f"HTTP {status}"),
+            redact_sensitive(source.get("code") or "API_ERROR"),
+            redact_sensitive(source.get("message") or f"HTTP {status}"),
         )
     return "API_ERROR", f"HTTP {status}"
 
 
 class JungleGridClient:
-    """Small async client for Jungle Grid's documented public execution API."""
+    """Async client matching the current Jungle Grid MCP-backed REST contract."""
 
     def __init__(
         self,
         api_base: Optional[str] = None,
         timeout_seconds: float = 30.0,
+        read_retries: int = 2,
+        retry_delay_seconds: float = 0.5,
+        sleep: Callable[[float], Awaitable[None]] = asyncio.sleep,
     ):
-        raw_api_base = api_base if api_base is not None else os.getenv("JUNGLE_GRID_API", DEFAULT_API_BASE)
+        configured_base = (
+            api_base
+            or os.getenv("JUNGLEGRID_API_BASE")
+            or os.getenv("JUNGLE_GRID_API_URL")
+            or os.getenv("JUNGLE_GRID_API")
+            or DEFAULT_API_BASE
+        )
         self.api_key = os.getenv("JUNGLE_GRID_API_KEY", "").strip()
-        self.api_base = raw_api_base.rstrip("/")
+        self.api_base = configured_base.strip().rstrip("/")
         self.timeout_seconds = timeout_seconds
+        self.read_retries = max(0, read_retries)
+        self.retry_delay_seconds = max(0.0, retry_delay_seconds)
+        self.sleep = sleep
 
     def _require_api_key(self) -> str:
         if not self.api_key:
             raise JungleGridError("MISSING_API_KEY", "JUNGLE_GRID_API_KEY is required.")
         return self.api_key
 
-    async def _request(self, method: str, path: str, payload: Optional[Dict[str, Any]] = None) -> Dict[str, Any]:
+    async def _request(
+        self,
+        method: str,
+        path: str,
+        payload: Optional[dict[str, object]] = None,
+    ) -> dict[str, object]:
         api_key = self._require_api_key()
-        timeout = aiohttp.ClientTimeout(total=self.timeout_seconds)
-        headers = {
-            "Accept": "application/json",
-            "Authorization": f"Bearer {api_key}",
-            "Content-Type": "application/json",
-        }
-        try:
-            async with aiohttp.ClientSession(timeout=timeout) as session:
-                async with session.request(method, f"{self.api_base}{path}", headers=headers, json=payload) as response:
-                    text = await response.text()
-                    try:
-                        data = json.loads(text) if text.strip() else {}
-                    except json.JSONDecodeError as exc:
-                        raise JungleGridError(
-                            "INVALID_API_RESPONSE", "Jungle Grid returned invalid JSON.", response.status
-                        ) from exc
-                    if response.status < 200 or response.status >= 300:
-                        code, message = _error_detail(data, response.status)
-                        raise JungleGridError(
-                            redact_sensitive(code, api_key),
-                            redact_sensitive(message, api_key),
-                            response.status,
-                        )
-        except asyncio.TimeoutError as exc:
-            raise JungleGridError("NETWORK_TIMEOUT", "Jungle Grid request timed out.") from exc
-        except aiohttp.ClientError as exc:
-            raise JungleGridError("NETWORK_ERROR", redact_sensitive(exc, api_key)) from exc
-
-        result = _unwrap_response(data)
-        if not isinstance(result, dict):
-            raise JungleGridError("INVALID_API_RESPONSE", "Jungle Grid returned an unexpected response shape.")
-        return result
+        attempts = self.read_retries + 1 if method == "GET" else 1
+        for attempt in range(attempts):
+            try:
+                timeout = aiohttp.ClientTimeout(total=self.timeout_seconds)
+                headers = {
+                    "Accept": "application/json",
+                    "Authorization": f"Bearer {api_key}",
+                    "Content-Type": "application/json",
+                }
+                async with aiohttp.ClientSession(timeout=timeout) as session:
+                    async with session.request(
+                        method, f"{self.api_base}{path}", headers=headers, json=payload
+                    ) as response:
+                        text = await response.text()
+                        try:
+                            data = json.loads(text) if text.strip() else {}
+                        except json.JSONDecodeError as exc:
+                            raise JungleGridError(
+                                "INVALID_API_RESPONSE",
+                                "Jungle Grid returned invalid JSON.",
+                                response.status,
+                            ) from exc
+                        if not 200 <= response.status < 300:
+                            code, message = error_detail(data, response.status)
+                            raise JungleGridError(code, message, response.status)
+                        result = unwrap_response(data)
+                        if not isinstance(result, dict):
+                            raise JungleGridError(
+                                "INVALID_API_RESPONSE",
+                                "Jungle Grid returned an unexpected response shape.",
+                            )
+                        return result
+            except (asyncio.TimeoutError, aiohttp.ClientError) as exc:
+                if attempt + 1 < attempts:
+                    await self.sleep(self.retry_delay_seconds * (2**attempt))
+                    continue
+                code = "NETWORK_TIMEOUT" if isinstance(exc, asyncio.TimeoutError) else "NETWORK_ERROR"
+                message = (
+                    "Jungle Grid request timed out."
+                    if code == "NETWORK_TIMEOUT"
+                    else "Jungle Grid network request failed."
+                )
+                raise JungleGridError(code, message) from exc
+            except JungleGridError as exc:
+                retryable = method == "GET" and (exc.status is None or exc.status == 429 or exc.status >= 500)
+                if retryable and attempt + 1 < attempts:
+                    await self.sleep(self.retry_delay_seconds * (2**attempt))
+                    continue
+                raise JungleGridError(
+                    redact_sensitive(exc.code, [api_key]),
+                    redact_sensitive(exc, [api_key]),
+                    exc.status,
+                ) from exc
+        raise JungleGridError("NETWORK_ERROR", "Jungle Grid request failed.")
 
-    async def estimate_job(self, workload: Dict[str, Any]) -> Dict[str, Any]:
-        return await self._request("POST", "/v1/jobs/estimate", workload)
+    async def estimate_job(self, workload: dict[str, object]) -> dict[str, object]:
+        return await self._request("POST", "/v1/mcp/jobs/estimate", workload)
 
-    async def submit_job(self, workload: Dict[str, Any]) -> Dict[str, Any]:
-        return await self._request("POST", "/v1/jobs", workload)
+    async def submit_job(self, workload: dict[str, object]) -> dict[str, object]:
+        return await self._request("POST", "/v1/mcp/jobs", workload)
 
-    async def get_job(self, job_id: str) -> Dict[str, Any]:
-        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}")
+    async def get_job(self, job_id: str) -> dict[str, object]:
+        return await self._request("GET", f"/v1/mcp/jobs/{quote(job_id, safe='')}")
 
-    async def get_job_runtime(self, job_id: str) -> Dict[str, Any]:
-        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/runtime")
+    async def get_job_events(self, job_id: str) -> dict[str, object]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/events")
 
-    async def get_job_logs(self, job_id: str) -> Dict[str, Any]:
-        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/logs?tail=100")
+    async def get_job_logs(
+        self,
+        job_id: str,
+        *,
+        limit: int = 100,
+        cursor: Optional[str] = None,
+        tail: Optional[int] = None,
+    ) -> dict[str, object]:
+        params: dict[str, object] = {"limit": limit}
+        if cursor:
+            params["cursor"] = cursor
+        if tail is not None:
+            params["tail"] = tail
+        return await self._request("GET", f"/v1/mcp/jobs/{quote(job_id, safe='')}/logs?{urlencode(params)}")
+
+    async def get_job_runtime(self, job_id: str) -> dict[str, object]:
+        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/runtime")
 
-    async def cancel_job(self, job_id: str, reason: str) -> Dict[str, Any]:
-        return await self._request("POST", f"/v1/jobs/{quote(job_id, safe='')}/cancel", {"reason": reason})
+    async def cancel_job(self, job_id: str, reason: str) -> dict[str, object]:
+        return await self._request(
+            "POST",
+            f"/v1/mcp/jobs/{quote(job_id, safe='')}/cancel",
+            {"reason": reason},
+        )
 
-    async def list_artifacts(self, job_id: str) -> Dict[str, Any]:
-        return await self._request("GET", f"/v1/jobs/{quote(job_id, safe='')}/artifacts")
+    async def list_artifacts(self, job_id: str) -> dict[str, object]:
+        return await self._request("GET", f"/v1/mcp/jobs/{quote(job_id, safe='')}/artifacts")
 
-    async def get_artifact(self, job_id: str, artifact_id: str) -> Dict[str, Any]:
+    async def get_artifact(self, job_id: str, artifact_id: str) -> dict[str, object]:
         return await self._request(
             "POST",
-            f"/v1/jobs/{quote(job_id, safe='')}/artifacts/{quote(artifact_id, safe='')}/download",
+            f"/v1/mcp/jobs/{quote(job_id, safe='')}/artifacts/{quote(artifact_id, safe='')}/download",
         )
 
 
-def parse_workload_goal(goal: str) -> Dict[str, Any]:
-    """Parse and validate a project goal containing a Jungle Grid workload JSON object."""
+def _string(value: object, field_name: str) -> str:
+    if not isinstance(value, str) or not value.strip():
+        raise ValueError(f"{field_name} must be a non-empty string.")
+    return value.strip()
+
+
+def _string_array(value: object, field_name: str) -> list[str]:
+    if not isinstance(value, list) or not all(isinstance(item, str) and item for item in value):
+        raise ValueError(f"{field_name} must be an array of non-empty strings.")
+    return value
+
+
+def _positive_number(value: object, field_name: str, *, allow_zero: bool = False) -> None:
+    if isinstance(value, bool) or not isinstance(value, (int, float)):
+        raise ValueError(f"{field_name} must be a number.")
+    if value < 0 if allow_zero else value <= 0:
+        qualifier = "zero or greater" if allow_zero else "positive"
+        raise ValueError(f"{field_name} must be {qualifier}.")
+
+
+def _validate_input_references(value: object, field_name: str) -> list[dict[str, str]]:
+    if not isinstance(value, list):
+        raise ValueError(f"{field_name} must be an array of input_id references.")
+    result: list[dict[str, str]] = []
+    for item in value:
+        if isinstance(item, str):
+            input_id = item.strip()
+        elif isinstance(item, Mapping) and set(item) == {"input_id"}:
+            input_id = _string(item.get("input_id"), f"{field_name}.input_id")
+        else:
+            raise ValueError(f"{field_name} items must contain only input_id.")
+        if not INPUT_ID_PATTERN.fullmatch(input_id):
+            raise ValueError(f"{field_name} contains an invalid input_id.")
+        result.append({"input_id": input_id})
+    return result
+
+
+def _validate_callback(value: object) -> dict[str, object]:
+    if not isinstance(value, Mapping):
+        raise ValueError("callback must be an object.")
+    unsupported = set(value) - {"url", "metadata", "auth_token_from_env"}
+    if unsupported:
+        raise ValueError(f"Unsupported callback fields: {', '.join(sorted(unsupported))}.")
+    result: dict[str, object] = {"url": _string(value.get("url"), "callback.url")}
+    metadata = value.get("metadata")
+    if metadata is not None:
+        if not isinstance(metadata, Mapping) or not all(
+            isinstance(key, str) and isinstance(item, str) for key, item in metadata.items()
+        ):
+            raise ValueError("callback.metadata must map strings to strings.")
+        if contains_sensitive_key(metadata):
+            raise ValueError("callback.metadata must not contain secret-like keys.")
+        result["metadata"] = dict(metadata)
+    auth_env = value.get("auth_token_from_env")
+    if auth_env is not None:
+        result["auth_token_from_env"] = _string(auth_env, "callback.auth_token_from_env")
+    return result
+
+
+def _validate_constraints(value: object) -> dict[str, object]:
+    if not isinstance(value, Mapping):
+        raise ValueError("constraints must be an object.")
+    unsupported = sorted(set(value) - CONSTRAINT_FIELDS)
+    if unsupported:
+        raise ValueError(f"Unsupported constraint fields: {', '.join(unsupported)}.")
+    result = dict(value)
+    if "max_price_per_hour" in result:
+        _positive_number(result["max_price_per_hour"], "constraints.max_price_per_hour")
+    if "gpu_class" in result and result["gpu_class"] not in VALID_GPU_CLASSES:
+        raise ValueError(f"constraints.gpu_class must be one of: {', '.join(sorted(VALID_GPU_CLASSES))}.")
+    if "region_mode" in result and result["region_mode"] not in VALID_REGION_MODES:
+        raise ValueError(f"constraints.region_mode must be one of: {', '.join(sorted(VALID_REGION_MODES))}.")
+    for field_name in ("latency_priority", "cost_priority"):
+        if field_name in result and result[field_name] not in {"low", "balanced", "high"}:
+            raise ValueError(f"constraints.{field_name} must be one of: balanced, high, low.")
+    if "avoid_gpu_families" in result:
+        result["avoid_gpu_families"] = _string_array(result["avoid_gpu_families"], "constraints.avoid_gpu_families")
+    for field_name in ("gpu_type", "preferred_gpu_family", "region_preference"):
+        if field_name in result:
+            result[field_name] = _string(result[field_name], f"constraints.{field_name}")
+    return result
+
+
+def parse_workload_goal(goal: str) -> dict[str, object]:
+    """Parse and validate a project goal without resolving any secrets."""
     text = goal.strip()
     if text.startswith("```"):
         text = re.sub(r"^```(?:json)?\s*", "", text)
         text = re.sub(r"\s*```$", "", text)
     try:
-        workload = json.loads(text)
+        raw = json.loads(text)
     except json.JSONDecodeError as exc:
         raise ValueError("Project goal must be a JSON object describing the Jungle Grid workload.") from exc
-    if not isinstance(workload, dict):
+    if not isinstance(raw, dict):
         raise ValueError("Project goal must be a JSON object.")
-    if SENSITIVE_PATTERN.search(json.dumps(workload)):
-        raise ValueError("Workload must not contain API keys or Bearer tokens.")
-
-    unsupported = sorted(set(workload) - SUBMIT_FIELDS)
+    if SECRET_TEXT_PATTERN.search(json.dumps(raw)):
+        raise ValueError("Workload must not contain API keys, Bearer tokens, or signed URLs.")
+    unsupported = sorted(set(raw) - SUBMIT_FIELDS)
     if unsupported:
         raise ValueError(f"Unsupported workload fields: {', '.join(unsupported)}.")
-    required = {"name", "workload_type", "image"}
-    missing = sorted(key for key in required if not isinstance(workload.get(key), str) or not workload[key].strip())
-    if missing:
-        raise ValueError(f"Missing required workload fields: {', '.join(missing)}.")
-    model_size_gb = workload.get("model_size_gb")
-    if not isinstance(model_size_gb, (int, float)) or isinstance(model_size_gb, bool) or model_size_gb <= 0:
-        raise ValueError("model_size_gb must be a positive number.")
+
+    workload = dict(raw)
+    for required in ("name", "workload_type", "image"):
+        workload[required] = _string(workload.get(required), required)
     if workload["workload_type"] not in VALID_WORKLOAD_TYPES:
         raise ValueError(f"workload_type must be one of: {', '.join(sorted(VALID_WORKLOAD_TYPES))}.")
-    if "optimize_for" in workload and workload["optimize_for"] not in VALID_OPTIMIZE_FOR:
-        raise ValueError(f"optimize_for must be one of: {', '.join(sorted(VALID_OPTIMIZE_FOR))}.")
-    if "args" in workload and not (
-        isinstance(workload["args"], list) and all(isinstance(item, str) for item in workload["args"])
-    ):
-        raise ValueError("args must be an array of strings.")
-    if "environment_from_env" in workload and not (
-        isinstance(workload["environment_from_env"], dict)
-        and all(
+
+    command = workload.get("command")
+    args = workload.get("args")
+    if isinstance(command, str):
+        workload["command"] = _string(command, "command")
+        if args is not None:
+            workload["args"] = _string_array(args, "args")
+    elif isinstance(command, list):
+        workload["command"] = _string_array(command, "command")
+        if args is not None:
+            raise ValueError("args cannot be combined with the command-array format.")
+    elif command is not None:
+        raise ValueError("command must be a string or an array of strings.")
+    elif args is not None:
+        raise ValueError("args requires command.")
+
+    for field_name in ("input_files", "script_files"):
+        if field_name in workload:
+            workload[field_name] = _validate_input_references(workload[field_name], field_name)
+    if "expected_artifacts" in workload:
+        paths = _string_array(workload["expected_artifacts"], "expected_artifacts")
+        if not all(path.startswith("/workspace/artifacts/") for path in paths):
+            raise ValueError("expected_artifacts must be paths under /workspace/artifacts/.")
+        workload["expected_artifacts"] = paths
+    if any(key in workload for key in ("local_path", "path", "file_path")):
+        raise ValueError("Arbitrary local file access is not supported.")
+
+    env_refs = workload.get("environment_from_env")
+    if env_refs is not None and (
+        not isinstance(env_refs, Mapping)
+        or not all(
             isinstance(key, str) and key.strip() and isinstance(value, str) and value.strip()
-            for key, value in workload["environment_from_env"].items()
+            for key, value in env_refs.items()
         )
     ):
-        raise ValueError("environment_from_env must map workload variable names to local environment variable names.")
-    if _contains_sensitive_key(workload.get("metadata")):
+        raise ValueError("environment_from_env must map workload names to local environment names.")
+    if contains_sensitive_key(workload.get("metadata")):
         raise ValueError("metadata must not contain secret-like keys.")
+    if "callback" in workload:
+        workload["callback"] = _validate_callback(workload["callback"])
+    if "gpu_required" in workload and not isinstance(workload["gpu_required"], bool):
+        raise ValueError("gpu_required must be a boolean.")
+
+    for field_name in ("model_size_gb", "batch_size", "disk_gb", "gpu_count", "min_vram_gb", "max_price_per_hour"):
+        if field_name in workload:
+            _positive_number(
+                workload[field_name], field_name, allow_zero=field_name in {"batch_size", "disk_gb", "gpu_count"}
+            )
+    if "timeout_seconds" in workload:
+        _positive_number(workload["timeout_seconds"], "timeout_seconds")
+    for field_name, allowed in (
+        ("gpu_class", VALID_GPU_CLASSES),
+        ("region_mode", VALID_REGION_MODES),
+        ("precision", VALID_PRECISIONS),
+        ("priority", VALID_PRIORITIES),
+        ("latency_priority", {"low", "balanced", "high"}),
+        ("cost_priority", {"low", "balanced", "high"}),
+    ):
+        if field_name in workload and workload[field_name] not in allowed:
+            raise ValueError(f"{field_name} must be one of: {', '.join(sorted(allowed))}.")
+    optimize = workload.get("routing_mode", workload.get("optimize_for"))
+    if "routing_mode" in workload and "optimize_for" in workload:
+        raise ValueError("Use routing_mode or optimize_for, not both.")
+    if optimize is not None and optimize not in VALID_OPTIMIZE_FOR:
+        raise ValueError(f"routing preference must be one of: {', '.join(sorted(VALID_OPTIMIZE_FOR))}.")
+    if "avoid_gpu_families" in workload:
+        workload["avoid_gpu_families"] = _string_array(workload["avoid_gpu_families"], "avoid_gpu_families")
+    if "constraints" in workload:
+        workload["constraints"] = _validate_constraints(workload["constraints"])
     return workload
 
 
-def build_estimate_payload(workload: Dict[str, Any]) -> Dict[str, Any]:
-    """Build an estimate-only payload without submit-only or secret-bearing fields."""
-    return {key: value for key, value in workload.items() if key in ESTIMATE_FIELDS}
+def _api_workload_type(value: object) -> object:
+    return "fine-tuning" if value == "fine_tuning" else value
 
 
-def build_submit_payload(workload: Dict[str, Any]) -> Dict[str, Any]:
-    """Build a submit payload, resolving secret environment values only at submission time."""
-    payload = {key: value for key, value in workload.items() if key != "environment_from_env"}
-    references = workload.get("environment_from_env")
-    if not references:
-        return payload
+def normalize_api_payload(workload: Mapping[str, object]) -> dict[str, object]:
+    """Convert goal compatibility aliases to the current Jungle Grid shape."""
+    payload = copy.deepcopy(dict(workload))
+    payload["workload_type"] = _api_workload_type(payload["workload_type"])
+    if "routing_mode" in payload:
+        payload["optimize_for"] = payload.pop("routing_mode")
+    if isinstance(payload.get("command"), str):
+        legacy_args = payload.pop("args", [])
+        payload["command"] = [
+            payload["command"],
+            *(legacy_args if isinstance(legacy_args, list) else []),
+        ]
+    return payload
 
-    missing = sorted(env_name for env_name in references.values() if not os.getenv(env_name))
-    if missing:
-        raise ValueError(f"Missing required local environment variables: {', '.join(missing)}.")
-    payload["environment"] = {name: os.environ[env_name] for name, env_name in references.items()}
+
+def build_estimate_payload(workload: Mapping[str, object]) -> dict[str, object]:
+    payload = normalize_api_payload({key: value for key, value in workload.items() if key in ESTIMATE_FIELDS})
+    callback = payload.get("callback")
+    if isinstance(callback, dict):
+        callback.pop("auth_token_from_env", None)
     return payload
 
 
-def public_workload(workload: Dict[str, Any]) -> Dict[str, Any]:
-    """Return workload metadata safe to share in a project message or artifact."""
+def build_submit_payload(workload: Mapping[str, object]) -> tuple[dict[str, object], list[str]]:
+    """Resolve environment-backed secrets only after human approval."""
+    payload = normalize_api_payload({key: value for key, value in workload.items() if key != "environment_from_env"})
+    secrets: list[str] = []
+    references = workload.get("environment_from_env")
+    if isinstance(references, Mapping):
+        missing = sorted(str(env_name) for env_name in references.values() if not os.getenv(str(env_name)))
+        if missing:
+            raise ValueError(f"Missing required local environment variables: {', '.join(missing)}.")
+        environment = {str(name): os.environ[str(env_name)] for name, env_name in references.items()}
+        payload["environment"] = environment
+        secrets.extend(environment.values())
+    callback = payload.get("callback")
+    if isinstance(callback, dict):
+        auth_env = callback.pop("auth_token_from_env", None)
+        if auth_env:
+            token = os.getenv(str(auth_env))
+            if not token:
+                raise ValueError(f"Missing required local environment variable: {auth_env}.")
+            callback["auth_token"] = token
+            secrets.append(token)
+    return payload, secrets
+
+
+def public_workload(workload: Mapping[str, object]) -> dict[str, object]:
     result = dict(workload)
-    if "metadata" in result:
-        metadata = result["metadata"]
-        result["metadata"] = {key: "[REDACTED]" for key in metadata} if isinstance(metadata, dict) else "[REDACTED]"
+    metadata = result.get("metadata")
+    if isinstance(metadata, Mapping):
+        result["metadata"] = {str(key): "[REDACTED]" for key in metadata}
     return result
 
 
-def lifecycle_label(status: str) -> str:
-    """Map Jungle Grid status to a user-facing lifecycle label."""
-    if status == "assigned":
-        return "assigned (provisioning)"
-    return status
+def estimate_can_submit(estimate: Mapping[str, object]) -> bool:
+    screening = estimate.get("screening")
+    if isinstance(screening, Mapping) and screening.get("can_submit") is False:
+        return False
+    return estimate.get("available") is not False and estimate.get("can_submit") is not False
 
 
-def estimate_can_submit(estimate: Dict[str, Any]) -> bool:
-    """Return whether an estimate explicitly permits submission."""
-    return estimate.get("available") is not False and estimate.get("can_submit") is not False
+def estimate_summary(estimate: Mapping[str, object]) -> str:
+    """Build a compact summary without claiming immediate capacity."""
+    parts: list[str] = []
+    cost = estimate.get("estimated_cost_usd")
+    if cost is None:
+        minimum = estimate.get("estimated_cost_min_usd")
+        maximum = estimate.get("estimated_cost_max_usd")
+        if minimum is not None or maximum is not None:
+            cost = {"min": minimum, "max": maximum}
+    if cost is not None:
+        parts.append(f"estimated cost `{json.dumps(cost, sort_keys=True)}` USD")
+    duration_min = estimate.get("estimated_runtime_min_minutes")
+    duration_max = estimate.get("estimated_runtime_max_minutes")
+    if duration_min is not None or duration_max is not None:
+        parts.append(f"duration `{duration_min or '?'}-{duration_max or '?'}` minutes")
+    capacity = estimate.get("capacity_status")
+    if isinstance(capacity, Mapping):
+        if capacity.get("availability"):
+            parts.append(f"capacity `{capacity['availability']}`")
+        if capacity.get("immediate_capacity_confirmed") is False:
+            parts.append("immediate worker pickup not confirmed")
+    warnings = estimate.get("warnings")
+    if isinstance(warnings, list) and warnings:
+        parts.append(f"{len(warnings)} warning(s)")
+    return "; ".join(parts) if parts else "structured estimate stored in `jungle_grid_estimate`"
+
+
+def status_fingerprint(job: Mapping[str, object]) -> str:
+    fields = (
+        "status",
+        "execution_phase",
+        "status_message",
+        "phase_started_at",
+        "delayed_start",
+        "delay_reason",
+        "failure",
+    )
+    return json.dumps({key: job.get(key) for key in fields}, sort_keys=True, default=str)
 
 
 @dataclass
 class ProjectExecution:
-    """State tracked between estimate, approval, submission, and completion."""
-
     project_id: str
-    workload: Dict[str, Any]
+    workload: dict[str, object]
     estimate_id: str
-    estimate: Dict[str, Any]
+    estimate: dict[str, object]
     job_id: Optional[str] = None
-    last_status: Optional[str] = None
     approved_by: Optional[str] = None
-    submission_started: bool = False
-    submit_payload: Optional[Dict[str, Any]] = None
-    secret_values: Optional[list[str]] = None
+    submission_state: str = "pending"
+    cancel_requested: bool = False
+    terminal: bool = False
+    last_status_fingerprint: Optional[str] = None
+    log_cursor: Optional[str] = None
+    seen_event_ids: list[str] = field(default_factory=list)
+    logs: list[object] = field(default_factory=list)
+    events: list[object] = field(default_factory=list)
+    secret_values: list[str] = field(default_factory=list, repr=False)
+
+    def persisted(self) -> dict[str, object]:
+        data = asdict(self)
+        data.pop("secret_values", None)
+        return data
+
+    @classmethod
+    def from_persisted(cls, value: Mapping[str, object]) -> ProjectExecution:
+        allowed = cls.__dataclass_fields__.keys()
+        return cls(**{key: value[key] for key in allowed if key in value})  # type: ignore[arg-type]
 
 
 class JungleGridExecutorAgent(WorkerAgent):
-    """Execute approved Jungle Grid workloads and report results to an OpenAgents project."""
+    """Deterministic executor for the Jungle Grid project demo."""
 
     default_agent_id = "jungle-grid-executor"
 
@@ -325,208 +616,297 @@ def __init__(
         self,
         jungle_grid_client: Optional[JungleGridClient] = None,
         poll_interval_seconds: float = 10.0,
+        max_poll_failures: int = 3,
+        sleep: Callable[[float], Awaitable[None]] = asyncio.sleep,
         **kwargs: Any,
     ):
         super().__init__(**kwargs)
         self.jungle_grid = jungle_grid_client or JungleGridClient()
-        self.poll_interval_seconds = poll_interval_seconds
+        self.poll_interval_seconds = max(0.0, poll_interval_seconds)
+        self.max_poll_failures = max(1, max_poll_failures)
+        self.sleep = sleep
         self.project_adapter = DefaultProjectAgentAdapter()
-        self.executions: Dict[str, ProjectExecution] = {}
-        self.monitor_tasks: Dict[str, asyncio.Task] = {}
+        self.executions: dict[str, ProjectExecution] = {}
+        self.monitor_tasks: dict[str, asyncio.Task[None]] = {}
+        self.project_locks: dict[str, asyncio.Lock] = {}
 
-    async def on_startup(self):
-        """Bind the project adapter after the OpenAgents client is connected."""
+    async def on_startup(self) -> None:
         self.project_adapter.bind_client(self.client)
+        if self.client.connector is None:
+            raise RuntimeError("OpenAgents connector is unavailable during startup.")
         self.project_adapter.bind_connector(self.client.connector)
         self.project_adapter.bind_agent(self.agent_id)
         logger.info("Jungle Grid executor is ready")
 
-    async def on_shutdown(self):
-        """Stop local monitor tasks without cancelling remote jobs."""
+    async def on_shutdown(self) -> None:
         for task in self.monitor_tasks.values():
             task.cancel()
         if self.monitor_tasks:
             await asyncio.gather(*self.monitor_tasks.values(), return_exceptions=True)
 
-    async def _post(self, project_id: str, text: str):
+    async def _post(self, project_id: str, text: str) -> None:
         await self.project_adapter.send_project_message(project_id=project_id, content={"text": text})
 
-    async def _set_artifact(self, project_id: str, key: str, value: Dict[str, Any]):
+    async def _set_artifact(self, project_id: str, key: str, value: object) -> None:
+        safe = sanitize_project_data(value, [self.jungle_grid.api_key])
         await self.project_adapter.set_project_artifact(
-            project_id=project_id, key=key, value=json.dumps(value, indent=2)
+            project_id=project_id, key=key, value=json.dumps(safe, indent=2, sort_keys=True)
         )
 
-    def _project_secrets(self, execution: ProjectExecution) -> list[str]:
-        return [
-            self.jungle_grid.api_key,
-            *(execution.secret_values or []),
-            *_collect_string_values(execution.workload.get("metadata")),
-        ]
+    async def _save_state(self, execution: ProjectExecution) -> None:
+        await self._set_artifact(execution.project_id, STATE_ARTIFACT, execution.persisted())
+
+    async def _load_state(self, project_id: str) -> Optional[ProjectExecution]:
+        if project_id in self.executions:
+            return self.executions[project_id]
+        response = await self.project_adapter.get_project_artifact(project_id=project_id, key=STATE_ARTIFACT)
+        if not response.get("success"):
+            return None
+        value = response.get("data", {}).get("value")
+        if not isinstance(value, str) or not value.strip():
+            return None
+        try:
+            raw = json.loads(value)
+            if not isinstance(raw, dict):
+                return None
+            execution = ProjectExecution.from_persisted(raw)
+        except (TypeError, ValueError, json.JSONDecodeError):
+            return None
+        self.executions[project_id] = execution
+        return execution
 
-    def _sanitize_for_project(self, value: Any, execution: ProjectExecution) -> Any:
-        return sanitize_project_data(value, self._project_secrets(execution))
+    def _secrets(self, execution: ProjectExecution) -> list[str]:
+        return [self.jungle_grid.api_key, *execution.secret_values]
 
-    def _is_human_approver(self, sender_id: str) -> bool:
-        return sender_id.startswith("human:")
+    def _safe(self, value: object, execution: ProjectExecution) -> object:
+        return sanitize_project_data(value, self._secrets(execution))
+
+    @staticmethod
+    def _is_human(sender_id: str) -> bool:
+        return sender_id.startswith("human:") and len(sender_id) > len("human:")
 
     @on_event("project.notification.started")
-    async def handle_project_started(self, context: EventContext):
-        """Estimate a workload and request human approval without submitting it."""
+    async def handle_project_started(self, context: EventContext) -> None:
         payload = context.incoming_event.payload
         project_id = payload.get("project_id")
-        goal = payload.get("goal", "")
-        if not project_id:
+        if not isinstance(project_id, str) or not project_id:
             return
-        try:
-            workload = parse_workload_goal(goal)
-            estimate = await self.jungle_grid.estimate_job(build_estimate_payload(workload))
-            estimate_id = uuid.uuid4().hex[:12]
-            execution = ProjectExecution(project_id, workload, estimate_id, estimate)
-            self.executions[project_id] = execution
-            shared_workload = self._sanitize_for_project(public_workload(workload), execution)
-            shared_estimate = self._sanitize_for_project(estimate, execution)
-            await self._set_artifact(
-                project_id,
-                "jungle_grid_estimate",
-                {"estimate_id": estimate_id, "workload": shared_workload, "estimate": shared_estimate},
-            )
-            if not estimate_can_submit(estimate):
+        lock = self.project_locks.setdefault(project_id, asyncio.Lock())
+        async with lock:
+            existing = await self._load_state(project_id)
+            if existing:
+                if existing.job_id and not existing.terminal:
+                    self._ensure_monitor(existing)
+                return
+            try:
+                workload = parse_workload_goal(str(payload.get("goal", "")))
+                estimate = await self.jungle_grid.estimate_job(build_estimate_payload(workload))
+                execution = ProjectExecution(
+                    project_id=project_id,
+                    workload=workload,
+                    estimate_id=uuid.uuid4().hex[:12],
+                    estimate=estimate,
+                )
+                self.executions[project_id] = execution
+                await self._save_state(execution)
+                shared = {
+                    "estimate_id": execution.estimate_id,
+                    "workload": public_workload(workload),
+                    "estimate": estimate,
+                }
+                await self._set_artifact(project_id, "jungle_grid_estimate", shared)
+                if not estimate_can_submit(estimate):
+                    await self._post(project_id, "Jungle Grid screening blocked submission. No job was submitted.")
+                    await self.project_adapter.stop_project(
+                        project_id=project_id, reason="Jungle Grid screening blocked submission"
+                    )
+                    return
                 await self._post(
                     project_id,
-                    "Jungle Grid estimate is not currently eligible for submission.\n\n"
-                    f"```json\n{json.dumps({'estimate_id': estimate_id, 'workload': shared_workload, 'estimate': shared_estimate}, indent=2)}\n```",
+                    "Jungle Grid estimate ready. No job has been submitted. "
+                    f"Summary: {estimate_summary(estimate)}.\n\n"
+                    f"A human must reply exactly `APPROVE {execution.estimate_id}` "
+                    "before billable compute can start.",
                 )
-                await self.project_adapter.stop_project(
-                    project_id=project_id, reason="Jungle Grid estimate is not eligible for submission"
+            except (ValueError, JungleGridError) as exc:
+                await self._post(
+                    project_id,
+                    f"Jungle Grid estimate failed: {redact_sensitive(exc, [self.jungle_grid.api_key])}",
                 )
-                return
-            await self._post(
-                project_id,
-                "Jungle Grid estimate ready. No job has been submitted.\n\n"
-                f"```json\n{json.dumps({'estimate_id': estimate_id, 'workload': shared_workload, 'estimate': shared_estimate}, indent=2)}\n```\n\n"
-                f"A human must reply exactly `APPROVE {estimate_id}` before billable compute can start.",
-            )
-        except (ValueError, JungleGridError) as exc:
-            await self._post(
-                project_id, f"Jungle Grid estimate failed: {redact_sensitive(exc, self.jungle_grid.api_key)}"
-            )
-            await self.project_adapter.stop_project(project_id=project_id, reason="Jungle Grid estimate failed")
+                await self.project_adapter.stop_project(project_id=project_id, reason="Jungle Grid estimate failed")
 
     @on_event("project.notification.message_received")
-    async def handle_project_message(self, context: EventContext):
-        """Handle explicit approval and cancellation commands."""
+    async def handle_project_message(self, context: EventContext) -> None:
         payload = context.incoming_event.payload
         project_id = payload.get("project_id")
         sender_id = str(payload.get("sender_id", ""))
-        content = payload.get("content", {})
-        text = content.get("text", "") if isinstance(content, dict) else ""
-        if not project_id or not isinstance(text, str):
+        content = payload.get("content")
+        text = content.get("text") if isinstance(content, Mapping) else None
+        if not isinstance(project_id, str) or not isinstance(text, str):
             return
-        command = text
-        execution = self.executions.get(project_id)
-
-        if command.startswith("APPROVE "):
-            if not execution:
-                await self._post(project_id, "There is no pending Jungle Grid estimate for this project.")
-                return
-            if not self._is_human_approver(sender_id):
-                await self._post(
-                    project_id, "Approval rejected: billable Jungle Grid submission requires a human approver."
-                )
-                return
-            if command != f"APPROVE {execution.estimate_id}":
-                await self._post(project_id, "Approval rejected: estimate id does not match the pending estimate.")
-                return
-            if execution.submission_started:
-                suffix = f" as job `{execution.job_id}`" if execution.job_id else ""
-                await self._post(project_id, f"Jungle Grid submission has already been requested{suffix}.")
-                return
-            await self._submit_and_monitor(execution, sender_id)
+        normalized_prefix = text.strip()
+        if not normalized_prefix.startswith(("APPROVE", "CANCEL")):
             return
+        lock = self.project_locks.setdefault(project_id, asyncio.Lock())
+        async with lock:
+            execution = await self._load_state(project_id)
+            if normalized_prefix.startswith("APPROVE"):
+                await self._handle_approval(project_id, sender_id, text, execution)
+            else:
+                await self._handle_cancellation(project_id, sender_id, text, execution)
+
+    async def _handle_approval(
+        self,
+        project_id: str,
+        sender_id: str,
+        command: str,
+        execution: Optional[ProjectExecution],
+    ) -> None:
+        if not execution:
+            await self._post(project_id, "There is no pending Jungle Grid estimate for this project.")
+            return
+        if not self._is_human(sender_id):
+            await self._post(project_id, "Approval rejected: billable submission requires a verified human identity.")
+            return
+        if command != f"APPROVE {execution.estimate_id}":
+            await self._post(project_id, "Approval rejected: estimate id does not match the pending estimate.")
+            return
+        if execution.terminal or execution.submission_state != "pending":
+            suffix = f" as job `{execution.job_id}`" if execution.job_id else ""
+            await self._post(project_id, f"Jungle Grid submission has already been recorded{suffix}.")
+            return
+        await self._submit(execution, sender_id)
 
-        if command.startswith("CANCEL "):
-            if not execution or not execution.job_id:
-                await self._post(project_id, "There is no submitted Jungle Grid job to cancel for this project.")
-                return
-            if command != f"CANCEL {execution.job_id}":
-                await self._post(project_id, "Cancellation rejected: job id does not match this project.")
-                return
-            if not self._is_human_approver(sender_id):
-                await self._post(
-                    project_id, "Cancellation rejected: Jungle Grid cancellation requires a human approver."
-                )
-                return
-            try:
-                result = await self.jungle_grid.cancel_job(
-                    execution.job_id, f"Requested from OpenAgents by {sender_id}"
-                )
-                shared_result = self._sanitize_for_project(result, execution)
-                await self._post(
-                    project_id,
-                    f"Cancellation requested for Jungle Grid job `{execution.job_id}`.\n\n```json\n{json.dumps(shared_result, indent=2)}\n```",
-                )
-            except JungleGridError as exc:
-                await self._post(
-                    project_id, f"Jungle Grid cancellation failed: {redact_sensitive(exc, self.jungle_grid.api_key)}"
-                )
-
-    async def _submit_and_monitor(self, execution: ProjectExecution, approved_by: str):
-        execution.submission_started = True
+    async def _submit(self, execution: ProjectExecution, approved_by: str) -> None:
+        execution.submission_state = "submitting"
         execution.approved_by = approved_by
+        await self._save_state(execution)
         try:
-            execution.submit_payload = build_submit_payload(execution.workload)
-            execution.secret_values = _collect_string_values(execution.submit_payload.get("environment"))
-            result = await self.jungle_grid.submit_job(execution.submit_payload)
+            submit_payload, secrets = build_submit_payload(execution.workload)
+            execution.secret_values = secrets
+            result = await self.jungle_grid.submit_job(submit_payload)
             job_id = str(result.get("job_id") or result.get("id") or "").strip()
             if not job_id:
                 raise JungleGridError("INVALID_API_RESPONSE", "Jungle Grid submit response did not include a job id.")
             execution.job_id = job_id
-            execution.last_status = str(result.get("status") or "submitted")
+            execution.submission_state = "submitted"
+            execution.last_status_fingerprint = status_fingerprint(result)
+            await self._save_state(execution)
             await self._set_artifact(
                 execution.project_id,
                 "jungle_grid_submission",
                 {
                     "approved_by": approved_by,
                     "estimate_id": execution.estimate_id,
-                    "submission": self._sanitize_for_project(result, execution),
+                    "submission": self._safe(result, execution),
                 },
             )
             await self._post(
                 execution.project_id,
-                f"Jungle Grid job submitted after approval by `{approved_by}`: `{job_id}` "
-                f"(status: `{lifecycle_label(execution.last_status)}`).",
+                f"Jungle Grid job submitted after approval by `{approved_by}`: `{job_id}`.",
             )
-            task = asyncio.create_task(self._monitor(execution))
-            self.monitor_tasks[execution.project_id] = task
+            self._ensure_monitor(execution)
         except (ValueError, JungleGridError) as exc:
+            execution.submission_state = "submission_failed"
+            await self._save_state(execution)
             await self._post(
                 execution.project_id,
-                f"Jungle Grid submission failed: {redact_sensitive(exc, self.jungle_grid.api_key)}",
+                f"Jungle Grid submission failed: {redact_sensitive(exc, self._secrets(execution))}",
             )
             await self.project_adapter.stop_project(
                 project_id=execution.project_id, reason="Jungle Grid submission failed"
             )
 
-    async def _monitor(self, execution: ProjectExecution):
+    async def _handle_cancellation(
+        self,
+        project_id: str,
+        sender_id: str,
+        command: str,
+        execution: Optional[ProjectExecution],
+    ) -> None:
+        if not execution or not execution.job_id:
+            await self._post(project_id, "There is no submitted Jungle Grid job to cancel for this project.")
+            return
+        if command != f"CANCEL {execution.job_id}":
+            await self._post(project_id, "Cancellation rejected: job id does not match this project.")
+            return
+        if not self._is_human(sender_id):
+            await self._post(project_id, "Cancellation rejected: cancellation requires a verified human identity.")
+            return
+        if execution.terminal:
+            await self._post(
+                project_id, "Cancellation was not sent because this project already recorded a terminal job."
+            )
+            return
+        if execution.cancel_requested:
+            await self._post(project_id, "Cancellation has already been requested for this job.")
+            return
+        execution.cancel_requested = True
+        await self._save_state(execution)
+        try:
+            result = await self.jungle_grid.cancel_job(execution.job_id, f"Requested from OpenAgents by {sender_id}")
+            await self._post(
+                project_id,
+                f"Cancellation requested for Jungle Grid job `{execution.job_id}`: "
+                f"{json.dumps(self._safe(result, execution), sort_keys=True)}",
+            )
+            if str(result.get("status", "")).lower() in TERMINAL_STATUSES:
+                execution.terminal = True
+                await self._save_state(execution)
+                await self.project_adapter.stop_project(
+                    project_id=project_id, reason=f"Jungle Grid job {execution.job_id} was cancelled."
+                )
+        except JungleGridError as exc:
+            execution.cancel_requested = False
+            await self._save_state(execution)
+            await self._post(
+                project_id,
+                f"Jungle Grid cancellation failed: {redact_sensitive(exc, self._secrets(execution))}",
+            )
+
+    def _ensure_monitor(self, execution: ProjectExecution) -> None:
+        current = self.monitor_tasks.get(execution.project_id)
+        if current and not current.done():
+            return
+        self.monitor_tasks[execution.project_id] = asyncio.create_task(self._monitor(execution))
+
+    async def _monitor(self, execution: ProjectExecution) -> None:
         assert execution.job_id
+        failures = 0
         try:
-            while True:
-                job = await self.jungle_grid.get_job(execution.job_id)
-                status = str(job.get("status") or "unknown")
-                if status != execution.last_status:
-                    execution.last_status = status
+            while not execution.terminal:
+                try:
+                    job = await self.jungle_grid.get_job(execution.job_id)
+                    await self._collect_events(execution)
+                    await self._collect_logs(execution)
+                    failures = 0
+                except JungleGridError as exc:
+                    failures += 1
+                    if failures >= self.max_poll_failures:
+                        raise exc
+                    await self.sleep(self.poll_interval_seconds)
+                    continue
+                fingerprint = status_fingerprint(job)
+                if fingerprint != execution.last_status_fingerprint:
+                    execution.last_status_fingerprint = fingerprint
+                    status = str(job.get("status") or "unknown")
+                    phase = job.get("execution_phase")
+                    delayed = " (delayed start)" if job.get("delayed_start") is True else ""
+                    phase_text = f", phase `{phase}`" if phase else ""
                     await self._post(
                         execution.project_id,
-                        f"Jungle Grid job `{execution.job_id}` is now `{lifecycle_label(status)}`.",
+                        f"Jungle Grid job `{execution.job_id}` is `{status}`{phase_text}{delayed}.",
                     )
-                if status in TERMINAL_STATUSES:
+                await self._save_state(execution)
+                if str(job.get("status", "")).lower() in TERMINAL_STATUSES:
                     await self._finalize(execution, job)
                     return
-                await asyncio.sleep(self.poll_interval_seconds)
+                await self.sleep(self.poll_interval_seconds)
         except JungleGridError as exc:
             await self._post(
                 execution.project_id,
-                f"Jungle Grid monitoring failed: {redact_sensitive(exc, self.jungle_grid.api_key)}",
+                f"Jungle Grid monitoring failed after bounded retries: "
+                f"{redact_sensitive(exc, self._secrets(execution))}",
             )
             await self.project_adapter.stop_project(
                 project_id=execution.project_id, reason="Jungle Grid monitoring failed"
@@ -534,44 +914,73 @@ async def _monitor(self, execution: ProjectExecution):
         finally:
             self.monitor_tasks.pop(execution.project_id, None)
 
-    async def _finalize(self, execution: ProjectExecution, job: Dict[str, Any]):
+    async def _collect_events(self, execution: ProjectExecution) -> None:
         assert execution.job_id
-        runtime: Dict[str, Any] = {}
-        logs: Dict[str, Any] = {}
-        artifacts: Dict[str, Any] = {}
-        downloads = []
+        response = await self.jungle_grid.get_job_events(execution.job_id)
+        items = response.get("items")
+        if not isinstance(items, list):
+            return
+        seen = set(execution.seen_event_ids)
+        new_items: list[object] = []
+        for item in items:
+            if not isinstance(item, Mapping):
+                continue
+            event_id = str(item.get("id") or item.get("sequence") or item.get("created_at") or "")
+            if not event_id or event_id in seen:
+                continue
+            seen.add(event_id)
+            execution.seen_event_ids.append(event_id)
+            new_items.append(self._safe(item, execution))
+        if new_items:
+            execution.events = (execution.events + new_items)[-MAX_SHARED_EVENTS:]
+            latest = new_items[-1]
+            title = latest.get("title") if isinstance(latest, Mapping) else None
+            if title:
+                await self._post(execution.project_id, f"Jungle Grid lifecycle: {title}.")
+
+    async def _collect_logs(self, execution: ProjectExecution) -> None:
+        assert execution.job_id
+        response = await self.jungle_grid.get_job_logs(execution.job_id, limit=100, cursor=execution.log_cursor)
+        items = response.get("items", response.get("logs"))
+        if isinstance(items, list) and items:
+            safe_items = self._safe(items, execution)
+            if isinstance(safe_items, list):
+                execution.logs = (execution.logs + safe_items)[-MAX_SHARED_LOGS:]
+        next_cursor = response.get("next_cursor")
+        if next_cursor is not None and str(next_cursor) != execution.log_cursor:
+            execution.log_cursor = str(next_cursor)
+
+    async def _finalize(self, execution: ProjectExecution, job: dict[str, object]) -> None:
+        assert execution.job_id
+        runtime: object = {}
+        artifacts: object = {}
         try:
             runtime = await self.jungle_grid.get_job_runtime(execution.job_id)
         except JungleGridError as exc:
-            runtime = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
-        try:
-            logs = await self.jungle_grid.get_job_logs(execution.job_id)
-        except JungleGridError as exc:
-            logs = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
+            if exc.status not in {404, 409}:
+                runtime = {"unavailable": redact_sensitive(exc, self._secrets(execution))}
+            else:
+                runtime = {"unavailable": "Runtime details are not available for this job."}
         try:
             artifacts = await self.jungle_grid.list_artifacts(execution.job_id)
-            for artifact in artifacts.get("artifacts", []):
-                if not isinstance(artifact, dict):
-                    continue
-                artifact_id = str(artifact.get("artifact_id") or artifact.get("id") or "").strip()
-                if artifact_id:
-                    download = await self.jungle_grid.get_artifact(execution.job_id, artifact_id)
-                    if "url" in download:
-                        download = {**download, "url": "[REDACTED]"}
-                    downloads.append(download)
         except JungleGridError as exc:
-            artifacts = {"error": redact_sensitive(exc, self.jungle_grid.api_key)}
-
-        result = self._sanitize_for_project(
-            {"job": job, "runtime": runtime, "logs": logs, "artifacts": artifacts, "downloads": downloads},
-            execution,
-        )
+            artifacts = {"unavailable": redact_sensitive(exc, self._secrets(execution))}
+        result = {
+            "job": self._safe(job, execution),
+            "events": execution.events,
+            "logs": execution.logs,
+            "runtime": self._safe(runtime, execution),
+            "artifacts": self._safe(artifacts, execution),
+        }
         await self._set_artifact(execution.project_id, "jungle_grid_result", result)
-        status = str(job.get("status") or "unknown")
+        execution.terminal = True
+        await self._save_state(execution)
+        status = str(job.get("status") or "unknown").lower()
         await self._post(
             execution.project_id,
             f"Jungle Grid job `{execution.job_id}` finished with status `{status}`. "
-            "Logs and artifact metadata are stored in project artifact `jungle_grid_result`.",
+            "Sanitized lifecycle events, polled logs, runtime details, and artifact metadata are in "
+            "`jungle_grid_result`. Temporary download URLs are intentionally not requested or stored.",
         )
         if status == "completed":
             await self.project_adapter.complete_project(
@@ -585,10 +994,12 @@ async def _finalize(self, execution: ProjectExecution, job: Dict[str, Any]):
             )
 
 
-async def main():
-    """Run the Jungle Grid executor agent."""
+async def main() -> None:
     logging.basicConfig(level=logging.INFO, format="%(asctime)s - %(name)s - %(levelname)s - %(message)s")
-    agent = JungleGridExecutorAgent()
+    agent = JungleGridExecutorAgent(
+        poll_interval_seconds=float(os.getenv("JUNGLE_GRID_POLL_INTERVAL_SECONDS", "10")),
+        max_poll_failures=int(os.getenv("JUNGLE_GRID_MAX_POLL_FAILURES", "3")),
+    )
     try:
         await agent.async_start(
             network_host="localhost",
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/network.yaml b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
index 30c0f5012..8bfb1f445 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
+++ b/sdk/demos/09_jungle_grid_gpu_execution/network.yaml
@@ -20,6 +20,8 @@ network:
   agent_groups:
     executors:
       description: Agents allowed to execute Jungle Grid project workflows
+      # Demo-only group credential used to establish runtime topology membership.
+      # Replace it before adapting this network for a shared or public deployment.
       password_hash: 8fba13dab71d6fdd8a9b9db1f06e81315dfbfd69167b6097f724604db3c91cdf
       metadata:
         permissions:
@@ -39,7 +41,7 @@ network:
             description: Estimate, approve, execute, and monitor an AI workload on Jungle Grid
             expose_as_tool: true
             tool_name: run_jungle_grid_workload
-            tool_description: Start a Jungle Grid workload project. The task must be a JSON object with name, workload_type, image, and model_size_gb; use environment_from_env for workload environment values.
+            tool_description: Start a human-approved Jungle Grid workload project. The task must be a JSON object with name, workload_type, and image; use uploaded input_id references and environment_from_env for secret workload values.
             tool_mode: async
             agent_groups:
               - executors
@@ -48,7 +50,8 @@ network:
               The executor estimates cost first and will not submit a job until a human
               replies with the exact approval command shown in the project. Do not put
               credentials in the goal; use environment_from_env to reference variables
-              available only in the executor process.
+              available only in the executor process. File jobs accept previously
+              uploaded Jungle Grid input_id references and never read arbitrary host paths.
   created_by_version: 0.9.3
 
 network_profile:

From b14a0f361a565707d9b5cd74e2e3e21c3b9435c9 Mon Sep 17 00:00:00 2001
From: dejaguarkyng <deinvinciblekyng.1@gmail.com>
Date: Thu, 11 Jun 2026 15:18:07 +0000
Subject: [PATCH 4/5] test: expand jungle grid execution safety coverage

---
 tests/agents/test_jungle_grid_executor.py | 791 ++++++++++++++--------
 1 file changed, 494 insertions(+), 297 deletions(-)

diff --git a/tests/agents/test_jungle_grid_executor.py b/tests/agents/test_jungle_grid_executor.py
index aa9bf248e..b3466bc7f 100644
--- a/tests/agents/test_jungle_grid_executor.py
+++ b/tests/agents/test_jungle_grid_executor.py
@@ -1,8 +1,9 @@
-"""Mocked tests for the Jungle Grid GPU execution demo agent."""
+"""Mocked safety and contract tests for the Jungle Grid execution demo."""
 
 import asyncio
 import importlib.util
 import json
+import sys
 from pathlib import Path
 from types import SimpleNamespace
 from unittest.mock import AsyncMock
@@ -29,6 +30,7 @@
 SPEC = importlib.util.spec_from_file_location("jungle_grid_executor", MODULE_PATH)
 MODULE = importlib.util.module_from_spec(SPEC)
 assert SPEC and SPEC.loader
+sys.modules[SPEC.name] = MODULE
 SPEC.loader.exec_module(MODULE)
 
 JungleGridClient = MODULE.JungleGridClient
@@ -36,10 +38,10 @@
 JungleGridExecutorAgent = MODULE.JungleGridExecutorAgent
 ProjectExecution = MODULE.ProjectExecution
 EXECUTORS_GROUP_PASSWORD_HASH = MODULE.EXECUTORS_GROUP_PASSWORD_HASH
+STATE_ARTIFACT = MODULE.STATE_ARTIFACT
 build_estimate_payload = MODULE.build_estimate_payload
 build_submit_payload = MODULE.build_submit_payload
 estimate_can_submit = MODULE.estimate_can_submit
-lifecycle_label = MODULE.lifecycle_label
 parse_workload_goal = MODULE.parse_workload_goal
 public_workload = MODULE.public_workload
 redact_sensitive = MODULE.redact_sensitive
@@ -54,50 +56,88 @@ def context(event_name, payload):
     )
 
 
-def workload():
-    return {
-        "name": "batch-demo",
-        "workload_type": "batch",
-        "image": "python:3.11-slim",
+def workload(**updates):
+    value = {
+        "name": "training-demo",
+        "workload_type": "training",
+        "image": "pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime",
+        "command": ["python", "-c", "print(42)"],
         "model_size_gb": 1,
-        "command": "python",
-        "args": ["-c", "print(42)"],
-        "optimize_for": "cost",
+        "routing_mode": "cost",
     }
+    value.update(updates)
+    return value
 
 
 class FakeJungleGridClient:
     def __init__(self):
-        self.api_key = "test-api-key"
-        self.estimate_job = AsyncMock(return_value={"available": True, "estimated_cost_usd": {"min": 0.1, "max": 0.2}})
+        self.api_key = "jg_test_api_key"
+        self.estimate_job = AsyncMock(
+            return_value={
+                "available": True,
+                "screening": {"can_submit": True},
+                "capacity_status": {"immediate_capacity_confirmed": False},
+            }
+        )
         self.submit_job = AsyncMock(return_value={"job_id": "job_123", "status": "queued"})
         self.get_job = AsyncMock(return_value={"job_id": "job_123", "status": "completed"})
+        self.get_job_events = AsyncMock(
+            return_value={
+                "items": [
+                    {
+                        "id": "evt_1",
+                        "type": "job.completed",
+                        "title": "Job completed",
+                        "message": "done",
+                        "created_at": "2026-06-11T00:00:00Z",
+                    }
+                ]
+            }
+        )
+        self.get_job_logs = AsyncMock(
+            return_value={
+                "items": [{"category": "workload_stdout", "message": "done"}],
+                "next_cursor": None,
+            }
+        )
         self.get_job_runtime = AsyncMock(return_value={"exit_code": 0, "stdout_tail": "done"})
-        self.get_job_logs = AsyncMock(return_value={"items": [{"message": "done"}]})
-        self.cancel_job = AsyncMock(return_value={"job_id": "job_123", "status": "cancelled", "cancelled": True})
+        self.cancel_job = AsyncMock(return_value={"job_id": "job_123", "status": "cancelled"})
         self.list_artifacts = AsyncMock(
-            return_value={"artifacts": [{"artifact_id": "artifact_1", "filename": "output.json"}]}
-        )
-        self.get_artifact = AsyncMock(
             return_value={
-                "artifact": {"artifact_id": "artifact_1", "filename": "output.json"},
-                "url": "https://example.test/file",
+                "artifacts": [
+                    {
+                        "artifact_id": "artifact_1",
+                        "filename": "output.json",
+                        "content_type": "application/json",
+                        "size_bytes": 12,
+                    }
+                ]
             }
         )
+        self.get_artifact = AsyncMock(return_value={"download_url": "https://storage.example/file?signature=secret"})
 
 
 def agent_with_mocks(fake=None):
-    agent = JungleGridExecutorAgent(jungle_grid_client=fake or FakeJungleGridClient(), poll_interval_seconds=0)
+    agent = JungleGridExecutorAgent(
+        jungle_grid_client=fake or FakeJungleGridClient(),
+        poll_interval_seconds=0,
+        sleep=AsyncMock(),
+    )
     agent.project_adapter = AsyncMock()
     agent.project_adapter.send_project_message = AsyncMock(return_value={"success": True})
     agent.project_adapter.set_project_artifact = AsyncMock(return_value={"success": True})
+    agent.project_adapter.get_project_artifact = AsyncMock(return_value={"success": True, "data": {"value": None}})
     agent.project_adapter.complete_project = AsyncMock(return_value={"success": True})
     agent.project_adapter.stop_project = AsyncMock(return_value={"success": True})
     return agent
 
 
+def message_texts(agent):
+    return [call.kwargs["content"]["text"] for call in agent.project_adapter.send_project_message.await_args_list]
+
+
 @pytest.mark.asyncio
-async def test_executor_group_membership_delivers_project_start_and_returns_estimate():
+async def test_group_authentication_runtime_membership_and_project_delivery():
     network_yaml = yaml.safe_load(NETWORK_CONFIG_PATH.read_text())
     executor_group = network_yaml["network"]["agent_groups"]["executors"]
     assert executor_group["password_hash"] == EXECUTORS_GROUP_PASSWORD_HASH
@@ -137,17 +177,11 @@ async def test_executor_group_membership_delivers_project_start_and_returns_esti
 
     fake = FakeJungleGridClient()
     executor = agent_with_mocks(fake)
-    delivered = []
 
     async def deliver(event):
-        delivered.append(event)
         if event.destination_id == "jungle-grid-executor":
             await executor.handle_project_started(
-                EventContext(
-                    incoming_event=event,
-                    event_threads={},
-                    incoming_thread_id="project-start",
-                )
+                EventContext(incoming_event=event, event_threads={}, incoming_thread_id="start")
             )
         return SimpleNamespace(success=True)
 
@@ -155,7 +189,7 @@ async def deliver(event):
     response = await project_mod.process_system_message(
         Event(
             event_name="project.start",
-            source_id="human:project-owner",
+            source_id="human:owner",
             payload={
                 "template_id": "jungle_grid_execution",
                 "goal": json.dumps(workload()),
@@ -163,364 +197,478 @@ async def deliver(event):
             },
         )
     )
-
     assert response.success
     assert "jungle-grid-executor" in response.data["authorized_agents"]
-    assert any(
-        event.event_name == "project.notification.started"
-        and event.destination_id == "jungle-grid-executor"
-        and event.payload["initiator_agent_id"] == "human:project-owner"
-        for event in delivered
-    )
-    fake.estimate_job.assert_awaited_once_with(build_estimate_payload(workload()))
-    estimate_message = executor.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
-    assert "Jungle Grid estimate ready" in estimate_message
-    assert "APPROVE" in estimate_message
+    fake.estimate_job.assert_awaited_once()
 
 
 @pytest.mark.asyncio
-async def test_successful_estimate_flow_posts_estimate_and_requires_approval():
+async def test_estimate_never_submits_and_requires_exact_human_approval():
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
-
     await agent.handle_project_started(
         context("project.notification.started", {"project_id": "project-1", "goal": json.dumps(workload())})
     )
-
-    fake.estimate_job.assert_awaited_once_with(build_estimate_payload(workload()))
     fake.submit_job.assert_not_awaited()
-    assert "project-1" in agent.executions
-    message = agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
-    assert "No job has been submitted" in message
-    assert "APPROVE" in message
+    assert any("No job has been submitted" in text and "APPROVE" in text for text in message_texts(agent))
 
 
 @pytest.mark.asyncio
-async def test_unavailable_estimate_never_requests_approval_or_submits():
+async def test_screening_can_submit_false_blocks_approval():
     fake = FakeJungleGridClient()
-    fake.estimate_job = AsyncMock(return_value={"available": False, "can_submit": False})
+    fake.estimate_job.return_value = {
+        "available": True,
+        "screening": {"can_submit": False, "blocked_checks": ["resource"]},
+    }
     agent = agent_with_mocks(fake)
-
     await agent.handle_project_started(
         context("project.notification.started", {"project_id": "project-1", "goal": json.dumps(workload())})
     )
-
     fake.submit_job.assert_not_awaited()
-    message = agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
-    assert "not currently eligible for submission" in message
-    assert "APPROVE" not in message
     agent.project_adapter.stop_project.assert_awaited_once()
+    assert not any("APPROVE" in text for text in message_texts(agent))
 
 
 @pytest.mark.asyncio
-async def test_approval_required_before_submit_and_non_human_is_rejected():
+@pytest.mark.parametrize(
+    ("sender", "command"),
+    [
+        ("agent:other", "APPROVE estimate-1"),
+        ("human:user", "APPROVE wrong"),
+        ("human:user", " APPROVE estimate-1"),
+        ("human:user", "APPROVE estimate-1\n"),
+    ],
+)
+async def test_unauthorized_or_malformed_approval_is_rejected(sender, command):
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
-    agent.executions["project-1"] = execution
-
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
     await agent.handle_project_message(
         context(
             "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "agent:other", "content": {"text": "APPROVE estimate-1"}},
+            {"project_id": "project-1", "sender_id": sender, "content": {"text": command}},
         )
     )
-
     fake.submit_job.assert_not_awaited()
-    assert (
-        "requires a human approver" in agent.project_adapter.send_project_message.await_args.kwargs["content"]["text"]
-    )
 
 
 @pytest.mark.asyncio
-@pytest.mark.parametrize("command", ["APPROVE estimate-2", " APPROVE estimate-1", "APPROVE estimate-1\n"])
-async def test_approval_requires_exact_command(command):
+async def test_duplicate_and_concurrent_approval_submit_only_once():
     fake = FakeJungleGridClient()
-    agent = agent_with_mocks(fake)
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
-    agent.executions["project-1"] = execution
+    started = asyncio.Event()
+    release = asyncio.Event()
 
-    await agent.handle_project_message(
-        context(
-            "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": command}},
-        )
+    async def delayed_submit(_payload):
+        started.set()
+        await release.wait()
+        return {"job_id": "job_123", "status": "queued"}
+
+    fake.submit_job.side_effect = delayed_submit
+    agent = agent_with_mocks(fake)
+    agent._ensure_monitor = lambda execution: None
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    approval = context(
+        "project.notification.message_received",
+        {
+            "project_id": "project-1",
+            "sender_id": "human:user",
+            "content": {"text": "APPROVE estimate-1"},
+        },
     )
+    first = asyncio.create_task(agent.handle_project_message(approval))
+    await started.wait()
+    second = asyncio.create_task(agent.handle_project_message(approval))
+    release.set()
+    await asyncio.gather(first, second)
+    fake.submit_job.assert_awaited_once()
+
 
+@pytest.mark.asyncio
+async def test_restart_recovers_submitted_state_without_resubmitting():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    persisted = ProjectExecution(
+        "project-1",
+        workload(),
+        "estimate-1",
+        {"available": True},
+        job_id="job_existing",
+        submission_state="submitted",
+    )
+    agent.project_adapter.get_project_artifact.return_value = {
+        "success": True,
+        "data": {"value": json.dumps(persisted.persisted())},
+    }
+    agent._ensure_monitor = AsyncMock()
+    await agent.handle_project_started(
+        context("project.notification.started", {"project_id": "project-1", "goal": json.dumps(workload())})
+    )
+    fake.estimate_job.assert_not_awaited()
     fake.submit_job.assert_not_awaited()
+    agent._ensure_monitor.assert_called_once()
 
 
 @pytest.mark.asyncio
-async def test_approved_submit_flow_starts_monitor():
+async def test_restart_does_not_retry_uncertain_submission():
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
-    agent.executions["project-1"] = execution
-    agent._monitor = AsyncMock()
-
+    persisted = ProjectExecution(
+        "project-1",
+        workload(),
+        "estimate-1",
+        {"available": True},
+        submission_state="submitting",
+    )
+    agent.project_adapter.get_project_artifact.return_value = {
+        "success": True,
+        "data": {"value": json.dumps(persisted.persisted())},
+    }
     await agent.handle_project_message(
         context(
             "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "APPROVE estimate-1"}},
+            {
+                "project_id": "project-1",
+                "sender_id": "human:user",
+                "content": {"text": "APPROVE estimate-1"},
+            },
         )
     )
-    await asyncio.sleep(0)
+    fake.submit_job.assert_not_awaited()
 
-    fake.submit_job.assert_awaited_once_with(workload())
-    assert execution.job_id == "job_123"
-    agent._monitor.assert_awaited_once_with(execution)
 
+def test_current_command_array_is_preserved():
+    requested = parse_workload_goal(json.dumps(workload()))
+    assert build_estimate_payload(requested)["command"] == ["python", "-c", "print(42)"]
+    assert build_submit_payload(requested)[0]["command"] == ["python", "-c", "print(42)"]
 
-@pytest.mark.asyncio
-async def test_concurrent_matching_approvals_submit_only_once():
-    fake = FakeJungleGridClient()
-    submit_started = asyncio.Event()
-    release_submit = asyncio.Event()
 
-    async def delayed_submit(_workload):
-        submit_started.set()
-        await release_submit.wait()
-        return {"job_id": "job_123", "status": "queued"}
+def test_legacy_command_and_args_are_combined_without_semantic_change():
+    requested = parse_workload_goal(json.dumps(workload(command="python", args=["-c", "print(42)"])))
+    assert build_submit_payload(requested)[0]["command"] == ["python", "-c", "print(42)"]
+    assert "args" not in build_submit_payload(requested)[0]
 
-    fake.submit_job = AsyncMock(side_effect=delayed_submit)
-    agent = agent_with_mocks(fake)
-    agent._monitor = AsyncMock()
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
-    agent.executions["project-1"] = execution
-    approval = context(
-        "project.notification.message_received",
-        {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "APPROVE estimate-1"}},
+
+def test_command_array_rejects_separate_args():
+    with pytest.raises(ValueError, match="cannot be combined"):
+        parse_workload_goal(json.dumps(workload(args=["extra"])))
+
+
+def test_fine_tuning_is_accepted_and_normalized():
+    requested = parse_workload_goal(json.dumps(workload(workload_type="fine_tuning")))
+    assert build_submit_payload(requested)[0]["workload_type"] == "fine-tuning"
+
+
+def test_invalid_workload_type_is_rejected():
+    with pytest.raises(ValueError, match="workload_type must be one of"):
+        parse_workload_goal(json.dumps(workload(workload_type="interactive")))
+
+
+def test_input_script_and_expected_artifacts_are_forwarded():
+    requested = parse_workload_goal(
+        json.dumps(
+            workload(
+                input_files=[{"input_id": "inp_audio123"}],
+                script_files=["inp_script123"],
+                expected_artifacts=["/workspace/artifacts/transcript.txt"],
+            )
+        )
     )
+    payload = build_submit_payload(requested)[0]
+    assert payload["input_files"] == [{"input_id": "inp_audio123"}]
+    assert payload["script_files"] == [{"input_id": "inp_script123"}]
+    assert payload["expected_artifacts"] == ["/workspace/artifacts/transcript.txt"]
 
-    first = asyncio.create_task(agent.handle_project_message(approval))
-    await submit_started.wait()
-    await agent.handle_project_message(approval)
-    release_submit.set()
-    await first
-    await asyncio.sleep(0)
 
-    fake.submit_job.assert_awaited_once_with(workload())
+@pytest.mark.parametrize(
+    "bad",
+    [
+        {"input_files": [{"local_path": "/etc/passwd"}]},
+        {"script_files": [{"input_id": "../../secret"}]},
+        {"expected_artifacts": ["/tmp/output.txt"]},
+    ],
+)
+def test_arbitrary_local_paths_and_invalid_references_are_rejected(bad):
+    with pytest.raises(ValueError):
+        parse_workload_goal(json.dumps(workload(**bad)))
+
+
+def test_environment_references_resolve_only_for_submission(monkeypatch):
+    monkeypatch.setenv("MODEL_TOKEN", "secret-value")
+    requested = parse_workload_goal(json.dumps(workload(environment_from_env={"MODEL_TOKEN": "MODEL_TOKEN"})))
+    assert "environment" not in build_estimate_payload(requested)
+    payload, secrets = build_submit_payload(requested)
+    assert payload["environment"] == {"MODEL_TOKEN": "secret-value"}
+    assert secrets == ["secret-value"]
+
+
+def test_missing_environment_reference_blocks_submission(monkeypatch):
+    monkeypatch.delenv("MISSING_TOKEN", raising=False)
+    requested = parse_workload_goal(json.dumps(workload(environment_from_env={"MODEL_TOKEN": "MISSING_TOKEN"})))
+    with pytest.raises(ValueError, match="MISSING_TOKEN"):
+        build_submit_payload(requested)
+
+
+def test_callback_auth_token_is_environment_only(monkeypatch):
+    monkeypatch.setenv("CALLBACK_TOKEN", "callback-secret")
+    requested = parse_workload_goal(
+        json.dumps(
+            workload(
+                callback={
+                    "url": "https://example.test/hooks/jungle",
+                    "metadata": {"project": "demo"},
+                    "auth_token_from_env": "CALLBACK_TOKEN",
+                }
+            )
+        )
+    )
+    estimate = build_estimate_payload(requested)
+    assert "auth_token" not in json.dumps(estimate)
+    payload, secrets = build_submit_payload(requested)
+    assert payload["callback"]["auth_token"] == "callback-secret"
+    assert secrets == ["callback-secret"]
+
+
+def test_literal_secrets_and_secret_metadata_are_rejected():
+    with pytest.raises(ValueError, match="must not contain"):
+        parse_workload_goal(json.dumps(workload(command=["curl", "-H", "Bearer secret"])))
+    with pytest.raises(ValueError, match="secret-like"):
+        parse_workload_goal(json.dumps(workload(metadata={"api_token": "value"})))
+
+
+def test_supported_resource_routing_and_timeout_fields_are_forwarded():
+    requested = parse_workload_goal(
+        json.dumps(
+            workload(
+                gpu_required=True,
+                gpu_count=1,
+                gpu_class="datacenter",
+                gpu_type="A100",
+                min_vram_gb=40,
+                region_preference="us-east",
+                region_mode="strict",
+                timeout_seconds=600,
+                precision="bf16",
+                disk_gb=50,
+            )
+        )
+    )
+    payload = build_submit_payload(requested)[0]
+    assert payload["gpu_required"] is True
+    assert payload["gpu_type"] == "A100"
+    assert payload["timeout_seconds"] == 600
+
+
+def test_constraints_reject_unverified_fields():
+    with pytest.raises(ValueError, match="Unsupported constraint fields"):
+        parse_workload_goal(json.dumps(workload(constraints={"provider": "runpod"})))
 
 
 @pytest.mark.asyncio
-async def test_status_polling_posts_updates_and_completes():
-    fake = FakeJungleGridClient()
-    fake.get_job = AsyncMock(
-        side_effect=[
-            {"job_id": "job_123", "status": "running"},
-            {"job_id": "job_123", "status": "completed"},
-        ]
+async def test_malformed_approval_posts_rejection():
+    agent = agent_with_mocks()
+    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {"available": True})
+    await agent.handle_project_message(
+        context(
+            "project.notification.message_received",
+            {
+                "project_id": "project-1",
+                "sender_id": "human:user",
+                "content": {"text": " APPROVE estimate-1"},
+            },
+        )
     )
-    agent = agent_with_mocks(fake)
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123", last_status="queued")
+    assert any("Approval rejected" in text for text in message_texts(agent))
 
-    await agent._monitor(execution)
 
-    texts = [call.kwargs["content"]["text"] for call in agent.project_adapter.send_project_message.await_args_list]
-    assert any("`running`" in text for text in texts)
-    assert any("`completed`" in text for text in texts)
-    agent.project_adapter.complete_project.assert_awaited_once()
+def test_estimate_can_submit_honors_screening_and_availability():
+    assert estimate_can_submit({"available": True, "screening": {"can_submit": True}})
+    assert not estimate_can_submit({"available": False})
+    assert not estimate_can_submit({"screening": {"can_submit": False}})
 
 
 @pytest.mark.asyncio
-async def test_failed_workload_stops_project():
+async def test_status_changes_are_deduplicated():
     fake = FakeJungleGridClient()
+    running = {
+        "job_id": "job_123",
+        "status": "running",
+        "execution_phase": "executing",
+        "phase_started_at": "2026-06-11T00:00:00Z",
+    }
+    fake.get_job.side_effect = [running, running, {"job_id": "job_123", "status": "completed"}]
     agent = agent_with_mocks(fake)
-    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    execution = ProjectExecution(
+        "project-1", workload(), "estimate-1", {}, job_id="job_123", submission_state="submitted"
+    )
+    await agent._monitor(execution)
+    assert sum("`running`" in text for text in message_texts(agent)) == 1
 
-    await agent._finalize(execution, {"job_id": "job_123", "status": "failed"})
 
-    agent.project_adapter.stop_project.assert_awaited_once()
-    agent.project_adapter.complete_project.assert_not_awaited()
+@pytest.mark.asyncio
+async def test_lifecycle_endpoint_and_event_deduplication():
+    fake = FakeJungleGridClient()
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    await agent._collect_events(execution)
+    await agent._collect_events(execution)
+    fake.get_job_events.assert_awaited_with("job_123")
+    assert len(execution.events) == 1
+    assert sum("Job completed" in text for text in message_texts(agent)) == 1
 
 
 @pytest.mark.asyncio
-async def test_logs_and_artifacts_are_stored_in_project_artifact():
+async def test_empty_workload_logs_during_startup_do_not_fail():
     fake = FakeJungleGridClient()
+    fake.get_job_logs.return_value = {"items": [], "next_cursor": None}
     agent = agent_with_mocks(fake)
     execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    await agent._collect_logs(execution)
+    assert execution.logs == []
+    agent.project_adapter.stop_project.assert_not_awaited()
 
-    await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
 
-    fake.get_job_runtime.assert_awaited_once_with("job_123")
-    fake.get_job_logs.assert_awaited_once_with("job_123")
-    fake.list_artifacts.assert_awaited_once_with("job_123")
-    fake.get_artifact.assert_awaited_once_with("job_123", "artifact_1")
-    artifact_call = agent.project_adapter.set_project_artifact.await_args
-    assert artifact_call.kwargs["key"] == "jungle_grid_result"
-    assert "output.json" in artifact_call.kwargs["value"]
-    assert "stdout_tail" in artifact_call.kwargs["value"]
-    assert "https://example.test/file" not in artifact_call.kwargs["value"]
-    assert "[REDACTED]" in artifact_call.kwargs["value"]
+@pytest.mark.asyncio
+async def test_log_pagination_and_bounded_output():
+    fake = FakeJungleGridClient()
+    fake.get_job_logs.side_effect = [
+        {"items": [{"message": "first"}], "next_cursor": "cursor-1"},
+        {"items": [{"message": f"line-{index}"} for index in range(250)], "next_cursor": None},
+    ]
+    agent = agent_with_mocks(fake)
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    await agent._collect_logs(execution)
+    await agent._collect_logs(execution)
+    assert fake.get_job_logs.await_args_list[1].kwargs["cursor"] == "cursor-1"
+    assert len(execution.logs) == 200
 
 
 @pytest.mark.asyncio
-async def test_resolved_environment_values_are_redacted_from_results(monkeypatch):
-    monkeypatch.setenv("MODEL_TOKEN", "secret-value")
+async def test_runtime_unavailable_is_nonfatal_and_artifacts_have_no_signed_url():
     fake = FakeJungleGridClient()
-    fake.get_job_logs = AsyncMock(return_value={"items": [{"message": "token=secret-value"}]})
+    fake.get_job_runtime.side_effect = JungleGridError("NOT_FOUND", "not ready", 404)
+    fake.list_artifacts.return_value = {
+        "artifacts": [
+            {
+                "artifact_id": "artifact_1",
+                "filename": "output.json",
+                "download_url": "https://storage.example/file?signature=secret",
+            }
+        ]
+    }
     agent = agent_with_mocks(fake)
-    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MODEL_TOKEN"}}
-    execution = ProjectExecution(
-        "project-1",
-        requested,
-        "estimate-1",
-        {},
-        job_id="job_123",
-        submit_payload=build_submit_payload(requested),
-        secret_values=["secret-value"],
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
+    result_call = next(
+        call
+        for call in agent.project_adapter.set_project_artifact.await_args_list
+        if call.kwargs["key"] == "jungle_grid_result"
     )
+    value = result_call.kwargs["value"]
+    assert "Runtime details are not available" in value
+    assert "https://storage.example" not in value
+    assert "signature=secret" not in value
+    fake.get_artifact.assert_not_awaited()
+    agent.project_adapter.complete_project.assert_awaited_once()
 
-    await agent._finalize(execution, {"job_id": "job_123", "status": "completed"})
 
-    artifact_value = agent.project_adapter.set_project_artifact.await_args.kwargs["value"]
-    assert "secret-value" not in artifact_value
-    assert "[REDACTED]" in artifact_value
+@pytest.mark.asyncio
+@pytest.mark.parametrize("status", ["failed", "cancelled"])
+async def test_failed_or_cancelled_job_stops_project(status):
+    agent = agent_with_mocks()
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
+    await agent._finalize(execution, {"job_id": "job_123", "status": status})
+    agent.project_adapter.stop_project.assert_awaited_once()
+    agent.project_adapter.complete_project.assert_not_awaited()
 
 
 @pytest.mark.asyncio
-async def test_cancellation_uses_matching_job_id():
+@pytest.mark.parametrize(
+    ("sender", "command"),
+    [
+        ("agent:other", "CANCEL job_123"),
+        ("human:user", "CANCEL job_other"),
+        ("human:user", " CANCEL job_123"),
+        ("human:user", "CANCEL job_123\n"),
+    ],
+)
+async def test_unauthorized_mismatched_or_malformed_cancellation_is_rejected(sender, command):
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
     agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
-
     await agent.handle_project_message(
         context(
             "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": "CANCEL job_123"}},
+            {"project_id": "project-1", "sender_id": sender, "content": {"text": command}},
         )
     )
-
-    fake.cancel_job.assert_awaited_once_with("job_123", "Requested from OpenAgents by human:user")
+    fake.cancel_job.assert_not_awaited()
 
 
 @pytest.mark.asyncio
-async def test_non_human_cancellation_is_rejected():
+async def test_duplicate_and_terminal_cancellation_are_safe():
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
-    agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
-
-    await agent.handle_project_message(
-        context(
-            "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "agent:other", "content": {"text": "CANCEL job_123"}},
-        )
+    execution = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123", cancel_requested=True)
+    agent.executions["project-1"] = execution
+    cancellation = context(
+        "project.notification.message_received",
+        {
+            "project_id": "project-1",
+            "sender_id": "human:user",
+            "content": {"text": "CANCEL job_123"},
+        },
     )
-
+    await agent.handle_project_message(cancellation)
+    execution.cancel_requested = False
+    execution.terminal = True
+    await agent.handle_project_message(cancellation)
     fake.cancel_job.assert_not_awaited()
 
 
 @pytest.mark.asyncio
-@pytest.mark.parametrize("command", ["CANCEL job_456", " CANCEL job_123", "CANCEL job_123\n"])
-async def test_cancellation_requires_exact_command(command):
+async def test_matching_human_cancellation_uses_recorded_job_only():
     fake = FakeJungleGridClient()
     agent = agent_with_mocks(fake)
     agent.executions["project-1"] = ProjectExecution("project-1", workload(), "estimate-1", {}, job_id="job_123")
-
     await agent.handle_project_message(
         context(
             "project.notification.message_received",
-            {"project_id": "project-1", "sender_id": "human:user", "content": {"text": command}},
+            {
+                "project_id": "project-1",
+                "sender_id": "human:user",
+                "content": {"text": "CANCEL job_123"},
+            },
         )
     )
-
-    fake.cancel_job.assert_not_awaited()
-
-
-@pytest.mark.asyncio
-async def test_missing_api_key_is_reported_without_network_call(monkeypatch):
-    monkeypatch.delenv("JUNGLE_GRID_API_KEY", raising=False)
-    client = JungleGridClient()
-    with pytest.raises(JungleGridError, match="JUNGLE_GRID_API_KEY is required"):
-        await client.estimate_job(workload())
-
-
-def test_invalid_workload_is_rejected():
-    with pytest.raises(ValueError, match="Missing required workload fields"):
-        parse_workload_goal('{"workload_type": "batch"}')
-
-
-def test_workload_requires_positive_model_size():
-    with pytest.raises(ValueError, match="model_size_gb"):
-        parse_workload_goal(json.dumps({**workload(), "model_size_gb": 0}))
-
-
-def test_estimate_payload_matches_current_draft_job_fields():
-    requested = {
-        **workload(),
-        "constraints": {"max_price_per_hour": 2.5, "preferred_gpu_family": "l4"},
-    }
-
-    assert build_estimate_payload(requested) == requested
-
-
-def test_workload_rejects_literal_credentials_and_secret_like_metadata():
-    with pytest.raises(ValueError, match="must not contain API keys"):
-        parse_workload_goal(json.dumps({**workload(), "command": "curl -H 'Bearer secret-value'"}))
-    with pytest.raises(ValueError, match="secret-like keys"):
-        parse_workload_goal(json.dumps({**workload(), "metadata": {"api_token": "secret-value"}}))
-
-
-def test_build_submit_payload_resolves_environment_only_at_submission(monkeypatch):
-    monkeypatch.setenv("MODEL_TOKEN", "secret-value")
-    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MODEL_TOKEN"}}
-
-    assert "environment_from_env" not in build_estimate_payload(requested)
-    assert build_submit_payload(requested)["environment"] == {"MODEL_TOKEN": "secret-value"}
-    assert public_workload(requested)["environment_from_env"] == {"MODEL_TOKEN": "MODEL_TOKEN"}
-
-
-def test_build_submit_payload_rejects_missing_local_environment(monkeypatch):
-    monkeypatch.delenv("MISSING_MODEL_TOKEN", raising=False)
-    requested = {**workload(), "environment_from_env": {"MODEL_TOKEN": "MISSING_MODEL_TOKEN"}}
-
-    with pytest.raises(ValueError, match="MISSING_MODEL_TOKEN"):
-        build_submit_payload(requested)
-
-
-def test_secret_redaction_removes_api_keys_and_bearer_tokens():
-    text = redact_sensitive("failed with Bearer abc123 and jg_super_secret", "jg_super_secret")
-    assert "abc123" not in text
-    assert "jg_super_secret" not in text
-    assert "[REDACTED]" in text
-
-
-def test_public_workload_redacts_metadata_values():
-    shared = public_workload({**workload(), "metadata": {"nested": {"value": "secret"}}})
-    assert shared["metadata"] == {"nested": "[REDACTED]"}
-    assert "secret" not in json.dumps(shared)
+    fake.cancel_job.assert_awaited_once_with("job_123", "Requested from OpenAgents by human:user")
 
 
-def test_project_data_redaction_removes_nested_workload_secrets():
-    result = sanitize_project_data(
-        {"logs": [{"message": "token=secret-value"}], "error": "Bearer test-api-key"},
-        ["secret-value", "test-api-key"],
+def test_redaction_removes_api_keys_environment_values_and_signed_urls():
+    safe = sanitize_project_data(
+        {
+            "message": "Bearer jg_test_api_key secret-value",
+            "download_url": "https://storage.example/file?signature=abc",
+            "authorization": "Bearer abc",
+        },
+        ["jg_test_api_key", "secret-value"],
     )
-    assert "secret-value" not in json.dumps(result)
-    assert "test-api-key" not in json.dumps(result)
+    encoded = json.dumps(safe)
+    assert "jg_test_api_key" not in encoded
+    assert "secret-value" not in encoded
+    assert "storage.example" not in encoded
+    assert encoded.count("[REDACTED]") >= 3
 
 
-def test_estimate_can_submit_honors_explicit_unavailability():
-    assert estimate_can_submit({"available": True, "can_submit": True})
-    assert not estimate_can_submit({"available": False})
-    assert not estimate_can_submit({"can_submit": False})
+def test_public_workload_hides_metadata_values():
+    shared = public_workload(workload(metadata={"customer": "private-value"}))
+    assert shared["metadata"] == {"customer": "[REDACTED]"}
 
 
-@pytest.mark.parametrize(
-    ("status", "label"),
-    [
-        ("submitted", "submitted"),
-        ("queued", "queued"),
-        ("assigned", "assigned (provisioning)"),
-        ("running", "running"),
-        ("completed", "completed"),
-        ("failed", "failed"),
-        ("rejected", "rejected"),
-        ("cancelled", "cancelled"),
-    ],
-)
-def test_lifecycle_labels(status, label):
-    assert lifecycle_label(status) == label
+@pytest.mark.asyncio
+async def test_missing_api_key_fails_before_network(monkeypatch):
+    monkeypatch.delenv("JUNGLE_GRID_API_KEY", raising=False)
+    with pytest.raises(JungleGridError, match="JUNGLE_GRID_API_KEY is required"):
+        await JungleGridClient().estimate_job(workload())
 
 
 class FakeResponse:
@@ -556,67 +704,116 @@ async def __aexit__(self, exc_type, exc, tb):
 
 
 @pytest.mark.asyncio
-async def test_invalid_jungle_grid_response(monkeypatch):
-    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
-    monkeypatch.setattr(MODULE.aiohttp, "ClientSession", lambda **kwargs: FakeSession(FakeResponse(200, "not-json")))
-    client = JungleGridClient()
-
-    with pytest.raises(JungleGridError, match="invalid JSON"):
+async def test_timeout_uses_bounded_retries_for_reads_only(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "jg_test_api_key")
+    monkeypatch.setattr(
+        MODULE.aiohttp,
+        "ClientSession",
+        lambda **kwargs: FakeSession(error=asyncio.TimeoutError()),
+    )
+    sleep = AsyncMock()
+    client = JungleGridClient(read_retries=2, retry_delay_seconds=0, sleep=sleep)
+    with pytest.raises(JungleGridError, match="timed out"):
         await client.get_job("job_123")
+    assert sleep.await_count == 2
+    sleep.reset_mock()
+    with pytest.raises(JungleGridError, match="timed out"):
+        await client.submit_job(workload())
+    sleep.assert_not_awaited()
 
 
 @pytest.mark.asyncio
-async def test_network_timeout_is_sanitized(monkeypatch):
-    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+async def test_malformed_json_response_is_handled(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "jg_test_api_key")
     monkeypatch.setattr(
         MODULE.aiohttp,
         "ClientSession",
-        lambda **kwargs: FakeSession(error=asyncio.TimeoutError()),
+        lambda **kwargs: FakeSession(FakeResponse(200, "not-json")),
     )
-    client = JungleGridClient()
-
-    with pytest.raises(JungleGridError, match="timed out"):
-        await client.get_job("job_123")
+    with pytest.raises(JungleGridError, match="invalid JSON"):
+        await JungleGridClient(read_retries=0).get_job("job_123")
 
 
 @pytest.mark.asyncio
-async def test_api_error_is_sanitized(monkeypatch):
-    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+async def test_api_error_code_and_message_are_sanitized(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "jg_test_api_key")
     body = json.dumps(
         {
             "error": {
                 "code": "provider_jg_private_backend",
-                "message": "Bearer test-api-key is not allowed",
+                "message": "Bearer jg_test_api_key is forbidden",
             }
         }
     )
-    monkeypatch.setattr(MODULE.aiohttp, "ClientSession", lambda **kwargs: FakeSession(FakeResponse(403, body)))
-    client = JungleGridClient()
-
+    monkeypatch.setattr(
+        MODULE.aiohttp,
+        "ClientSession",
+        lambda **kwargs: FakeSession(FakeResponse(403, body)),
+    )
     with pytest.raises(JungleGridError) as exc_info:
-        await client.get_job("job_123")
+        await JungleGridClient().get_job("job_123")
     assert "jg_private_backend" not in exc_info.value.code
-    assert "[REDACTED]" in exc_info.value.code
-    assert "test-api-key" not in str(exc_info.value)
-
+    assert "jg_test_api_key" not in str(exc_info.value)
 
-def test_client_uses_documented_rest_api_environment(monkeypatch):
-    monkeypatch.setenv("JUNGLE_GRID_API", "https://orchestrator.example.test/")
-    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
 
+def test_client_prefers_official_api_base_and_normalizes_slashes(monkeypatch):
+    monkeypatch.setenv("JUNGLEGRID_API_BASE", "https://official.example.test///")
+    monkeypatch.setenv("JUNGLE_GRID_API_URL", "https://legacy.example.test")
     client = JungleGridClient()
+    assert client.api_base == "https://official.example.test"
+
 
-    assert client.api_base == "https://orchestrator.example.test"
+def test_client_keeps_legacy_api_base_fallback(monkeypatch):
+    monkeypatch.delenv("JUNGLEGRID_API_BASE", raising=False)
+    monkeypatch.setenv("JUNGLE_GRID_API_URL", "https://legacy.example.test/")
+    assert JungleGridClient().api_base == "https://legacy.example.test"
 
 
 @pytest.mark.asyncio
-async def test_client_uses_documented_runtime_and_log_routes(monkeypatch):
-    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "test-api-key")
+async def test_client_uses_current_routes_and_log_pagination(monkeypatch):
+    monkeypatch.setenv("JUNGLE_GRID_API_KEY", "jg_test_api_key")
     client = JungleGridClient()
     client._request = AsyncMock(return_value={})
+    await client.estimate_job({})
+    await client.submit_job({})
+    await client.get_job("job 123")
+    await client.get_job_events("job 123")
+    await client.get_job_logs("job 123", limit=50, cursor="cursor-1")
+    await client.get_job_runtime("job 123")
+    await client.list_artifacts("job 123")
+    await client.get_artifact("job 123", "artifact 1")
+    await client.cancel_job("job 123", "reason")
+    paths = [call.args[1] for call in client._request.await_args_list]
+    assert paths == [
+        "/v1/mcp/jobs/estimate",
+        "/v1/mcp/jobs",
+        "/v1/mcp/jobs/job%20123",
+        "/v1/jobs/job%20123/events",
+        "/v1/mcp/jobs/job%20123/logs?limit=50&cursor=cursor-1",
+        "/v1/jobs/job%20123/runtime",
+        "/v1/mcp/jobs/job%20123/artifacts",
+        "/v1/mcp/jobs/job%20123/artifacts/artifact%201/download",
+        "/v1/mcp/jobs/job%20123/cancel",
+    ]
+
+
+def test_execution_state_never_persists_secret_values():
+    execution = ProjectExecution(
+        "project-1",
+        workload(environment_from_env={"TOKEN": "LOCAL_TOKEN"}),
+        "estimate-1",
+        {"available": True},
+        secret_values=["resolved-secret"],
+    )
+    assert "resolved-secret" not in json.dumps(execution.persisted())
+    assert execution.persisted()["workload"]["environment_from_env"] == {"TOKEN": "LOCAL_TOKEN"}
 
-    await client.get_job_runtime("job_123")
-    await client.get_job_logs("job_123")
 
-    assert client._request.await_args_list[0].args == ("GET", "/v1/jobs/job_123/runtime")
-    assert client._request.await_args_list[1].args == ("GET", "/v1/jobs/job_123/logs?tail=100")
+def test_state_artifact_name_is_stable():
+    assert STATE_ARTIFACT == "jungle_grid_execution_state"
+
+
+def test_redact_sensitive_handles_bearer_and_jungle_grid_keys():
+    text = redact_sensitive("Bearer abc and jg_super_secret")
+    assert "abc" not in text
+    assert "jg_super_secret" not in text

From 7d0bea42d3a43a162eca50cfc42e337ff99358f2 Mon Sep 17 00:00:00 2001
From: dejaguarkyng <deinvinciblekyng.1@gmail.com>
Date: Thu, 11 Jun 2026 15:18:19 +0000
Subject: [PATCH 5/5] docs: update jungle grid production demo

---
 .../IMPLEMENTATION_DECISION.md                | 153 +++++----
 .../09_jungle_grid_gpu_execution/README.md    | 317 +++++++++++-------
 2 files changed, 284 insertions(+), 186 deletions(-)

diff --git a/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
index 05857178a..64eba8a83 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
+++ b/sdk/demos/09_jungle_grid_gpu_execution/IMPLEMENTATION_DECISION.md
@@ -2,64 +2,97 @@
 
 ## Selected Extension Point
 
-This contribution is a runnable demo network with a Python `WorkerAgent`. The agent
-uses OpenAgents' project mod for the long-running workflow, project messages for
-estimate and lifecycle updates, and project artifacts for logs and Jungle Grid
-artifact metadata.
-
-Jungle Grid is an external agentic AI workload execution and GPU orchestration
-layer, not an OpenAgents transport, launcher agent type, or network mod. A demo
-keeps the integration provider-specific while showing a reusable OpenAgents
-pattern: an agent delegates asynchronous compute, waits for human approval
-before billable work, and returns results to a shared project.
-
-## Rejected Alternatives
-
-- **Launcher agent type:** Jungle Grid executes workloads; it is not an interactive
-  coding-agent runtime managed by the launcher.
-- **Core provider integration:** No OpenAgents core abstraction requires a
-  provider-specific compute backend.
-- **Jungle Grid mod:** The integration does not add network-wide event semantics or
-  shared infrastructure. Existing project events already cover the workflow.
-- **Hosted MCP entry:** Jungle Grid's hosted Streamable HTTP endpoint uses OAuth,
-  while local stdio uses an API key. The direct REST integration keeps approval
-  and project state inside OpenAgents without requiring an MCP auth change.
-- **Local stdio MCP dependency:** The Jungle Grid stdio MCP package is supported,
-  but a direct Python API client is easier to validate, test, and constrain around
-  mandatory human approval. It also avoids requiring Node.js for a Python demo.
-
-## Jungle Grid Contract Used
-
-The demo uses the documented public execution API:
-
-- `POST /v1/jobs/estimate`
-- `POST /v1/jobs`
-- `GET /v1/jobs/{job_id}`
+This contribution remains a runnable demo network with a deterministic Python
+`WorkerAgent`. It uses OpenAgents projects for assignment and lifecycle,
+project messages for human approval and meaningful status changes, and project
+artifacts for durable execution state and sanitized results.
+
+Jungle Grid is an external workload execution service, not an OpenAgents
+transport, launcher, credential type, or network mod. Keeping it as a demo makes
+the approval boundary and asynchronous project behavior explicit and testable.
+The agent calls REST directly because an MCP tool call would otherwise hide the
+project-state transition around billable submission.
+
+## Jungle Grid Contract
+
+The implementation was aligned against `Jungle-Grid/mcp-server` and the current
+orchestrator API implementation, not only the README:
+
+- `POST /v1/mcp/jobs/estimate`
+- `POST /v1/mcp/jobs`
+- `GET /v1/mcp/jobs/{job_id}`
+- `GET /v1/jobs/{job_id}/events`
+- `GET /v1/mcp/jobs/{job_id}/logs`
 - `GET /v1/jobs/{job_id}/runtime`
-- `GET /v1/jobs/{job_id}/logs`
-- `POST /v1/jobs/{job_id}/cancel`
-- `GET /v1/jobs/{job_id}/artifacts`
-- `POST /v1/jobs/{job_id}/artifacts/{artifact_id}/download`
-
-Authentication is a scoped server-side API key in `JUNGLE_GRID_API_KEY`; the
-REST base can be overridden with `JUNGLE_GRID_API`. The
-documented lifecycle includes `pending`, `queued`, `assigned`, `running`,
-`completed`, `failed`, `rejected`, and `cancelled`.
-
-The current REST request shape includes `model_size_gb`. Estimate responses
-describe classification, routing, capacity, rates, cost ranges, queue waits,
-start windows, warnings, and screening without starting compute. Managed
-workloads can publish regular files from `/workspace/artifacts`; temporary
-signed artifact download URLs are treated as secrets and are not stored in the
-OpenAgents project.
-
-Workload environment values are not accepted in project goals. A goal may use
-`environment_from_env` to reference variables available only in the executor
-process; those values are resolved after human approval and are excluded from
-the estimate request and project-visible output.
-
-## Contribution Workflow
-
-OpenAgents' contributing guide asks contributors to create an issue for feature
-suggestions before submitting a pull request. This demo should be proposed in an
-issue and held for maintainer direction before a PR is opened.
+- `POST /v1/mcp/jobs/{job_id}/cancel`
+- `GET /v1/mcp/jobs/{job_id}/artifacts`
+- `POST /v1/mcp/jobs/{job_id}/artifacts/{artifact_id}/download`
+
+The official API-base override is `JUNGLEGRID_API_BASE`.
+`JUNGLE_GRID_API_URL` and the older demo variable `JUNGLE_GRID_API` remain
+compatibility fallbacks. Trailing slashes are removed.
+
+The public workload types are `inference`, `training`, `fine_tuning`, and
+`batch`; `fine_tuning` is sent to REST as `fine-tuning`. The preferred command
+shape is an array. Legacy string `command` plus string-array `args` is combined
+in order before estimation and submission.
+
+## Uploaded Files
+
+The demo accepts previously uploaded Jungle Grid `input_id` values through
+`input_files` and `script_files`. This is the minimum safe file workflow:
+
+- IDs are validated locally and then verified by Jungle Grid during estimate or
+  submission.
+- No goal field can name an executor host path.
+- Upload URLs, completion tokens, and storage credentials never enter project
+  state.
+
+Uploading OpenAgents artifacts would require a separate authorization and
+byte-transfer design. It is intentionally outside this demo rather than
+allowing a project goal to read arbitrary local files.
+
+## Durable Idempotency
+
+`jungle_grid_execution_state` records the estimate ID, submission state,
+recorded job ID, cancellation state, status fingerprint, event IDs, and log
+cursor. The agent writes `submitting` before the non-idempotent submission call
+and writes the returned job ID immediately afterward.
+
+After restart:
+
+- a recorded job resumes monitoring;
+- a terminal project is not resubmitted;
+- a `submitting` state without a recorded job is not retried automatically,
+  because the current submission contract does not expose a verified
+  idempotency key;
+- duplicate approvals and cancellations are serialized by a per-project lock.
+
+This favors avoiding a duplicate billable job over guessing after an ambiguous
+network failure.
+
+## Security Decisions
+
+- Estimation cannot submit compute.
+- Submission requires exact `APPROVE <estimate-id>` from a `human:` identity.
+- Cancellation requires exact `CANCEL <job-id>` from a `human:` identity.
+- API and workload secrets are resolved from environment variables only.
+- Callback auth uses `callback.auth_token_from_env`; literal callback secrets
+  are not accepted.
+- Metadata with secret-like keys, Bearer tokens, API-key patterns, and signed
+  URLs are rejected or redacted.
+- Artifact download URLs are not requested during finalization. The client
+  method exists to match the API, but project state stores metadata only.
+- Automated tests mock all external calls.
+
+The committed `executors.password_hash` is a demo-only group credential. Its
+purpose is to establish actual runtime topology membership so project
+notifications reach the executor. It must be replaced for a shared deployment.
+
+## Deliberately Unsupported Goal Fields
+
+The current public MCP submission contract does not expose arbitrary
+host-file paths, CPU or memory sizing, provider pinning, or user-controlled
+retry policy. The demo does not invent those fields. It supports the verified
+GPU, region, priority, timeout, callback, routing, upload-reference, template,
+metadata, and expected-artifact fields accepted by the current API.
diff --git a/sdk/demos/09_jungle_grid_gpu_execution/README.md b/sdk/demos/09_jungle_grid_gpu_execution/README.md
index 599cf77ab..f5df32ad9 100644
--- a/sdk/demos/09_jungle_grid_gpu_execution/README.md
+++ b/sdk/demos/09_jungle_grid_gpu_execution/README.md
@@ -1,202 +1,267 @@
 # Jungle Grid GPU Execution Demo
 
-This demo shows an OpenAgents execution agent delegating long-running AI and GPU
-workloads to [Jungle Grid](https://junglegrid.dev), an agentic AI workload
-execution and GPU orchestration layer that classifies intent, resolves capacity,
-and places workloads without requiring agents to manage GPU servers.
+This demo delegates asynchronous GPU workloads from an OpenAgents project to
+[Jungle Grid](https://junglegrid.dev). A deterministic Python `WorkerAgent`
+estimates first, waits for exact human approval, submits once, then polls
+lifecycle events, status, logs, runtime details, and managed artifact metadata.
 
-The workflow fits OpenAgents because the workload is asynchronous and
-collaborative: an agent estimates the job, a human approves spending in the
-shared project, and the agent returns lifecycle updates, logs, and artifact
-metadata to the same workspace.
+```text
+Project goal
+→ estimate
+→ human approval
+→ optional input/script references
+→ submit
+→ lifecycle events and status
+→ workload logs
+→ runtime details
+→ managed artifacts
+```
+
+The demo calls REST directly so the human approval boundary and durable
+OpenAgents project state remain explicit and testable. It does not require an
+LLM or an MCP runtime dependency.
 
 ## Security And Billing Warning
 
-Jungle Grid jobs may consume credits or incur charges. The executor never submits
-a workload when a project starts. It requires an exact approval command from a
-human identity after posting the estimate. Keep API keys in environment variables
-and do not paste secrets into project goals, messages, logs, metadata, or
-committed files. Workloads that need environment values must use
-`environment_from_env`; the executor resolves those references only after human
-approval, immediately before submission.
+Jungle Grid jobs may consume credits or incur charges. Project creation only
+estimates. Billable submission requires this exact command from a verified
+human identity:
 
-## Prerequisites
+```text
+APPROVE <estimate-id>
+```
 
-- Python with the OpenAgents development package installed.
-- A Jungle Grid account and a scoped API key that can estimate, submit, read, and
-  cancel jobs.
-- A public container image suitable for the requested workload.
+Cancellation also requires an exact human command:
 
-## Environment Variables
+```text
+CANCEL <job-id>
+```
 
-- `JUNGLE_GRID_API_KEY` is required. The agent reads this server-side API key and
-  sends it only as a Bearer token to Jungle Grid.
-- `JUNGLE_GRID_API` optionally overrides the default REST API base,
-  `https://api.junglegrid.dev`.
-- Any workload-specific variables referenced by `environment_from_env` must also
-  be exported in the executor process. Their values are never placed in the
-  project goal or estimate request.
+Keep credentials in executor environment variables. Do not put secrets in
+goals, messages, metadata, logs, or committed files. The demo rejects literal
+API-key/Bearer patterns, resolves workload secrets only after approval, redacts
+shared output, never reads arbitrary host paths, and never stores temporary
+signed artifact URLs.
 
-## Setup
+## Prerequisites
+
+- OpenAgents development dependencies.
+- A scoped Jungle Grid API key with estimate, submit, read, logs, artifact, and
+  cancellation access.
+- A GPU-capable public container image or configured private-image credential.
+- Previously uploaded Jungle Grid input IDs for file-backed jobs.
 
-From the repository root, install OpenAgents with SDK and development
-dependencies so the network, agent, and test commands are available:
+Install the repository package and development tools:
 
 ```bash
 pip install -e ".[sdk,dev]"
 ```
 
-Export the Jungle Grid API key in the shell that will run the executor. This
-keeps the credential out of the repository and network configuration:
+## Environment Configuration
 
 ```bash
 export JUNGLE_GRID_API_KEY="jg_..."
+export JUNGLEGRID_API_BASE="https://api.junglegrid.dev"
+export JUNGLE_GRID_POLL_INTERVAL_SECONDS="10"
+export JUNGLE_GRID_MAX_POLL_FAILURES="3"
 ```
 
-## Run The Demo
-
-The current demo assumes exactly one executor. Run one
-`jungle-grid-executor` process so a project is estimated and submitted at most
-once.
+`JUNGLEGRID_API_BASE` is the current official API-base override.
+`JUNGLE_GRID_API_URL` and `JUNGLE_GRID_API` are compatibility fallbacks. The
+executor removes trailing slashes. Workload variables referenced by
+`environment_from_env` must be exported in the executor process.
 
-Start the OpenAgents network from this demo directory. The network enables the
-project mod and exposes the `Jungle Grid GPU Execution` project template:
+## Start The Network
 
 ```bash
 cd sdk/demos/09_jungle_grid_gpu_execution
 openagents network start network.yaml
 ```
 
-In a second terminal, start the deterministic Python executor. It does not need
-an LLM provider key:
+The network enables the project mod and restricts the template to the
+`executors` group. The committed group password hash is a demo-only credential;
+replace it before a shared deployment.
+
+## Start The Executor
 
 ```bash
 cd sdk/demos/09_jungle_grid_gpu_execution
 python agents/jungle_grid_executor.py
 ```
 
-The script connects with the password hash configured for the `executors`
-group. OpenAgents records that connection in
-`network.topology.agent_group_membership`, which is the runtime source used by
-the project mod. The optional `metadata.agents` list in an agent-group
-configuration does not assign runtime membership and is intentionally not used
-by this demo.
+The executor supplies the configured group password hash during
+`async_start`. OpenAgents therefore records it in
+`network.topology.agent_group_membership`; static metadata alone does not
+establish group membership. Run one executor for this demo.
+
+## Create A Project
+
+Open Studio at `http://localhost:8700/studio`, choose
+`Jungle Grid GPU Execution`, and provide a JSON goal.
+
+### Simple Command Job
 
-Open Studio at `http://localhost:8700/studio`, create a project with the
-`Jungle Grid GPU Execution` template, and use a JSON object as the project goal.
-For example:
+The preferred command representation is an array:
 
 ```json
 {
-  "name": "openagents-batch-demo",
-  "workload_type": "batch",
-  "image": "python:3.11-slim",
+  "name": "openagents-training-demo",
+  "workload_type": "training",
+  "image": "pytorch/pytorch:2.4.0-cuda12.1-cudnn9-runtime",
+  "command": ["python", "-c", "import torch; print(torch.cuda.is_available())"],
   "model_size_gb": 1,
+  "gpu_required": true,
+  "routing_mode": "cost"
+}
+```
+
+The original format remains compatible and is converted without reordering:
+
+```json
+{
+  "name": "legacy-command-demo",
+  "workload_type": "batch",
+  "image": "nvidia/cuda:12.2.0-base-ubuntu22.04",
   "command": "python",
-  "args": ["-c", "print('hello from Jungle Grid')"],
-  "optimize_for": "cost"
+  "args": ["-c", "print('hello')"]
+}
+```
+
+Accepted workload types are `inference`, `training`, `fine_tuning`, and
+`batch`.
+
+### File-Backed Job
+
+Upload files through Jungle Grid first, then use only the returned IDs:
+
+```json
+{
+  "name": "openagents-transcription",
+  "workload_type": "inference",
+  "image": "ghcr.io/example/whisper-runtime:cuda",
+  "command": [
+    "python",
+    "/workspace/scripts/transcribe.py",
+    "/workspace/inputs/audio.wav",
+    "/workspace/artifacts/transcript.txt"
+  ],
+  "script_files": [{"input_id": "inp_script123"}],
+  "input_files": [{"input_id": "inp_audio123"}],
+  "expected_artifacts": ["/workspace/artifacts/transcript.txt"]
 }
 ```
 
-The agent validates the request and calls the read-only
-`POST /v1/jobs/estimate` endpoint. Current estimates include workload
-classification, routing and capacity signals, hourly and total cost ranges,
-queue-wait ranges, estimated start windows, warnings, and screening details.
-The executor posts that structured estimate and stores it as project artifact
-`jungle_grid_estimate`. No compute has been submitted at this point.
+Inputs mount under `/workspace/inputs`, scripts under `/workspace/scripts`, and
+managed outputs belong under `/workspace/artifacts`. `local_path` and similar
+host-file fields are not supported.
 
-For a workload that needs a credential or other environment value, export it in
-the executor shell and reference only its local variable name in the goal:
+### Environment And Callback Secrets
 
 ```bash
 export MODEL_TOKEN="..."
+export CALLBACK_TOKEN="..."
 ```
 
 ```json
 {
-  "name": "openagents-inference-demo",
+  "name": "secure-inference",
   "workload_type": "inference",
-  "image": "example/model-server:latest",
-  "model_size_gb": 7,
-  "environment_from_env": {
-    "MODEL_TOKEN": "MODEL_TOKEN"
-  },
-  "optimize_for": "cost"
+  "image": "ghcr.io/example/model-runtime:cuda",
+  "environment_from_env": {"MODEL_TOKEN": "MODEL_TOKEN"},
+  "callback": {
+    "url": "https://example.com/hooks/jungle",
+    "metadata": {"source": "openagents"},
+    "auth_token_from_env": "CALLBACK_TOKEN"
+  }
 }
 ```
 
-The mapping key is the variable sent to the workload, and the mapping value is
-the local executor variable to resolve. Literal `environment` values, API keys,
-Bearer tokens, and secret-like metadata keys are rejected.
+Environment and callback token values are absent from estimates and are
+resolved only after approval.
 
-Review the estimate, then reply in the project with the exact command shown by
-the agent. Estimates that explicitly report `available: false` or
-`can_submit: false` cannot be approved:
+## Estimate And Approval
 
-```text
-APPROVE <estimate-id>
-```
+The executor calls `POST /v1/mcp/jobs/estimate`, stores a sanitized structured
+response in `jungle_grid_estimate`, and posts a short summary. It respects
+`screening.can_submit`, availability, warnings, fixes, blocked checks, routing,
+cost/rate ranges, duration, queue/start windows, and capacity fields returned by
+the API.
 
-After approval, the agent submits with `POST /v1/jobs`, polls
-`GET /v1/jobs/{job_id}`, and posts public lifecycle changes: pending, queued,
-assigned, running, completed, failed, rejected, or cancelled. On a terminal
-state it retrieves the runtime surface, the latest 100 stored log entries, and
-the managed artifact list. Regular files written by managed workloads under
-`/workspace/artifacts` are eligible for automatic upload.
+`screening.can_submit: true` does not prove immediate capacity.
+`capacity_status.immediate_capacity_confirmed` is the relevant signal. Approval
+is blocked when screening or availability explicitly rejects submission.
 
-Artifact download requests mint temporary signed URLs. The executor requests
-download metadata but redacts the URL before storing `jungle_grid_result`; do
-not log or share signed URLs.
+## Monitoring
 
-To cancel a submitted job, reply with the exact job ID:
+After approval the executor:
 
-```text
-CANCEL <job-id>
-```
+- polls `GET /v1/mcp/jobs/{job_id}` for status, execution phase, status message,
+  phase timing, delayed-start, scheduling, retry, failure, and completion data;
+- polls `GET /v1/jobs/{job_id}/events` separately for platform lifecycle events;
+- polls paginated `GET /v1/mcp/jobs/{job_id}/logs`;
+- reads `GET /v1/jobs/{job_id}/runtime` at finalization;
+- lists managed artifacts after terminal status.
 
-Cancellation is explicit and only applies when the job ID matches the project.
-Only a human identity can request cancellation. The agent reports cancellation
-failures without exposing the API key.
+Lifecycle names are not restricted to a local enum. Event IDs and log cursors
+prevent duplicates. Messages are posted only for meaningful state changes.
+Empty workload logs during scheduling, provisioning, input preparation, or
+container startup do not fail the project. This is polling, not true streaming.
 
-## Failure Behavior
+Shared event and log history is bounded to 200 entries each. API keys, Bearer
+tokens, resolved environment values, authorization fields, and signed URLs are
+redacted.
 
-Invalid workload JSON, missing required fields, missing API keys, timeouts,
-invalid Jungle Grid responses, and API errors are posted to the project in
-sanitized form. Failed, rejected, or cancelled jobs stop the OpenAgents project.
-Completed jobs complete the project.
+## Artifacts
 
-The API key needs `jobs:estimate`, `jobs:submit`, `jobs:read`, and `logs:read`
-capabilities for the complete flow.
+Regular files written under `/workspace/artifacts` are eligible for managed
+collection. `jungle_grid_result` contains sanitized job data, bounded lifecycle
+events, bounded logs, runtime details when available, and artifact IDs, names,
+paths, sizes, and content types returned by Jungle Grid.
 
-## Jungle Grid Interfaces
+The API can mint temporary artifact download URLs, but this demo intentionally
+does not request or store them. Downloading bytes into an OpenAgents artifact
+would require a separate size, authorization, and content-handling policy.
 
-This demo calls the REST API directly so OpenAgents can enforce project-based
-human approval. Jungle Grid also provides the `jungle` CLI, whose `submit`
-command estimates and asks for confirmation before queuing, and a hosted MCP
-endpoint at `https://mcp.junglegrid.dev/mcp`. Hosted MCP uses OAuth; local stdio
-MCP uses `JUNGLE_GRID_API_KEY`. The current MCP tools are `estimate_job`,
-`submit_job`, `list_jobs`, `get_job`, `get_job_logs`, `cancel_job`,
-`list_artifacts`, and `get_artifact`.
+## Cancellation And Failure
 
-## Tests
+Cancellation is accepted only for the job ID already recorded for that project.
+Unauthorized, mismatched, duplicate, and terminal-state cancellation requests
+do not call Jungle Grid.
 
-Run the focused mocked tests. They do not contact Jungle Grid or submit paid
-work:
+Safe GET requests use bounded retries with exponential backoff. Submission is
+never automatically retried because the current contract does not expose a
+verified idempotency mechanism. If the executor restarts after recording a job,
+it resumes monitoring. If it restarts with an uncertain `submitting` state and
+no job ID, it refuses to resubmit blindly.
 
-```bash
-pytest tests/agents/test_jungle_grid_executor.py
-```
+Completed jobs complete the OpenAgents project. Failed, rejected, and cancelled
+jobs stop it. Runtime details may be unavailable before assignment/startup and
+do not by themselves fail finalization.
 
-Run the repository formatter and linter checks used by the Python project:
+## Current Jungle Grid MCP Tools
+
+The current registry exposes:
+
+- `estimate_job`
+- `submit_job`
+- `upload_job_input`
+- `list_job_inputs`
+- `list_jobs`
+- `get_job`
+- `get_job_events`
+- `get_job_logs`
+- `cancel_job`
+- `list_artifacts`
+- `get_artifact`
+
+## Tests
+
+All external requests are mocked. Tests never require a Jungle Grid account,
+contact the live API, or submit paid work:
 
 ```bash
-ruff format --check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
+pytest tests/agents/test_jungle_grid_executor.py -q
 ruff check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
+ruff format --check sdk/demos/09_jungle_grid_gpu_execution tests/agents/test_jungle_grid_executor.py
+mypy --follow-untyped-imports sdk/demos/09_jungle_grid_gpu_execution/agents/jungle_grid_executor.py
 ```
-
-## Optional Live Estimate
-
-The normal demo performs a live estimate when a project starts, but it never
-automatically submits a job. Use a low-cost workload goal, review the estimate in
-the project, and do not send the approval command unless you explicitly intend
-to start billable compute.