Skip to content

feat: VPS launch-proof contract on the existing /v1/vm path#29

Merged
Svaag merged 2 commits into
mainfrom
hyrule-feature/ISSUE_HYRULE_CLOUD_28/hyrule-cloud
Jun 16, 2026
Merged

feat: VPS launch-proof contract on the existing /v1/vm path#29
Svaag merged 2 commits into
mainfrom
hyrule-feature/ISSUE_HYRULE_CLOUD_28/hyrule-cloud

Conversation

@hyrule-engineering-loop

Copy link
Copy Markdown
Contributor

Change class

app_feature

Repos touched

  • hyrule-cloud

Senior role reviews

  • network_architect: not approved
  • systems_engineer: approved
  • devops_netops: approved
  • security_auditor: not approved
  • finops_integrity: not approved
  • virtual_lab_chaos: not approved

Source-of-truth files consulted

  • hyrule-cloud:README.md

Validation gates run

Expected production impact

none

Rollback plan

Discard the generated feature worktree and branch; no production state was changed.

NOC handoff

  • handoff artifact: /var/lib/engineering-loop/runs/issue_hyrule_cloud_28/handoff/noc_handoff.json
  • rollback trigger: operator rejection, failed gates, or failed post-deploy checks
  • expected duration: none

Post-deploy checks

  • review graph state
  • run documented gates

Operator notes

Closes #28

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Reviewer Guide 🔍

(Review updated until commit 0949be2)

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

28 - Partially compliant

Compliant requirements:

  • Document customer/operator states in docs/vps-launch-proof-contract.md
  • Extend GET /v1/vm/{vm_id}/status with launch-proof fields
  • Add hyrule_cloud/services/launch_proof.py for contract logic
  • Add tests covering quote→payment_required→paid→provisioning→provisioned and failed→safe-message path
  • No generic payment-intent engine / billing / subscriptions
  • Default behavior = controlled simulation; real infra only behind HCP_LAUNCH_PROOF_REAL_XCPNG=1
  • Do not modify app pins, workflows, secrets, generated files

Non-compliant requirements:

  • scripts/smoke/vps_launch_proof.py not present in the diff

Requires further human verification:

  • uv run ruff check hyrule_cloud tests, uv run mypy hyrule_cloud, uv run pytest -q all green (cannot verify without running the gates)
  • Keep existing blocked-port policy and paid-VM cap enforced (cannot verify from diff alone)
⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🏅 Score: 78
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Payment Status Logic Bug

In build_launch_proof, when quote_row is not None and its status is neither "created" nor "consumed", the code falls through to payment_status = PaymentStatus.PAID. This means an unexpected quote status (e.g., "expired" or "cancelled") is silently treated as paid, which could allow a VM to appear paid when the underlying quote is invalid. A realistic scenario: a quote expires before the VM is created, but the VM row still references it; the status endpoint would report paid instead of surfacing the inconsistency.

if quote_row is not None:
    q_status = _safe_getattr(quote_row, "status", "")
    if q_status == "created":
        payment_status = PaymentStatus.PAYMENT_REQUIRED
    elif q_status == "consumed":
        payment_status = PaymentStatus.PAID
    else:
        payment_status = PaymentStatus.PAID
Missing Input Validation

The function build_launch_proof accepts vm_row as a plain object and uses _safe_getattr to access attributes. If vm_row is not a VMRow (e.g., a dict or None), the function silently returns default values (empty strings, False, etc.) rather than raising an error. This masks programming errors and could lead to incorrect launch-proof status being returned to customers. A realistic scenario: a refactoring changes the return type of get_vm, and the status endpoint silently returns all-default launch-proof fields instead of failing loudly.

def build_launch_proof(
    vm_row: object,
    *,
    quote_row: object | None = None,
) -> dict[str, object]:
Simulation Bypasses Payment Verification

The _simulate_provisioning method unconditionally sets the VM to READY with a fake IPv6 and marks dns_aaaa_verified=True and ssh_smoke_status="passed". This bypasses any payment verification or quote consumption checks that the real provisioning path would enforce. In simulation mode, a VM created without payment (e.g., via a direct DB insert or a bug in the create flow) would still appear as fully provisioned with all launch-proof fields set to success. This undermines the launch-proof contract's integrity even in simulation.

async def _simulate_provisioning(self, vm_id: str) -> None:
    """Controlled simulation of provisioning (issue #28).

    Skips XCP-NG, DNS, and Openprovider. Sets a fake IPv6 and flips
    the VM to READY after a short delay so the launch-proof contract
    can be exercised end-to-end without touching real infra.
    """
    import random

    log.info("provision_simulate_start", vm_id=vm_id)

    # Simulate brief provisioning work
    await asyncio.sleep(0.1)

    fake_ipv6 = f"2001:db8::{random.randint(0x1000, 0x9999):04x}"

    async with self.db() as session:
        row = await session.get(VMRow, vm_id)
        if not row:
            return
        row.ipv6 = fake_ipv6
        row.status = VMStatus.READY
        row.provisioned_at = _now()
        meta = row.metadata_ or {}
        lp = meta.get("launch_proof", {})
        lp["dns_aaaa_verified"] = True
        lp["ssh_smoke_status"] = "passed"
        meta["launch_proof"] = lp
        row.metadata_ = meta
        await session.commit()

    log.info("provision_simulate_complete", vm_id=vm_id, ipv6=fake_ipv6)

@github-actions

github-actions Bot commented Jun 16, 2026

Copy link
Copy Markdown

PR Code Suggestions ✨

Latest suggestions up to 0949be2
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Security
Default payment status to required

The payment status logic defaults to PAID for any unknown quote status. This could
mask a bug where a new quote status is introduced but not handled. It would be safer
to default to PAYMENT_REQUIRED for unknown quote statuses, as this is the more
conservative state that would not incorrectly indicate payment was received. This
prevents potential security issues where a VM could be considered paid when it is
not.

hyrule_cloud/services/launch_proof.py [41-49]

 # --- Payment status ---
 payment_status: PaymentStatus
 if quote_row is not None:
     q_status = _safe_getattr(quote_row, "status", "")
     if q_status == "created":
         payment_status = PaymentStatus.PAYMENT_REQUIRED
     elif q_status == "consumed":
         payment_status = PaymentStatus.PAID
     else:
-        payment_status = PaymentStatus.PAID
+        payment_status = PaymentStatus.PAYMENT_REQUIRED
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential security concern where an unknown quote status could incorrectly default to PAID. Changing the default to PAYMENT_REQUIRED is a safer, more conservative approach. This is a moderate-impact improvement that prevents potential payment bypass issues.

Medium
Possible issue
Handle invalid SSH smoke status

When explicit_ssh is not None, the code converts it to a string and then to
SSHSmokeStatus. If explicit_ssh contains an invalid value (e.g., "unknown"),
SSHSmokeStatus(str(explicit_ssh)) will raise a ValueError. This could crash the
endpoint. Use a try-except block to handle invalid values gracefully, falling back
to SSHSmokeStatus.NOT_RUN.

hyrule_cloud/services/launch_proof.py [91-100]

 # --- SSH smoke test ---
 ssh_smoke: SSHSmokeStatus
 explicit_ssh = lp_meta.get("ssh_smoke_status")
 if explicit_ssh is not None:
-    ssh_smoke = SSHSmokeStatus(str(explicit_ssh))
+    try:
+        ssh_smoke = SSHSmokeStatus(str(explicit_ssh))
+    except ValueError:
+        ssh_smoke = SSHSmokeStatus.NOT_RUN
 elif vm_status in (VMStatus.READY, VMStatus.RUNNING):
     ssh_smoke = SSHSmokeStatus.PASSED
 elif vm_status == VMStatus.FAILED:
     ssh_smoke = SSHSmokeStatus.FAILED
 else:
     ssh_smoke = SSHSmokeStatus.NOT_RUN
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly identifies that an invalid value in lp_meta["ssh_smoke_status"] could raise a ValueError and crash the endpoint. Adding a try-except block is a good defensive practice. The impact is moderate as it prevents a potential crash from malformed metadata.

Low
Handle None vm_row gracefully

The function uses safe_getattr with a default of None for metadata, then falls
back to an empty dict. However, if metadata_ is an empty dict, meta_raw will be {},
and lp_meta will be {}. This is correct. However, if metadata_ is None, the or {}
will make meta_raw an empty dict, and the isinstance check will pass. This is fine.
The real issue is that the function does not handle the case where vm_row itself is
None. If vm_row is None, the call to safe_getattr will raise an AttributeError
because None has no attribute metadata
. Add a guard clause to return a default
state if vm_row is None.

hyrule_cloud/services/launch_proof.py [31-38]

 def build_launch_proof(
     vm_row: object,
     *,
     quote_row: object | None = None,
 ) -> dict[str, object]:
     """Build the launch-proof contract fields from a VMRow and optional quote."""
+    if vm_row is None:
+        return {
+            "launch_proof_status": LaunchProofStatus.PROVISIONING,
+            "payment_status": PaymentStatus.NOT_REQUIRED,
+            "dns_aaaa_verified": False,
+            "ssh_smoke_status": SSHSmokeStatus.NOT_RUN,
+            "rollback_available": False,
+            "operator_message": None,
+            "customer_message": None,
+        }
     meta_raw = _safe_getattr(vm_row, "metadata_", None) or {}
     lp_meta = meta_raw.get("launch_proof", {}) if isinstance(meta_raw, dict) else {}
Suggestion importance[1-10]: 5

__

Why: The suggestion is valid and adds defensive programming by handling a None vm_row. However, in the current PR code, build_launch_proof is only called from get_vm_public_status where row is guaranteed to be non-None (a 404 is raised earlier). The impact is low, but the suggestion is correct and improves robustness.

Low

Previous suggestions

Suggestions up to commit b83824c
CategorySuggestion                                                                                                                                    Impact
General
Restructure rollback logic

The rollback_available variable is set to False after it may have been set to True
earlier in the function (e.g., when vm_status == VMStatus.FAILED). This logic is
correct, but the ordering is fragile. Consider restructuring the logic to set
rollback_available in a single, clear conditional block to avoid future bugs if the
order of checks changes.

hyrule_cloud/services/launch_proof.py [107-108]

+# --- Rollback availability ---
 if vm_status == VMStatus.DESTROYED:
     rollback_available = False
+else:
+    rollback_available = bool(
+        lp_meta.get("rollback_available", False)
+        or (vm_status == VMStatus.FAILED)
+    )
Suggestion importance[1-10]: 3

__

Why: The suggestion is a minor improvement for clarity by restructuring the conditional logic for rollback_available. However, the existing code is already clear and correct, and the proposed change does not fix a bug or add significant value. The score is low because it addresses code style rather than correctness or impact.

Low

Human-in-the-loop completion of the Engineering Loop's draft (issue #28):
- drop unused `sqlalchemy.update` import (ruff)
- coerce `_safe_getattr` results to str before VMStatus()/operator_message
  so mypy is satisfied (object -> str|None)

The loop authored the contract (models, routes, orchestrator, launch_proof
service, tests, docs); these three fixes make it green:
ruff clean, mypy clean (80 files), 189 pytest passed.

Co-Authored-By: hyrule-engineering-loop[bot] <noreply@as215932.net>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@Svaag Svaag marked this pull request as ready for review June 16, 2026 13:10
@Svaag Svaag merged commit 81e4316 into main Jun 16, 2026
4 checks passed
@github-actions

Copy link
Copy Markdown

Persistent review updated to latest commit 0949be2

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0949be2f1e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +171 to +173
if not use_real_provisioning():
await self._simulate_provisioning(vm_id)
return

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Default to real provisioning on paid creates

With this guard, every environment that does not set the new HCP_LAUNCH_PROOF_REAL_XCPNG=1 variable now bypasses the existing XCP-NG/DNS/Openprovider path and marks paid VMs ready via _simulate_provisioning instead. I searched the repo and the deployment/config docs only define the existing XCPNG_* settings, so a normal production deploy would accept payment but never create a real VM, returning a fake documentation IPv6 instead.

Useful? React with 👍 / 👎.

dns_aaaa_verified=lp["dns_aaaa_verified"],
ssh_smoke_status=lp["ssh_smoke_status"],
rollback_available=lp["rollback_available"],
operator_message=lp["operator_message"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep operator errors out of the public status response

GET /v1/vm/{id}/status is explicitly unauthenticated/public, but this now serializes operator_message, and build_launch_proof populates it from row.error for failed VMs. In the failure scenario covered by the new tests, anyone with the VM id can read provider-internal errors such as XCP-NG/RPC details and UUIDs from the public status page, defeating the existing sanitized status contract; this field needs to be omitted here or only returned from an authenticated operator route.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: VPS launch-proof contract on the existing /v1/vm path

1 participant