feat: add AS215932 extmon and diagnostics ops support by Svaag · Pull Request #213 · AS215932/network-operations

Svaag · 2026-06-14T00:43:04Z

Summary

This PR contains the Network Operations side of the AS215932 / Agentic ISP Support rollout.

It adds operational support for the new Hyrule Cloud BGP/network-intelligence platform and exposes pricing knobs for the new agentic diagnostics APIs.

Major pieces:

extmon role expansion for out-of-AS215932 monitoring
- Routinator
- BGPalerter
- BGPStream/PyBGPStream bootstrap
- extmon-bgp-agent
- extmon-diag-agent
- Prometheus scrape jobs and alert rules
- direct Discord alerting plus best-effort NOC webhook
BGPalerter pinned to v2.0.1 with SHA256
extmon rendered service/config templates
NOC-side AS215932 router-table snapshot timer/service templates
Hyrule Cloud runtime/deployment support for BGP data dir and ingest token
Vault/static env template price knobs for:
- BGP/IP/DNS/RDAP/WHOIS/MX/Mail
- Web/TLS
- Path diagnostics
- Port checks
- NAT/CGNAT
- Threat reputation
- VoIP/SIP
- Speedtest
docs updates for extmon and network flows

Review notes

This is paired with:

hyrule-cloud PR: x402 Agentic ISP Support API/Skill implementation
hyrule-mcp PR: AS215932 router snapshot collector

The important design constraint remains: extmon is outside AS215932 and must not hold router SSH keys. Router snapshots stay NOC-side.

Validation

After rebasing on current origin/main:

cd network-operations && ANSIBLE_ROLES_PATH=ansible/roles ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/cloud.yml --syntax-check
playbook: ansible/playbooks/cloud.yml

Previous local validation before the branch split also covered extmon/noc/cloud syntax checks and extmon render/agent py_compile.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5a5d7daec4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-14T00:47:11Z

+    resolve_public(parsed.hostname or "")
+    req = urllib.request.Request(url, headers={"User-Agent": "AS215932-extmon-diag/1.0"})
+    with urllib.request.urlopen(req, timeout=timeout) as resp:


Block redirects to private addresses

When a diagnostics request probes a public URL that returns a 30x to loopback/link-local/private infrastructure (for example metadata or 127.0.0.1), resolve_public only validates the original hostname, but urllib.request.urlopen follows redirects and resolves the redirected target itself. That bypasses the non-public-address guard and can return a body sample from internal services; disable redirects or re-validate each redirect target before fetching.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T00:47:11Z

+          summary: 'BGP data source stale: {% raw %}{{ $labels.source }}{% endraw %}'
+
+      - alert: BGPalerterCriticalEvent
+        expr: increase(bgpalerter_alerts_total{severity="critical"}[10m]) > 0


Preserve the first BGPalerter event

For the first critical BGPalerter webhook after the agent starts, or for any new channel/type labelset, the counter series first appears at 1; Prometheus cannot compute a positive increase() without a prior sample, and later samples remain at 1. This misses the one-shot hijack/withdrawal notification until a second matching event increments the same series, so initialize counters at 0 or alert from an event timestamp/gauge instead.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-06-14T00:47:11Z

+
+# BGP router table snapshots for paid Hyrule Cloud data products.
+hyrule_mcp_bgp_snapshot_dir: "{{ hyrule_mcp_state_dir }}/bgp-snapshots"
+hyrule_mcp_bgp_snapshot_ingest_url: "{{ lookup('ansible.builtin.env', 'HYRULE_BGP_INGEST_URL') | default('https://cloud.hyrule.host/v1/internal/bgp') }}"


Use the default ingest URL when env is unset

When HYRULE_BGP_INGEST_URL is not exported, Ansible's env lookup returns an empty string, and default(...) does not replace falsey values unless the second argument is true. The rendered bgp-router-snapshot.env therefore gets HYRULE_BGP_INGEST_URL= instead of the intended cloud endpoint, so the hourly snapshot job will not upload unless every deploy environment explicitly sets the URL.

Useful? React with 👍 / 👎.

github-actions · 2026-06-14T00:55:46Z

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 5 🔵🔵🔵🔵🔵
🏅 Score: 72
🧪 No relevant tests
🔒 Security concerns Yes. Multiple security concerns: (1) The CAIDA BGPStream bootstrap script is fetched via unauthenticated HTTP and piped to bash, creating a supply-chain compromise vector. (2) The diag agent's `resolve_public` function has a logic flaw that can bypass private-IP blocking under crafted DNS responses. (3) The diag agent's `ping`/`trace` probes pass user-controlled addresses to subprocess without sanitization, enabling argument injection if the token is empty or leaked. (4) The `EXTMON_DIAG_AGENT_TOKEN` defaults to empty string in the env file, meaning the agent is unprotected by default until the operator explicitly sets a token. (5) The `EXTMON_BGP_INGEST_TOKEN` and `EXTMON_BGP_CLOUDFLARE_API_TOKEN` are rendered into a world-readable env file (mode 0600 is set in the install task, but the generated file in `ansible/generated/extmon/bgp-agent.env` has no mode restriction — it inherits the default from the render task which is 0644). (6) The `HYRULE_BGP_INGEST_TOKEN` is rendered into `bgp-router-snapshot.env` with mode 0640, which is group-readable; the group is `hyrule_mcp_app_group` which may include non-root users.
⚡ Recommended focus areas for review Operational Footgun The CAIDA BGPStream repository bootstrap (lines 52-62) runs a remote shell script piped into bash via `curl -s "https://pkg.caida.org/os/$(lsb_release -si \| awk '{print tolower($0)}')/bootstrap.sh" \| bash`. This is a classic supply-chain risk: the script is fetched over plain HTTP (no HTTPS pinned), and the URL is dynamically constructed from the OS release string. If the CAIDA bootstrap endpoint is compromised or the DNS is hijacked, arbitrary code can execute as root during the apply phase. This bypasses the SHA-pinned binary pattern used elsewhere in the PR (e.g., BGPalerter). The `creates: /etc/apt/sources.list.d/caida.list` guard only prevents re-execution after the first run, not the initial compromise. - name: Apply — bootstrap CAIDA BGPStream repositories shell: \| set -euo pipefail curl -s "https://pkg.caida.org/os/$(lsb_release -si \| awk '{print tolower($0)}')/bootstrap.sh" \| bash args: executable: /bin/bash creates: /etc/apt/sources.list.d/caida.list when: - extmon_apply \| bool - extmon_install_bgpstream \| bool tags: [apply] Security Concern The `resolve_public` function (lines 25-39) has a logic flaw: if `blocked_ip(host)` raises a `ValueError`, the bare `except ValueError: pass` on line 29-30 silently swallows the exception and continues to `socket.getaddrinfo`. This means a caller can bypass the blocked-IP check by passing a hostname that is itself a private IP string (e.g., "127.0.0.1" or "10.0.0.1"). The function will skip the check, resolve the hostname (which is already an IP), and then the subsequent `blocked_ip(addr)` check on line 35 will catch it — but only if the address resolves to the same IP. If the hostname resolves to a different private IP (e.g., via a crafted DNS response), the check is effectively bypassed. In practice, an attacker who can control the `host` parameter to the `tcp` or `ping`/`trace` probes could target internal RFC1918 or loopback addresses, bypassing the intended restriction to public targets only. def resolve_public(host: str) -> list[str]: try: if blocked_ip(host): raise ValueError(f"blocked non-public target {host}") except ValueError: pass infos = socket.getaddrinfo(host, None, socket.AF_UNSPEC, socket.SOCK_STREAM) addrs = [] for info in infos: addr = info[4][0] if blocked_ip(addr): raise ValueError(f"blocked resolved non-public target {addr}") if addr not in addrs: addrs.append(addr) return addrs Security Concern The `run_probe` function (lines 62-76) allows `ping` and `trace` probes that execute subprocess commands with user-controlled `addr` values (line 70). While `resolve_public` is called first, the `addr` variable is the first resolved address from `getaddrinfo`. If an attacker can control the `host` payload and the DNS resolves to a crafted address containing shell metacharacters (e.g., `; rm -rf /`), the `subprocess.run` call on line 71 could execute arbitrary commands. Although `subprocess.run` with a list argument (not shell=True) prevents shell injection, the `addr` is passed as a positional argument to `ping`/`traceroute`, which could still allow argument injection (e.g., `-c` flag injection in ping). This is a realistic attack vector if the diag agent is exposed to untrusted clients (the token check on line 86 is optional — if `TOKEN` is empty, any POST is accepted). if kind in {"ping", "trace"}: host = str(payload["host"]) addr = resolve_public(host)[0] cmd = ["ping", "-c", "4", "-W", "2", addr] if kind == "ping" else ["traceroute", "-m", "20", addr] proc = subprocess.run(cmd, capture_output=True, text=True, timeout=20, check=False) return {"ok": proc.returncode == 0, "returncode": proc.returncode, "output": (proc.stdout or proc.stderr)[-4096:]}

github-actions · 2026-06-14T00:57:14Z

PR Code Suggestions ✨

Explore these optional code suggestions:

Category	Suggestion	Impact
Security	Fix SSRF bypass in resolve_public The `try-except` block silently swallows the `ValueError` when a blocked IP is detected, completely bypassing the SSRF protection. Remove the outer try-except so that any non-public target immediately raises an exception and prevents execution of the probe. ansible/roles/extmon/templates/diag-agent.py.j2 [25-30] def resolve_public(host: str) -> list[str]: - try: - if blocked_ip(host): - raise ValueError(f"blocked non-public target {host}") - except ValueError: - pass + if blocked_ip(host): + raise ValueError(f"blocked non-public target {host}") Suggestion importance[1-10]: 9 __ Why: The try-except silently swallows the ValueError when a blocked IP is detected, completely bypassing SSRF protection. Removing it ensures non-public targets are rejected immediately, which is a critical security fix.	High
Possible issue	Handle empty DNS resolution result `resolve_public` may return an empty list if no addresses are resolved, causing an unhandled `IndexError` that crashes the probe handler. Check the length of the list before indexing and raise a clear error if no addresses are available. ansible/roles/extmon/templates/diag-agent.py.j2 [42-45] def tcp(host: str, port: int, timeout: float = 10.0): ... - addr = resolve_public(host)[0] + addrs = resolve_public(host) + if not addrs: + raise ValueError(f"no public addresses resolved for {host}") + addr = addrs[0] Suggestion importance[1-10]: 6 __ Why: `resolve_public` may return an empty list, causing an unhandled `IndexError`. Adding a length check prevents a crash and provides a clear error message, improving robustness.	Low
General	Prevent polling loop from crashing An unexpected exception in `poll_once()` (e.g., a bug in the shared state lock) would terminate the polling thread entirely, stopping all metric collection. Wrap the call in a try-except to log any such error and continue the loop, ensuring the agent remains responsive. ansible/roles/extmon/templates/bgp-agent.py.j2 [139-142] def poll_loop() -> None: while True: - poll_once() + try: + poll_once() + except Exception: + pass # Prevent loop exit on unexpected error time.sleep(POLL_SECONDS) Suggestion importance[1-10]: 5 __ Why: An unexpected exception in `poll_once()` would terminate the polling thread, stopping all metric collection. Wrapping it in a try-except ensures the loop continues, though the individual poll functions already have error handling.	Low

Svaag · 2026-06-14T11:08:41Z

Live extmon VPS has been bootstrapped and this PR branch now points inventory at it.

Provisioned host:

IPv4: 45.32.179.134
IPv6: 2001:19f0:7402:0cd5:5400:06ff:fe40:7112
OS: Debian 13 / trixie
RAM: ~3.8GiB
Swap: ~7.7GiB
Hostname: extmon
SSH: key-based root access works with the AS215932 operator key
Personal account: svag created with AS215932 operator key and passwordless sudo
SSH hardening: password/kbd-interactive auth disabled; root login is key-only

Validated after inventory update:

ANSIBLE_CONFIG=ansible/ansible.cfg ansible -i ansible/inventory/hosts.yml extmon -m ping
extmon | SUCCESS

ANSIBLE_CONFIG=ansible/ansible.cfg ANSIBLE_ROLES_PATH=ansible/roles ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/extmon.yml --syntax-check
OK

ANSIBLE_CONFIG=ansible/ansible.cfg ANSIBLE_ROLES_PATH=ansible/roles ansible-playbook -i ansible/inventory/hosts.yml ansible/playbooks/extmon.yml --tags validate --connection=local --limit extmon
OK

Still blocked before extmon_apply=true full service deployment:

EXTMON_DISCORD_WEBHOOK_URL
EXTMON_BGP_CLOUDFLARE_API_TOKEN
EXTMON_BGP_INGEST_TOKEN
EXTMON_DIAG_AGENT_TOKEN
OVH expiry collector credentials if that collector should be live on first apply

Svaag requested a review from a team as a code owner June 14, 2026 00:43

chatgpt-codex-connector Bot reviewed Jun 14, 2026

View reviewed changes

github-actions Bot added Possible security concern Review effort 5/5 labels Jun 14, 2026

Svaag added 6 commits June 14, 2026 16:36

feat: add AS215932 network intelligence foundations

0c6b292

chore: expose agentic diagnostics pricing knobs

7aca420

chore: point extmon inventory at Vultr host

46f04f3

fix(extmon): make live BGP monitoring deployment converge

626493f

chore(extmon): refresh generated noc firewall

63c1338

fix: wire BGP ingest and keep native rails disabled

e47057f

Svaag force-pushed the chore/agentic-diagnostics-pricing-knobs branch from eed8734 to e47057f Compare June 14, 2026 14:45

fix: keep monero wallet rpc stopped when disabled

7f67a4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add AS215932 extmon and diagnostics ops support#213

feat: add AS215932 extmon and diagnostics ops support#213
Svaag wants to merge 7 commits into
mainfrom
chore/agentic-diagnostics-pricing-knobs

Svaag commented Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

github-actions Bot commented Jun 14, 2026

Uh oh!

Svaag commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Svaag commented Jun 14, 2026

Summary

Review notes

Validation

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 14, 2026

PR Reviewer Guide 🔍

Uh oh!

github-actions Bot commented Jun 14, 2026

PR Code Suggestions ✨

Uh oh!

Svaag commented Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant