feat: add AS215932 extmon and diagnostics ops support#213
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5a5d7daec4
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| resolve_public(parsed.hostname or "") | ||
| req = urllib.request.Request(url, headers={"User-Agent": "AS215932-extmon-diag/1.0"}) | ||
| with urllib.request.urlopen(req, timeout=timeout) as resp: |
There was a problem hiding this comment.
Block redirects to private addresses
When a diagnostics request probes a public URL that returns a 30x to loopback/link-local/private infrastructure (for example metadata or 127.0.0.1), resolve_public only validates the original hostname, but urllib.request.urlopen follows redirects and resolves the redirected target itself. That bypasses the non-public-address guard and can return a body sample from internal services; disable redirects or re-validate each redirect target before fetching.
Useful? React with 👍 / 👎.
| summary: 'BGP data source stale: {% raw %}{{ $labels.source }}{% endraw %}' | ||
|
|
||
| - alert: BGPalerterCriticalEvent | ||
| expr: increase(bgpalerter_alerts_total{severity="critical"}[10m]) > 0 |
There was a problem hiding this comment.
Preserve the first BGPalerter event
For the first critical BGPalerter webhook after the agent starts, or for any new channel/type labelset, the counter series first appears at 1; Prometheus cannot compute a positive increase() without a prior sample, and later samples remain at 1. This misses the one-shot hijack/withdrawal notification until a second matching event increments the same series, so initialize counters at 0 or alert from an event timestamp/gauge instead.
Useful? React with 👍 / 👎.
|
|
||
| # BGP router table snapshots for paid Hyrule Cloud data products. | ||
| hyrule_mcp_bgp_snapshot_dir: "{{ hyrule_mcp_state_dir }}/bgp-snapshots" | ||
| hyrule_mcp_bgp_snapshot_ingest_url: "{{ lookup('ansible.builtin.env', 'HYRULE_BGP_INGEST_URL') | default('https://cloud.hyrule.host/v1/internal/bgp') }}" |
There was a problem hiding this comment.
Use the default ingest URL when env is unset
When HYRULE_BGP_INGEST_URL is not exported, Ansible's env lookup returns an empty string, and default(...) does not replace falsey values unless the second argument is true. The rendered bgp-router-snapshot.env therefore gets HYRULE_BGP_INGEST_URL= instead of the intended cloud endpoint, so the hourly snapshot job will not upload unless every deploy environment explicitly sets the URL.
Useful? React with 👍 / 👎.
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
|
Live extmon VPS has been bootstrapped and this PR branch now points inventory at it. Provisioned host:
Validated after inventory update: Still blocked before
|
eed8734 to
e47057f
Compare
Summary
This PR contains the Network Operations side of the AS215932 / Agentic ISP Support rollout.
It adds operational support for the new Hyrule Cloud BGP/network-intelligence platform and exposes pricing knobs for the new agentic diagnostics APIs.
Major pieces:
extmon-bgp-agentextmon-diag-agentv2.0.1with SHA256Review notes
This is paired with:
hyrule-cloudPR: x402 Agentic ISP Support API/Skill implementationhyrule-mcpPR: AS215932 router snapshot collectorThe important design constraint remains:
extmonis outside AS215932 and must not hold router SSH keys. Router snapshots stay NOC-side.Validation
After rebasing on current
origin/main:Previous local validation before the branch split also covered extmon/noc/cloud syntax checks and extmon render/agent py_compile.