Skip to content

Add FRR NETCONF/YANG audit foundation#222

Open
Svaag wants to merge 1 commit into
mainfrom
feat/netconf-yang-frr-adoption
Open

Add FRR NETCONF/YANG audit foundation#222
Svaag wants to merge 1 commit into
mainfrom
feat/netconf-yang-frr-adoption

Conversation

@Svaag

@Svaag Svaag commented Jun 14, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Document AS215932 NETCONF/YANG architecture and production write gates while keeping committed configs/<router>/frr.conf canonical.
  • Add read-only FRR YANG/NETCONF audit inventory, role, playbook, and scheduled/manual audit workflow.
  • Add normalized FRR semantic and policy-intent renderers with committed generated artifacts and parity/static tests across all four routers.
  • Add disabled-by-default NETCONF endpoint firewall scaffolding and a trusted-only Containerlab NETCONF/YANG lab job gated by ENABLE_NETCONF_YANG_TESTS.

Safety / production impact

  • No production router config is changed by this PR.
  • frr_netconf_endpoint_enabled and frr_netconf_write_enabled default to false for routers.
  • TCP/830 firewall rules only render if a future PR explicitly enables a host.
  • The new production audit workflow is read-only and writes only ignored runtime snapshots/artifacts.

Validation

  • scripts/ci/iac-static.sh
  • cd ansible && for playbook in playbooks/*.yml; do ansible-playbook "$playbook" --syntax-check >/dev/null || exit 1; done
  • python3 -m py_compile scripts/netops/frr_semantic.py scripts/netops/render_frr_policy.py tests/iac/containerlab/check_netconf_yang.py tests/iac/containerlab/netconf/netconf_smoke.py
  • bash -n scripts/ci/containerlab-netconf-yang-test.sh tests/iac/containerlab/netconf/entrypoint.sh tests/iac/containerlab/netconf/install-frr-yang-modules.sh
  • ANSIBLE_ROLES_PATH=ansible/roles ansible-lint ansible/playbooks/frr-yang.yml ansible/roles/frr_yang

Not run: scripts/ci/containerlab-netconf-yang-test.sh because it builds a lab-only FRR/sysrepo/Netopeer2 image and is intended for trusted CI/manual execution.

@Svaag Svaag requested a review from a team as a code owner June 14, 2026 18:50
@github-actions

Copy link
Copy Markdown
Contributor

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🏅 Score: 88
🧪 PR contains tests
🔒 Security concerns

Hardcoded NETCONF credentials in lab test:
tests/iac/containerlab/netconf/netconf_smoke.py contains NETCONF_USER = "netconf" and NETCONF_PASSWORD = "netconf". While scoped to a lab container, these are committed in plaintext. No production secrets are exposed. No other security concerns identified.

⚡ Recommended focus areas for review

Hardcoded Credentials

The NETCONF smoke test uses hardcoded credentials (NETCONF_USER="netconf", NETCONF_PASSWORD="netconf") on lines 15-16. While this is a lab-only test, these credentials are committed to the repository. If the lab topology or Dockerfile is ever reused outside of isolated CI (e.g., a developer runs it on a shared host), the credentials are trivially guessable. Consider sourcing these from environment variables or a CI secret, even for lab use.

NETCONF_USER = "netconf"
NETCONF_PASSWORD = "netconf"
Broad Exception Handling

The check_all_bgp_established() function catches all exceptions with except Exception (line 41) and continues, which can mask critical failures (e.g., a malformed JSON response from vtysh, a connection timeout, or a Docker exec failure). This could cause the test to pass with incomplete coverage or silently skip a router. Consider catching specific exceptions or re-raising after logging.

except Exception as exc:  # noqa: BLE001 - lab diagnostic path.
    errors.append(f"{node}: failed to query BGP summary: {exc}")
    continue
Potential Config Parsing Ambiguity

The _parse_static_route function (line 94) assumes the third token is the prefix and the fourth is the next-hop. For FRR static routes with optional keywords like table, distance, or tag, this parsing may produce incorrect structured output (e.g., a distance value could be misidentified as a next-hop). While the current configs may not trigger this, the parser will silently produce wrong data for any future route that uses these modifiers. Consider validating the token count and known keyword positions.

def _parse_static_route(line: str) -> dict[str, Any]:
    parts = _tokens(line)
    route: dict[str, Any] = {"raw": line}
    if len(parts) >= 3:
        route["afi"] = parts[0]
        route["prefix"] = parts[2]
    if len(parts) >= 4:
        route["next_hop"] = parts[3]
    if len(parts) >= 5:
        route["distance_or_table"] = " ".join(parts[4:])
    return route

@github-actions

Copy link
Copy Markdown
Contributor

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Prevent duplicate static route entries

Static routes are being appended to both data["static_routes"][vrf] and
data["vrfs"][current_vrf]["static_routes"] when inside a VRF context. This
duplicates the route entry in the output. Remove the second append to data["vrfs"]
to avoid duplication, as the VRF-specific static routes are already captured in
data["static_routes"].

scripts/netops/frr_semantic.py [332-338]

 if line.startswith("ipv6 route "):
     route = _parse_static_route(line)
     vrf = current_vrf or "default"
     data["static_routes"].setdefault(vrf, []).append(route)
-    if current_vrf:
-        data["vrfs"].setdefault(current_vrf, {"name": current_vrf, "static_routes": []})["static_routes"].append(route)
     continue
Suggestion importance[1-10]: 8

__

Why: The suggestion correctly identifies a real bug where static routes inside a VRF context are duplicated in the output data structure. This could cause incorrect semantic diffs and test failures. The fix is accurate and directly addresses the issue.

Medium
Fix neighbor enforce-first-as detection

The condition parts[3:] == ["enforce-first-as"] will only match if the line is
exactly no neighbor

enforce-first-as. However, FRR syntax allows no neighbor
enforce-first-as without additional arguments. The comparison parts[3:] creates a
list of remaining tokens, which will be ["enforce-first-as"] for the exact match,
but could be ["enforce-first-as", "something"] if extra arguments are present. Use
parts[3] == "enforce-first-as" to correctly detect the command regardless of
trailing arguments.

scripts/netops/frr_semantic.py [393-400]

 if line.startswith("no neighbor "):
     parts = _tokens(line)
     if len(parts) >= 4:
         address = parts[2]
         neighbor = current_bgp["neighbors"].setdefault(address, _empty_bgp_neighbor(address))
         neighbor["raw_commands"].append(line)
-        if parts[3:] == ["enforce-first-as"]:
+        if parts[3] == "enforce-first-as":
             neighbor["enforce_first_as"] = False
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential issue with the parts[3:] comparison, which could fail if extra arguments are present. However, in practice, no neighbor <address> enforce-first-as is a fixed command with no trailing arguments, so the current code works correctly. The improvement is minor but valid for robustness.

Medium
General
Handle neighbor shutdown command

The neighbor command parsing in the global BGP context does not handle neighbor


shutdown. When a neighbor is shut down, the shutdown keyword is the third token.
This command is currently ignored, which could lead to missing state in the semantic
output. Add handling for the shutdown command to set a shutdown flag on the
neighbor.

scripts/netops/frr_semantic.py [380-392]

 if line.startswith("neighbor "):
     parts = _tokens(line)
     if len(parts) >= 3:
         address = parts[1]
         neighbor = current_bgp["neighbors"].setdefault(address, _empty_bgp_neighbor(address))
         neighbor["raw_commands"].append(line)
         command = parts[2]
         if command == "remote-as" and len(parts) >= 4:
             neighbor["remote_as"] = int(parts[3])
         elif command == "description" and len(parts) >= 4:
             neighbor["description"] = " ".join(parts[3:])
         elif command == "update-source" and len(parts) >= 4:
             neighbor["update_source"] = " ".join(parts[3:])
+        elif command == "shutdown":
+            neighbor["shutdown"] = True
Suggestion importance[1-10]: 6

__

Why: The suggestion adds handling for the neighbor shutdown command, which is a valid FRR command. However, the PR's scope is to normalize FRR config into JSON for semantic diffing, and the shutdown state is not currently tracked. Adding this would be an enhancement, but it is not a bug fix or critical issue for the current use case.

Low
Make PF interface configurable

The PF rule is hardcoded to inet6 and the wg interface, which may not match all
deployment scenarios. Consider making the interface and address family configurable
via variables, or at minimum document that this rule only applies to WireGuard IPv6
traffic. This prevents silent rule ineffectiveness on non-WireGuard or IPv4-only
setups.

ansible/roles/firewall/templates/partials/_netconf_endpoint_pf.j2 [5-10]

 {% if frr_netconf_endpoint_enabled | default(false) | bool %}
 {% set netconf_sources = frr_netconf_allowed_sources_v6 | default([]) %}
+{% set netconf_if = frr_netconf_interface | default('wg') %}
 {% if netconf_sources | length > 0 %}
 # --- FRR NETCONF/YANG endpoint (disabled by default) ---
-pass in quick on wg inet6 proto tcp from { {{ netconf_sources | join(', ') }} } to any port = {{ frr_netconf_port | default(830) }} flags S/SA keep state label "NETCONF_RULE: FRR NETCONF/YANG approved sources"
+pass in quick on {{ netconf_if }} inet6 proto tcp from { {{ netconf_sources | join(', ') }} } to any port = {{ frr_netconf_port | default(830) }} flags S/SA keep state label "NETCONF_RULE: FRR NETCONF/YANG approved sources"
 
 {% endif %}
 {% endif %}
Suggestion importance[1-10]: 4

__

Why: The suggestion itself is logically sound, but the hardcoded wg interface is intentional for the current PR's WireGuard-based overlay network design, and making it configurable adds unnecessary complexity given the PR's scope. Additionally, the suggestion lacks a corresponding variable definition in the PR, making it a minor improvement at best. The score reflects the limited impact and contextual over-engineering.

Low

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f13a8cf8f4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +18 to +20
sys.path.insert(0, str(REPO / "scripts/netops"))
from frr_semantic import parse_frr_config, semantic_json # noqa: E402
from render_frr_policy import effective_policy, load_intent, policy_json, render_policy_conf # noqa: E402

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include netops scripts in IaC change detection

These new static tests import the parser/renderer from scripts/netops, but the iac-tests.yml change detector only treats scripts/ci/ as an IaC path. A PR that changes only scripts/netops/frr_semantic.py or scripts/netops/render_frr_policy.py will set iac_changed=false and skip static-iac/ansible-idempotency, so regressions in the semantic or policy artifact generation can merge without running the parity checks added here.

Useful? React with 👍 / 👎.

{% set netconf_sources = frr_netconf_allowed_sources_v6 | default([]) %}
{% if netconf_sources | length > 0 %}
# --- FRR NETCONF/YANG endpoint (disabled by default) ---
pass in quick on wg inet6 proto tcp from { {{ netconf_sources | join(', ') }} } to any port = {{ frr_netconf_port | default(830) }} flags S/SA keep state label "NETCONF_RULE: FRR NETCONF/YANG approved sources"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate NETCONF before the broad WireGuard pass

For FreeBSD routers with frr_netconf_endpoint_enabled=true, this allowlist does not actually constrain NETCONF access because pf.conf.j2 renders pass quick on wg all no state earlier, and quick stops evaluation before this source-restricted rule is reached. If the endpoint is enabled, any traffic arriving on a WireGuard interface can reach TCP/830 regardless of frr_netconf_allowed_sources_v6; the earlier WireGuard pass needs to be narrowed or ordered so port 830 is gated first.

Useful? React with 👍 / 👎.

@Svaag Svaag force-pushed the feat/netconf-yang-frr-adoption branch from ff44f07 to f13a8cf Compare June 14, 2026 23:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant