Skip to content

bgpd: prevent root node poisoning in prefix tree#21915

Draft
soumyar-roy wants to merge 1 commit into
FRRouting:masterfrom
soumyar-roy:soumya/bnc_crash
Draft

bgpd: prevent root node poisoning in prefix tree#21915
soumyar-roy wants to merge 1 commit into
FRRouting:masterfrom
soumyar-roy:soumya/bnc_crash

Conversation

@soumyar-roy
Copy link
Copy Markdown
Contributor

@soumyar-roy soumyar-roy commented May 11, 2026

When a BNC is first created via XCALLOC, resolved_prefix is {family=0, prefixlen=0}. If bgp_bnc_mark_nht_important fires before resolved_prefix is updated, it passes the zeroed prefix to bgp_afi_node_get which creates a {family=0, prefixlen=0} node in the radix trie, poisoning the table root.

Skip the old resolved_prefix lookup when family is still 0.

Signed-off-by: Soumya Roy souroy@nvidia.com

Steps:

  1. Two routers with one unnumbered BGP link, each advertising loopbacks
  2. Add a numbered eBGP peer between them using loopback IPs — the remote loopback is reachable via a BGP-learned route (from the unnumbered session)
  3. Since the remote loopback resolves via a BGP route, zebra sends NHT update with nhr->type = ZEBRA_ROUTE_BGP, which triggers bgp_bnc_mark_nht_important — but resolved_prefix is still {family=0} (XCALLOC zeroed) at this point, so it poisons the radix trie root
  4. Inject 0.0.0.0/0 from the remote peer — paths attach to the poisoned root via prefix_match

When a BNC is first created via XCALLOC, resolved_prefix is
{family=0, prefixlen=0}. If bgp_bnc_mark_nht_important fires
before resolved_prefix is updated, it passes the zeroed prefix
to bgp_afi_node_get which creates a {family=0, prefixlen=0}
node in the radix trie, poisoning the table root.

Skip the old resolved_prefix lookup when family is still 0.

Signed-off-by: Soumya Roy <souroy@nvidia.com>
@frrbot frrbot Bot added the bgp label May 11, 2026
@soumyar-roy soumyar-roy marked this pull request as draft May 11, 2026 15:53
@greptile-apps
Copy link
Copy Markdown

greptile-apps Bot commented May 11, 2026

Greptile Summary

This PR fixes a radix trie corruption in BGP's next-hop tracking (NHT) code. When a BNC (bgp_nexthop_cache) is first allocated via XCALLOC, its resolved_prefix is all-zeros (family=0). If bgp_bnc_mark_nht_important fires before resolved_prefix is populated, the old code unconditionally passed the zeroed prefix to bgp_afi_node_get, which created a spurious {family=0, prefixlen=0} node at the radix trie root.

  • Guard added: wraps the "unmark old resolved node" lookup in if (bnc->resolved_prefix.family != 0), so the root node is never created from an uninitialized prefix.
  • Scope: single-function, one-hunk change in bgpd/bgp_nht.c; no data structure or API changes.

Confidence Score: 4/5

Safe to merge; the fix correctly prevents root-node creation from an uninitialized prefix and does not regress the normal resolved-prefix lifecycle.

The guard is logically correct: resolved_prefix.family is only non-zero after an explicit prefix_copy at line 707, so skipping the trie lookup when it is zero avoids creating a spurious root node. The stale-flag scenario (flag set on nhr->prefix when nexthop_num==0 zeros resolved_prefix afterward) predates this PR — the old code also failed to clear that flag while additionally corrupting the root. Overall the change is minimal and targeted.

bgpd/bgp_nht.c — specifically the interaction between the nexthop_num==0 branch (which zeros resolved_prefix) and the flag-set path in bgp_bnc_mark_nht_important for that same call.

Important Files Changed

Filename Overview
bgpd/bgp_nht.c Adds a family != 0 guard around the old-resolved-prefix trie lookup in bgp_bnc_mark_nht_important to prevent root-node poisoning on first NHT update; fix is correct for the described race but see note on stale flag after nexthop invalidation.

Sequence Diagram

sequenceDiagram
    participant Z as Zebra
    participant NHT as bgp_process_nexthop_update
    participant MARK as bgp_bnc_mark_nht_important
    participant TRIE as BGP Radix Trie

    Z->>NHT: zapi_route update (nhr)
    NHT->>MARK: "bnc (resolved_prefix={family=0}), nhr"

    alt "BEFORE fix: family == 0, no guard"
        MARK->>TRIE: "bgp_afi_node_get(resolved_prefix={0,0})"
        TRIE-->>MARK: creates/returns root node POISONED
        MARK->>TRIE: UNSET_FLAG on root node
    end

    alt "AFTER fix: family == 0, guard skips old lookup"
        MARK->>MARK: "skip old-prefix cleanup (family==0)"
    end

    MARK->>TRIE: "bgp_afi_node_get(nhr->prefix) set BGP_NODE_NHT_RESOLVED_NODE"
    NHT->>NHT: "prefix_copy(resolved_prefix, nhr->prefix)"
    Note over NHT: resolved_prefix.family now non-zero

    Z->>NHT: next zapi_route update
    NHT->>MARK: "bnc (resolved_prefix=real prefix), nhr"
    MARK->>TRIE: bgp_afi_node_get(resolved_prefix) UNSET old flag
    MARK->>TRIE: "bgp_afi_node_get(nhr->prefix) SET new flag"
Loading

Reviews (1): Last reviewed commit: "bgpd: prevent root node poisoning in pre..." | Re-trigger Greptile

Comment thread bgpd/bgp_nht.c
if (dest) {
UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE);
bgp_dest_unlock_node(dest);
if (bnc->resolved_prefix.family != 0) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the bnc->flags BGP_NEXTHOP_VALID and the check should be on line 636

Copy link
Copy Markdown
Contributor Author

@soumyar-roy soumyar-roy May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In place of "if (bnc->resolved_prefix.family != 0) {" we can do if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID)) {, but at line 636 we make early return in existing code , which we should not do early return for this PR fix, as lines from 648 still should run, and that flow currently can happen for the steps described. So basically the diff will become like this > git diff master
diff --git a/bgpd/bgp_nht.c b/bgpd/bgp_nht.c
index 433a58006e..cbdd858dfb 100644

--- a/bgpd/bgp_nht.c
+++ b/bgpd/bgp_nht.c
@@ -636,10 +636,12 @@ static void bgp_bnc_mark_nht_important(struct bgp_nexthop_cache *bnc, struct zap
        if (prefix_same(&bnc->resolved_prefix, &nhr->prefix))
                return;
 
-       dest = bgp_afi_node_get(table, afi, nhr->safi, &bnc->resolved_prefix, NULL);
-       if (dest) {
-               UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE);
-               bgp_dest_unlock_node(dest);
+       if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID)) {
+               dest = bgp_afi_node_get(table, afi, nhr->safi, &bnc->resolved_prefix, NULL);
+               if (dest) {
+                       UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE);
+                       bgp_dest_unlock_node(dest);
+               }
        }
 
        dest = bgp_afi_node_get(table, afi, nhr->safi, &nhr->prefix, NULL);

Comment thread bgpd/bgp_nht.c
@@ -636,10 +636,13 @@ static void bgp_bnc_mark_nht_important(struct bgp_nexthop_cache *bnc, struct zap
if (prefix_same(&bnc->resolved_prefix, &nhr->prefix))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One test idea: can we add a topotest with two routers

  1. unnumbered BGP session advertises loopbacks, and a second numbered eBGP session is established over those loopbacks? [ That should make the remote loopback resolve]
  2. It should trigger the NHT path with nhr->type = ZEBRA_ROUTE_BGP
  3. Then advertising 0.0.0.0/0 can validate that poisoning issue does not reappear.
  4. If there is no CLI way to inspect the bad node then Validation of poisoning can be checked by assert or traces .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants