bgpd: prevent root node poisoning in prefix tree#21915
Conversation
When a BNC is first created via XCALLOC, resolved_prefix is
{family=0, prefixlen=0}. If bgp_bnc_mark_nht_important fires
before resolved_prefix is updated, it passes the zeroed prefix
to bgp_afi_node_get which creates a {family=0, prefixlen=0}
node in the radix trie, poisoning the table root.
Skip the old resolved_prefix lookup when family is still 0.
Signed-off-by: Soumya Roy <souroy@nvidia.com>
Greptile SummaryThis PR fixes a radix trie corruption in BGP's next-hop tracking (NHT) code. When a BNC (
Confidence Score: 4/5Safe to merge; the fix correctly prevents root-node creation from an uninitialized prefix and does not regress the normal resolved-prefix lifecycle. The guard is logically correct: bgpd/bgp_nht.c — specifically the interaction between the Important Files Changed
Sequence DiagramsequenceDiagram
participant Z as Zebra
participant NHT as bgp_process_nexthop_update
participant MARK as bgp_bnc_mark_nht_important
participant TRIE as BGP Radix Trie
Z->>NHT: zapi_route update (nhr)
NHT->>MARK: "bnc (resolved_prefix={family=0}), nhr"
alt "BEFORE fix: family == 0, no guard"
MARK->>TRIE: "bgp_afi_node_get(resolved_prefix={0,0})"
TRIE-->>MARK: creates/returns root node POISONED
MARK->>TRIE: UNSET_FLAG on root node
end
alt "AFTER fix: family == 0, guard skips old lookup"
MARK->>MARK: "skip old-prefix cleanup (family==0)"
end
MARK->>TRIE: "bgp_afi_node_get(nhr->prefix) set BGP_NODE_NHT_RESOLVED_NODE"
NHT->>NHT: "prefix_copy(resolved_prefix, nhr->prefix)"
Note over NHT: resolved_prefix.family now non-zero
Z->>NHT: next zapi_route update
NHT->>MARK: "bnc (resolved_prefix=real prefix), nhr"
MARK->>TRIE: bgp_afi_node_get(resolved_prefix) UNSET old flag
MARK->>TRIE: "bgp_afi_node_get(nhr->prefix) SET new flag"
Reviews (1): Last reviewed commit: "bgpd: prevent root node poisoning in pre..." | Re-trigger Greptile |
| if (dest) { | ||
| UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE); | ||
| bgp_dest_unlock_node(dest); | ||
| if (bnc->resolved_prefix.family != 0) { |
There was a problem hiding this comment.
Please use the bnc->flags BGP_NEXTHOP_VALID and the check should be on line 636
There was a problem hiding this comment.
In place of "if (bnc->resolved_prefix.family != 0) {" we can do if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID)) {, but at line 636 we make early return in existing code , which we should not do early return for this PR fix, as lines from 648 still should run, and that flow currently can happen for the steps described. So basically the diff will become like this > git diff master
diff --git a/bgpd/bgp_nht.c b/bgpd/bgp_nht.c
index 433a58006e..cbdd858dfb 100644
--- a/bgpd/bgp_nht.c
+++ b/bgpd/bgp_nht.c
@@ -636,10 +636,12 @@ static void bgp_bnc_mark_nht_important(struct bgp_nexthop_cache *bnc, struct zap
if (prefix_same(&bnc->resolved_prefix, &nhr->prefix))
return;
- dest = bgp_afi_node_get(table, afi, nhr->safi, &bnc->resolved_prefix, NULL);
- if (dest) {
- UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE);
- bgp_dest_unlock_node(dest);
+ if (CHECK_FLAG(bnc->flags, BGP_NEXTHOP_VALID)) {
+ dest = bgp_afi_node_get(table, afi, nhr->safi, &bnc->resolved_prefix, NULL);
+ if (dest) {
+ UNSET_FLAG(dest->flags, BGP_NODE_NHT_RESOLVED_NODE);
+ bgp_dest_unlock_node(dest);
+ }
}
dest = bgp_afi_node_get(table, afi, nhr->safi, &nhr->prefix, NULL);
| @@ -636,10 +636,13 @@ static void bgp_bnc_mark_nht_important(struct bgp_nexthop_cache *bnc, struct zap | |||
| if (prefix_same(&bnc->resolved_prefix, &nhr->prefix)) | |||
There was a problem hiding this comment.
One test idea: can we add a topotest with two routers
- unnumbered BGP session advertises loopbacks, and a second numbered eBGP session is established over those loopbacks? [ That should make the remote loopback resolve]
- It should trigger the NHT path with nhr->type = ZEBRA_ROUTE_BGP
- Then advertising 0.0.0.0/0 can validate that poisoning issue does not reappear.
- If there is no CLI way to inspect the bad node then Validation of poisoning can be checked by assert or traces .
When a BNC is first created via XCALLOC, resolved_prefix is {family=0, prefixlen=0}. If bgp_bnc_mark_nht_important fires before resolved_prefix is updated, it passes the zeroed prefix to bgp_afi_node_get which creates a {family=0, prefixlen=0} node in the radix trie, poisoning the table root.
Skip the old resolved_prefix lookup when family is still 0.
Signed-off-by: Soumya Roy souroy@nvidia.com
Steps: