[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540
[EVPN-MH] Add kernel patches for EVPN VXLAN Multihoming support#540bdfriedman wants to merge 1 commit intosonic-net:masterfrom
Conversation
Signed-off-by: Barry Friedman (friedman) <friedman@cisco.com>
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
@bdfriedman , are these PR upstreamed to linux kernel already? or in the process of upstream? |
banidoru
left a comment
There was a problem hiding this comment.
- Bug in
br.c(patch 2):br_switchdev_eventpassesfdb_info->locked(bool) as theprotocolparameter andRTPROT_UNSPECasext_flags— argument order is wrong after the signature change - Bug in
br_fdb_add(patch 2):protocolis initialized toRTPROT_UNSPECbut never parsed fromtb[NDA_PROTOCOL]— the add path silently ignores the user-supplied protocol, unlike the delete path which parses it correctly - Patch 1 (NDA_FLAGS_EXT) and Patch 3 (NTF_EXT_VALIDATED) look structurally sound
- Patches are not upstream-accepted — these appear to be custom SONiC patches carrying significant kernel neighbor/FDB subsystem changes
| } | ||
|
|
||
| - err = __br_fdb_delete(br, p, addr, vid); | ||
| + err = __br_fdb_delete(br, p, addr, vid, protocol); |
There was a problem hiding this comment.
Bug: Wrong argument order. After adding protocol to br_fdb_external_learn_add's signature (as the 5th positional param after vid), this call site passes fdb_info->locked as protocol and appends RTPROT_UNSPEC at the end as ext_flags.
Current: (br, p, addr, vid, fdb_info->locked, false, 0, RTPROT_UNSPEC)
Expected: (br, p, addr, vid, RTPROT_UNSPEC, fdb_info->locked, false, 0)
When fdb_info->locked is true, the entry will get protocol=1 instead of RTPROT_UNSPEC, and locked will silently be forced to false.
| - bool swdev_notify, u32 ext_flags) | ||
| + const unsigned char *addr, u16 vid, u8 protocol, | ||
| + bool locked, bool swdev_notify, u32 ext_flags) | ||
| { |
There was a problem hiding this comment.
Bug: protocol is never parsed from netlink in the add path. The variable is initialized to RTPROT_UNSPEC but there's no if (tb[NDA_PROTOCOL]) protocol = nla_get_u8(tb[NDA_PROTOCOL]); anywhere in br_fdb_add. Compare with br_fdb_delete which correctly parses it. This means bridge fdb add ... proto hw will silently ignore the protocol — it will always store RTPROT_UNSPEC.
banidoru
left a comment
There was a problem hiding this comment.
- Patch ordering concern: Patch 2 (protocol field) modifies
br_fdb_external_learn_addsignature from patch 1 but gets argument order wrong inbr.c(already noted by prior reviewer). - Missing NDA_PROTOCOL parse in br_fdb_add:
protocolis initialized toRTPROT_UNSPECbut never read fromtb[NDA_PROTOCOL], so bridge fdb add silently ignores user-specified protocol (already noted). - Internal callers can silently overwrite protocol:
vxlan_snoopandvxlan_fdb_external_learn_addpassRTPROT_UNSPEC— if the entry already has a real protocol set, it gets silently cleared on update. - Patches are well-structured overall: Clean kernel patch backports with proper commit messages and sign-offs. The NTF_EXT_VALIDATED and NTF_EXT_MH_PEER_SYNC patches look correct.
| int err; | ||
|
|
||
| if (!(ndm->ndm_state & (NUD_PERMANENT|NUD_REACHABLE))) { | ||
| @@ -1281,7 +1296,7 @@ static int vxlan_fdb_add(struct ndmsg *ndm, struct nlattr *tb[], |
There was a problem hiding this comment.
Protocol overwrite on internal update. When vxlan_fdb_update_existing is called by internal callers like vxlan_snoop or vxlan_fdb_external_learn_add with RTPROT_UNSPEC, this block will overwrite any previously-set protocol value back to 0. Consider guarding with if (protocol != RTPROT_UNSPEC && f->protocol != protocol) to avoid unintentional protocol clearing by internal callers that don't carry protocol context.
| if (!fdb || READ_ONCE(fdb->dst) != p) | ||
| return -ENOENT; | ||
|
|
||
| + /* If the delete comes from a different protocol type, |
There was a problem hiding this comment.
Non-extern_learn path doesn't propagate protocol. When __br_fdb_add takes the fdb_add_entry path (non-extern_learn), the protocol parameter is unused — fdb_add_entry never receives or sets it. This means bridge fdb add ... proto hw without extern_learn will silently store RTPROT_UNSPEC. Either fdb_add_entry should also accept and set protocol, or the kernel should reject protocol with non-extern_learn entries.
banidoru
left a comment
There was a problem hiding this comment.
- VXLAN FDB protocol field not emitted in netlink dumps:
vxlan_fdb_info()is not updated to includeNDA_PROTOCOLin the protocol patch, so VXLAN FDB dumps won't show the protocol even though it's stored internally. Bridge side (fdb_fill_info) does this correctly. - Existing review comments correctly identify: wrong argument order in
br_switchdev_eventcall, missingNDA_PROTOCOLparsing inbr_fdb_add, protocol not propagated throughfdb_add_entrypath, and protocol overwrite risk from internal VXLAN callers. - NTF_EXT_VALIDATED patch is well-designed — proper state validation, GC exemption, stale-instead-of-failed fallback, and carrier-down protection.
- NTF_EXT_MH_PEER_SYNC / ext_flags plumbing through VXLAN and bridge layers looks correct.
- All three patches use
0001-prefix — consider renaming to0001-/0002-/0003-for clarity in the series.
| 6 files changed, 85 insertions(+), 41 deletions(-) | ||
|
|
||
| diff --git a/drivers/net/vxlan/vxlan_core.c b/drivers/net/vxlan/vxlan_core.c | ||
| index de1b3fa96..c34b9f75c 100644 |
There was a problem hiding this comment.
Missing: vxlan_fdb_info() does not emit NDA_PROTOCOL. The protocol field is stored in vxlan_fdb and parsed/updated correctly, but vxlan_fdb_info() (the function that builds netlink messages for VXLAN FDB dumps) is never updated to include nla_put_u8(skb, NDA_PROTOCOL, fdb->protocol). This means bridge fdb show for VXLAN devices won't display the protocol, even though it's correctly tracked internally.
Compare with the bridge side where fdb_fill_info() correctly adds NDA_PROTOCOL. The same needs to happen in vxlan_fdb_info() — likely right after the existing NDA_VNI put.
|
Patch 3 (NTF_EXT_VALIDATED) is upstreamed and merged — great. Patches 1 and 2 appear to be Cisco-specific and not yet submitted upstream. Are there plans to upstream these? Carrying out-of-tree kernel patches long-term creates maintenance burden on every kernel version bump. If there are upstream submissions in progress, linking them here would help reviewers assess the risk. |
|
The |
|
The bugs flagged by banidoru (wrong argument order in |
|
Agree with what Tamer has said about upstreaming patches. There is already a fullcone NAT patch in this repo that hasn't been upstreamed, requires significant maintenance work to update, and is currently not getting applied on the Trixie kernel because no one has updated it. I do not want another instance of that. Additionally, if a patch has been upstreamed, please use the git patch from upstream, adjust for any patch conflicts, and (preferably) add the commit ID in the patch description. |
Why I did it
This PR adds three critical Linux kernel patches required to enable EVPN VXLAN Multihoming in SONiC. These kernel enhancements provide the necessary infrastructure for:
These patches are essential for implementing the EVPN-MH feature as described in the EVPN VXLAN Multihoming HLD.
Work item tracking
How I did it
Added three kernel patches to patches-sonic directory:
1. NDA_FLAGS_EXT Support with NTF_EXT_MH_PEER_SYNC (0001-vxlan-bridge-Add-NDA_FLAGS_EXT-support-with-NTF_EXT_.patch)
This patch adds extended flags support for VXLAN and bridge FDB entries to enable multi-homing peer synchronization:
ext_flagsinvxlan_fdbstructureNTF_EXT_MH_PEER_SYNC- Indicates FDB entry is synchronized across EVPN-MH peersNEIGH_UPDATE_F_EXT_MH_PEER_SYNCfor propagating sync statevxlan_fdb_alloc()- Initialize ext_flagsvxlan_fdb_create()- Pass ext_flags parametervxlan_fdb_update_existing()- Handle ext_flags updates and notificationsvxlan_fdb_update_create()- Create FDB with ext_flagsvxlan_fdb_info()- Include NDA_FLAGS_EXT in netlink messagesFiles modified:
drivers/net/vxlan/vxlan_core.c(140 lines)drivers/net/vxlan/vxlan_private.h(21 lines)drivers/net/vxlan/vxlan_vnifilter.c(11 lines)include/net/neighbour.h(4 lines)include/uapi/linux/neighbour.h(1 line)net/bridge/br.c(4 lines)net/bridge/br_fdb.c(35 lines)net/bridge/br_private.h(5 lines)net/core/neighbour.c(13 lines)2. Protocol Field in Bridge FDB (0001-net-bridge-vxlan-Protocol-field-in-bridge-fdb.patch)
This patch introduces an optional "protocol" field for bridge FDB entries to distinguish between control plane and data plane learned MAC addresses:
Purpose: In EVPN Multihoming, MAC addresses can be learned via:
This distinction enables:
Implementation:
protocolinnet_bridge_fdb_entryandvxlan_fdbstructuresRTPROT_UNSPECwhen protocol not specified (backward compatible)Usage Example:
Files modified:
drivers/net/vxlan/vxlan_core.c(55 lines)drivers/net/vxlan/vxlan_private.h(5 lines)drivers/net/vxlan/vxlan_vnifilter.c(4 lines)net/bridge/br.c(2 lines)net/bridge/br_fdb.c(55 lines)net/bridge/br_private.h(5 lines)3. NTF_EXT_VALIDATED Flag for External Validation (0001-neighbor-Add-NTF_EXT_VALIDATED-flag-for-externally-v.patch)
This patch adds a new "extern_valid" neighbor flag to indicate entries learned and validated externally that should not be invalidated by the kernel:
Background: In EVPN multi-homing:
Solution (based on draft-rbickhart-evpn-ip-mac-proxy-adv-03):
Implementation:
NTF_EXT_VALIDATED(extern_valid) - Entry is externally validatedUse case: Required for EVPN-MH proxy advertisements where control plane needs full control over neighbor entry validity and removal decisions.
Files modified:
How to verify it
Build kernel with these patches applied:
cd sonic-linux-kernel make BLDENV=bookwormVerify NDA_FLAGS_EXT support:
Verify protocol field support:
Verify extern_valid flag:
Integration testing with EVPN-MH:
Compatibility testing:
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Add kernel patches for EVPN VXLAN Multihoming: extended FDB flags (NTF_EXT_MH_PEER_SYNC), protocol field for bridge FDB entries, and extern_valid flag for externally validated neighbor entries
Link to config_db schema for YANG model changes
N/A - This PR only adds kernel patches, no CONFIG_DB schema changes
Depends on
Related upstream work
Summary:
Critical for EVPN-MH:
✅ Peer synchronization flag (NTF_EXT_MH_PEER_SYNC)
✅ Control/data plane MAC distinction (protocol field)
✅ External neighbor validation (extern_valid flag)
✅ Proxy advertisement support
✅ Prevents intermittent EVPN-MH failures