VXLAN BGP EVPN - 2 - Intra VNI switching
In this post, we will concentrate on the BGP EVPN building blocks needed to accomplish intra-VNI switching.
Introduction
In this post, we will concentrate on the BGP EVPN building blocks needed to accomplish intra-VNI switching.
Topology
Configuration
BGP L2VPN EVPN
- EVPN is an AFI in BGP that helps perform control-plane MAC/IP learning thereby providing control for who can learn what routes by applying policies and ultimately enabling us to send L2 frames across an IP network.
- Leaf-Spine has iBGP peering on their loopback 50.
- Spine has the same config as the leaf except it is also a Route Reflector.
Leaf 101 (use the below template to configure Leaf 102)
feature bgp
nv overlay evpn
router bgp 65501
router-id 172.16.50.101
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn
template peer TO_SPINES
remote-as 65501
update-source loopback50
address-family l2vpn evpn
send-community extended
neighbor 172.16.50.11
inherit peer TO_SPINES
neighbor 172.16.50.12
inherit peer TO_SPINES
Spine 11 (use the below template to configure Spine 12)
feature bgp
nv overlay evpn
router bgp 65501
router-id 172.16.50.11
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn
template peer TO_LEAFS
remote-as 65501
update-source loopback50
address-family l2vpn evpn
send-community extended
route-reflector-client
neighbor 172.16.50.101
inherit peer TO_LEAFS
neighbor 172.16.50.102
inherit peer TO_LEAFS
verification
spine-11# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 172.16.50.11, local AS number 65501
BGP table version is 14, L2VPN EVPN config peers 2, capable peers 2
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
172.16.50.101 4 65501 3456 3457 14 0 0 2d09h 0
172.16.50.102 4 65501 3460 3456 14 0 0 2d09h 0
NVE interface
Next, enable the NVE interface and specify that BGP will be used for the overlay control plane. The NVE interface IP will appear as the next hop in BGP updates in later sections.
Leaf 101 (use the below template to configure Leaf 102)
feature nv overlay
interface nve 1
no shutdown
host-reachability protocol bgp
source-interface loopback100
Host Mobility Manager
HMM is needed to track MAC addresses inside the VXLAN fabric.
Leaf 101 (use the below template to configure Leaf 102)
feature fabric forwarding
Anycast GW
All leaves in the fabric share a virtual MAC address which acts as the anycast GW. By doing this, we are tricking clients into thinking that there’s a single core switch so that even if the client moves, its GW mac remains the same.
Leaf 101 (use the below template to configure Leaf 102)
feature interface-vlan
fabric forwarding anycast-gateway-mac 2023.2023.2023
VLAN based service
Tell NXOS that you intend to map VLANs to VNIs.
Leaf 101 (use the below template to configure Leaf 102)
feature vn-segment-vlan-based
NOTE: Do this step only if you see TCAM errors on the console
TCAM modification
Define if you wanna use TCAM as 256 or 512-byte blocks based on the features being used.
Leaf 101 (use the below template to configure Leaf 102)
hardware access-list tcam region racl 512
hardware access-list tcam region vpc-convergence 256
hardware access-list tcam region arp-ether 256 double-wide
Intra-VNI service (L2VNI) in VXLAN Fabric
- Map the VLAN to L2 VNI
- Create EVPN instance per VNI ( also called MAC VRF)
- RD generated based on formula = BGP RID : 32767 + VLAN id
- RT generated based on formula = AS number: VNI-Id
- Next, attach the VNI to the NVE interface
- For L2BUM, define a multicast group per L2VNI
Leaf 101 (use the below template to configure Leaf 102)
vlan 10
name vlan10-VNI10010
vn-segment 10010
evpn
vni 10010 l2
rd auto
route-target import auto
route-target export auto
interface nve1
member vni 10010
mcast-group 224.1.1.10
int nve1
shut
PIM control plane
leaf 101#int nve1
no shut
MRIB of leaf 101
leaf-101# show ip mroute
IP Multicast Routing Table for VRF "default"
(*, 224.1.1.10/32), uptime: 00:00:57, nve ip pim
Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.12
Outgoing interface list: (count: 1)
nve1, uptime: 00:00:57, nve
As soon as NVE on leaf 101 is unshut, you will see PIM join forwarded to RP to notify the RP that it wants to be a receiver for a specific mcast group. PIM Join is always forwarded based on Multicast RIB. Due to RPF, only 1 path will be used to forward the PIM Join. In our case, leaf 101 forwards it to Spine 12.
PIM register is sent by leaf to RP to tell it is the source for the mcast group when it starts creating L2BUM. PIM register forwarded based on the unicast RIB. In our case, ECMP hash ends up choosing spine 12 again.
Spine that received the PIM register will send PIM register -STOP to the leaf if there are no other leaves yet
Spine which received the PIM register will forward it to its other Anycast group member, the other spine
spine-12# show ip mroute
IP Multicast Routing Table for VRF "default"
(*, 224.1.1.10/32), uptime: 00:44:18, pim ip
Incoming interface: loopback225, RPF nbr: 172.16.225.225
Outgoing interface list: (count: 1)
Ethernet1/2, uptime: 00:44:18, pim
(172.16.100.101/32, 224.1.1.10/32), uptime: 00:43:55, pim mrib ip
Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.101, internal
Outgoing interface list: (count: 1)
Ethernet1/2, uptime: 00:40:55, pim, (RPF)
(*, 232.0.0.0/8), uptime: 00:46:12, pim ip
Incoming interface: Null, RPF nbr: 0.0.0.0
Outgoing interface list: (count: 0)
spine-11# show ip mroute
IP Multicast Routing Table for VRF "default"
(172.16.100.101/32, 224.1.1.10/32), uptime: 01:06:39, pim ip
Incoming interface: Ethernet1/1, RPF nbr: 172.16.0.101, internal
Outgoing interface list: (count: 0)
(*, 232.0.0.0/8), uptime: 01:09:01, pim ip
Incoming interface: Null, RPF nbr: 0.0.0.0
Outgoing interface list: (count: 0)
Now unshut the NVE on leaf 102.
leaf-102# int nve1
no shut
The final MRIB on spines shows both leaves that have registered for group 224.1.1.10 as receivers and sources.
spine-12# show ip mroute
IP Multicast Routing Table for VRF "default"
(*, 224.1.1.10/32), uptime: 01:08:51, pim ip
Incoming interface: loopback225, RPF nbr: 172.16.225.225
Outgoing interface list: (count: 2)
Ethernet1/1, uptime: 00:00:47, pim
Ethernet1/2, uptime: 01:08:51, pim
(172.16.100.101/32, 224.1.1.10/32), uptime: 01:08:28, pim mrib ip
Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.101, internal
Outgoing interface list: (count: 2)
Ethernet1/1, uptime: 00:00:47, pim
Ethernet1/2, uptime: 01:05:28, pim, (RPF)
(172.16.100.102/32, 224.1.1.10/32), uptime: 00:00:06, pim mrib ip
Incoming interface: Ethernet1/1, RPF nbr: 172.16.0.102, internal
Outgoing interface list: (count: 1)
Ethernet1/2, uptime: 00:00:06, pim
(*, 232.0.0.0/8), uptime: 01:10:45, pim ip
Incoming interface: Null, RPF nbr: 0.0.0.0
Outgoing interface list: (count: 0)
Intra-VNI control plane
Leaf 101
Attach the clients and unshut the client-facing ports on the leaves.
int e1/3
switchport mode access
Switchport access vlan 10 >>> ( use vlan 100 for leaf 102)
no shut
Leaf 101 learns the mac of “Alice” locally due to source learning.
leaf-101# show mac address-table
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
C 10 0000.0000.0b0b dynamic 0 F F nve1(172.16.100.102)
* 10 0000.000a.11ce dynamic 0 F F Eth1/3
G - 2023.2023.2023 static - F F sup-eth1(R)
G - 5001.0000.1b08 static - F F sup-eth1(R)
L2FWDER installs mac address into EVPN instance of VNI 10010 ( MAC VRF) L2RIB
leaf-101# show l2route evpn mac evi 10
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan
Topology Mac Address Prod Flags Seq No Next-Hops
----------- -------------- ------ ------------- ---------- ---------------------
------------------
10 0000.0000.0b0b BGP Rcv 0 172.16.100.102 (Label
: 10010)
10 0000.000a.11ce Local L, 0 Eth1/3
leaf-101# show vlan id 10 vn-segment
VLAN Segment-id
---- -----------
10 10010
Leaf 101 exports the MAC route from L2RIB into the BGP process which applies BGP policy ( export RT). Finally, BGP attaches any Path Attributes and then sends Route type 2 to spine 11 (RR). Spine forwards it to Leaf 102 ( RR client)
leaf-101# show bgp l2vpn evpn 0000.000a.11ce
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.50.101:32777 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
version 25
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn
Advertised path-id 1
Path type: local, path is valid, is best path, no labeled nexthop
AS-Path: NONE, path locally originated
172.16.100.101 (metric 0) from 0.0.0.0 (172.16.50.101)
Origin IGP, MED not set, localpref 100, weight 32768
Received label 10010
Extcommunity: RT:65501:10010 ENCAP:8
Path-id 1 advertised to peers:
172.16.50.11 172.16.50.12
Leaf 102 imports route into the BGP process without modification. Next, the route is imported into BGP RIB based on EVPN import RT and BGP best path result. Also during this import RD might be changed. Since VLAN = 10 on Leaf 101 but 100 on Leaf 102, received RD = 172.16.50.101:32777 was changed to 172.16.50.102:32867 (32767 + 100)
leaf-102# show bgp l2vpn evpn 0000.000a.11ce
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.50.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
version 20
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW
Path type: internal, path is valid, not best reason: Neighbor Address, no labe
led nexthop
AS-Path: NONE, path sourced internal to AS
172.16.100.101 (metric 81) from 172.16.50.12 (172.16.50.12)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010
Extcommunity: RT:65501:10010 ENCAP:8
Originator: 172.16.50.101 Cluster list: 172.16.50.12
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop
Imported to 1 destination(s)
Imported paths list: L2-10010
AS-Path: NONE, path sourced internal to AS
172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010
Extcommunity: RT:65501:10010 ENCAP:8
Originator: 172.16.50.101 Cluster list: 172.16.50.11
Path-id 1 not advertised to any peer
Route Distinguisher: 172.16.50.102:32867 (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
version 21
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW
Advertised path-id 1
Path type: internal, path is valid, is best path, no labeled nexthop, in rib
Imported from 172.16.50.101:32777:[2]:[0]:[0]:[48]:[0000.000a.11ce]
:[0]:[0.0.0.0]/216
AS-Path: NONE, path sourced internal to AS
172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
Origin IGP, MED not set, localpref 100, weight 0
Received label 10010
Extcommunity: RT:65501:10010 ENCAP:8
Originator: 172.16.50.101 Cluster list: 172.16.50.11
Path-id 1 not advertised to any peer
Install route to EVPN instance VNI10010 L2RIB based on L2VNI label received in BGP update. The next hop will be remote Leaf.
leaf-102# show l2route evpn mac evi 100
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan
Topology Mac Address Prod Flags Seq No Next-Hops
----------- -------------- ------ ------------- ---------- ---------------------
------------------
100 0000.0000.0b0b Local L, 0 Eth1/3
100 0000.000a.11ce BGP Rcv 0 172.16.100.101 (Label
: 10010)
leaf-102# sh vlan id 100 vn-segment
VLAN Segment-id
---- -----------
100 10010
L2FWDER installs Mac into the MAC table from L2RIB.
leaf-102# show mac address-table
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
* 100 0000.0000.0b0b dynamic 0 F F Eth1/3
C 100 0000.000a.11ce dynamic 0 F F nve1(172.16.100.101)
G - 2023.2023.2023 static - F F sup-eth1(R)
G - 5002.0000.1b08 static - F F sup-eth1(R)
BGP EVPN Type 2 PCAP
PIM data plane
alice#ping 192.168.11.12
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.12, timeout is 2 seconds:
.!!!!
- ARP request from Alice is L2BUM
- Leaf 101 forwards it to mcast group based on mrib
- The spine will then look at the OIL and send it to other leaves
- Meanwhile, leaf 101 also learns the MAC of Alice and advertises it as a BGP EVPN route to other VTEPs
- Leaf 102 which has a target host will decap and forward the Arp request to host
- The host will send a unicast ARP reply
- Since we have received the MAC route via BGP EVPN, we can now unicast the ARP reply to the VTEP which sent the Arp request.
Intra-VNI data plane
ICMP request is now successfully encapsulated by leaf 101
ICMP request is successfully decapsulated by leaf 102 and it encapsulates ICMP reply.