VXLAN BGP EVPN - 2 - Intra VNI switching

Page content

In this post, we will concentrate on the BGP EVPN building blocks needed to accomplish intra-VNI switching.

Introduction

In this post, we will concentrate on the BGP EVPN building blocks needed to accomplish intra-VNI switching.

Topology

VXLAN_underlay

Configuration

BGP L2VPN EVPN

  • EVPN is an AFI in BGP that helps perform control-plane MAC/IP learning thereby providing control for who can learn what routes by applying policies and ultimately enabling us to send L2 frames across an IP network.
  • Leaf-Spine has iBGP peering on their loopback 50.
  • Spine has the same config as the leaf except it is also a Route Reflector.

Leaf 101 (use the below template to configure Leaf 102)

feature bgp
nv overlay evpn

router bgp 65501
router-id 172.16.50.101
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn

template peer TO_SPINES
  remote-as 65501
  update-source loopback50
  address-family l2vpn evpn
    send-community extended

neighbor 172.16.50.11
 inherit peer TO_SPINES

neighbor 172.16.50.12
 inherit peer TO_SPINES

Spine 11 (use the below template to configure Spine 12)

feature bgp
nv overlay evpn

router bgp 65501
router-id 172.16.50.11
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn

template peer TO_LEAFS
  remote-as 65501
  update-source loopback50
  address-family l2vpn evpn
    send-community extended
    route-reflector-client

neighbor 172.16.50.101
 inherit peer TO_LEAFS

neighbor 172.16.50.102
 inherit peer TO_LEAFS

verification

spine-11# sh bgp l2vpn evpn summary
BGP summary information for VRF default, address family L2VPN EVPN
BGP router identifier 172.16.50.11, local AS number 65501
BGP table version is 14, L2VPN EVPN config peers 2, capable peers 2
0 network entries and 0 paths using 0 bytes of memory
BGP attribute entries [0/0], BGP AS path entries [0/0]
BGP community entries [0/0], BGP clusterlist entries [0/0]

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
172.16.50.101   4 65501    3456    3457       14    0    0    2d09h 0         
172.16.50.102   4 65501    3460    3456       14    0    0    2d09h 0

NVE interface

Next, enable the NVE interface and specify that BGP will be used for the overlay control plane. The NVE interface IP will appear as the next hop in BGP updates in later sections.

Leaf 101 (use the below template to configure Leaf 102)

feature nv overlay

interface nve 1
no shutdown
host-reachability protocol bgp 
source-interface loopback100

Host Mobility Manager

HMM is needed to track MAC addresses inside the VXLAN fabric.

Leaf 101 (use the below template to configure Leaf 102)

feature fabric forwarding

Anycast GW

All leaves in the fabric share a virtual MAC address which acts as the anycast GW. By doing this, we are tricking clients into thinking that there’s a single core switch so that even if the client moves, its GW mac remains the same.

Leaf 101 (use the below template to configure Leaf 102)

feature interface-vlan
fabric forwarding anycast-gateway-mac 2023.2023.2023

VLAN based service

Tell NXOS that you intend to map VLANs to VNIs.

Leaf 101 (use the below template to configure Leaf 102)

feature vn-segment-vlan-based 

NOTE: Do this step only if you see TCAM errors on the console

TCAM modification

Define if you wanna use TCAM as 256 or 512-byte blocks based on the features being used.

Leaf 101 (use the below template to configure Leaf 102)

hardware access-list tcam region racl 512
hardware access-list tcam region vpc-convergence 256 
hardware access-list tcam region arp-ether 256 double-wide 

Intra-VNI service (L2VNI) in VXLAN Fabric

  • Map the VLAN to L2 VNI
  • Create EVPN instance per VNI ( also called MAC VRF)
  • RD generated based on formula = BGP RID : 32767 + VLAN id
  • RT generated based on formula = AS number: VNI-Id
  • Next, attach the VNI to the NVE interface
  • For L2BUM, define a multicast group per L2VNI

Leaf 101 (use the below template to configure Leaf 102)

vlan 10
name vlan10-VNI10010
vn-segment 10010

evpn
 vni 10010 l2
 rd auto
route-target import auto
    route-target export auto

interface nve1
member vni 10010
mcast-group 224.1.1.10

int nve1
shut

PIM control plane

leaf 101#int nve1
no shut

MRIB of leaf 101

leaf-101# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.10/32), uptime: 00:00:57, nve ip pim 
  Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.12
  Outgoing interface list: (count: 1)
    nve1, uptime: 00:00:57, nve

As soon as NVE on leaf 101 is unshut, you will see PIM join forwarded to RP to notify the RP that it wants to be a receiver for a specific mcast group. PIM Join is always forwarded based on Multicast RIB. Due to RPF, only 1 path will be used to forward the PIM Join. In our case, leaf 101 forwards it to Spine 12.

PIM_join

PIM register is sent by leaf to RP to tell it is the source for the mcast group when it starts creating L2BUM. PIM register forwarded based on the unicast RIB. In our case, ECMP hash ends up choosing spine 12 again.

PIM_register

Spine that received the PIM register will send PIM register -STOP to the leaf if there are no other leaves yet

PIM_register_stop

Spine which received the PIM register will forward it to its other Anycast group member, the other spine

PIM_register_anycast

spine-12# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.10/32), uptime: 00:44:18, pim ip 
  Incoming interface: loopback225, RPF nbr: 172.16.225.225
  Outgoing interface list: (count: 1)
    Ethernet1/2, uptime: 00:44:18, pim


(172.16.100.101/32, 224.1.1.10/32), uptime: 00:43:55, pim mrib ip 
  Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.101, internal
  Outgoing interface list: (count: 1)
    Ethernet1/2, uptime: 00:40:55, pim, (RPF)


(*, 232.0.0.0/8), uptime: 00:46:12, pim ip 
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)
spine-11# show ip mroute
IP Multicast Routing Table for VRF "default"

(172.16.100.101/32, 224.1.1.10/32), uptime: 01:06:39, pim ip 
  Incoming interface: Ethernet1/1, RPF nbr: 172.16.0.101, internal
  Outgoing interface list: (count: 0)


(*, 232.0.0.0/8), uptime: 01:09:01, pim ip 
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

Now unshut the NVE on leaf 102.

leaf-102# int nve1
no shut

The final MRIB on spines shows both leaves that have registered for group 224.1.1.10 as receivers and sources.

spine-12# show ip mroute
IP Multicast Routing Table for VRF "default"

(*, 224.1.1.10/32), uptime: 01:08:51, pim ip 
  Incoming interface: loopback225, RPF nbr: 172.16.225.225
  Outgoing interface list: (count: 2)
    Ethernet1/1, uptime: 00:00:47, pim
    Ethernet1/2, uptime: 01:08:51, pim

(172.16.100.101/32, 224.1.1.10/32), uptime: 01:08:28, pim mrib ip 
  Incoming interface: Ethernet1/2, RPF nbr: 172.16.0.101, internal
  Outgoing interface list: (count: 2)
    Ethernet1/1, uptime: 00:00:47, pim
    Ethernet1/2, uptime: 01:05:28, pim, (RPF)

(172.16.100.102/32, 224.1.1.10/32), uptime: 00:00:06, pim mrib ip 
  Incoming interface: Ethernet1/1, RPF nbr: 172.16.0.102, internal
  Outgoing interface list: (count: 1)
    Ethernet1/2, uptime: 00:00:06, pim

(*, 232.0.0.0/8), uptime: 01:10:45, pim ip 
  Incoming interface: Null, RPF nbr: 0.0.0.0
  Outgoing interface list: (count: 0)

Intra-VNI control plane

L2VNI

Leaf 101

Attach the clients and unshut the client-facing ports on the leaves.

int e1/3
switchport mode access 
Switchport access vlan 10 >>> ( use vlan 100 for leaf 102)
no shut

Leaf 101 learns the mac of “Alice” locally due to source learning.

leaf-101# show mac address-table
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
C   10     0000.0000.0b0b   dynamic  0         F      F    nve1(172.16.100.102)
*   10     0000.000a.11ce   dynamic  0         F      F    Eth1/3
G    -     2023.2023.2023   static   -         F      F    sup-eth1(R)
G    -     5001.0000.1b08   static   -         F      F    sup-eth1(R)

L2FWDER installs mac address into EVPN instance of VNI 10010 ( MAC VRF) L2RIB

leaf-101# show l2route evpn mac evi 10

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops            
                  
----------- -------------- ------ ------------- ---------- ---------------------
------------------
10          0000.0000.0b0b BGP    Rcv           0          172.16.100.102 (Label
: 10010)          
10          0000.000a.11ce Local  L,            0          Eth1/3               
leaf-101# show vlan id 10 vn-segment


VLAN Segment-id
---- -----------
10   10010     

Leaf 101 exports the MAC route from L2RIB into the BGP process which applies BGP policy ( export RT). Finally, BGP attaches any Path Attributes and then sends Route type 2 to spine 11 (RR). Spine forwards it to Leaf 102 ( RR client)

leaf-101# show bgp l2vpn evpn 0000.000a.11ce
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.50.101:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
 version 25
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path locally originated
    172.16.100.101 (metric 0) from 0.0.0.0 (172.16.50.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010
      Extcommunity: RT:65501:10010 ENCAP:8

  Path-id 1 advertised to peers:
    172.16.50.11       172.16.50.12   

Leaf 102 imports route into the BGP process without modification. Next, the route is imported into BGP RIB based on EVPN import RT and BGP best path result. Also during this import RD might be changed. Since VLAN = 10 on Leaf 101 but 100 on Leaf 102, received RD = 172.16.50.101:32777 was changed to 172.16.50.102:32867 (32767 + 100)

leaf-102# show bgp l2vpn evpn 0000.000a.11ce
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.50.101:32777
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
 version 20
Paths: (2 available, best #2)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not i
n HW

  Path type: internal, path is valid, not best reason: Neighbor Address, no labe
led nexthop
  AS-Path: NONE, path sourced internal to AS
    172.16.100.101 (metric 81) from 172.16.50.12 (172.16.50.12)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010
      Extcommunity: RT:65501:10010 ENCAP:8
      Originator: 172.16.50.101 Cluster list: 172.16.50.12 

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported to 1 destination(s)
             Imported paths list: L2-10010
  AS-Path: NONE, path sourced internal to AS
    172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010
      Extcommunity: RT:65501:10010 ENCAP:8
      Originator: 172.16.50.101 Cluster list: 172.16.50.11 

  Path-id 1 not advertised to any peer

Route Distinguisher: 172.16.50.102:32867    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[0]:[0.0.0.0]/216,
 version 21
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop, in rib
             Imported from 172.16.50.101:32777:[2]:[0]:[0]:[48]:[0000.000a.11ce]
:[0]:[0.0.0.0]/216 
  AS-Path: NONE, path sourced internal to AS
    172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010
      Extcommunity: RT:65501:10010 ENCAP:8
      Originator: 172.16.50.101 Cluster list: 172.16.50.11 

  Path-id 1 not advertised to any peer

Install route to EVPN instance VNI10010 L2RIB based on L2VNI label received in BGP update. The next hop will be remote Leaf.

leaf-102# show l2route evpn mac evi 100

Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv (AD):Auto-Delete (D):Del Pending
(S):Stale (C):Clear, (Ps):Peer Sync (O):Re-Originated (Nho):NH-Override
(Pf):Permanently-Frozen, (Orp): Orphan

Topology    Mac Address    Prod   Flags         Seq No     Next-Hops            
                
----------- -------------- ------ ------------- ---------- ---------------------
------------------
100         0000.0000.0b0b Local  L,            0          Eth1/3               
                
100         0000.000a.11ce BGP    Rcv           0          172.16.100.101 (Label
: 10010)  
leaf-102# sh vlan id 100 vn-segment


VLAN Segment-id
---- -----------
100  10010   

L2FWDER installs Mac into the MAC table from L2RIB.

leaf-102# show mac address-table
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
*  100     0000.0000.0b0b   dynamic  0         F      F    Eth1/3
C  100     0000.000a.11ce   dynamic  0         F      F    nve1(172.16.100.101)
G    -     2023.2023.2023   static   -         F      F    sup-eth1(R)
G    -     5002.0000.1b08   static   -         F      F    sup-eth1(R)

BGP EVPN Type 2 PCAP

VXLAN_underlay

PIM data plane

alice#ping 192.168.11.12
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.11.12, timeout is 2 seconds:
.!!!!
  • ARP request from Alice is L2BUM
  • Leaf 101 forwards it to mcast group based on mrib
  • The spine will then look at the OIL and send it to other leaves
  • Meanwhile, leaf 101 also learns the MAC of Alice and advertises it as a BGP EVPN route to other VTEPs

VXLAN_underlay

  • Leaf 102 which has a target host will decap and forward the Arp request to host
  • The host will send a unicast ARP reply
  • Since we have received the MAC route via BGP EVPN, we can now unicast the ARP reply to the VTEP which sent the Arp request.

VXLAN_underlay

Intra-VNI data plane

ICMP request is now successfully encapsulated by leaf 101

VXLAN_underlay

ICMP request is successfully decapsulated by leaf 102 and it encapsulates ICMP reply.

VXLAN_underlay