VXLAN BGP EVPN - 3 - Inter VNI routing

Page content

In this post, we will concentrate on the BGP EVPN Control plane and data plane needed to accomplish inter-VNI routing.

Introduction

In this post, we will concentrate on the BGP EVPN Control plane and data plane needed to accomplish inter-VNI routing.

Topology

VXLAN_underlay

Configuration

I’m gonna assume you’ve followed the series until now and the below steps need to be performed after the completion of configs from the previous blogs in this series.

Since we intend to perform inter-VNI routing, we need a minimum of 2 L2VNI in our fabric. Let’s create a new L2VNI on Leaf 102

Leaf 102

vlan 200
  name vlan200-VNI10020
  vn-segment 10020

evpn
  vni 10020 l2
    rd auto
    route-target import auto
    route-target export auto

interface nve1
  member vni 10020
    mcast-group 224.1.1.10
 
interface Ethernet1/4
  switchport access vlan 200

Now let’s perform the configuration needed for inter-VNI routing

  • Define a VLAN for L3VNI and assign a VNI to it.
  • Map L3VNI to the tenant VRF.
  • Define an L3VNI SVI for VXLAN traffic forwarding
  • Associate L3VNI to the VRF in the NVE interface.

Leaf 101 (do the same on Leaf 102)

vlan 99
  name PRD_Tenant
  vn-segment 10099

vrf context PRD_Tenant
  vni 10099
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface Vlan99
  no shutdown
  mtu 9216
  vrf member PRD_Tenant
  ip forward

interface nve1
  member vni 10099 associate-vrf

Let’s also enable a couple of add-on features that would make our fabric forwarding more efficient

Anycast GW

This allows all clients (behind different leaves) into believing that they are all connected to a single switch since we use the same IP and virtual MAC for the SVI on all leaves

Leaf 101 and 102

interface Vlan10
  no shutdown
  vrf member PRD_Tenant
  ip address 192.168.11.1/24
  fabric forwarding mode anycast-gateway

Leaf 102

interface Vlan200
  no shutdown
  vrf member PRD_Tenant
  ip address 192.168.22.1/24
  fabric forwarding mode anycast-gateway

ARP suppression

This feature allows the Local VTEP to reply to the ARP requests (using the ARP cache) without flooding the request across the fabric.

Leaf 101

interface nve1
member vni 10010
    suppress-arp

Leaf 102

interface nve1
 member vni 10010
    suppress-arp
  member vni 10020
    suppress-arp

Inter-VNI Control plane

VXLAN_underlay

There are many moving parts here so I’ve numbered the above workflow for ease of understanding.

Let’s assume Alice ( 192.168.11.11, 0000.000a.11ce ) with a default GW of 192.168.11.1 comes online. When a client comes online, it generally announces its presence using Gratuitous ARP (GARP) We have already learned how local VTEP learns the MAC address and exchanges it with remote VREP in the previous blog so we will jump directly to the IP learning here

  1. Leaf 101 scrubs the GARP and stores the MAC-IP of Alice binding in its ARP table.
leaf-101# sh ip arp vrf PRD_Tenant
<snipped>

IP ARP Table for context PRD_Tenant
Total number of entries: 1
Address         Age       MAC Address     Interface       Flags
192.168.11.11   00:12:15  0000.000a.11ce  Vlan10 

2a. Host mobility manager ( HMM) learns MAC-IP as a local route in its local host DB, then sends it to L2RIB.

  • Observe the route is learned as /32 in the DB. The DB also has MAC, SVI, and a local interface.
  • We used Mac VRF in L2RIB in the previous blog for intra-VNI. Now we will use IP VRF within L2RIB to store Mac-IP info from the local host DB.
leaf-101# show fabric forwarding ip local-host-db vrf PRD_Tenant

HMM host IPv4 routing table information for VRF PRD_Tenant
Status: *-valid, x-deleted, D-Duplicate, DF-Duplicate and frozen, 
        c-cleaned in 00:07:35

    Host                 MAC Address        SVI        Flags      Physical Inter
face
*   192.168.11.11/32     0000.000a.11ce     Vlan10     0x420201   Ethernet1/3
  • Observe the contents of the IP VRF below.
leaf-101# show l2route mac-ip topology 10 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated (Orp):Orphan 
Topology    Mac Address    Host IP                                 Prod   Flags 
        Seq No     Next-Hops                              
----------- -------------- --------------------------------------- ------ ------
---- ---------- ---------------------------------------
10          0000.000a.11ce 192.168.11.11                           HMM    L,    
        0         Local                                  
            L3-Info: 10099

2b. We have enabled ARP suppression per VNI on VTEPs. Hence the ARP cache is updated with the MAC-IP information.

leaf-101# sh ip arp suppression-cache detail

<snipped>

Ip Address      Age      Mac Address    Vlan Physical-ifindex    Flags    Remote Vtep Addrs

192.168.11.11   00:16:22 0000.000a.11ce   10 Ethernet1/3         L

2c. HMM installs MAC-IP information from the ARP-Table into L3RIB

leaf-101# show ip route 192.168.11.11 vrf PRD_Tenant 
<snipped>

192.168.11.11/32, ubest/mbest: 1/0, attached
    *via 192.168.11.11, Vlan10, [190/0], 06:19:12, hmm
  1. Leaf 101 installs MAC-IP route from L2RIB into BGP L2 EVPN. Here we apply the necessary BGP PA ( ex: RT) before sending it out as a Type 2 MAC + IP route to iBGP neighbors.

Pay attention to the following fields in the BGP update :

  • Mac address = 00:00:00:0a:11:ce
  • IP = 192.168.11.11
  • RD = 172.16.50.101:32777
  • l2VNI = 10010
  • l3VNI = 10099
  • Encap type = VXLAN
  • Router MAC = 50:01:00:00:1b:08 (Used by the VTEP for the inner ethernet frame when it does VXLAN encapsulation)

VXLAN_underlay

leaf-101# sh bgp l2vpn evpn 192.168.11.11
BGP routing table information for VRF default, address family L2VPN EVPN
Route Distinguisher: 172.16.50.101:32777    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[32]:[192.168.11.11]/272, version 6
Paths: (1 available, best #1)
Flags: (0x000102) (high32 00000000) on xmit-list, is not in l2rib/evpn

  Advertised path-id 1
  Path type: local, path is valid, is best path, no labeled nexthop
  AS-Path: NONE, path locally originated
    172.16.100.101 (metric 0) from 0.0.0.0 (172.16.50.101)
      Origin IGP, MED not set, localpref 100, weight 32768
      Received label 10010 10099
      Extcommunity: RT:65501:10010 RT:65501:10099 ENCAP:8 Router MAC:5001.0000.1b08

  Path-id 1 advertised to peers:
    172.16.50.11       172.16.50.12
  1. Leaf 102 receives the route in its BGP process without modification.

5a. Next the route is imported into BGP RIB based on EVPN import RT and then into L2RIB (IP VRF)

  • Also during this import RD might be changed. Since VLAN = 10 on Leaf 101 but 100 on Leaf 102, received RD = 172.16.50.102:32777 was changed to 192.168.77.102:32787 (32767 + 100)

  • Observe that the route also appears against L3VNI = 10099 in the BGP EVPN table. The import RD is changed here too based on RD = BGP RID : VRF id = 172.16.50.102:3.

leaf-101# sh bgp l2vpn evpn 192.168.11.11
<snipped>


Route Distinguisher: 172.16.50.102:32867    (L2VNI 10010)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[32]:[192.168.11.11]/272, version 7
Paths: (1 available, best #1)
Flags: (0x000212) (high32 00000000) on xmit-list, is in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop, in rib
             Imported from 172.16.50.101:32777:[2]:[0]:[0]:[48]:[0000.000a.11ce]:[32]:[192.168.11.11
]/272 
  AS-Path: NONE, path sourced internal to AS
    172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 10099
      Extcommunity: RT:65501:10010 RT:65501:10099 ENCAP:8 Router MAC:5001.0000.1b08
      Originator: 172.16.50.101 Cluster list: 172.16.50.11 

  Path-id 1 not advertised to any peer


Route Distinguisher: 172.16.50.102:3    (L3VNI 10099)
BGP routing table entry for [2]:[0]:[0]:[48]:[0000.000a.11ce]:[32]:[192.168.11.11]/272, version 8
Paths: (1 available, best #1)
Flags: (0x000202) (high32 00000000) on xmit-list, is not in l2rib/evpn, is not in HW

  Advertised path-id 1
  Path type: internal, path is valid, is best path, no labeled nexthop
             Imported from 172.16.50.101:32777:[2]:[0]:[0]:[48]:[0000.000a.11ce]:[32]:[192.168.11.11]/272 
  AS-Path: NONE, path sourced internal to AS
    172.16.100.101 (metric 81) from 172.16.50.11 (172.16.50.11)
      Origin IGP, MED not set, localpref 100, weight 0
      Received label 10010 10099
      Extcommunity: RT:65501:10010 RT:65501:10099 ENCAP:8 Router MAC:5001.0000.1b08
      Originator: 172.16.50.101 Cluster list: 172.16.50.11 

  Path-id 1 not advertised to any peer

MAC-IP in L2RIB

leaf-102# show l2route mac-ip topology 100 detail
Flags -(Rmac):Router MAC (Stt):Static (L):Local (R):Remote (V):vPC link 
(Dup):Duplicate (Spl):Split (Rcv):Recv(D):Del Pending (S):Stale (C):Clear
(Ps):Peer Sync (Ro):Re-Originated (Orp):Orphan 
Topology    Mac Address    Host IP                                 Prod   Flags         Seq No     N
ext-Hops                              
----------- -------------- --------------------------------------- ------ ---------- ---------- ----
-----------------------------------
100         0000.000a.11ce 192.168.11.11                           BGP    --            0         17
2.16.100.101 (Label: 10010)          
            Sent To: ARP
            encap-type:1

5b. The route is now installed into VRF’s L3RIB from the above BGP RIB as we have mapped the L3VNI = 10099 to the VRF.

leaf-102# show ip route 192.168.11.11 vrf PRD_Tenant 
<snipped>

192.168.11.11/32, ubest/mbest: 1/0
    *via 172.16.100.101%default, [200/0], 06:04:38, bgp-65501, internal, tag 65501, segid: 10099 tun
nelid: 0xac106465 encap: VXLAN
  1. Since ARP suppression is enabled on VTEP switches, L2RIB updates the ARP cache
leaf-102# show ip arp suppression-cache  detail 

<snipped>

192.168.11.11   06:13:39 0000.000a.11ce  100 (null)              R        172.16.100.101
192.168.22.22   00:16:27 0000.0000.7011  200 Ethernet1/4         L

Inter-VNI Data plane

Check if you can ping tom from Alice to tom.

alice#ping 192.168.22.22
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.22.22, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 7/15/44 ms

We are using symmetric IRB in our setup here so the data plane works as follows

  • Switching: Since Alice and tom are in different subnets, Alice sends the ICMP request to anycast GW on leaf 101 ( VLAN 10)

  • Routing: Leaf 101 receives the frame and finds that 192.168.22.22 is learned via BGP ( as seen in RIB) with leaf 102 as the next hop. It encapsulates in VXLAN with L3VNI = 10099 and routes the packet to Leaf 102. Observe the inner frame destination MAC = Router MAC.

  • Routing: Leaf 102 removes the VXLAN header and makes a routing decision using VRF RIB by looking at VNI = 10099.

  • Switching: Finally the frame is switched out of VLAN 200.

ICMP request encapsulated by Leaf 101

VXLAN_underlay

ICMP reply encapsulated by Leaf 102

VXLAN_underlay