Cisco BGP EVPN VXLAN- Part 1

Here is an overview of the Cisco implementation for VXLAN using BGP EVPN for distributed control-plane operations. anycast gateway, and unicast head-end replication. I am using Cisco 9396PX devices for leaf switches and Cisco 9508 chassis switches for the spine using iBGP. We’ll explore the basic setup with the leaf switches being vPC enabled, including the Border Leaf switches, while also going over a few scenarios which can blackhole traffic and how to avoid this without a OSPF adjacency between the leaf switches.

**Please note, this is an older post I am using from my old blog series and I made this in 2015, new switches and newer versions of code provide cleaner configuration and significant enhancements**

This blog will assume you understand the basic setup of BGP EVPN VXLAN by reading the great Cisco documentation already available; thus, I presume you’re coming here for a more in-depth, real-world deployment scenario and for some better explanations and failure scenario testing and outputs

Below, this diagram shows the connectivity in the UNDERLAY network:

Cisco BGP EVPN UNDERLAY

You can see we have three spine switches, two configured as route reflectors for scalability. Below is the configuration of a single spine switch being used as a route reflector, the other route reflector is setup the same way, with IP addresses being different and such and, of course, the other spine switch not having any iBGP peering relationships with the third spine switch is just runs OSPF, forms adjacencies with all VTEPS for advertisement of VTEP IP reachability.


nv overlay evpn
feature ospf
feature bgp
feature nv overlay

router ospf 1
router-id 172.16.2.253
log-adjacency-changes
passive-interface default

interface Ethernet1/1
description Leaf01-9kA
link debounce time 0
mtu 9216
medium p2p
ip address 172.16.2.1/30
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf 1 area 0.0.0.0
no shutdown

interface loopback0
ip address 1.1.1.10/32
ip router ospf 1 area 0.0.0.0

router bgp 65000
router-id 1.1.1.10
address-family ipv4 unicast
neighbor 1.1.1.40
description VTEP1
password 3 SOMEPASSWORD
update-source loopback0
timers 3 9
address-family ipv4 unicast
address-family l2vpn evpn
send-community both
route-reflector-client
neighbor 1.1.1.41 remote-as 65000
description VTEP2
password 3 SOMEPASSWORD
update-source loopback0
timers 3 9
address-family l2vpn evpn
send-community both
route-reflector-client

The above forms the basis of the Underlay network on the spine and sets up the route-reflectors. We have tuned this for protocol convergence speed; thus, timers are aggressive for BGP and you’ll notice the “link debounce time 0”, which disabled link debounce. In a nutshell, by default, the debounce time is the amount of time after a switchport goes down for which the switchport will wait to notify the supervisor, 100msec by default. Disabling this allows immediate updating to the supervisor on a link failure to start protocol convergence. If you’re worried about an unstable interface, it is quite likely in the event of a link failing/flapping issue, the link-flap detection mechanism will down the port. Finally, we set BOTH the interface medium to p2p and set the OSPF network type to point-to-point. Why? In the event someone misses the command to switch OSPF to point-to-point, since this interface type is broadcast by default, the medium p2p command changes the ports operating mode and OSPF will properly adjust to point-to-point; thus, this is just good extra redundancy.

VXLAN-OVERLAY

Now, here is the overlay view, which is only relevant to leaf switches, pretend this is an OVERLAY named “Tenant-01”:

Below is the configuration:


nv overlay evpn
feature ospf
feature bgp
feature interface-vlan
feature vn-segment-vlan-based
feature lacp
feature vpc
feature nv overlay

fabric forwarding anycast-gateway-mac 0005.0005.0005
fabric forwarding dup-host-ip-addr-detection 5 180

class-map type qos match-any ONE
match cos 1
match dscp 26
class-map type qos match-any TWO
match cos 2
match dscp 16
class-map type qos match-any THREE
match cos 3
match dscp 48
policy-map type qos REST-YOUR-COS-FOR-UCS-FI
class SILVER
set cos 2
class GOLD
set cos 4
class PLATINUM
set cos 6
policy-map type qos FOR-THE-COS-IGNORANT
class class-default
set cos 2
set dscp 16

spanning-tree vlan 1-3967 hello-time 4

vlan 201
name VXLAN-VLAN01
vn-segment 100201
vlan 202
name VXLAN-VLAN02
vn-segment 900202
vlan 203
name VXLAN-VLAN03
vn-segment 900203
vlan 950
name vPC-underlay-ptp
vlan 2999
name VLAN-FOR-BRIDGE-DOMAIN
vn-segment 29999

vrf context Tenant01
vni 29999
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
address-family ipv6 unicast
route-target both auto
route-target both auto evpn

hardware access-list tcam region vacl 0
hardware access-list tcam region e-racl 0
hardware access-list tcam region span 0
hardware access-list tcam region redirect 256
hardware access-list tcam region rp-qos 0
hardware access-list tcam region rp-ipv6-qos 0
hardware access-list tcam region rp-mac-qos 0
hardware access-list tcam region e-ipv6-qos 256
hardware access-list tcam region e-qos-lite 256
hardware access-list tcam region arp-ether 256

vpc domain 100
peer-switch
role priority 8192
system-priority 8192
peer-keepalive destination 192.168.1.1 source 192.168.1.2 interval 500 timeout 3
delay restore 5
peer-gateway
auto-recovery
ipv6 nd synchronize
ip arp synchronize

interface vlan 950
no shutdown
mtu 9216
no ip redirects
ip address 1.1.2.0/31 #The other vPC switch would be 1.1.2.1/31
ip ospf network point-to-point
ip router 1 ospf area 0.0.0.0

interface Vlan2999
description L3-VXLAN-BD
no shutdown
mtu 9216
vrf member Tenant01
no ip redirects
ip forward
ipv6 forward
no ipv6 redirects

interface Vlan201
description NET01
no shutdown
mtu 9216
no ip redirects
management
vrf member VXLAN
ip address 10.0.0.1/24
no ipv6 nd redirects
fabric forwarding mode anycast-gateway

interface Vlan202
description NET02
no shutdown
mtu 9216
no ip redirects
vrf member Tenant02
ip address 10.0.1.1/24
fabric forwarding mode anycast-gateway

interface Vlan203
description NET03
no shutdown
mtu 9216
no ip redirects
vrf member Tenant01
ip address 10.0.2.1/24
fabric forwarding mode anycast-gateway

interface port-channel50
description To Ethernet Switch B
switchport mode trunk
vpc peer-link

interface port-channel201
description Fabric-Interconnect-A
switchport mode trunk
switchport trunk allowed vlan 201-203
spanning-tree port type edge trunk
mtu 9216
service-policy type qos output REST-YOUR-COS-FOR-UCS-FI
vpc 201

interface port-channel202
description Fabric-Interconnect-B
switchport mode trunk
switchport trunk allowed vlan 201-203
spanning-tree port type edge trunk
mtu 9216
service-policy type qos output REST-YOUR-COS-FOR-UCS-FI
vpc 202

interface nve1
no shutdown
source-interface loopback0
host-reachability protocol bgp
source-interface hold-down-time 120
member vni 29999 associate-vrf
member vni 100201-100202
suppress-arp
ingress-replication protocol bgp

interface Ethernet2/1
switchport mode trunk
channel-group 50 mode active

interface Ethernet2/2
switchport mode trunk
channel-group 50 mode active

interface Ethernet2/3
no switchport
link debounce time 0
medium p2p
mtu 9216
ip address 172.16.2.18/30
no ipv6 redirects
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf 1 area 0.0.0.0
no shutdown

interface Ethernet2/4
no switchport
link debounce time 0
medium p2p
mtu 9216
ip address 172.16.3.22/30
ip ospf network point-to-point
no ip ospf passive-interface
ip router ospf 1 area 0.0.0.0
no shutdown

interface loopback0
description Loopback for NVE VTEP
ip address 1.1.100.44/32
ip address 1.1.1.102/32 secondary
ip router ospf 1 area 0.0.0.0

interface loopback1
description Loopback for BGP update-source
ip address 1.1.1.44/32
ip router ospf 1 area 0.0.0.0

router ospf 1
router-id 172.16.2.18
passive-interface default
log-neigh-adj

router bgp 65000
router-id 1.1.1.44
log-neighbor-changes
address-family ipv4 unicast
maximum-paths ibgp 10
neighbor 1.1.1.10
description spine1
password 3 SOMEPASSWORD
update-source loopback1
timers 3 9
address-family ipv4 unicast
address-family l2vpn evpn
send-community both
neighbor 1.1.1.20
description spine2
password 3 SOMEPASSWORD
update-source loopback1
timers 3 9
address-family ipv4 unicast
address-family l2vpn evpn
send-community both
vrf Tenant01
address-family ipv4 unicast
advertise l2vpn evpn
maximum-paths ibgp 10
address-family ipv6 unicast
advertise l2vpn evpn
maximum-paths ibgp 6
evpn
vni 100201 l2
rd auto
route-target import auto
route-target export auto
vni 100202 l2
rd auto
route-target import auto
route-target export auto
vni 100203 l2
rd auto
route-target import auto
route-target export auto

ip tcp path-mtu-discovery
l2rib dup-host-mac-detection 5 180

A lot to see here, right? This is why I decided to break this into two parts, so this is part 1 and my next post is part 2 for border leafs and failure scenarios! Lets get this initial review over with!

I will just outline all the key points here:

  • policy-map type qos REST-YOUR-COS-FOR-UCS-FI – This is for those of you who utilize the COS in Cisco UCS and want to maintain your COS value AFTER your packet is VXLAN DE-CAPSULATED. With this EVPN VXLAN configuration, the original 802.1Q header is stripped at ingress; thus, no COS value remains, but if you set any DSCP at the virtual switch level it is maintained throughout so we’re assuming you’re marking DSCP at your virtual switch along with COS and you have your own unique mapping from COS to DSCP. So, you create the classes I have above, this is all for example, your mappings will/may be different, and then create a policy-map to match against the DSCP value marked from your virtual switch and set the appropriate COS value. You then set this as a QOS OUTBOUND policy on the port-channel towards your Fabric Interconnects, but you will have to adjust your TCAM entries for this to work. The other one, for the COS-IGNORANT, will be for devices which aren’t smart enough to set either the DSCP or COS value; thus, just apply this to the interface, inbound, and set your values as needed
  • fabric forwarding anycast-gateway-mac 0005.0005.0005 – This is for the anycast gateway mac address. You can get “funny” here, but I like to keep it simple, your choice.
  • fabric forwarding dup-host-ip-addr-detection 5 180 – I set the duplicate host IP detection to 5 moves in 180 seconds for my environment, tune to the values best suited for yours
  • track objects and object list – I set these to look for the BGP neighbor address of the route-reflectors in the routing table and then assign each of those to the track object list for later assignment to the VPC. Part 2 will show and explain why
  • hardware tcam entries – Follow these for success in this configuration, especially if you’re in need of using the outbound QOS service policies
  • VPC peer-keepalive and delay-restore timers – Set to our environment and for specific reasons we’ll explain in part 2
  • NVE source-interface hold-down – This timer is set to 120 seconds, tuned for our environment, from the default of 300 seconds. I will explain the use of this and why I use 120 seconds in part 2
  • Loopback0 – Used ONLY for the NVE VTEP interface
  • Loopback0 secondary address – for vPC enabled VTEPS only, this is the PROXY VTEP address used
  • Loopback1 – Used ONLY for BGP source-updates
  • BGP passwords – This is used for security in the Underlay, you can also utilize OSPF authentication too, for extra security
  • VLAN and interface VLan 950 – This is used strictly between the vPC switch pairs, in the underlay only. This allows for reachability in the event of a single switch in the vPC losing all spine links beause there will be a sub-optimal route which will be instantly placed into the routing table upon failure and allow for continuous reachability for BGP. This is only for allowing continuous forwarding and to prevent a blackhole for traffic and you’re meant to figure out what happened with your spine uplinks

So, like Forest Gump said to all his faithful followers “I’m pretty tired….I think I’ll go home now”. So, see you on Part 2, where the FUN is!!!