Transcription

GuideVXLAN Network withMP-BGP EVPN Control PlaneDesign Guide 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 1 of 44

ContentsIntroduction . 3MP-BGP EVPN Control Plane: Overview . 4Software and Hardware Support for the MP-BGP EVPN Control Plane . 4IP Transport Devices Running MP-BGP EVPN . 4VTEPs Running MP-BGP EVPN . 5Inter-VXLAN Routing . 5MP-BGP EVPN VXLAN Support on Cisco Nexus 9000 Series Switches . 5Multitenancy in MP-BGP EVPN . 5MP-BGP EVPN NLRI and L2VPN EVPN Address Family . 6Integrated Routing and Bridging with the MP-BGP EVPN Control Plane . 7Local-Host Learning . 8EVPN Route Advertisement and Remote-Host Learning . 8Symmetric and Asymmetric Integrated Routing and Bridging . 9VNIs for Bridge Domains and IP VRF Instances . 12VTEP Peer Discovery and Authentication in MP-BGP EVPN . 13Distributed Anycast Gateway in MP-BGP EVPN . 15ARP Suppression in MP-BGP EVPN . 15MP-BGP EVPN VTEP Configuration . 16Virtual Port-Channel VTEP in MP-BGP EVPN VXLAN. 20EVPN vPC VTEP Configuration . 21vPC VTEP MP-BGP Status and EVPN Route Updates . 24MP-BGP EVPN VXLAN Fabric Design . 26VXLAN Fabric with MP-iBGP EVPN . 27MP-iBGP Route Reflector on the Spine Layer . 27MP-iBGP Route Reflector on the Leaf Layer . 30MP-iBGP with Dedicated Route Reflectors . 31VXLAN Fabric with MP-eBGP EVPN . 31External Routing for MP-BGP EVPN VXLAN . 35Sample Configuration for eBGP Between the VXLAN EVPN Border Leaf and the External Router . 36Sample Configuration for OSPF Between the VXLAN EVPN Border Leaf and the External Router . 39Scalability Considerations for the EVPN VXLAN Border Leaf Nodes . 41Distribution of External Routes to the EVPN VXLAN Fabric . 41EVPN VXLAN Fabric Internal Network Advertisements to the Outside . 41EVPN Tenant Scalability on the Border Leaf Nodes . 42IP Host Route Scalability on the Border Leaf Nodes . 42Data Center Interconnect for MP-BGP EVPN VXLAN . 42Conclusion . 43For More Information . 43 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 2 of 44

IntroductionVirtual Extensible LAN (VXLAN) is an overlay technology for network virtualization. It provides Layer-2 extensionover a shared Layer-3 underlay infrastructure network by using MAC address in IP User Datagram Protocol(MAC in IP/UDP) tunneling encapsulation. The purpose of obtaining Layer-2 extension in the overlay network is toovercome the limitations of physical server racks and geographical location boundaries and achieve flexibility forworkload placement within a data center or between different data centers.The initial IETF VXLAN standards (RFC 7348) defined a multicast-based flood-and-learn VXLAN without a controlplane. It relies on data-driven flood-and-learn behavior for remote VXLAN tunnel endpoint (VTEP) peer discoveryand remote end-host learning. The overlay broadcast, unknown unicast, and multicast traffic is encapsulated intomulticast VXLAN packets and transported to remote VTEP switches through the underlay multicast forwarding.Flooding in such a deployment can present a challenge for the scalability of the solution. The requirement toenable multicast capabilities in the underlay network also presents a challenge because some organizations do notwant to enable multicast in their data centers or WAN networks.To overcome the limitations of the flood-and-learn VXLAN as defined in RFC 7348, organizations can useMultiprotocol Border Gateway Protocol Ethernet Virtual Private Network (MP-BGP EVPN) as the control plane forVXLAN. MP-BGP EVPN has been defined by IETF as the standards-based control plane for VXLAN overlays. TheMP-BGP EVPN control plane provides protocol-based VTEP peer discovery and end-host reachability informationdistribution that allows more scalable VXLAN overlay network designs suitable for private and public clouds. TheMP-BGP EVPN control plane introduces a set of features that reduces or eliminates traffic flooding in the overlaynetwork and enables optimal forwarding for both west-east and south-north traffic.This document discusses the functions and configuration of MP-BGP EVPN and describes typical VXLAN overlaynetwork designs using MP-BGP EVPN.This document does not discuss the fundamentals of VXLAN, VXLAN in multicast-based flood-and-learn mode, orrelated network design options. For more information about VXLAN and VXLAN with multicast-based flood-andlearn, please refer to the following documents: VXLAN Overview: Cisco Nexus 9000 Series paper-c11729383.html. VXLAN Design with Cisco Nexus 9300 Platform paper-c11732453.html.This document assumes prior knowledge about BGP, MP-BGP, and BGP and Multiprotocol Label Switching(BGP/MPLS) IP VPN. For more information, refer to the following IETF RFC documents: RFC 4271 - Border Gateway Protocol 4 (BGP-4): https://tools.ietf.org/html/rfc4271 RFC 4760 - Multiprotocol Extensions for BGP-4: https://tools.ietf.org/html/rfc4760 RFC 4364 - BGP/MPLS IP VPNs: https://tools.ietf.org/html/rfc4364#page-15 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 3 of 44

MP-BGP EVPN Control Plane: OverviewMP-BGP EVPN is a control protocol for VXLAN based on industry standards. Prior to EVPN, VXLAN overlaynetworks operated in the flood-and-learn mode. In this mode, end-host information learning and VTEP discoveryare both data plane driven, with no control protocol to distribute end-host reachability information among VTEPs.MP-BGP EVPN changes this model. It introduces control-plane learning for end hosts behind remote VTEPs.It provides control-plane and data-plane separation and a unified control plane for both Layer-2 and Layer-3forwarding in a VXLAN overlay network.The MP-BGP EVPN control plane offers the following main benefits: The MP-BGP EVPN protocol is based on industry standards, allowing multivendor interoperability. It enables control-plane learning of end-host Layer-2 and Layer-3 reachability information, enablingorganizations to build more robust and scalable VXLAN overlay networks. It uses the decade-old MP-BGP VPN technology to support scalable multitenant VXLAN overlay networks. The EVPN address family carries both Layer-2 and Layer-3 reachability information, thus providingintegrated bridging and routing in VXLAN overlay networks. It minimizes network flooding through protocol-based host MAC/IP route distribution and AddressResolution Protocol (ARP) suppression on the local VTEPs. It provides optimal forwarding for east-west and north-south traffic and supports workload mobility with thedistributed anycast function. It provides VTEP peer discovery and authentication, mitigating the risk of rogue VTEPs in the VXLANoverlay network. It provides mechanisms for building active-active multihoming at Layer-2.Software and Hardware Support for the MP-BGP EVPN Control PlaneDepending on the role a device plays in a MP-BGP EVPN VXLAN network, it may need to support only the controlplane functions or both the control-plane and data-plane functions of the VXLAN network with the MP-BGP EVPNcontrol plane.IP Transport Devices Running MP-BGP EVPNIP transport devices provide IP routing in the underlay network. By running the MP-BGP EVPN protocol, theybecome part of the VXLAN control plane and distribute the MP-BGP EVPN routes among their MP-BGP EVPNpeers. Devices might be MP-iBGP EVPN peers or route reflectors, or MP External BGP (MP-eBGP) EVPN peers.Their OS software needs to support MP-BGP EVPN so that it can understand the MP-BGP EVPN updates anddistribute them to other MP-BGP EVPN peers using the standards-defined constructs. For data forwarding, IPtransport devices perform IP routing based only on the outer IP address of a VXLAN encapsulated packet. Theydon’t need to support the VXLAN data encapsulation and decapsulation functions. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 4 of 44

VTEPs Running MP-BGP EVPNVTEPs running MP-BGP EVPN need to support both the control-plane and data-plane functions. In the controlplane, they initiate MP-BGP EVPN routes to advertise their local hosts. They receive MP-BGP EVPN updates fromtheir peers and install the EVPN routes in their forwarding tables. For data forwarding, they encapsulate user trafficin VXLAN and send it over the IP underlay network. In the reverse direction, they receive VXLAN encapsulatedtraffic from other VTEPs, decapsulate it, and forward the traffic with native Ethernet encapsulation toward the host.The correct switch platforms need to be selected for the different network roles. For IP transport devices, thesoftware needs to support the MP-EVPN control plane, but the hardware doesn’t need to support VXLAN dataplane functions. For VTEP, the switch needs to support both the control-plane and data-plane functions.Inter-VXLAN RoutingThe MP-BGP EVPN control plane provides integrated routing and bridging by distributing both the Layer-2 andLayer-3 reachability information for end hosts on VXLAN overlay networks. Communication between hosts indifferent subnets requires inter-VXLAN routing. BGP EVPN enables this communication by distributing Layer-3reachability information in the form of either a host IP address route or an IP address prefix. In the data plane, theVTEP needs to support IP address route lookup and perform VXLAN encapsulation based on the lookup result.This capability is referred to as the VXLAN routing function. Not all switch hardware platforms support VXLANrouting, hence affecting the choice of hardware platform.MP-BGP EVPN VXLAN Support on Cisco Nexus 9000 Series SwitchesThe MP-BGP EVPN control plane for VXLAN was introduced into Cisco NX-OS Software Release 7.0(3)I1(1) forCisco Nexus 9000 Series Switches. The software functions will be implemented in the Cisco NX-OS software trainsfor other Cisco Nexus switch platforms, such as the Cisco Nexus 7000 Series Switches, as well.In Cisco NX-OS 7.0(3)I1(1), the Cisco Nexus 9300 platform switches support both the MP-BGP EVPN controlplane functions and the VTEP data-plane functions. The Cisco Nexus 9500 platform switches support the MP-BGPEVPN control-plane functions. The VTEP data-plane functions will be added to the Cisco Nexus 9500 platformswitches in a maintenance release of Cisco NX-OS 7.0(3)I1(1). The Cisco Nexus 9300 and 9500 platforms bothsupport inter-VXLAN routing in hardware.Although many of the MP-BGP EVPN functions and design discussions in this document are platform independent,because the Cisco Nexus 9000 Series is the first switch platform that supports this protocol, the examples arebased on the Cisco Nexus 9000 Series.Multitenancy in MP-BGP EVPNAs an extension to the existing MP-BGP, MP-BGP EVPN inherits the support for multitenancy with VPN using thevirtual routing and forwarding (VRF) construct. In MP-BGP EVPN, multiple tenants can co-exist and share acommon IP transport network while having their own separate VPNs in the VXLAN overlay network.In the EVPN VXLAN overlay network, VXLAN network identifiers (VNIs) define the Layer-2 domains and enforceLayer-2 segmentation by not allowing Layer-2 traffic to traverse VNI boundaries. Similarly, Layer-3 segmentationamong VXLAN tenants is achieved by applying Layer-3 VRF technology and enforcing routing isolation betweentenants by using a separate Layer-3 VNI mapped to each VRF instance. Each tenant has its own VRF routinginstance. IP subnets of the VNIs for a given tenant are in the same Layer-3 VRF instance that separates the Layer3 routing domain from the other tenants. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 5 of 44

Built-in multitenancy support is an advantage of MP-BGP EVPN VXLAN compared to multicast-based flood-andlearn VXLAN and other Layer-2 extension technologies without multitenancy capabilities. It makes VXLANtechnology more suitable for cloud networks, which are deployed using the multitenant model.MP-BGP EVPN NLRI and L2VPN EVPN Address FamilyLike other network routing control protocols, MP-BGP EVPN is designed to distribute network layer reachabilityinformation (NLRI) for the network. A unique feature of EVPN NLRI is that it includes both the Layer-2 and Layer-3reachability information for end hosts that reside in the EVPN VXLAN overlay network. In other words, it advertisesboth MAC and IP addresses of EVPN VXLAN end hosts. This capability forms the basis for VXLAN integratedrouting and bridging support.Layer-2 MAC addresses need to be distributed because VXLAN is a Layer-2 extension technology. Unlike atraditional VLAN, which is confined in a specific location in a network and remains within the Layer-2 and Layer-3boundary, a VNI is a virtual Layer-2 segment in the overlay network. However, from the underlay network point ofview, it can span multiple noncontiguous sites, reaching beyond the Layer-2 and Layer-3 boundary of the underlayinfrastructure (Figure 1). Traffic between end hosts in the same VNI needs to be bridged in the overlay network,which means that VTEP devices in a given VNI need to know about other MAC addresses of end hosts in this VNI.Distribution of MAC addresses through BGP EVPN allows unknown unicast flooding in the VXLAN to be reducedor eliminated.Figure 1.VNI across an Underlay IP NetworkLayer-3 host IP addresses are advertised through MP-BGP EVPN so that inter-VXLAN traffic can be routed to thedestination end host through an optimal path. For inter-VXLAN traffic that needs to be routed to the destination endhost, host-based IP routing can provide the optimal forwarding path to the exact location of the destination host.MP-BGP EVPN can also advertise the IP subnet prefix routes of VNIs. The prefix routes can be used to routetraffic to the destination hosts when the host IP routes are missing: for instance, when the host IP routes have notyet been learned by the VTEPs through MP-BGP. VTEP can also advertise the prefix routes to outside the VXLANnetwork if the subnets need to be routable and made known outside the VXLAN network. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 6 of 44

EVPN NLRI is carried in BGP using the BGP multiprotocol extension with a new address family called Layer-2 VPN(L2VPN) EVPN. Similar to the VPNv4 address-family in the BGP MPLS-based IP VPN (RFC 4364), the L2VPNEVPN address-family for EVPN uses route distinguishers (RDs) to maintain uniqueness among identical routes indifferent VRF instances, and uses route targets (RTs) to define the policies that determine how routes areadvertised and shared by different VRF instances.A route distinguisher is an 8-bit octet number used to distinguish one set of routes (one VRF instance) fromanother. It is a unique number prepended to each route so that if the same route is used in several different VRFinstances, BGP can treat them as distinct routes. The route distinguisher is transmitted along with the routethrough MP-BGP when EVPN routes are exchanged with MP-BGP peers.Route targets can be applied to a VRF instance to control the import and export of routes between this instanceand other VRF instances. The route-target attributes for a route are distributed in the form of a BGP extendedcommunity attribute, so the BGP configuration on the devices that run MP-BGP EVPN must be enabled togenerate or process extended community attributes.In the Cisco NX-OS implementation, the BGP route distinguisher and route target can be generated automaticallyfor ease of configuration. The BGP route distinguisher can be derived automatically from the VNI and BGP routerID of the VTEP switch, and the BGP route target can be generated automatically as the BGP AS: VNI.Alternatively, you also can manually configure the BGP route distinguisher and route target. If all the MP-BGPEVPN VTEPs in a network are Cisco Nexus switch platforms, the recommended approach is to use autogeneratedroute-distinguisher and route-target values. If multiple vendors’ VTEP devices are interoperating, therecommended approach is to manually configure the values to avoid problems caused by the differences invendors’ implementations. For eBGP deployment scenarios in which VTEPs are in different BGP domains, theBGP route targets must be manually assigned.Integrated Routing and Bridging with the MP-BGP EVPN Control PlaneThe MP-BGP EVPN control plane provides integrated routing and bridging by distributing both Layer-2 and Layer-3reachability information for the end host residing in the VXLAN overlay networks. Each VTEP performs locallearning to obtain MAC and IP address information from its locally attached hosts and then distributes thisinformation through the MP-BGP EVPN control plane. Hosts attached to remote VTEPs are learned remotelythrough the MP-BGP control plane. This approach reduces network flooding for end-host learning and providesbetter control over end-host reachability information distribution. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 7 of 44

Figure 2 shows an example of end-host NLRI learning and distribution in an MP-iBGP EVPN using route reflectors.Figure 2.MP-BGP EVPN Host NLRI Learning and DistributionLocal-Host LearningA VTEP in MP-BGP EVPN learns the MAC addresses and IP addresses of locally attached end hosts through locallearning. This learning can be local-data-plane based using the standard Ethernet and IP learning procedures,such as source MAC address learning from the incoming Ethernet frames and IP address learning when the hostssend Gratuitous ARP (GARP) and Reverse ARP (RARP) packets or ARP requests for the gateway IP address onthe VTEP. Alternatively, the learning can be achieved by using a control plane or through management-planeintegration between the VTEP and the local hosts.EVPN Route Advertisement and Remote-Host LearningAfter learning the local-host MAC and IP addresses, a VTEP advertises the host information in the MP-BGP EVPNcontrol plane so that this information can be distributed to other VTEPs. This approach enables EVPN VTEPs tolearn the remote end hosts in the MP-BGP EVPN control plane.The EVPN routes are advertised through the L2VPN EVPN address-family. The BGP L2VPN EVPN routes includethe following information: RD: Route distinguisher MAC address length: 6 bytes MAC address: Host MAC address IP address length: 32 or 128 IP address: Host IP address (IPv4 or IPv6) L2 VNI: VNI of the bridge domain to which the end host belongs L3 VNI: VNI associated with the tenant VRF routing instance 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 8 of 44

MP-BGP EVPN uses the BGP extended community attribute to transmit the exported route-targets in an EVPNroute. When an EVPN VTEP receives an EVPN route, it compares the route-target attributes in the received routeto its locally configured route-target import policy to decide whether to import or ignore the route. This approachuses the decade-old MP-BGP VPN technology (RFC 4364) and provides scalable multitenancy in which a nodethat does not have a VRF locally does not import the corresponding routes. VPN scaling can be further enhancedby the use of BGP constructs such as route-target-constrained route distribution (RFC 4684).When a VTEP switch originates MP-BGP EVPN routes for its locally learned end hosts, it uses its own VTEPaddress as the BGP next-hop. This BGP next-hop must remain unchanged through the route distribution acrossthe network because the remote VTEP must learn the originating VTEP address as the next-hop for VXLANencapsulation when forwarding packets for the overlay network.The underlay network provides IP reachability for all the VTEP addresses that are used to route the encapsulatedVXLAN packets toward the egress VTEP through the underlay network. The network devices in the underlaynetwork need to maintain routing information only for the VTEP addresses. They don’t need to learn the EVPNroutes. This approach simplifies the underlay network operation and increases its stability and scalability.Symmetric and Asymmetric Integrated Routing and BridgingThe IETF EVPN drafts define two integrated routing and bridging (IRB) semantics: asymmetric IRB and symmetricIRB. Cisco NX-OS for Cisco Nexus switch platforms implements symmetric IRB for its scalability advantages andsimplified Layer-2 and Layer-3 multitenancy support.Asymmetric IRBWith asymmetric IRB, the ingress VTEP performs both Layer-2 bridging and Layer-3 routing lookup, whereas theegress VTEP performs only Layer-2 bridging lookup. As shown in Figure 3, with asymmetric IRB, when a packettravels between two VNIs, the ingress VTEP routes the packet from the source VNI to the destination VNI. Theegress VTEP bridges the packet to the destination point within the destination VNI.Figure 3.VXLAN Routing with Asymmetric IRBAsymmetric IRB requires the ingress VTEP to be configured with both the source and destination VNIs for bothLayer-2 and Layer-3 forwarding. Essentially, this requires each VTEP to be configured with all VNIs in the VXLANnetwork and to learn ARP entries and MAC addresses for all the end hosts attached to those VNIs (Figure 4). Thisbehavior can cause scalability problems as the density of end hosts and/or the number of VXLAN VNIs in theoverlay network increase. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 9 of 44

Figure 4.VTEP VNI Membership in Asymmetric IRBSymmetric IRBWith symmetric IRB, both the ingress and egress VTEPs perform Layer-2 and Layer-3 lookups. Symmetric IRBintroduces some new logical constructs: Layer-3 VNI: Each tenant VRF instance is mapped to a unique Layer-3 VNI in the network. This mappingneeds to be consistent on all the VTEPs in network. All inter-VXLAN routed traffic is encapsulated with theLayer-3 VNI in the VXLAN header and provides the VRF context for the receiving VTEP. The receivingVTEP uses this VNI to determine the VRF context in which the inner IP packet needs to be forwarded. ThisVNI also provides the basis for enforcing Layer-3 segmentation in the data plane. VTEP router MAC address: Each VTEP has a unique system MAC address that other VTEPs can use forinter-VNI routing. This MAC address is referred to here as the router MAC address. The router MACaddress is used as the inner destination MAC address for the routed VXLAN packet.As shown in Figure 5, when a packet is sent from VNI A to VNI B, the ingress VTEP routes the packet to the Layer3 VNI. It rewrites the inner destination MAC address to the egress VTEP’s router MAC address and encodes theLayer-3 VNI in the VXLAN header. After the egress VTEP receives the encapsulated VXLAN packet, it firstdecapsulates the packet by removing the VXLAN header. Then it looks at the inner packet header. Because thedestination MAC address in the inner packet header is its own MAC address, it performs a Layer-3 routing lookup.The Layer-3 VNI in the VXLAN header provides the VRF context in which this routing lookup is performed. 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 10 of 44

Figure 5.VXLAN Routing with Symmetric IRBAdvantages of Symmetric IRBWith symmetric IRB, the ingress VTEP doesn’t need to know the destination VNI for inter-VNI routing. Therefore,VTEPs don’t need to learn and maintain MAC address information for the remote hosts attached to egress VNIs forwhich it doesn’t have local hosts (Figure 6). This approach results in better utilization of the MAC address table andARP adjacencies on a VTEP. For example, in Figure 6 all host MAC address and ARP adjacencies in VNI-B do notneed to be present on VTEP-1. As a result, the routing and bridging is more scalable than with asymmetric IRB.Cisco NX-OS implements symmetric IRB to achieve optimal learning and scaling.Figure 6.VTEP VNI Membership with Symmetric IRB 2016 Cisco and/or its affiliates. All rights reserved. This document is Cisco Public Information.Page 11 of 44

VNIs for Bridge Domains and IP VRF InstancesAn EVPN VXLAN tenant can have multiple Layer-2 networks, each with a corresponding VNI. These Layer-2networks are bridge domains in the overlay network. The VNIs which are associated with them are often referred toas Layer-2 (L2) VNIs. Each tenant also needs a Layer-3 (L3) VNI for symmetric IRB if inter-VXLAN routing isneeded. Although a VTEP can have all or a subset of the Layer-2 VNIs in an VXLAN EVPN, it must have theLayer-3 VNI for inter-VXLAN routing. All VTEPs in an EVPN must have the same Layer-3 VNI (Figure 7).Figure 7.VNIs for Bridge Domain and IP VRF InstancesWhen an EVPN VTEP performs forwarding lookup and VXLAN encapsulation for the packets it receives from itslocal end hosts, it uses either a Layer-2 VNI or the Layer-3 VNI in the VXLAN header, depending on whether thepackets need to be bridged or routed. If the destination MAC address in the original packet header does not belongto the local VTEP, the local VTEP performs a Layer-2 lookup and bridges the packet to the destination end hostthat is located in the same Layer-2 VNI as the source host. The local VTEP embeds this Layer-2 VNI in the VXLANheader. In this case, both the source and destination hosts are in the same Layer-2 broadcast domain. If thedestination MAC address belongs to the local VTEP switch - that is, if the local VTEP is the IP gateway for thesource host, and the source and destination hosts are in different IP subnets - the packet will be routed by the localVTEP. In this case, it performs Layer-3 routing lookup. It then encapsulates the packets with the Layer-3 VNI in theVXLAN header and rewrites the inner destination MAC address to the remote VTEP’s router MAC address. Uponreceipt of the encapsulated VXLAN packet, the remote VTEP performs another routing lookup based on the innerIP header because the inner destination MAC address in the received packet belongs to the remote VTEP itself.The destination VTEP address in the outer IP header of a VXLAN packet identifies the location of the destinati

distribution that allows more scalable VXLAN overlay network designs suitable for private and public clouds. The MP-BGP EVPN control plane introduces a set of features that reduces or eliminates traffic flooding in the overlay network and enables optim