CSCvz43359 Traffic using GENEVE overlay sometimes leaves wrong VNIC when GENEVE Offload is enabled on VIC14xx – FIX

According Release Notes for Cisco UCS Manager, Release 4.2(1l) We have a fix for CSCvz43359 Traffic using GENEVE overlay sometimes leaves wrong VNIC when GENEVE Offload is enabled on VIC14xx:

Defect IDSymptomFirst Bundle AffectedResolved in Release
The following caveats related to NSX-T are resolved in Release 4.2(1l)
CSCvz43359On a Cisco UCS server using an NSX-T topology, data traffic using a GENEVE overlay sometimes left the wrong vNIC when GENEVE Offload was enabled on a VIC 1400 series Fabric Interconnect. This issue is resolved.4.2(1d)C4.2(1l)C

Traffic using GENEVE overlay sometimes leaves wrong VNIC when GENEVE Offload is enabled on VIC14xx

Symptom: Rapid mac moves observed on Fabric Interconnect and northbound switches where mac address belongs to device using GENEVE overlay. pkcatp-uw in ESXi kernel was not able to observe this phenomenon. This was only observable via tcpdump on the physical VIC adapter in the debug shell.

Conditions: This was specifically seen in an NSX-T topology though more general use of GENEVE offloading in the hardware would likely show same behavior. The NSX-T TEP mac addresses should be ‘bound’ to a physical interface unless there is a topology change. In this circumstance, we observed the TEP macs rapidly moving from Fabric A to Fabric B and vice versa while the teaming/load balancing policy was set to Active/Active in ESXi and NSX. NSX-T uses BFD Control frames between hosts and BFD leverages GENEVE. When GENEVE Offloading is enabled in the VIC adapter policy, this causes some small number of these BFD frames to egress the wrong physical link which causes the unexpected mac move behavior on northbound devices.

Links:

ESXi host fails with PSOD “#PF Exception 14 in world xxxx:nsx-cfgagent” during bulk vMotions in a NSX-T Environment (87352)

Be aware of the issues below found in NSX-t 3.1.3 and 3.1.3.x. If you are considering moving to NSX-T 3.1.3.x, please upgrade directly to 3.1.3.6.1.

This issue is observed when bulk vMotions occur in the NSX-T environment, following are some of the probable scenarios:

  • Migration of multiple VMs with each VM comprising of multiple vNICs
  • Multiple IP sets configured in CIDR form per rule
  • Multiple rules containing same IP Sets
  • VMs from a non-upgraded NSX-T host migrated to an upgraded NSX-T host
PSOD_cfagent.PNG
Above scenarios may lead to PSOD with following Back trace :

Workaround

  • Set DRS to manual on the ESXi Cluster and avoid performing bulk vMotions

Fix

Fixed in NSX-T 3.1.3.6, but recommended to upgrade directly to 3.1.3.6.1 due to the issue below.

NSX-T 3.1.3.6 Edge configured with an L4 LB stops passing all traffic (87627)

This issue is fixed in NSX-T 3.1.3.6.1. Also, NSX-T 3.2 is not impacted by this.

Links

vExpert 2022 Awards Announcement

vExpert 2022 Awards Announcement

vExpert 2022 Awards Announcement

Tweet Thank you to everyone who applied for vExpert and to the vExpert PROs for managing the voting process, it’s a lot of work! We are pleased to announce the list of 2022 vExperts. You can visit the vExpert Directory to see the list and profiles of each vExpert. All of the new and returning … Continued The post vExpert 2022 Awards Announcement appeared first on VMware vExpert Blog.


VMware Social Media Advocacy