ESXi host fails with PSOD “#PF Exception 14 in world xxxx:nsx-cfgagent” during bulk vMotions in a NSX-T Environment (87352)

Be aware of the issues below found in NSX-t 3.1.3 and 3.1.3.x. If you are considering moving to NSX-T 3.1.3.x, please upgrade directly to 3.1.3.6.1.

This issue is observed when bulk vMotions occur in the NSX-T environment, following are some of the probable scenarios:

  • Migration of multiple VMs with each VM comprising of multiple vNICs
  • Multiple IP sets configured in CIDR form per rule
  • Multiple rules containing same IP Sets
  • VMs from a non-upgraded NSX-T host migrated to an upgraded NSX-T host
PSOD_cfagent.PNG
Above scenarios may lead to PSOD with following Back trace :

Workaround

  • Set DRS to manual on the ESXi Cluster and avoid performing bulk vMotions

Fix

Fixed in NSX-T 3.1.3.6, but recommended to upgrade directly to 3.1.3.6.1 due to the issue below.

NSX-T 3.1.3.6 Edge configured with an L4 LB stops passing all traffic (87627)

This issue is fixed in NSX-T 3.1.3.6.1. Also, NSX-T 3.2 is not impacted by this.

Links

Author: Daniel Micanek

Senior Service Architect, SAP Platform Services Team at Tietoevry | SUSE SCA | vExpert ⭐⭐⭐⭐⭐ | vExpert NSX | VCIX-DCV/NV | VCAP-DCV/NV Design+Deploy | VCP-DCV/NV/CMA/TKO/DTM | NCIE-DP | OCP | Azure Solutions Architect | Certified Kubernetes Administrator (CKA)