ESXi 7.0 and Mellanox ConnectX 2 – support fix patch

I upgraded vCenter to version 7 successfully but failed when it came to updating my hosts from 6.7 to 7.

I got some warning stating PCI devices were incompatible but tried anyways. Turns out that failed, my Mellanox ConnectX 2 wasn’t showing up as an available physical NIC.

At first It was necessary to have VID/DID device code for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].

PartnerProductDriverVIDDID
MellanoxMT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s]mlx4_core15b36750
Whole table We could check here or search mlx to see all Mellanox cards list.

Deprecated devices supported by VMKlinux drivers

Devices that were only supported in 6.7 or earlier by a VMKlinux inbox driver. These devices are no longer supported because all support for VMKlinux drivers and their devices have been completely removed in 7.0.
PartnerProductDriverVID
MellanoxMT26428 [ConnectX VPI - 10GigE / IB QDR, PCIe 2.0 5GT/s]mlx4_core15b3
MellanoxMT26488 [ConnectX VPI PCIe 2.0 5GT/s - IB DDR / 10GigE Virtualization+]mlx4_core15b3
MellanoxMT26438 [ConnectX VPI PCIe 2.0 5GT/s - IB QDR / 10GigE Virtualization+]mlx4_core15b3
MellanoxMT25408 [ConnectX VPI - 10GigE / IB SDR]mlx4_core15b3
MellanoxMT27560 Familymlx4_core15b3
MellanoxMT27551 Familymlx4_core15b3
MellanoxMT27550 Familymlx4_core15b3
MellanoxMT27541 Familymlx4_core15b3
MellanoxMT27540 Familymlx4_core15b3
MellanoxMT27531 Familymlx4_core15b3
MellanoxMT25448 [ConnectX EN 10GigE, PCIe 2.0 2.5GT/s]mlx4_core15b3
MellanoxMT25408 [ConnectX EN 10GigE 10GBaseT, PCIe Gen2 5GT/s]mlx4_core15b3
MellanoxMT27561 Familymlx4_core15b3
MellanoxMT26468 [ConnectX EN 10GigE, PCIe 2.0 5GT/s Virtualization+]mlx4_core15b3
MellanoxMT26418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 5GT/s]mlx4_core15b3
MellanoxMT27510 Familymlx4_core15b3
MellanoxMT26488 [ConnectX VPI PCIe 2.0 5GT/s - IB DDR / 10GigE Virtualization+]mlx4_core15b3
MellanoxMT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s]mlx4_core15b3
MellanoxMT25418 [ConnectX VPI - 10GigE / IB DDR, PCIe 2.0 2.5GT/s]mlx4_core15b3
MellanoxMT27530 Familymlx4_core15b3
MellanoxMT27521 Familymlx4_core15b3
MellanoxMT27511 Familymlx4_core15b3
MellanoxMT25408 [ConnectX EN 10GigE 10BASE-T, PCIe 2.0 2.5GT/s]mlx4_core15b3
MellanoxMT25408 [ConnectX IB SDR Flash Recovery]mlx4_core15b3
MellanoxMT25400 Family [ConnectX-2 Virtual Function]mlx4_core15b3

Deprecated devices supported by VMKlinux drivers – full table list

How to fix it? I tuned small script ESXi7-enable-nmlx4_co.v00.sh to DO IT. Notes:

  • edit patch to Your Datastore example is /vmfs/volumes/ISO
  • nmlx4_co.v00.orig is backup for original nmlx4_co.v00
  • New VIB is without signatures – ALERT message will be in log during reboot:
    • ALERT: Failed to verify signatures of the following vib
  • ESXi reboot is needed for load new driver
cp /bootbank/nmlx4_co.v00 /vmfs/volumes/ISO/nmlx4_co.v00.orig
cp /bootbank/nmlx4_co.v00 /vmfs/volumes/ISO/n.tar
cd /vmfs/volumes/ISO/
vmtar -x n.tar -o output.tar
rm -f n.tar
mkdir tmp-network
mv output.tar tmp-network/output.tar
cd tmp-network
tar xf output.tar
rm output.tar
echo '' >> /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
echo 'regtype=native,bus=pci,id=15b36750..............,driver=nmlx4_core' >> /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
cat /vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
echo '        6750  Mellanox ConnectX-2 Dual Port 10GbE '                 >> /vmfs/volumes/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids 
cat /vmfs/volumes/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids
tar -cf /vmfs/volumes/ISO/FILE.tar *
cd /vmfs/volumes/ISO/
vmtar -c FILE.tar -o output.vtar
gzip output.vtar
mv output.vtar.gz nmlx4_co.v00
rm FILE.tar
cp /vmfs/volumes/ISO/nmlx4_co.v00 /bootbank/nmlx4_co.v00

Scripts add HW ID support in file nmlx4_core.map:

*********************************************************************
/vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
*********************************************************************
regtype=native,bus=pci,id=15b301f6..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b301f8..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31003..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31004..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31007..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30003......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30006......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30007......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30008......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b3000c......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b3000d......,driver=nmlx4_core
regtype=native,bus=pci,id=15b36750..............,driver=nmlx4_core
------------------------->Last Line is FIX

And add HW ID support in file nmlx4_core.ids:

**************************************************************************************
/vmfs/volumes/FreeNAS/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids 
**************************************************************************************
#
# This file is mechanically generated.  Any changes you make
# manually will be lost at the next build.
#
# Please edit <driver>_devices.py file for permanent changes.
#
# Vendors, devices and subsystems.
#
# Syntax (initial indentation must be done with TAB characters):
#
# vendor  vendor_name
#       device  device_name                            <-- single TAB
#               subvendor subdevice  subsystem_name    <-- two TABs

15b3  Mellanox Technologies
        01f6  MT27500 [ConnectX-3 Flash Recovery]
        01f8  MT27520 [ConnectX-3 Pro Flash Recovery]
        1003  MT27500 Family [ConnectX-3]
        1004  MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
        1007  MT27520 Family [ConnectX-3 Pro]
                15b3 0003  ConnectX-3 Pro VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX354A-FCC)
                15b3 0006  ConnectX-3 Pro EN network interface card 40/56GbE dual-port QSFP(MCX314A-BCCT )
                15b3 0007  ConnectX-3 Pro EN NIC; 40GigE; dual-port QSFP (MCX314A-BCC)
                15b3 0008  ConnectX-3 Pro VPI adapter card; single-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX353A-FCC)
                15b3 000c  ConnectX-3 Pro EN NIC; 10GigE; dual-port SFP+ (MCX312B-XCC)
                15b3 000d  ConnectX-3 Pro EN network interface card; 10GigE; single-port SFP+ (MCX311A-XCC)
        6750  Mellanox ConnectX-2 Dual Port 10GbE
-------->Last Line is FIX

After reboot I could see support for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].

Only ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core].

2020-XX-XXTXX:XX:44.473Z cpu0:2097509)ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core]. All tardisks validated
2020-XX-XXTXX:XX:47.909Z cpu1:2097754)Loading module nmlx4_core ...
2020-XX-XXTXX:XX:47.912Z cpu1:2097754)Elf: 2052: module nmlx4_core has license BSD
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)<NMLX_INF> nmlx4_core: init_module called
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Device: 194: Registered driver 'nmlx4_core' from 42
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Mod: 4845: Initialization of nmlx4_core succeeded with module ID 42.
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)nmlx4_core loaded successfully.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_core_Attach - (nmlx4_core_main.c:2476) running
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:49.724Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:382) Requested RoCE mode RoCEv1
2020-XX-XXTXX:XX:49.724Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:422) Requested RoCE mode is supported - choosing RoCEv1
2020-XX-XXTXX:XX:49.934Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_CmdInitHca - (nmlx4_core_fw.c:1408) Initializing device with B0 steering support
2020-XX-XXTXX:XX:50.561Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1744) Granted 38 MSIX vectors
2020-XX-XXTXX:XX:50.561Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1766) Using MSIX
2020-XX-XXTXX:XX:50.781Z cpu1:2097754)Device: 330: Found driver nmlx4_core for device 0xxxxxxxxxxxxxxxxxxxxxxx

Some 10 Gbps tuning testing looks great, between 2x ESXi 7.0 with 2x MT2644:

[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-120.00 sec   131 GBytes  9380 Mbits/sec    0             sender
[  4]   0.00-120.00 sec   131 GBytes  9380 Mbits/sec                  receiver
RDMA support for RoCEv1

RoCEv1 is only supported, because:

  • Support for RoCEv2 is above card – Mellanox ConnectX-3 Pro
  • We can see RoCEv2 options in nmlx2_core driver, but when I enabled enable_rocev2 It is NOT working
[root@esxi~] esxcli system module parameters list -m nmlx4_core
Name                    Type  Value  Description
----------------------  ----  -----  -----------
enable_64b_cqe_eqe      int          Enable 64 byte CQEs/EQEs when the the FW supports this
enable_dmfs             int          Enable Device Managed Flow Steering
enable_qos              int          Enable Quality of Service support in the HCA
enable_rocev2           int          Enable RoCEv2 mode for all devices
enable_vxlan_offloads   int          Enable VXLAN offloads when supported by NIC
log_mtts_per_seg        int          Log2 number of MTT entries per segment
log_num_mgm_entry_size  int          Log2 MGM entry size, that defines the number of QPs per MCG, for example: value 10 results in 248 QP per MGM entry
msi_x                   int          Enable MSI-X
mst_recovery            int          Enable recovery mode(only NMST module is loaded)
rocev2_udp_port         int          Destination port for RoCEv2

It is officialy NOT supported. Use it only in your HomeLAB. But We could save some money for new 10Gbps network cards.

Cisco UCS Manager Plugin for VMware vSphere HTML Client (Version 3.0(4))

Cisco has released the 3.0(4) version of the Cisco UCS Manager VMware vSphere HTML client plugin. The UCS Manager vSphere HTML client plugin enables a virtualization administrator to view, manage, and monitor the Cisco UCS physical infrastructure. The plugin provides a physical view of the UCS hardware inventory on the HTML client.

Here are the new features in Release 3.0(4):

  • Support for VMware vSphere HTML Client 7.0
  • Included defect fix

Resolved Bugs

Defect IDSymptomFirst Affected in ReleaseResolved in Release
CSCvv01618UCSM domain not visible in HTML plug-in 3.0(2).3.0(2)3.0(4)
More info here.

FIX: The virtual machine is configured for too much PMEM. 6.0 TB main memory and x PMEM exceeds the maximum allowed 6.0 TB

VMware supports Intel Optane PMem with vSphere 6.7 and 7.0. VMware and Intel worked with SAP to complete the validation of SAP HANA with Optane PMem enabled VMs.

I configured PoC testing VM RAM 1 TB and 1 TB PMem. I was unable to power on. Error – The virtual machine is configured for too much PMem. 6.0 TB main memory and 1024 GB PMem exceeds the maximum allowed 6.0 TB.

Problem was with enable Memory Hot Plug, because there is a limit calculation:

With enable Memory Hot Plug
* Limit 6TB <= size of RAM x 16 + size of PMEM

With disable Memory Hot Plug
* Limit 6TB <= size of RAM + size of PMEM

SAP HANA does not support hot-add memory. Because of this, hot-add memory was not validated by SAP and VMware with SAP HANA and is therefore not supported. According SAP HANA on VMware vSphere

Example how to reproduce an error:

6 TB limit
1 TB PMem
5 * 1024 / 16 = 320 GB

With enable Memory Hot Plug I can't start VM with more than 321 GB RAM.