The lsdoctor tool is designed to help diagnose and resolve common issues related to the VMware vCenter Lookup Service. Here’s a quick overview of how to install, launch, and utilize its various functions effectively.
🛠️ Installation
To get started with lsdoctor, download the ZIP file provided and transfer it to the target node using a file transfer tool like WinSCP. If you encounter issues connecting to a vCenter Appliance using WinSCP, refer to VMware’s documentation for troubleshooting.
Steps:
Transfer the ZIP file to your vCenter node.
Extract the ZIP file:
VCSA (vCenter Server Appliance):
bash
unzip lsdoctor.zip
Key Functions of lsdoctor
The lsdoctor tool comes with various options for checking and fixing issues in the vCenter Lookup Service:
--lscheck (-l): Checks for common issues without making changes.
Usage:python lsdoctor.py -l
Follow-up: Review the JSON report for findings.
--pscHaUnconfigure (-p): Removes a PSC High Availability configuration.
Usage:python lsdoctor.py -p
Follow-up: Restart services and repoint your vCenter servers.
--stalefix (-s): Cleans up stale configurations from older upgrades.
Usage:python lsdoctor.py -s
Follow-up: Restart services and re-register external solutions.
--trustfix (-t): Resolves SSL trust issues in the Lookup Service.
Usage:python lsdoctor.py -t
Follow-up: Restart services on all nodes.
--solutionusers (-u): Recreates solution users for the node.
Usage:python lsdoctor.py -u
Follow-up: Restart services on the node.
--rebuild (-r): Rebuilds service registrations for the node.
Usage:python lsdoctor.py -r
Follow-up: Restart services and re-register external solutions.
VMware has released an important security advisory, VMSA-2024-0019, detailing updates for VMware vCenter Server that address two significant vulnerabilities: a heap-overflow vulnerability (CVE-2024-38812) and a privilege escalation vulnerability (CVE-2024-38813). Both of these vulnerabilities could have severe implications if exploited, making it crucial for administrators to apply the necessary patches promptly.
Heap-Overflow Vulnerability (CVE-2024-38812)
Description: The first vulnerability, identified as CVE-2024-38812, is a heap-overflow vulnerability found in the vCenter Server’s implementation of the DCERPC protocol. This issue has been classified by VMware as Critical, with a maximum CVSSv3 base score of 9.8, indicating the potential for severe impact.
Known Attack Vectors: A malicious actor with network access to the vCenter Server can exploit this vulnerability by sending a specially crafted network packet. Successful exploitation could lead to remote code execution (RCE), allowing the attacker to execute arbitrary code on the vCenter Server with potentially full system privileges. This level of access could be used to disrupt services, exfiltrate sensitive data, or further compromise the virtual environment.
Description: The second vulnerability, CVE-2024-38813, is a privilege escalation flaw within the vCenter Server. VMware has rated this issue as Important, with a CVSSv3 base score of 7.5. While not as severe as the heap-overflow vulnerability, it still poses a significant risk.
Known Attack Vectors: An attacker with network access to the vCenter Server can exploit this vulnerability by sending a specially crafted network packet. If successful, the attacker could escalate their privileges to root, gaining full administrative control over the vCenter Server. This level of access could enable the attacker to make unauthorized changes, access sensitive information, or disrupt the entire virtual infrastructure.
Exploring the vSphere environment, I’ve found that configuring a large Host Cache with VMFS Datastore can significantly extend reboot times.
It’s a delicate balance of performance gains versus system availability. For an in-depth look at my findings and the impact on your VMware setup, stay tuned.
/usr/lib/vmware/secureboot/bin/secureBoot.py -h
usage: secureBoot.py [-h] [-a | -c | -s]
optional arguments:
-h, --help show this help message and exit
-a, --acceptance-level-check
Validate acceptance levels for installed vibs
-c, --check-capability
Check if the host is ready to enable secure boot
-s, --check-status Check if UEFI secure boot is enabled
Check if the host is ready to enable secure boot
/usr/lib/vmware/secureboot/bin/secureBoot.py -c
Secure boot can be enabled: All vib signatures verified. All tardisks validated. All acceptance levels validated
After installing Windows Server 2022 update KB5022842 (OS Build 20348.1547), guest OS can not boot up when virtual machine(s) configured with secure boot enabled running on vSphere ESXi 6.7 U2/U3 or vSphere ESXi 7.0.x.
In VM vmware.log, there is ‘Image DENIED’ info like the below:
2023-02-15T05:34:31.379Z In(05) vcpu-0 - SECUREBOOT: Signature: 0 in db, 0 in dbx, 1 unrecognized, 0 unsupported alg.
2023-02-15T05:34:31.379Z In(05) vcpu-0 - Hash: 0 in db, 0 in dbx.
2023-02-15T05:34:31.379Z In(05) vcpu-0 - SECUREBOOT: Image DENIED.
To identify the location of vmware.log files:
Establish an SSH session to your host. For ESXi hosts
Log in to the ESXi Host CLI using root account.
To list the locations of the configuration files for the virtual machines registered on the host, run the below command:
#vim-cmd vmsvc/getallvms | grep -i "VM_Name"
The vmware.log file is located in virtual machine folder along with the vmx file.
Record the location of the .vmx configuration file for the virtual machine you are troubleshooting. For example:
Currently there is no resolution for virtual machines running on vSphere ESXi 6.7 U2/U3 and vSphere ESXi 7.0.x. However the issue doesn’t exist with virtual machines running on vSphere ESXi 8.0.x.
VMware strongly advises that you move away completely from using SD card/USB as a boot device option on any future server hardware.
SD cards can continue to be used for the bootbank partition provided that a separate persistent local device to store the OSDATA partition (32GB min., 128GB recommended) is available in the host. Preferably, the SD cards should be replaced with an M.2 or another local persistent device as the standalone boot option.
Restart the daemons for the specified solution ID.
daemon control
start
Start the daemons for the specified solution ID.
daemon control
stop
Stop the daemons for the specified DSDK built solution.
daemon info
get
Get running daemon status for the specified solution ID.
daemon info
list
List the installed DSDK built daemons.
hardware pci pcipassthru
list
Display PCI device passthru configuration.
hardware pci pcipassthru
set
Configure PCI device for passthrough.
network nic attachment
add
Attach one uplink as a branch to a trunk uplink with specified VLAN ID.
network nic attachment
list
Show uplink attachment information.
network nic attachment
remove
Detach a branch uplink from its trunk.
network nic dcb status
get
Get the DCB information for a NIC.
network nic hwCap activated
list
List activated hardware capabilities of physical NICs.
network nic hwCap supported
list
List supported hardware capabilities of physical NICs.
nvme adapter
list
List all NVMe adapters.
nvme controller
identify
Get NVMe Identify Controller data.
nvme controller
list
List all NVMe controllers.
nvme fabrics
connect
Connect to an NVMe controller on a specified target through an adapter.
nvme fabrics connection
delete
Delete persistent NVMe over Fabrics connection entries. Reboot required for settings to take effect.
nvme fabrics connection
list
List all persistent NVMe over Fabrics connection entries.
nvme fabrics
disable
Disable NVMe over Fabrics for a transport protocol.
nvme fabrics
disconnect
Disconnect a specified NVMe controller on the specified NVMe adapter.
nvme fabrics
discover
Discover NVMe controllers on the specified target port through the specified NVMe adapter and list all of them.
nvme fabrics
enable
Enable NVMe over Fabrics for a transport protocol.
nvme info
get
Get NVMe host information.
nvme namespace
identify
Get NVMe Identify Namespace data.
nvme namespace
list
List all NVMe namespaces.
rdma iser params
set
Change iSER kernel driver settings.
software addon
get
Display the installed Addon on the host.
software
apply
Applies a complete image with a software spec that specifies base image, addon and components to install on the host.
software baseimage
get
Display the installed baseimage on the host.
software component
apply
Installs Component packages from a depot. Components may be installed, upgraded. WARNING: If your installation requires a reboot, you need to disable HA first.
software component
get
Displays detailed information about one or more installed Components
software component
list
Lists the installed Component packages
software component
remove
Removes components from the host. WARNING: If your installation requires a reboot, you need to disable HA first.
software component signature
verify
Verifies the signatures of installed Components and displays the name, version, vendor, acceptance level and the result of signature verification for each of them.
software component vib
list
List VIBs in an installed Component.
software sources addon
get
Display details about Addons in the depots.
software sources addon
list
List all Addons in the depots.
software sources baseimage
get
Display details about a Base Image from the depot.
software sources baseimage
list
List all the Base Images in a depot.
software sources component
get
Displays detailed information about one or more Components in the depot
software sources component
list
List all the Components from depots.
software sources component vib
list
List VIB packages in the specified Component in a depot.
storage core device smart daemon
start
Enable smartd.
storage core device smart daemon status
get
Get status of smartd.
storage core device smart daemon
stop
Disable smartd.
storage core device smart status
get
Get status of SMART stats on a device.
storage core device smart status
set
Enable or disable SMART stats gathering on a device.
system ntp config
get
Display Network Time Protocol configuration.
system ntp
get
Display Network Time Protocol configuration
system ntp
set
Configures the ESX Network Time Protocol agent.
system ptp
get
Display Precision Time Protocol configuration
system ptp
set
Configures the ESX Precision Time Protocol agent.
system ptp stats
get
Report operational state of Precision Time Protocol Daemon
vm appinfo
get
Get the state of appinfo component on the ESXi host.
vm appinfo
set
Modify the appinfo component on the ESXi host.
vsan network security
get
Get vSAN network security configurations.
vsan network security
set
Configure vSAN network security settings.
The ESXCLI command set allows you to run common system administration commands against vSphere systems from an administration server of your choice. The actual list of commands depends on the system that you are running on. Run esxcli --help for a list of commands on your system.
I upgraded vCenter to version 7 successfully but failed when it came to updating my hosts from 6.7 to 7.
I got some warning stating PCI devices were incompatible but tried anyways. Turns out that failed, my Mellanox ConnectX 2 wasn’t showing up as an available physical NIC.
At first It was necessary to have VID/DID device code for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].
Partner
Product
Driver
VID
DID
Mellanox
MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s]
mlx4_core
15b3
6750
Whole table We could check here or search mlx to see all Mellanox cards list.
Deprecated devices supported by VMKlinux drivers
Devices that were only supported in 6.7 or earlier by a VMKlinux inbox driver. These devices are no longer supported because all support for VMKlinux drivers and their devices have been completely removed in 7.0.
*********************************************************************
/vmfs/volumes/ISO/tmp-network/etc/vmware/default.map.d/nmlx4_core.map
*********************************************************************
regtype=native,bus=pci,id=15b301f6..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b301f8..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31003..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31004..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b31007..............,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30003......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30006......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30007......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b30008......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b3000c......,driver=nmlx4_core
regtype=native,bus=pci,id=15b3100715b3000d......,driver=nmlx4_core
regtype=native,bus=pci,id=15b36750..............,driver=nmlx4_core
------------------------->Last Line is FIX
And add HW ID support in file nmlx4_core.ids:
**************************************************************************************
/vmfs/volumes/FreeNAS/ISO/tmp-network/usr/share/hwdata/default.pciids.d/nmlx4_core.ids
**************************************************************************************
#
# This file is mechanically generated. Any changes you make
# manually will be lost at the next build.
#
# Please edit <driver>_devices.py file for permanent changes.
#
# Vendors, devices and subsystems.
#
# Syntax (initial indentation must be done with TAB characters):
#
# vendor vendor_name
# device device_name <-- single TAB
# subvendor subdevice subsystem_name <-- two TABs
15b3 Mellanox Technologies
01f6 MT27500 [ConnectX-3 Flash Recovery]
01f8 MT27520 [ConnectX-3 Pro Flash Recovery]
1003 MT27500 Family [ConnectX-3]
1004 MT27500/MT27520 Family [ConnectX-3/ConnectX-3 Pro Virtual Function]
1007 MT27520 Family [ConnectX-3 Pro]
15b3 0003 ConnectX-3 Pro VPI adapter card; dual-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX354A-FCC)
15b3 0006 ConnectX-3 Pro EN network interface card 40/56GbE dual-port QSFP(MCX314A-BCCT )
15b3 0007 ConnectX-3 Pro EN NIC; 40GigE; dual-port QSFP (MCX314A-BCC)
15b3 0008 ConnectX-3 Pro VPI adapter card; single-port QSFP; FDR IB (56Gb/s) and 40GigE (MCX353A-FCC)
15b3 000c ConnectX-3 Pro EN NIC; 10GigE; dual-port SFP+ (MCX312B-XCC)
15b3 000d ConnectX-3 Pro EN network interface card; 10GigE; single-port SFP+ (MCX311A-XCC)
6750 Mellanox ConnectX-2 Dual Port 10GbE
-------->Last Line is FIX
After reboot I could see support for MT26448 [ConnectX EN 10GigE , PCIe 2.0 5GT/s].
Only ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core].
2020-XX-XXTXX:XX:44.473Z cpu0:2097509)ALERT: Failed to verify signatures of the following vib(s): [nmlx4-core]. All tardisks validated
2020-XX-XXTXX:XX:47.909Z cpu1:2097754)Loading module nmlx4_core ...
2020-XX-XXTXX:XX:47.912Z cpu1:2097754)Elf: 2052: module nmlx4_core has license BSD
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)<NMLX_INF> nmlx4_core: init_module called
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Device: 194: Registered driver 'nmlx4_core' from 42
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)Mod: 4845: Initialization of nmlx4_core succeeded with module ID 42.
2020-XX-XXTXX:XX:47.921Z cpu1:2097754)nmlx4_core loaded successfully.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_core_Attach - (nmlx4_core_main.c:2476) running
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:47.951Z cpu1:2097754)DMA: 688: DMA Engine 'nmlx4_core' created using mapper 'DMANull'.
2020-XX-XXTXX:XX:49.724Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:382) Requested RoCE mode RoCEv1
2020-XX-XXTXX:XX:49.724Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_ChooseRoceMode - (nmlx4_core_main.c:422) Requested RoCE mode is supported - choosing RoCEv1
2020-XX-XXTXX:XX:49.934Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_CmdInitHca - (nmlx4_core_fw.c:1408) Initializing device with B0 steering support
2020-XX-XXTXX:XX:50.561Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1744) Granted 38 MSIX vectors
2020-XX-XXTXX:XX:50.561Z cpu1:2097754)<NMLX_INF> nmlx4_core: 0000:05:00.0: nmlx4_InterruptsAlloc - (nmlx4_core_main.c:1766) Using MSIX
2020-XX-XXTXX:XX:50.781Z cpu1:2097754)Device: 330: Found driver nmlx4_core for device 0xxxxxxxxxxxxxxxxxxxxxxx
Some 10 Gbps tuning testing looks great, between 2x ESXi 7.0 with 2x MT2644:
Support for RoCEv2 is above card – Mellanox ConnectX-3 Pro
We can see RoCEv2 options in nmlx2_core driver, but when I enabled enable_rocev2 It is NOT working
[root@esxi~] esxcli system module parameters list -m nmlx4_core
Name Type Value Description
---------------------- ---- ----- -----------
enable_64b_cqe_eqe int Enable 64 byte CQEs/EQEs when the the FW supports this
enable_dmfs int Enable Device Managed Flow Steering
enable_qos int Enable Quality of Service support in the HCA
enable_rocev2 int Enable RoCEv2 mode for all devices
enable_vxlan_offloads int Enable VXLAN offloads when supported by NIC
log_mtts_per_seg int Log2 number of MTT entries per segment
log_num_mgm_entry_size int Log2 MGM entry size, that defines the number of QPs per MCG, for example: value 10 results in 248 QP per MGM entry
msi_x int Enable MSI-X
mst_recovery int Enable recovery mode(only NMST module is loaded)
rocev2_udp_port int Destination port for RoCEv2
It is officialy NOT supported. Use it only in your HomeLAB. But We could save some money for new 10Gbps network cards.
Cisco has released the 3.0(4) version of the Cisco UCS Manager VMware vSphere HTML client plugin. The UCS Manager vSphere HTML client plugin enables a virtualization administrator to view, manage, and monitor the Cisco UCS physical infrastructure. The plugin provides a physical view of the UCS hardware inventory on the HTML client.