DELL R610 or R710: How to Convert an H200A to H200I for Dedicated Slot Use

For my project involving the AI tool llama.cpp, I needed to free up a PCI slot for an NVIDIA Tesla P40 GPU. I found an excellent guide and a useful video from ArtOfServer.

Based on this helpful video from ArtOfServer:

ArtOfServer wrote a small tutorial on how to modify an H200A (external) into an H200I (internal) to be used into the dedicated slot (e.g. instead of a Perc6i)

ArtOfServer wrote a small tutorial on how to modify an H200A (external) into an H200I (internal) to be used into the dedicated slot (e.g. instead of a Perc6i)

Install compiler and build tools (those can be removed later)

# apt install build-essential unzip

Compile and install lsirec and lsitool

# mkdir lsi
# cd lsi
# wget https://github.com/marcan/lsirec/archive/master.zip
# wget https://github.com/exactassembly/meta-xa-stm/raw/master/recipes-support/lsiutil/files/lsiutil-1.72.tar.gz
# tar -zxvvf lsiutil-1.72.tar.gz
# unzip master.zip
# cd lsirec-master
# make
# chmod +x sbrtool.py
# cp -p lsirec /usr/bin/
# cp -p sbrtool.py /usr/bin/
# cd ../lsiutil
# make -f Makefile_Linux

Modify SBR to match an internal H200I

Get bus address:

# lspci -Dmmnn | grep LSI
0000:05:00.0 "Serial Attached SCSI controller [0107]" "LSI Logic / Symbios Logic [1000]" "SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [0072]" -r03 "Dell [1028]" "6Gbps SAS HBA Adapter [1f1c]"

Bus address 0000:05:00.0

We are going to change id 0x1f1c to 0x1f1e

Unbind and halt card:

# lsirec 0000:05:00.0 unbind
Trying unlock in MPT mode...
Device in MPT mode
Kernel driver unbound from device
# lsirec 0000:05:00.0 halt
Device in MPT mode
Resetting adapter in HCB mode...
Trying unlock in MPT mode...
Device in MPT mode
IOC is RESET

Read sbr:

# lsirec 0000:05:00.0 readsbr h200.sbr
Device in MPT mode
Using I2C address 0x54
Using EEPROM type 1
Reading SBR...
SBR saved to h200.sbr

Transform binary sbr to text file:

# sbrtool.py parse h200.sbr h200.cfg

Modify PID in line 9 (e.g using vi or vim):
from this:
SubsysPID = 0x1f1c
to this:
SubsysPID = 0x1f1e

Important: if in the cfg file you find a line with:
SASAddr = 0xfffffffffffff
remove it!
Save and close file.

Build new sbr file:

# sbrtool.py build h200.cfg h200-int.sbr

Write it back to card:

# lsirec 0000:05:00.0 writesbr h200-int.sbr
Device in MPT mode
Using I2C address 0x54
Using EEPROM type 1
Writing SBR...
SBR written from h200-int.sbr

Reset the card an rescan the bus:

# lsirec 0000:05:00.0 reset
Device in MPT mode
Resetting adapter...
IOC is RESET
IOC is READY
# lsirec 0000:05:00.0 info
Trying unlock in MPT mode...
Device in MPT mode
Registers:
DOORBELL: 0x10000000
DIAG: 0x000000b0
DCR_I2C_SELECT: 0x80030a0c
DCR_SBR_SELECT: 0x2100001b
CHIP_I2C_PINS: 0x00000003
IOC is READY
# lsirec 0000:05:00.0 rescan
Device in MPT mode
Removing PCI device...
Rescanning PCI bus...
PCI bus rescan complete.

Verify new id (H200I):

# lspci -Dmmnn | grep LSI
0000:05:00.0 "Serial Attached SCSI controller [0107]" "LSI Logic / Symbios Logic [1000]" "SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] [0072]" -r03 "Dell [1028]" "PERC H200 Integrated [1f1e]"

You can now move the card to the dedicated slot 🙂

Thanks to ArtOfServer for a great video.

Host Cache can significantly extend reboot times

Exploring the vSphere environment, I’ve found that configuring a large Host Cache with VMFS Datastore can significantly extend reboot times.

It’s a delicate balance of performance gains versus system availability. For an in-depth look at my findings and the impact on your VMware setup, stay tuned.

Host Cache can significantly extend reboot times

How to Configure NVMe/TCP with vSphere 8.0 Update 1 and ONTAP 9.13.1 for VMFS Datastores

vSphere 8U1 – Deep dive on configuring NVMe-oF (Non-Volatile Memory Express over Fabrics) for VMware vSphere datastores.
What’s new

With vSphere 8.0 update 1, VMware has completed their journey to a completely native end-to-end NVMe storage stack. Prior to 8.0U1, there was a SCSI translation layer which added some complexity to the stack and slightly decreased some of the efficiencies inherent in the NVMe protocol.

ONTAP 9.12.1 added support for secure authentication over NVMe/TCP as well as increasing NVMe limits (viewable on the NetApp Hardware Universe [HWU]).

For more info and source blog please check great post How to Configure NVMe/TCP with vSphere 8.0 Update 1 and ONTAP 9.13.1 for VMFS Datastores

Deprecation of legacy BIOS support in vSphere 8.0 (84233) + Booting vSphere ESXi 8.0 may fail with “Error 10 (Out of resources)” (89682)

UCSX-TPM2-002 Trusted Platform Module 2.0 for UCS servers

    Personally, here are the recommendations for new ESXi 8.0 installations:

    • VMware only supports UEFI boot in new installations
    • For the purchase of new servers, it is suitable with TPM 2.0
    • When upgrading to ESXi 8.0, verify that UEFI boot is enabled

    Booting vSphere ESXi 8.0 may fail with “Error 10 (Out of resources)” (89682)

    • Hardware machine is configured to boot in legacy BIOS mode.
    • Booting stops early in the boot process with messages displayed in red on black with wording similar to “Error 10 (Out of resources) while loading module”, “Requested malloc size failed”, or “No free memory”.
    “Error 10 (Out of resources) while loading module”, “Requested malloc size failed”, or “No free memory”

    VMware’s recommended workaround is to transition the machine to UEFI boot mode permanently, as discussed in KB article 84233 . There will not be a future ESXi change to allow legacy BIOS to work on this machine again.

    Deprecation of legacy BIOS support in vSphere (84233)

    VMware’s plans to deprecate support for legacy BIOS in server platforms.

    If you upgrade a server that was certified and running successfully with legacy BIOS to a newer release of ESXi, it is possible the server will no longer function with that release. For example, some servers may fail to boot with an “Out of resources” message because the newer ESXi release is too large to boot in legacy BIOS mode. Generally, VMware will not provide any fix or workaround for such issues besides either switching the server to UEFI

    Motivation

    UEFI provides several advantages over legacy BIOS and aligns with VMware goals for being “secure by default”. UEFI

    • UEFI Secure Boot, a security standard that helps ensure that the server boots using only software that is trusted by the server manufacturer.
    • Automatic update of the system boot order during ESXi installation.
    • Persistent memory
    • TPM 2.0
    • Intel SGX Registration
    • Upcoming support for DPU/SmartNIC
    Securing ESXi Hosts with Trusted Platform Module
    vSphere 6.7 Support for ESXi and TPM 2 0

    List of vSphere 8.0 Knowledge base articles and Important Links (89756)

    List of Knowledge base articles for vSphere 8.0 – [Main KB] – List of vSphere 8.0 Knowledge base articles and Important Links (89756)

    “SECUREBOOT: Image DENIED.” – Virtual Machine with Windows Server 2022 KB5022842 (OS Build 20348.1547) configured with secure boot enabled not booting up (90947)

    Reference error “SECUREBOOT: Image DENIED.” for Linux VMs

    Important KB90947 Symptoms

    After installing Windows Server 2022 update KB5022842 (OS Build 20348.1547), guest OS can not boot up when virtual machine(s) configured with secure boot enabled running on vSphere ESXi 6.7 U2/U3 or vSphere ESXi 7.0.x.

    In VM vmware.log, there is ‘Image DENIED’ info like the below:

    2023-02-15T05:34:31.379Z In(05) vcpu-0 - SECUREBOOT: Signature: 0 in db, 0 in dbx, 1 unrecognized, 0 unsupported alg.
    2023-02-15T05:34:31.379Z In(05) vcpu-0 - Hash: 0 in db, 0 in dbx.
    2023-02-15T05:34:31.379Z In(05) vcpu-0 - SECUREBOOT: Image DENIED.
    To identify the location of vmware.log files:
    1. Establish an SSH session to your host. For ESXi hosts
    2. Log in to the ESXi Host CLI using root account.
    3. To list the locations of the configuration files for the virtual machines registered on the host, run the below command:
    #vim-cmd vmsvc/getallvms | grep -i "VM_Name"
    1. The vmware.log file is located in virtual machine folder along with the vmx file.
    2. Record the location of the .vmx configuration file for the virtual machine you are troubleshooting. For example:
    /vmfs/volumes/xxxxxxxx-xxxxxxx-c1d2-111122223333/vm1/vm1.vmx
    /vmfs/volumes/xxxxxxxx-xxxxxxx-c1d2-111122223333/vm1/vmware.log

    Resolution

    Currently there is no resolution for virtual machines running on vSphere ESXi 6.7 U2/U3 and vSphere ESXi 7.0.x. However the issue doesn’t exist with virtual machines running on vSphere ESXi 8.0.x.

    Note: vSphere ESXi 6.7 is End of general Support. For more information, see The End of General Support for vSphere 6.5 and vSphere 6.7 is October 15, 2022.

    Workaround

    There are three methods to avoid this issue

    1. Upgrade the ESXi Host where the virtual machine in question is running to vSphere ESXi 8.0
    2. Disable “Secure Boot” on the VMs.
    3. Do not install the KB5022842 patch on any Windows 2022 Server virtual machine until the issue is resolved.

    See the Microsoft article for details on the updates within the patch release

    To disable virtual machine “Secure Boot “option, please follow the below steps:

    1. Power off the VM.
    2. Right-click the virtual machine and click Edit Settings.
    3. Click the VM Options tab.
    4. Under Boot Option, uncheck the “Secure Boot enabled

    Related Information

    Uninstalling the KB5022842 patch will not resolve the issue. If the Virtual machine has already been updated, then the only available options are
     

    1. Upgrade the ESXi Host where the virtual machine in question is running to vSphere ESXi 8.0
    2. Disable “Secure Boot” on the VMs.

    VMware ESXI and Intel Optane NVMe – intelmas firmware update

    How to install intelmas tool

    [~] esxcli software component apply -d /vmfs/volumes/SSD/_ISO/intel-mas-tool_2.2.18-1OEM.700.0.0.15843807_20956742.zip
    Installation Result
       Components Installed: intel-mas-tool_2.2.18-1OEM.700.0.0.15843807
       Components Removed:
       Components Skipped:
       Message: Operation finished successfully.
       Reboot Required: false

    Common information about the disc

    [~] /opt/intel/intelmas/intelmas show -intelssd 1
    
    - 1 Intel(R) Optane(TM) SSD 905P Series PHMB839000LW280IGN -
    
    Bootloader : EB3B0416
    Capacity : 260.83 GB (280,065,171,456 bytes)
    DevicePath : nvmeMgmt-nvmhba5
    DeviceStatus : Healthy
    Firmware : E201HPS2
    FirmwareUpdateAvailable : The selected drive contains current firmware as of this tool release.
    Index : 1
    MaximumLBA : 547002287
    ModelNumber : INTEL SSDPED1D280GAH
    NamespaceId : 1
    PercentOverProvisioned : 0.00
    ProductFamily : Intel(R) Optane(TM) SSD 905P Series
    SMARTEnabled : True
    SectorDataSize : 512
    SerialNumber : PHMB839000LW280IGN

    S.M.A.R.T information

    [~] /opt/intel/intelmas/intelmas show -nvmelog SmartHealthInfo -intelssd 1
    
    -  PHMB839000LW280IGN -
    
    - NVMeLog SMART and Health Information -
    
    Volatile memory backup device has failed : False
    Temperature has exceeded a critical threshold : False
    Temperature - Celsius : 30
    Media is in a read-only mode : False
    Power On Hours : 0x0100
    Power Cycles : 0x03
    Number of Error Info Log Entries : 0x0
    Controller Busy Time : 0x0
    Available Spare Space has fallen below the threshold : False
    Percentage Used : 0
    Critical Warnings : 0
    Data Units Read : 0x02
    Available Spare Threshold Percentage : 0
    Data Units Written : 0x0
    Unsafe Shutdowns : 0x0
    Host Write Commands : 0x0
    Device reliability has degraded : False
    Available Spare Normalized percentage of the remaining spare capacity available : 100
    Media Errors : 0x0
    Host Read Commands : 0x017F

    Show all the SMART properties for the Intel® SSD at index 1

    [~] /opt/intel/intelmas/intelmas show  -intelssd 1 -smart
    
    - SMART Attributes PHMB839000LW280IGN -
    
    - B8 -
    
    Action : Pass
    Description : End-to-End Error Detection Count
    ID : B8
    Normalized : 100
    Raw : 0
    
    - C7 -
    
    Action : Pass
    Description : CRC Error Count
    ID : C7
    Normalized : 100
    Raw : 0
    
    - E2 -
    
    Action : Pass
    Description : Timed Workload - Media Wear
    ID : E2
    Normalized : 100
    Raw : 0
    
    - E3 -
    
    Action : Pass
    Description : Timed Workload - Host Read/Write Ratio
    ID : E3
    Normalized : 100
    Raw : 0
    
    - E4 -
    
    Action : Pass
    Description : Timed Workload Timer
    ID : E4
    Normalized : 100
    Raw : 0
    
    - EA -
    
    Action : Pass
    Description : Thermal Throttle Status
    ID : EA
    Normalized : 100
    Raw : 0
    ThrottleStatus : 0 %
    ThrottlingEventCount : 0
    
    - F0 -
    
    Action : Pass
    Description : Retry Buffer Overflow Count
    ID : F0
    Normalized : 100
    Raw : 0
    
    - F3 -
    
    Action : Pass
    Description : PLI Lock Loss Count
    ID : F3
    Normalized : 100
    Raw : 0
    
    - F5 -
    
    Action : Pass
    Description : Host Bytes Written
    ID : F5
    Normalized : 100
    Raw : 0
    Raw (Bytes) : 0
    
    - F6 -
    
    Action : Pass
    Description : System Area Life Remaining
    ID : F6
    Normalized : 100
    Raw : 0

    Disk firmware update

    [~] /opt/intel/intelmas/intelmas load -intelssd 1
    WARNING! You have selected to update the drives firmware!
    Proceed with the update? (Y|N): Y
    Checking for firmware update...
    
    - Intel(R) Optane(TM) SSD 905P Series PHMB839000LW280IGN -
    
    Status : The selected drive contains current firmware as of this tool release.

    How to Maxtang’s NX 6412 NUC add to vDS? Fix script /etc/rc.local.d/local.sh

    How to fix network after adding to vDS. When you add NX6412 to vDS and reboot ESXi. I don’t have uplink for vDS. You could check it with:

    # esxcfg-vswitch -l
    DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks
    vDS              2560        6           512               9000    vusb0
    --cut
      DVPort ID                               In Use      Client
      468                                     0           
      469                                     0
      470                                     0
      471                                     0

    We will have to note DVPort ID 468 – example. vDS is name of your vDS switch.

    esxcfg-vswitch -P vusb0 -V 468 vDS

    It is necessary add it to /etc/rc.local.d/local.sh before exit 0. You could have similar script from source Persisting USB NIC Bindings

    vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
    count=0
    while [[ $count -lt 20 && "${vusb0_status}" != "Up" ]]
    do
        sleep 10
        count=$(( $count + 1 ))
        vusb0_status=$(esxcli network nic get -n vusb0 | grep 'Link Status' | awk '{print $NF}')
    done
    
    esxcfg-vswitch -R
    esxcfg-vswitch -P vusb0 -V 468 vDS
    
    exit 0

    What’s the story with Optane?

    I am using Intel SSD Optane 900P PCIe in my HomeLAB as ZIL L2ARC drives for TrueNAS, but in July of 2022 Intel announced their intention to wind down the Optane business. I will try summary information about Intel Optane from Simon Todd presentation.

    My HomeLAB benchmark Optane 900P -TrueNAS ZIL L2ARC with HDD

    Optane help a lot with IOPs for RAID with normal HDD. I reach 2,5GB/s peak write performance.

    Writer Report – iozone -Raz -b lab.wks -g 1G – Optane 900P -TrueNAS ZIL L2ARC with HDD x-axis File size in KB; z-axis MB/s
    Writer Report – iozone -Raz -b lab.wks -g 1G – Optane 900P -TrueNAS ZIL L2ARC with HDD

    We call see great write performance for 40GB file size set about 1,7GB/s.

    # perftests-nas ; cat iozone.txt
            Run began: Sun Dec 18 08:02:39 2022
    
            Record Size 128 kB
            File size set to 41943040 kB
            Command line used: /usr/local/bin/iozone -r 128 -s 41943040k -i 0 -i 1
            Output is in kBytes/sec
            Time Resolution = 0.000001 seconds.
            Processor cache size set to 1024 kBytes.
            Processor cache line size set to 32 bytes.
            File stride size set to 17 * record size.
    
                  kB  reclen    write  rewrite    read    reread
            41943040     128  1734542  1364683  2413381  2371527
    
    iozone test complete.
    # dd if=/dev/zero of=foo bs=1G count=1
    1+0 records in
    1+0 records out
    1073741824 bytes transferred in 1.517452 secs (707595169 bytes/sec) 707 MB/s
    
    # dd if=/dev/zero of=foo bs=512 count=1000
    1000+0 records in
    1000+0 records out
    512000 bytes transferred in 0.012079 secs (42386853 bytes/sec) 42 MB/s

    Intel® Optane™ Business Update: What Does This Mean for Warranty and Support

    As announced in Intel’s Q2 2022 earnings, after careful consideration, Intel plans to cease future development of our Optane products. We will continue development of Crow Pass on Sapphire Rapids as we engage with key customers to determine their plans to deploy this product. While we believe Optane is a superb technology, it has become impractical to deliver products at the necessary scale as a single-source supplier of Optane technology.

    We are committed to supporting Optane customers and ecosystem partners through the transition of our existing memory and storage product lines through end-of-life. We continue to sell existing Optane products, and support and the 5-year warranty terms from date of sale remain unchanged.

    Get to know intel® optane™ technology
    Source Simon Todd – vExpert – Intel Webinar Slides

    What makes Optane SSD’s different?

      NAND SSD

      NAND garbage collection requires background writes. NAND SSD block erase process results in slower writes and inconsistent performance.

      Intel® Optane™ technology

      Intel® Optane™ technology does not use garbage collection
      Rapid, in-place writes enable consistently fast response times

      Intel® Optane™ SSDs are different by design
      Source Simon Todd – vExpert – Intel Webinar Slides
      Consistent performance, even under heavy write loads
      Source Simon Todd – vExpert – Intel Webinar Slides
      ModelDies per channelChannelsRaw CapacitySpare Area
      Intel Optane SSD 900p 280GB37336 GB56 GB
      Intel Optane SSD DC P4800X 375GB47448 GB73 GB
      Intel Optane SSD 900p 480GB57560 GB80 GB
      Intel Optane SSD DC P4800X 750GB87896 GB146 GB
      The Optane SSD DC P4800X and the Optane SSD 900p both use the same 7-channel controller, which leads to some unusual drive capacities. The 900p comes with either 3 or 5 memory dies per channel while the P4800X has 4 or 8. All models reserve about 1/6th of the raw capacity for internal use Source

      Intel Optane SSD DC P4800X / 900P Hands-On Review

      Wow, How is Optane fast …

      The Intel Optane SSD DC P4800X is slightly faster than the Optane SSD 900p throughout this test, but either is far faster than the flash-based SSDs. Source

      Maxtang’s NX 6412 NUC – update ESXi 8.0a

      VMware ESXi 8.0a release was announced:

      How to prepare ESXi Custom ISO image 8U0a for NX6412 NUC?

      Download these files:

      Run those script to prepare Custom ISO image you should use PowerCLI version 13.0. Problem with upgrade to PowerCLI you could fix with blog PowerCLI 13 update and installation hurdles on Windows:

      Add-EsxSoftwareDepot .\VMware-ESXi-8.0a-20842819-depot.zip
      Add-EsxSoftwareDepot .\ESXi800-VMKUSB-NIC-FLING-61054763-component-20826251.zip
      New-EsxImageProfile -CloneProfile "ESXi-8.0a-20842819-standard" -name "ESXi-8.0.0-20842819-USBNIC" -Vendor "vdan.cz"
      Add-EsxSoftwarePackage -ImageProfile "ESXi-8.0.0-20842819-USBNIC" -SoftwarePackage "vmkusb-nic-fling"
      Export-ESXImageProfile -ImageProfile "ESXi-8.0.0-20842819-USBNIC" -ExportToBundle -filepath ESXi-8.0.0-20842819-USBNIC.zip

      Upgrade to ESXi 8.0

      TPM_VERSION WARNING: Support for TPM version 1.2 is discontinued. With Apply –no-hardware-warning option to ignore the warnings and proceed with the transaction.

      esxcli software profile update -d  /vmfs/volumes/datastore1/_ISO/ESXi-8.0.1-20842819-USBNIC.zip -p ESXi-8.0.1-20842819-USBNIC --no-hardware-warning
      Update Result
         Message: The update completed successfully, but the system needs to be rebooted for the changes to be effective.
         Reboot Required: true