Cisco UCS M5 Boot Time Enhancements

How to speedup BOOT time in Cisco UCS M5?

Adaptive Memory Training drop-down list

When this token is enabled, the BIOS saves the memory training results (optimized timing/voltage values) along with CPU/memory configuration information and reuses them on subsequent reboots to save boot time. The saved memory training results are used only if the reboot happens within 24 hours of the last save operation. This can be one of the following:

  • Disabled—Adaptive Memory Training is disabled.
  • Enabled—Adaptive Memory Training is enabled.
  • Platform Default—The BIOS uses the value for this attribute contained in the BIOS defaults for the server type and vendor.

BIOS Techlog Level

Enabling this token allows the BIOS Tech log output to be controlled at more a granular level. This reduces the number of BIOS Tech log messages that are redundant, or of little use. This can be one of the following:

This option denotes the type of messages in BIOS tech log file. The log file can be one of the following types:

  • Minimum – Critical messages will be displayed in the log file.
  • Normal – Warning and loading messages will be displayed in the log file.
  • Maximum – Normal and information related messages will be displayed in the log file.

Note: This option is mainly for internal debugging purposes.

Note: To disable the Fast Boot option, the end user must set the following tokens as mentioned below:

OptionROM Launch Optimization

The Option ROM launch is controlled at the PCI Slot level, and is enabled by default. In configurations that consist of a large number of network controllers and storage HBAs having Option ROMs, all the Option ROMs may get launched if the PCI Slot Option ROM Control is enabled for all. However, only a subset of controllers may be used in the boot process. When this token is enabled, Option ROMs are launched only for those controllers that are present in boot policy. This can be one of the following:

  • Disabled—OptionROM Launch Optimization is disabled.
  • Enabled—OptionROM Launch Optimization is enabled.
  • Platform Default—The BIOS uses the value for this attribute contained in the BIOS defaults for the server type and vendor.

Results

First BOOT after New settings is longer about 1-2 minutes.

Then We can save about 2 minutes on each BOOT from Second BOOT with 3TB RAM B480M5:

Multiple-NIC vMotion tunning 2x 40Gbps

Because for Monster SAP HANA VM (1-3 TB RAM) I tuned several AdvSystemSettings.

In the end I was able to speedup vMotion 4x times and utilize 2x flow with 40 Gbps – VIC 1340 with PE.

Inspiration was:

It is in production from 04/2018, My tuned final settings is:

AdvSystemSettingsDefaultTunningDesc
Migrate.VMotionStreamHelpers08Number of helpers to allocate for VMotion streams
Net.NetNetqTxPackKpps300600Max TX queue load (in thousand packet per second) to allow packing on the corresponding RX queue
Net.NetNetqTxUnpackKpps6001200Threshold (in thousand packet per second) for TX queue load to trigger unpacking of the corresponding RX queue
Net.MaxNetifTxQueueLen200010000Maximum length of the Tx queue for the physical NICs – toto stačí pro urychlení VM komunikace

VMware vNIC placement order not adhered to Cisco UCS configuration – How to fix it?

It is better to use Cisco UCS Consistent Device Naming CDN + ESXi 6.7, but in same casses. It is necessary fix manualy according KB – How VMware ESXi determines the order in which names are assigned to devices (2091560) .

Here is an example How to fix it:

Check current mapping

[~] esxcfg-nics -l
 Name    PCI           MAC Address       
 vmnic0  0000:67:00.0  00:25:b5:00:a0:0e 
 vmnic1  0000:67:00.1  00:25:b5:00:b2:2f 
 vmnic2  0000:62:00.0  00:25:b5:00:a0:2e 
 vmnic3  0000:62:00.1  00:25:b5:00:b2:4f 
 vmnic4  0000:62:00.2  00:25:b5:00:a0:3e 

[~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
 Bus type  Bus address            Alias
 pci       s00000002:03.02        vmnic4
 pci       s00000002:03.01        vmnic3
 pci       s0000000b:03.00        vmnic0
 pci       s0000000b:03.01        vmnic1
 pci       p0000:00:11.5          vmhba0
 pci       s00000002:03.00        vmnic2
 logical   pci#s0000000b:03.00#0  vmnic0
 logical   pci#s0000000b:03.01#0  vmnic1
 logical   pci#s00000002:03.01#0  vmnic3
 logical   pci#s00000002:03.02#0  vmnic4
 logical   pci#p0000:00:11.5#0    vmhba0
 logical   pci#s00000002:03.00#0  vmnic2

Fix transfer table for physical devices

Bus type  Bus address            Alias
 pci       s0000000b:03.00        vmnic0 --> vmnic3
 pci       s0000000b:03.01        vmnic1 --> vmnic4
 pci       s00000002:03.00        vmnic2 --> vmnic0
 pci       s00000002:03.01        vmnic3 --> vmnic1
 pci       s00000002:03.02        vmnic4 --> vmnic2

Fix commands for physical devices

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic0 --bus-address s00000002:03.00 --bus-type pci

 localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic1 --bus-address s00000002:03.01 --bus-type pci

 localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic2 --bus-address s00000002:03.02 --bus-type pci

 localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic3 --bus-address s0000000b:03.00 --bus-type pci

 localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic4 --bus-address s0000000b:03.01 --bus-type pci

Fix transfer table for logical devices

[~] localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list
 Bus type  Bus address            Alias
 logical   pci#s0000000b:03.00#0  vmnic0 --> vmnic3
 logical   pci#s0000000b:03.01#0  vmnic1 --> vmnic4
 logical   pci#s00000002:03.00#0  vmnic2 --> vmnic0
 logical   pci#s00000002:03.01#0  vmnic3 --> vmnic1
 logical   pci#s00000002:03.02#0  vmnic4 --> vmnic2

Fix commands for logical devices

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic0 --bus-address pci#s00000002:03.00#0 --bus-type logical

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic1 --bus-address pci#s00000002:03.01#0 --bus-type logical

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic2 --bus-address pci#s00000002:03.02#0 --bus-type logical

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic3 --bus-address pci#s0000000b:03.00#0 --bus-type logical

localcli --plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store --alias vmnic4 --bus-address pci#s0000000b:03.01#0 --bus-type logical

Reboot

reboot

Crosscheck – Now we have target order

[~] esxcfg-nics -l
 Name    PCI           MAC Address       
 vmnic0  0000:62:00.0  00:25:b5:00:a0:2e 
 vmnic1  0000:62:00.1  00:25:b5:00:b2:4f 
 vmnic2  0000:62:00.2  00:25:b5:00:a0:3e 
 vmnic3  0000:67:00.0  00:25:b5:00:a0:0e 
 vmnic4  0000:67:00.1  00:25:b5:00:b2:2f 

Cisco UCS supports Consistent Device Naming CDN in ESXi 6.7

Cisco introduced Consistent Device Naming in Cisco UCS Manager Release 2.2(4).

In the past I could saw that VMware vNIC placement order not adhered to Cisco UCS configuration. But his issue will not be seen in the latest ESXi updates – ESXi 6.5 U2 and ESXi 6.7 U1.

How VMware ESXi determines the order in which names are assigned to devices (2091560)

When there is no mechanism for the Operating System to label Ethernet interfaces in a consistent manner. It becomes difficult to manage network connections with server configuration changes.

Allows Ethernet interfaces to be named in a consistent manner. This makes Ethernet interface names more persistent when adapter or other configuration changes are made.

To configure CDN for a vNIC, do the following:

This makes Ethernet interface names more uniform, easy to identify, and persistent when adapter or other configuration changes are made.

 set consistent-device-name-control cdn-name

Whether consistent device naming is enabled or not. This can be one of the following:

  • enabled—Consistent device naming is enabled for the BIOS policy. This enables Ethernet interfaces to be named consistently.
  • disabled—Consistent device naming is disabled for the BIOS policy.
  • platform-default—The BIOS uses the value for this attribute contained in the BIOS defaults for the server type and vendor.

Proactive HA is working in VCSA 6.7 with Cisco UCS Manager Plugin for VMware vSphere HTML Client (beta Version 3.0(2))

Cisco has released the 3.0(2) beta version of the the Cisco UCS Manager VMware vSphere HTML client plugin. These version is working with vSphere 6.7. It’s currently running and enabled on 9 different clusters – 290 hosts. It works great so far.

Here are the new and changed features in Release3.0(2):

  • Included defect fixes
  • Added a new fault (F1706)to the Cisco UCS Provider failure conditions list
  • Added support for proactive High Availability for more than 100 hosts in vCenter

It is great to combine it with new Cisco UCS 4.1.1 because of Intel Post Package Repair (PPR).

  • Intel Post Package Repair (PPR) uses additional spare capacity within the DDR4 DRAM to remap and replace faulty cell areas detected during system boot time. Remapping is permanent and persists through power-down and reboot.
  • Newer memories, such as double data ram version 4 (DDR4) include so-called post-package repair (PPR) capabilities. PPR capabilities enable a compatible memory controller to remap accesses from a faulty row of a memory module to a spare row of the memory module that is not faulty.
    • Hard-PPR permanently remaps accesses from a designated faulty row to a designated spare row. A Hard-PPR row remapping survives power cycles.
    • Soft-PPR remapping temporarily maps accesses from a faulty row to a designated spare row. A Soft-PPR row remapping will survive a “warm” reboot,but does not survive a powercycle.
  • You can enabled it in BIOS policy / Memory RAS configuration – Select PPR type configuration – Hard PPR

  • To support “Alert F1706 – ADDDC Memory RAS Problem” is necessary
    ADDDC Sparing—System reliability is optimized by holding memory in reserve so that it can be used in case other DIMMs fail. This mode provides some memory redundancy, but does not provide as much redundancy as mirroring.
  • Cisco recommends upgrading to 4.0(4c) or later to expand memory fault coverage. Beginning with 4.0(4c) an additional RAS feature, Adaptive Double Device Data Correction (ADDDC Sparing) is available. It will be enabled and configured as “Platform Default” for Memory RAS configuration.