Fault Resilient Memory (FRM) for Cisco UCS

We can see annual incidence of uncorrectable errors is rissing. Here is one possibility – How to solved it with FRM.

ESXi supports reliable memory.

Some systems have reliable memory, which is a part of memory that is less likely to have hardware memory errors than other parts of the memory in the system. If the hardware exposes information about the different levels of reliability, ESXi might be able to achieve higher system reliability.

How to enable in Cisco UCS

Configuration is in BIOS policy / Advanced / RAS Memory

8GB Could be enough for ESXi hypervisor …

This forces the Hypervisor and some core kernel processes to be mirrored between DIMMs so ESXi itself can survive the complete and total failure of a memory DIMM.

# esxcli hardware memory get
    Physical Memory: 540800864256 Bytes
    Reliable Memory: 8589934592 Bytes
    NUMA Node Count: 2 
#  esxcli system settings kernel list | grep useReliableMem
 useReliableMem Bool TRUE TRUE TRUE System is aware of reliable memory. 

Configuring Reliable Memory in Per-virtual machine basis (2146595)

I can decided to configure more Reliable Memory for VM – not only 8GB for hypervisor.

To turn on the feature per VM:

  1. Edit the .vmx file using a text editor
  2. Add the parameter:
    sched.mem.reliable = "True"
  3. Save and close the file

Conclusion:

  • For enable Fault Resilient Memory (FRM) I had to disable ADDDC Sparing in BIOS policy / Advanced / RAS Memory / Memory RAS configuration
  • With ADDDC and Proactive HA I can save about 95% failures – Personaly I prefer to use ADDDC
  • The Best possibility is to have both in future firmware …

Interesting links:

Field Notice: FN – 70432 – Improved Memory RAS Features for UCS M5 Platforms – Software Upgrade Recommended

Memory Errors and Dell EMC PowerEdge YX4X Server Memory RAS Features

Upgrading NSX-T 3.1.2 Data Center

After you finish the prerequisites for upgrading, your next step is to update the upgrade coordinator to initiate the upgrade process.

Trending support issues in VMware NSX (2131154) – This article provides information regarding trending issues and important Knowledge Base articles regarding VMware NSX Data Center for vSphere 6.x and VMware NSX-T Data Center 3.x.

NSX-T 3.1.2 – Upgrade the Upgrade Coordinator – 1/4

The upgrade coordinator runs in the NSX Manager. It is a self-contained web application that orchestrates the upgrade process of hosts, NSX Edge cluster, NSX Controller cluster, and Management plane.

NSX-T 3.1.2 – Upgrade NSX Edge Cluster – 2/4

After the upgrade coordinator has been upgraded, based on your input, the upgrade coordinator updates the NSX Edge cluster, hosts, and the Management plane. Edge upgrade unit groups consist of NSX Edge nodes that are part of the same NSX Edge cluster. You can reorder Edge upgrade unit groups and enable or disable an Edge upgrade unit group from the upgrade sequence.

NSX-T 3.1.2 – Configuring and Upgrading Hosts 3/4

You can upgrade your hosts using the upgrade coordinator.

NSX-T 3.1.2 – Upgrade Management Plane – 4/4

The upgrade sequence upgrades the Management Plane at the end. When the Management Plane upgrade is in progress, avoid any configuration changes from any of the nodes.

NSX-T Data Center Upgrade Checklist
TaskInstructions
Review the known upgrade problems and workaround documented in the NSX-T Data Center release notes.See the NSX-T Data Center Release Notes.
Follow the system configuration requirements and prepare your infrastructure.See the system requirements section of the NSX-T Data Center Installation Guide.
Evaluate the operational impact of the upgrade.See Operational Impact of the NSX-T Data Center Upgrade.
Upgrade your supported hypervisor.See Upgrading Your Host OS.
If you have an earlier version of NSX Intelligence installed, upgrade the NSX Intelligence appliance first.See the Installing and Upgrading VMware NSX Intelligence documentation at https://docs.vmware.com/en/VMware-NSX-Intelligence/index.html for more information.
Complete the Pre-Upgrade Tasks.See Pre-Upgrade Tasks.
Verify that the NSX-T Data Center environment is in a healthy state.See Verify the Current State of NSX-T Data Center.
Download the latest NSX-T Data Center upgrade bundle.See Download the NSX-T Data Center Upgrade Bundle.
If you are using NSX Cloud for your public cloud workload VMs, upgrade NSX Cloud components.See Upgrading NSX Cloud Components.
Upgrade your upgrade coordinator.See Upgrade the Upgrade Coordinator.
Upgrade the NSX Edge cluster.See Upgrade NSX Edge Cluster.
Upgrade the hosts.See Configuring and Upgrading Hosts .
Upgrade the Management plane.See Upgrade Management Plane.
Post-upgrade tasks.See Verify the Upgrade.
Troubleshoot upgrade errors.See Troubleshooting Upgrade Failures.

NSX-T 3.1.2 – Upgrade the Upgrade Coordinator – 1/4

TOC – Upgrade steps

Prerequisites

Verify that the upgrade bundle is available. See Download the NSX-T Data Center Upgrade Bundle.

Procedure

In the NSX Manager CLI, verify that the NSX-T Data Center services are running.
tanzu-nsx> get service install-upgrad
Wed May 19 2021 UTC 18:58:00.174
Service name: install-upgrade
Service state: running
Enabled on: 10.254.201.169
System > Upgrade / Proceed to Upgrade / Upload / Upgrade
  • From your browser, log in an NSX Manager
  • Select System > Upgrade from the navigation panel.
  • Click Proceed to Upgrade.
    • Navigate to the upgrade bundle .mub file.
  • Click Upload.
  • Click Upgrade to upgrade the upgrade coordinator.
  • Read and accept the EULA terms.
Accept the notification to upgrade the upgrade coordinator.
  • Click to Yes – upgrade coordinator.
  • Click Run Pre-Checks to verify that all the NSX-T Data Center components are ready for upgrade.

  • (Optional) Check Click Pre Check Status

NSX-T 3.1.2 – Upgrade NSX Edge Cluster – 2/4

TOC – Upgrade steps

The NSX Edge nodes are upgraded in serial mode so that when the upgrading node is down, the other nodes in the NSX Edge cluster remain active to continuously forward traffic.

Enter the NSX Edge cluster upgrade plan details:
OptionDescription
SerialUpgrade all the Edge upgrade unit groups consecutively.
ParallelUpgrade all the Edge upgrade unit groups simultaneously.
When an upgrade unit fails to upgradeYou can fix an error on the Edge node and continue the upgrade. You cannot deselect this setting.
After each group completesSelect to pause the upgrade process after each Edge upgrade unit group finishes upgrading.
Click Start to upgrade the NSX Edge cluster.
Click Start to upgrade the NSX Edge cluster
  • Monitor the upgrade process.
    • You can view the overall upgrade status and progress details of each Edge upgrade unit group. The upgrade duration depends on the number of Edge upgrade unit groups you have in your environment.
    • Or using CLI get upgrade progress-status

IN_PROGRESS:

tanzu-nsx-edge> get upgrade progress-status
********************************************************************* 
 Node Upgrade has been started. Please do not make any changes, until
 the upgrade operation is complete. Run "get upgrade progress-status"
 to show the progress of last upgrade step.
********************************************************************* 
 
 Tue May 18 2021 UTC 21:51:51.235
 Upgrade info:
 From-version: 3.1.1.0.0.17483065
 To-version: 3.1.2.0.0.17883603
 Upgrade steps:
 11-preinstall-enter_maintenance_mode [2021-05-18 21:45:34 - 2021-05-18 21:46:45] SUCCESS
 download_os [2021-05-18 21:47:34 - 2021-05-18 21:49:01] SUCCESS
 install_os [2021-05-18 21:49:02 - ] IN_PROGRESS

SUCCESS / Upgrade is not in progress:

tanzu-nsx-edge> get upgrade progress-status
 
 Node Upgrade has been started. Please do not make any changes, until
 the upgrade operation is complete. Run "get upgrade progress-status"
 to show the progress of last upgrade step.
 
 Tue May 18 2021 UTC 21:55:55.278
 Upgrade info:
 From-version: 3.1.1.0.0.17483065
 To-version: 3.1.2.0.0.17883603

 Upgrade steps:
 11-preinstall-enter_maintenance_mode [2021-05-18 21:45:34 - 2021-05-18 21:46:45] SUCCESS
 download_os [2021-05-18 21:47:34 - 2021-05-18 21:49:01] SUCCESS
 install_os [2021-05-18 21:49:02 - 2021-05-18 21:51:58] SUCCESS
 switch_os [2021-05-18 21:52:01 - 2021-05-18 21:52:24] SUCCESS
 reboot [2021-05-18 21:52:28 - 2021-05-18 21:53:53] SUCCESS
 migrate_users [2021-05-18 21:55:26 - 2021-05-18 21:55:31] SUCCESS


 tanzu-nsx-edge> get upgrade progress-status
 Tue May 18 2021 UTC 21:59:41.501
 Upgrade is not in progress
  • Click Run Post Checks to verify whether the Edge upgrade unit groups were successfully upgraded.

NSX-T 3.1.2 – Configuring and Upgrading Hosts 3/4

TOC – Upgrade steps
Configure Hosts

You can customize the upgrade sequence of the hosts, disable certain hosts from the upgrade, or pause the upgrade at various stages of the upgrade process.

OptionDescription
SerialUpgrade all the host upgrade unit groups consecutively.
This selection is useful to maintain the step-by-step upgrade of the host components. .
ParallelUpgrade all the host upgrade unit groups simultaneously. You can upgrade up to five hosts simultaneously.
When an upgrade unit fails to upgradeSelect to pause the upgrade process if any host upgrade fails.
After each group completesSelect to pause the upgrade process after each host upgrade unit group finishes upgrading.
Manage Host Upgrade Unit Groups

Hosts in a ESXi cluster appear in one host upgrade unit group in the upgrade coordinator. You can move these hosts from one host upgrade unit group to another host upgrade unit group.

Upgrade Hosts
  • Click Start to upgrade the hosts.
  • Click Refresh and monitor the upgrade process.
  • Click Run Post Checks to make sure that the upgraded hosts and NSX-T Data Center do not have any problems.
  • BEFORE the upgrade, verify the version of NSX-T Data Center packages
esxcli software vib list | grep nsx
 nsx-adf              3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-cfgagent         3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-context-mux      3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-cpp-libs         3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-esx-datapath     3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-exporter         3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-host             3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-ids              3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-monitoring       3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-mpa              3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-nestdb           3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-netopa           3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-opsagent         3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-platform-client  3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-proto2-libs      3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-proxy            3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-python-gevent    1.1.0-15366959         VMware  VMwareCertified
 nsx-python-greenlet  0.4.14-16723199        VMware  VMwareCertified
 nsx-python-logging   3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-python-protobuf  2.6.1-16723197         VMware  VMwareCertified
 nsx-python-utils     3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-sfhc             3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-shared-libs      3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsx-vdpi             3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
 nsxcli               3.1.1.0.0-7.0.17483060 VMware  VMwareCertified
  • AFTER the upgrade is successful, verify that the latest version of NSX-T Data Center packages
esxcli software vib list | grep nsx
 nsx-adf              3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-cfgagent         3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-context-mux      3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-cpp-libs         3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-esx-datapath     3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-exporter         3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-host             3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-ids              3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-monitoring       3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-mpa              3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-nestdb           3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-netopa           3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-opsagent         3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-platform-client  3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-proto2-libs      3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-proxy            3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-python-gevent    1.1.0-15366959         VMware  VMwareCertified
 nsx-python-greenlet  0.4.14-16723199        VMware  VMwareCertified
 nsx-python-logging   3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-python-protobuf  2.6.1-16723197         VMware  VMwareCertified
 nsx-python-utils     3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-sfhc             3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-shared-libs      3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsx-vdpi             3.1.2.0.0-7.0.17883598 VMware  VMwareCertified
 nsxcli               3.1.2.0.0-7.0.17883598 VMware  VMwareCertified

NSX-T 3.1.2 – Upgrade Management Plane – 4/4

TOC – Upgrade steps
Backup the NSX Manager

More info in the NSX-T Data Center Administration Guide.

Click Start to upgrade the Management plane
Accept the upgrade notification
Monitor the upgrade progress from the NSX Manager CLI for the orchestrator node.
tanzu-nsx> get upgrade progress-status
 
 Node Upgrade has been started. Please do not make any changes, until
 the upgrade operation is complete. Run "get upgrade progress-status"
 to show the progress of last upgrade step.
 
 Tue May 18 2021 UTC 22:38:49.533
 Upgrade info:
 From-version: 3.1.1.0.0.17483186
 To-version: 3.1.2.0.0.17883600
 Upgrade steps:
 download_os [2021-05-18 22:35:57 - 2021-05-18 22:38:04] SUCCESS
 shutdown_manager [2021-05-18 22:38:15 - ] IN_PROGRESS
     Status: cloud-init status command returned output active.
 Some appliance components are not functioning properly.
 Component health: MANAGER:UNKNOWN, POLICY:UNKNOWN, SEARCH:UNKNOWN, NODE_MGMT:UP, UI:UP. 
 Error code: 101
tanzu-nsx> get upgrade progress-status
 
 Node Upgrade has been started. Please do not make any changes, until
 the upgrade operation is complete. Run "get upgrade progress-status"
 to show the progress of last upgrade step.
 
 Tue May 18 2021 UTC 23:09:58.930
 Upgrade info:
 From-version: 3.1.1.0.0.17483186
 To-version: 3.1.2.0.0.17883600
 Upgrade steps:
 download_os [2021-05-18 22:35:57 - 2021-05-18 22:38:04] SUCCESS
 shutdown_manager [2021-05-18 22:38:15 - 2021-05-18 22:56:22] SUCCESS
 install_os [2021-05-18 22:56:22 - 2021-05-18 22:58:34] SUCCESS
 migrate_manager_config [2021-05-18 22:58:34 - 2021-05-18 22:58:44] SUCCESS
 switch_os [2021-05-18 22:58:44 - 2021-05-18 22:59:06] SUCCESS
 reboot [2021-05-18 22:59:06 - 2021-05-18 23:00:04] SUCCESS
 run_migration_tool [2021-05-18 23:03:11 - 2021-05-18 23:03:16] SUCCESS
 start_manager [2021-05-18 23:03:16 - ] IN_PROGRESS
tanzu-nsx> get upgrade progress-status
 Tue May 18 2021 UTC 23:21:33.659
 Upgrade is not in progress

Breaking: VMware Board Names Raghu Raghuram as CEO

Breaking: VMware Board Names Raghu Raghuram as CEO

VMware today announced that its Board of Directors has appointed Rangarajan (Raghu) Raghuram as Chief Executive Officer effective June 1, 2021. Raghuram is a strategic business leader who currently holds the position of Executive Vice President and COO, Products and Cloud Services at VMware.


VMware Social Media Advocacy