Submit a ticketCall us

whitepaperYour VM Perplexities Called, and They Need You to Read This.

Virtualization can give you enormous flexibility with future workloads and can be a key enabler for other areas, like cloud computing and disaster recovery. So, how can you get a handle on the performance challenges in your virtual environment and manage deployments without erasing the potential upside? Learn the four key areas you need to be focusing on to help deliver a healthy and well-performing data center.

Get your free white paper.

Home > Success Center > Orion Platform > Orion - Knowledgebase Articles > Troubleshooting Guide for Hardware Health for ESX host Server Polled Through vCenter

Troubleshooting Guide for Hardware Health for ESX host Server Polled Through vCenter

Updated December 28, 2018

Overview

This guide was created to help users troubleshoot problems when polling Hardware Health information on ESXi Host servers polled through a VCenter server (Management Server). 

 

VMware ESXi Host(s) showing one or more issues with Hardware Health status incorrect, ex: Memory or CPU status is showing Critical and not getting updated even after you have fixed the original issue.

clipboard_e760ba8878eeb09426970557f9fb302cb.png

Environment

  • Orion Platform
  • VMware ESXi

Troubleshooting Steps

Background On VSphere API and How it is used by Orion core products:

  • Sometimes it can happen that a Hardware sensor for Hard disk / Fan / CPU/ Memory etc appears in the Solarwinds Web console in a  critical or warning state. 

  • However, even when the faulty part is replaced the hardware health sensors still display the component status as critical or warning within Solarwinds web console. This is a common occurrence when ESX hosts are polled through VCenter.

  • SAM / NPM / VMAN is polling this information via the VMware API which can fail to update itself or can provide cached VSphere data for the node. Which can result in false positive alerts as well.

  • It seems that the VCenter sometimes fails to update or there is a significant time delay before the Hardware sensors are updated and warning messages are cleared from the VMware API. 

 

 

  1. You can see whether your ESX host is polled through vCenter or Polled Directly by checking on the Virtualization settings page. (Settings > Virtualization Settings)

 

vCenterSettings.png


2. To Access the VMware API:

You can access it by using this URL: https://your_vCenter_server/mob   ==> Replace 'your_vCenter_server' with either IP Address or hostname to access. This will open the Managed Object Browser page

vCenterTroubleshooting1.png

 

3. Now we have to find the ESX host object. It's a little bit complicated but you should get it by selecting the following properties:

Content -> rootFolder -> childEntity (choose datacenter) -> hostFolder -> childEntity (choose domain) -> host (choose ESX host)
 

You should see a page similar to this:

vCenterTroubleshooting2.png

 

4. Now let's find the ESX Host's  Hardware Health information:

runtime -> healthSystemRuntime -> systemHealthInfo -> numericSensorInfo
 

You should see a page similar to this:

vCenterTroubleshooting3.png

 

Except for the Numeric sensors, there is a number of other sensors in other location's that we monitor.

Here are alternative paths that you can substitute with the one for numeric sensors.

runtime -> healthSystemRuntime -> hardwareStatusInfo -> cpuStatusInfo

runtime -> healthSystemRuntime -> hardwareStatusInfo -> memoryStatusInfo

runtime -> healthSystemRuntime -> hardwareStatusInfo -> storageStatusInfo

hardware -> systemInfo -> otherIdentifyInfo

config -> storageDevice -> scsiLun

 

Let's see Some Examples:

Example 1: ESX Guest showing "Memory Critical" issue

Note: There are more than numericSensorInfo properties. Please check the one that is relevant to the Warning/Critical Message you are seeing to your ESX,  
 

  1. You get that by following the below properties:

 runtime -> healthSystemRuntime -> hardwareStatusInfo -> memoryStatusInfo

The error that appears in Orion web console

hmc.PNG

 

2. When we check the ESX Host itself in VSphere Console there appear to be no Issues. 

vmactual.PNG

 

3. But when we dig down into the VMware API we can see that it is reporting a 'Red' Status, which is why we display the Warning message in the SAM Web Console

vmpagediag.PNG

Example 2: Power Supply replaced, however, the status still shows Red

 

  1. When Checking the host via VCenter there appears to be no issue

repl.PNG

 

2. SAM is reporting there is an issue with one of the Power Supply Units

psupactul.PNG


Again, once we check the VMware API where SAM is actually Polling This information from we can see there is a 'Red' Warning message being flagged in the API. 

powsuphtml.PNG

Resolution

 

Solution 1: 

  • Contact VMware directly to see if there is a resolution to 'Clear' the incorrect status's from the API. They will be able to assist you from their side as unfortunately, we are unable to provide a workaround at this time for the incorrect states being provided within the API. 

 

 

Solution 2:

 

This KB was created with help from this thwack post https://thwack.solarwinds.com/thread/109143

If you do not find the above information useful or the problem is not resolved, log a ticket with SolarWinds Support

 

Last modified

Tags

Classifications

Public