Home > Success Center > Network Performance Monitor (NPM) > NPM 11.5.3 - Orion is flooding us with messages (1000s Hardware Health Alerts)

NPM 11.5.3 - Orion is flooding us with messages (1000s Hardware Health Alerts)

Created by Kevin Twomey, last modified by Kevin Twomey on Nov 10, 2016

Views: 794 Votes: 0 Revisions: 4

Overview

 NPM 11.5.3 - Orion is flooding with messages (100s Hardware Health Alerts).

Environment

NPM 11.5 or higher 

 

Cause 

  • No issue here with Orion, Alerts or HWH.
  • Issue is Devices are sending us down\Unknown Sensors.
  • User configuration issue.

 

Example:

  • Alert ID = 125
  • Alert Name: Hardware Component Status Warning
  • ObjectType = Hardware SENSOR
  • Affecting Multiple Nodes
  • The alert is disabled now but last flooded the customer with repetitive emails and SMSs alert actions.
  • Won’t enable it again in case he get the same flooding.

 

Example (with Fan):

  • Type                      VMware ESXi 5.5.0
  • NodeIDs              280
  • Name                    VMWARENODE
  • PollingType         1 VMware                Direct ESX polling (CIM polling)
     

HWH [APM_HardwareInfo.PollingMethod]

  • 0 Unknown
  • 1 VMware                Direct ESX polling (CIM polling)
  • 2 SNMPDell
  • 3 SNMPHP
  • 4 SNMPIBM
  • 5  VMwareAPI          (Through vCenter VMware API)
  • 6 WmiDell
  • 7 WmiHP
  • 8 WmiIBM
  • 9 Snmp.NPM.Cisco

     

Investigation shows:

Nodes Table

NodeID

Description

Status

EngineID

277

VMware ESXi 5.5.0 

1

3

279

VMware ESXi 5.5.0 

1

3

280

VMware ESXi 5.5.0

1

3

574

VMware ESXi 5.5.0 

1

3

APM_HardwareInfo

NodeID

PollingMethod

Manufacturer

Model

ServiceTag

279

1

HP

ProLiant DL380 G6

XCVVSDS4353

APM_HardwareItem

ID

NodeID

UniqueName

DisplayName

Status

OriginalStatus

1932

279

4.0.32.0

Power Supply 2 Power Supply 2: Presence detected

0

Unknown

1934

279

3.0.32.0

Power Supply 1 Power Supply 1: Presence detected

0

Unknown

1936

279

10.1

Power Supply 1

1

Ok

1937

279

10.2

Power Supply 2

1

Ok

1931

279

44.0.32.99

System Board 10 Power Meter

1

Ok

1933

279

4.0.32.1

Power Supply 2 Power Supply 2: Failure status

1

Ok

1935

279

3.0.32.1

Power Supply 1 Power Supply 1: Failure status

1

Ok

1955

279

21.0.32.99

Power Domain 2 Temp 9

1

Ok

1956

279

20.0.32.99

Power Domain 1 Temp 8

1

Ok

Resolution

HWH and Alerts are working by design.

HWH sees issues, that sent to Orion HWH from device and alerts merely alerting.
Note: Not everything matches up as Diags collected at time that HWH Sensors had a separate status.

 

  1. Copy Alert, remove Email, SMS parts and Events Actions.
  2. Create a Test Action to LOG to a file or edit Email one to just your email address.
  3. Fine tune Alert until you get it to only actions as you need it too.

For example EDIT to where STATUS <> UNKNOWN and where condition does not exists for more than 10 minutes.



For device sending bad results or device sending always UNKNOWN for some sensors,  disable them.

NPM 11.5 - Using Device Studio to disable Standard Orion Poller Jobs:

  1. Enable\Disable Hardware Health Sensors.
  2. Open Orion Web console > Web Settings > Manage Pollers. In here you can disable Hardware Health in Mass here.
  3. Toggle the on/off button for HARDWARE as needed.
  4. Select Hardware Health (See XX Nodes are assigned, click this to drill down).

  5. Go to Settings > Node & Group Management > Manage Pollers > SolarWinds Native Poller > Hardware Health.
    http://localhost/Orion/Admin/Pollers/ManagePollers.aspx
  6. Drill Down to Hardware health using left hand Group or searching for it.

 

You must to post a comment.
Last modified
11:46, 10 Nov 2016

Tags

Classifications

Public