Submit a ticketCall us

Get a crash course on Network Monitoring delivered right to your inbox
This free 7-day email course provides a primer to the philosophy, theory, and fundamental concepts involved in IT monitoring. Lessons will explain not only how to perform various monitoring tasks, but why and when you should use them. Sign up now.

Home > Success Center > Network Performance Monitor (NPM) > NTA issues after a successful HA failover

NTA issues after a successful HA failover

UPDATED April 24, 2017

Overview

The High Availability (HA) failover was successful apart from the following issues with NTA:

  • The system has become very unresponsive and the CPU usage for Information Service and IIS Worker Process are maxed out.
  • When attempting to restart Information Service, a triggered failover is successful but there are issues with the additional web server.
  • There are connection issues between the additional web server and NTA.
  • The primary platform does not indicate any connection issues.

Environment

  • NPM 12.0.1
  • NTA 4.2.1

Cause 

This is a known bug in NPM 12.0.1 and 12.1 as described below:

  • The additional web server is broken after the HA failover when the main polling engine is plugged off.
  • The additional web server is unable to connect to the license store in the HA environment after the main poller is plugged off and the backup server takes over.

 

For example,

Primary: Main polling engine 01 (MP01)

Secondary: Main polling engine 02 (MP02)

VIP: 10.10.10.99

Events in the log:

16/04/2017 03:17:53 SolarWinds.Orion.Licensing.BusinessLayer - Could not refresh license. Connectivity error: 'ProvideFault failed, check fault information.'.

16/04/2017 03:17:53 SolarWinds.Orion.Licensing.BusinessLayer - Machine refreshed license.

28/03/2017 18:27:21 SolarWinds.Orion.Licensing.BusinessLayer - Could not refresh license. 

Connectivity error: 'Could not connect to net.tcp://MainPoller01:17777/orion/licensing/licenseserver. 

The connection attempt lasted for a time span of 00:00:01.0312520. TCP error code 10061: No connection could be made because the target machine actively refused it 10.10.10.99:17777. '.    -> VIP IP

 

When the connection fails from MP01 to MP02, the additional web server is unable to get a license.

Resolution

This issue has already been fixed in Orion HAv2 and is due to be released after NPM 12.1 (What We're Working on for NPM).

As a workaround, edit the additional web server host file so that it recognizes the primary and secondary polling engines as the same using the VIP IP and host names:

  • Edit the Windows host file to include the  IP address and host names for both machines, both pointing to the same VIP IP.
  • 10.10.10.99 - Main polling engine 01
  • 10.10.10.99 - Main polling engine 02

This will resolve the issue while the secondary polling engine is still active and there is no need to restart the services.

 

Last modified
23:07, 24 Apr 2017

Tags

Classifications

Public