Submit a ticketCall us

Announcing NPM 12.2
With NPM 12.2 you can monitor your Cisco ASA firewalls, to monitor VPN tunnels for basic visibility and troubleshooting tunnels. NPM 12.2 also uses the SolarWinds Orion Installer so you can easily install and upgrade one or more Orion Platform products simultaneously.
See new features and improvements.

Home > Success Center > Log & Event Manager (LEM) > InternalAgentOffline events

InternalAgentOffline events

Table of contents
Created by Justin Rouviere, last modified by MindTouch on Jun 23, 2016

Views: 16 Votes: 1 Revisions: 3

Overview

This article is to give background on how InternalAgentOffline events are currently processed by the LEM.

 

Environment

  • All currently supported versions of LEM.

Detail

The Log and Event Manager relies on two main pieces of information to determine an agent is offline:

1)  The agent sends a disconnect message to the Manager.

2)  The Manager fires the event when the TCP Timeout is reached for the connection.

 

As a result there are two main categories of agent offline events:

1)  The agent fires the disconnect message.

a)  This happens when the Solarwinds Log and Event Manager Agent service is stopped.

b)  This happens when the Solarwinds Log and Event Manager Agent or Javaw task is stopped in Task Manager.

c)  This happens when the node is rebooted normally via windows.

2)  The agent does not get a chance to fire the offline event.

a)  This happens if the node is shut down normally.

b)  This happens if the node loses network connectivity.

c)  This happens when the node suffers a critical fault and shuts down (hardware failure).

 

In the second case, where TCP Timeout is reached the delay in seeing the InternalAgentOffline event can take 17+ minutes.  In the manager log you will see the following entries:

 

(Mon Dec 07 08:54:45 MST 2015) II:INFO [ConnectionStateHandler] {XML Communication Worker - 2:217} Connection [id: 0x5551c46f, /<Agent IP>:37895 => /<Manager IP>:37891] in state ACTIVE has been inactive for a long time. Closing.

 

The above event is generated roughly two minutes after the connection is lost with the agent, even if it is shutdown normally.

 

Then about 15 minutes later these entries are generated in the manager log:

 

(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [NioComNetworkChild] {NioReads-1:290} Reporting canceled key: ConnectionsKey:16 IOException while reading from key: ConnectionsKey:16 java.io.IOException: No route to host
(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [Communications] {NioReads-1:290} {NioReads-1:290}Child disconnection signaled: ( id:10000087 Agent name: <Agent Hostname> ):  Disconnect reason: IOException while reading from key: ConnectionsKey:16 java.io.IOException: No route to host
(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [Communications] {NioExitQueueHandler:56} {NioExitQueueHandler:56}Child disconnection complete: ( id:10000087 Agent name: <Agent Hostname> )

 

Currently this is the only way that the LEM is able to fire an InternalAgentOffline event.  There is no Agent/Manager heartbeat.  The LEM requires a disconnect message or the TCP Timeout to be reached before it is able to report the node offline.  As a result with a scenario such as those under 2) above you can see an agent report online for a long delay before the event is eventually fired and the rule triggered if they are looking for a notification.

 

 

 

Last modified
20:03, 22 Jun 2016

Tags

Classifications

Public