Submit a ticketCall us

Don’t fall victim to a ransomware attack
Backups are helpful, but sometimes that’s not enough to protect your business against ransomware. At our live webcast we will discuss how to protect against ransomware attacks with SolarWinds® Patch Manager and how to leverage log data to detect ransomware. Register now for our live webcast.

Home > Success Center > Log & Event Manager (LEM) > InternalAgentOffline events

InternalAgentOffline events

Table of contents
Created by Justin Rouviere, last modified by MindTouch on Jun 23, 2016

Views: 11 Votes: 1 Revisions: 3

Overview

This article is to give background on how InternalAgentOffline events are currently processed by the LEM.

 

Environment

  • All currently supported versions of LEM.

Detail

The Log and Event Manager relies on two main pieces of information to determine an agent is offline:

1)  The agent sends a disconnect message to the Manager.

2)  The Manager fires the event when the TCP Timeout is reached for the connection.

 

As a result there are two main categories of agent offline events:

1)  The agent fires the disconnect message.

a)  This happens when the Solarwinds Log and Event Manager Agent service is stopped.

b)  This happens when the Solarwinds Log and Event Manager Agent or Javaw task is stopped in Task Manager.

c)  This happens when the node is rebooted normally via windows.

2)  The agent does not get a chance to fire the offline event.

a)  This happens if the node is shut down normally.

b)  This happens if the node loses network connectivity.

c)  This happens when the node suffers a critical fault and shuts down (hardware failure).

 

In the second case, where TCP Timeout is reached the delay in seeing the InternalAgentOffline event can take 17+ minutes.  In the manager log you will see the following entries:

 

(Mon Dec 07 08:54:45 MST 2015) II:INFO [ConnectionStateHandler] {XML Communication Worker - 2:217} Connection [id: 0x5551c46f, /<Agent IP>:37895 => /<Manager IP>:37891] in state ACTIVE has been inactive for a long time. Closing.

 

The above event is generated roughly two minutes after the connection is lost with the agent, even if it is shutdown normally.

 

Then about 15 minutes later these entries are generated in the manager log:

 

(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [NioComNetworkChild] {NioReads-1:290} Reporting canceled key: ConnectionsKey:16 IOException while reading from key: ConnectionsKey:16 java.io.IOException: No route to host
(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [Communications] {NioReads-1:290} {NioReads-1:290}Child disconnection signaled: ( id:10000087 Agent name: <Agent Hostname> ):  Disconnect reason: IOException while reading from key: ConnectionsKey:16 java.io.IOException: No route to host
(Mon Dec 07 09:11:12 MST 2015) WW:WARNING [Communications] {NioExitQueueHandler:56} {NioExitQueueHandler:56}Child disconnection complete: ( id:10000087 Agent name: <Agent Hostname> )

 

Currently this is the only way that the LEM is able to fire an InternalAgentOffline event.  There is no Agent/Manager heartbeat.  The LEM requires a disconnect message or the TCP Timeout to be reached before it is able to report the node offline.  As a result with a scenario such as those under 2) above you can see an agent report online for a long delay before the event is eventually fired and the rule triggered if they are looking for a notification.

 

 

 

Last modified
20:03, 22 Jun 2016

Tags

Classifications

Public